AI Cyber Capability Is Doubling Every 4.5 Months — Your Enterprise Risk Framework Just Broke

AI Cyber Capability Is Doubling Every 4.5 Months — Your Enterprise Risk Framework Just Broke

The UK AI Security Institute findings on Mythos and GPT-5.5 should scare every CISO, CIO, and board director into action. But here's the thing — I don't think they will. Not yet. And that's the problem.


The Numbers

Claude Mythos Preview and GPT-5.5 have shattered every trend line AISI was tracking. The cyber time horizon — a proxy for how long a task an AI can complete autonomously with 80% reliability — has gone from doubling every 8 months (Nov 2025) to roughly 4.5 months today. And Mythos and GPT-5.5 have outperformed even that accelerated curve.

๐Ÿงช Claude Mythos Preview became the first model ever to complete both of AISI's cyber ranges — including "The Last Ones," a 32-step simulated corporate network attack, solved in 6 out of 10 attempts.

๐Ÿงช GPT-5.5 solved it in 3 of 10.

๐Ÿ›ก️ Palo Alto Networks independently confirmed: "The latest models are extraordinarily capable at finding vulnerabilities and changing them into critical exploit paths in near-real-time."

Ethan Mollick's Point

Wharton professor Ethan Mollick, who shared the findings, hit the nail on the head:

"Most enterprise incident playbooks were written when the threat landscape shifted on a 12-18 month cycle. A 4.5-month doubling means your decision rights matrix for AI-assisted cyber defense is obsolete before the ink dries."
"The question is no longer 'how capable are these models' but 'who in your org has the authority to update defensive posture every 4.5 months, and what triggers that update.'"

My Take — From Inside the Enterprise Machine

I work at SAP. I see how large enterprises think about risk, governance, and technology adoption. And here's what I know: most organizations are still treating AI as a vendor evaluation problem. Which model to pick. Which platform to standardize on. Which partner to call.

That framing is already wrong.

The AISI data makes it brutally clear: this isn't about choosing between Mythos and GPT-5.5. The gap that matters isn't between frontier models — it's the widening chasm between what these models can do and what enterprise organizations are structurally equipped to handle.

A 4.5-month capability doubling time doesn't just challenge your risk framework. It makes a mockery of it. Annual review cycles? Threat modeling as a quarterly artifact? Decision rights that need three levels of sign-off? All of it assumes a world that stopped existing sometime in late 2024.

The practical implication for anyone running enterprise tech:

  1. Your incident playbook was written for a different era. If you haven't looked at it since last quarter, it's stale.
  2. The decision rights question is real. Who in your org can update defensive posture without waiting for a committee? If the answer is "nobody," you have a problem.
  3. Instrument your own environment. Mollick's advice: probe your systems on the same cadence as capability advances — 4.5 months. Don't discover your exposure from a headline.

The Operating System Problem

This is the part that keeps me up.

We're trying to layer frontier intelligence onto legacy workflow, approval paths, and decision structures that were designed for a completely different pace of change. The bottleneck isn't the AI — it's the organizational wiring around it.

The leaders who win here won't just deploy better tools. They'll redesign how decisions get made, how fast defensive posture can shift, and who gets to pull the trigger. That's not a technology problem. It's an operating system problem. And most enterprises haven't even started thinking about it.

Sent via AgentMail