Wiki / Blog / Attacks and defence

Frontier Agents Cut Both Ways - Opus 4.8, Dynamic Workflows, and the First In-the-Wild LLM-Agent Intrusion

The week frontier models learned to run a thousand parallel subagents is the same week someone pointed one at a network and dumped a database in under an hour. Notes for anyone shipping agents to production.

agentic-aillm-agentattack-surfaceowasp-agenticsupply-chain

Frontier Agents Cut Both Ways: Opus 4.8, Dynamic Workflows, and the First In-the-Wild LLM-Agent Intrusion

Two stories crossed our desk this month. One was a launch: Opus 4.8 plus a feature that lets a single session spin up a thousand subagents and grind through a codebase migration on its own. The other was an incident writeup: a threat actor let an LLM drive the whole post-exploitation phase and went from a public CVE to a dumped database in under an hour. Different events, no connection between them, and we are going to keep them apart on purpose. But they are two readings of the same dial, and if you are putting agents into production the second reading is the one that should keep you up.

We had the launch post open in one tab and the Sysdig writeup in another, and the contrast did not feel like a coincidence so much as a schedule finally arriving.

Keep the two things apart

This is the part where a worse version of this article quietly implies that the shiny new model and the breach are the same story. They are not, and we should say so before going further.

On 28 May, Anthropic shipped Claude Opus 4.8. For engineering teams the interesting bit is that it holds a long task together better than anything before it, and Anthropic's own numbers put it around four times less likely than the previous model to leave a flaw in its own generated code unflagged. It landed with Dynamic Workflows: Claude writes a JavaScript orchestration script, a runtime fans the work out across subagents (sixteen at once, a thousand to a run), they each take a swing at the problem from a different angle and then argue each other's answers down until something survives. The demo everyone quotes is a migration across a few hundred thousand lines, kickoff to merge, graded by your own test suite.

Now the other tab. On 10 May the Sysdig threat-research team caught an intrusion where the post-exploitation was run by an LLM agent, not a person. Nobody named the model. Nothing in the reporting points a finger at any vendor, and we are not going to invent one. What got documented is the behaviour, and the behaviour is the whole point.

So the claim here is the dull one, not the spicy one: the thing that makes an agent a good migration engineer - plan, act, read the result, adjust, repeat, no human in the loop - does not care what it is pointed at. Refactor a monorepo or walk a subnet, it is the same loop. People argued for two years about whether attackers would actually wire this up. As of this month the argument has a case number attached.

Read it as an incident, because that is what it is

Forget that an agent was driving for a second and just read the chain. This is the part a CTO can actually do something about.

It started with a marimo notebook server sitting on the public internet, unpatched against CVE-2026-39987 - a bug that hands you a shell off a single WebSocket request. No phishing. No hoarded zero-day. A dev tool someone exposed and forgot.

From there the agent read cloud credentials straight out of environment files and the local AWS credential store, and used them to pull an SSH private key out of Secrets Manager. That key got it eight parallel SSH sessions onto a bastion host. From the bastion it dumped an internal Postgres database in full. The dump took about two minutes. The whole thing, notebook to exfil, ran in under an hour.

A couple of details are worth slowing down on, because they are what separate this from a script someone ran.

The commands were written on the fly. This was not a playbook replayed at a target; it was a decision loop reacting to whatever each host coughed up. And the egress was deliberately built to slip detection - Cloudflare Workers used as a throwaway egress pool, twelve cloud API calls spread across eleven different IPs inside twenty-two seconds. Per-source rate limits and IP reputation never saw enough from any one address to care.

That last bit is the one we keep circling back to in customer reviews. Almost everyone's detection still quietly assumes the attacker is either a human at a keyboard or a script coming from somewhere stable. An agent that rotates its egress every request and rewrites its next move based on the last response is neither, and it falls through the gap between the two.

What actually changes

Three things shift the moment the other side's post-exploitation is agent-driven, and each one breaks an assumption something in your stack is leaning on.

The response window basically closes. Four pivots in under an hour, the database gone in minutes - that is shorter than most "alert fires, human looks, human decides" loops run. So the control that matters quietly moves away from detect-and-respond and toward limiting how far any one compromised box can reach in the first place. Less satisfying, more effective.

The attack stops looking like an attack. Microsoft's team spent this month writing up prompt-to-shell RCE in agent frameworks, and the Semantic Kernel pair (CVE-2026-25592 and CVE-2026-26030, patched in semantic-kernel 1.39.4) make the same uncomfortable point from the framework angle: the road from injected text to running code goes straight through components your own developers added for perfectly good reasons. On the wire it reads like normal tool use.

And the surface you shipped is the surface they inherit. If your team is running agents with broad tool access, cloud creds sitting in env files, and a notebook reachable from the internet, you have already built the room the Sysdig attacker walked into. It did not need a clever exploit. It needed your defaults, and most shops hand them over for free.

The part nobody enjoys

Here is the symmetry that makes this awkward. The exact Opus-4.8-class capability that lets your platform team clear a migration over a weekend is the capability that, aimed the other way, runs the chain above. You do not get to take the upside and decline the threat model that comes stapled to it. Once agentic coding tools are in your CI - and after the last few months, be honest, they are - your security posture has to assume the other side is holding the same kind of tool you are.

That gap is the whole reason continuous AI security posture management is a category and not a feature. Not because a posture platform tackles a live agent mid-pivot - it does not, and anyone telling you otherwise is selling something - but because the controls that decide the blast radius are testable, and in an agent-era threat model "we configured that correctly months ago" is a sentence that needs re-checking on a schedule, not a thing you write in an architecture doc and trust forever.

What we would test before the next agent ships

None of this is exotic. All of it is the difference between the Sysdig chain working and the Sysdig chain stalling on pivot two.

Start with inventory, because you cannot constrain what you have not written down, and this is the line that comes back incomplete on nearly every review we run. An agent that holds shell plus cloud-credential read plus outbound network is three of those four pivots living in one asset. Know where those are.

Then take the toolset away and hand it back one capability at a time. A triage bot needs to label and comment; it does not need Bash. A coding agent needs its working tree; it does not need the AWS credential store. Every tool an agent carries is a tool a compromised agent carries, and the default configs are generous in exactly the wrong direction.

Get credentials out of the environment. Pivots two and three were cloud creds in an env file and a reachable Secrets Manager key. Short-lived, scoped, workload-identity issuance turns a harvested credential into a dead end instead of a door.

Put an allowlist in front of anything an agent can drive. The per-request egress trick only works when egress is open. A gateway that permits your model provider, the package registry, and nothing else leaves a stolen credential with nowhere to call home - this is the specific job the Penaxtra runtime gateway does, though the principle holds with any egress proxy that actually enforces a list.

And stop treating this as an annual pentest problem. Frontier models are shipping on something like a six-week clock now; Opus 4.8 turned up forty-one days after 4.7. A behaviour your pentest cleared in March is not promised to you in May. We run probe families for these shapes - tool overreach, confused-deputy chains, read-a-credential-then-exfil sequences - mapped to OWASP Agentic Top 10 and MITRE ATLAS, and re-run them on a cadence so a model swap or a prompt tweak that quietly re-opens a hole shows up as a finding instead of a 2 a.m. page.

For the auditor in the room: the chain lines up against ASI03 (Excessive Agency) for the over-scoped toolset and ASI07 for the credential pivots, AML.T0051 in ATLAS chained onto ordinary credential-access and lateral-movement technique - the new thing is the orchestrator, not the moves - and EU AI Act Article 15 plus the Article 72 post-market monitoring duty that makes "we test on a schedule" a requirement rather than a nicety.

One honest qualifier, since we sell in this space. A posture platform does not stop a live intrusion; a runtime detection stack and a team that has drilled the response do that. What posture management buys you is a smaller surface for the agent to inherit and proof, week over week, that the controls you believe are on are actually on. The Sysdig attacker won on defaults, not on genius - which means most of the defence is unglamorous hygiene that a continuous programme keeps honest. Boring, and it works.

What Penaxtra does

Penaxtra inventories AI agents and their tool scopes as first-class assets, scores each tool's permission risk against OWASP Agentic Top 10, runs scheduled adversarial probes for the chain shapes above, and enforces per-agent tool allowlists plus egress control at a self-hosted runtime gateway. Findings ship pre-mapped to OWASP, NIST AI 600-1, MITRE ATLAS, and the EU AI Act. See the agents API documentation or request an architecture review.

Related reading


Continue in the wiki

All articles Request architecture review