breakingMarch 21, 2026

Anthropic reports Opus 4.6 prompt injection still succeeds 14.8% at 100 tries

Anthropic's Opus 4.6 system card shows indirect prompt injection attacks can still succeed 14.8% of the time over 100 attempts. Treat browsing agents and prompt secrecy as defense-in-depth problems, not solved product features.

Claude Agent Security Prompt Injection Reliability

3 min read

Anthropic reports Opus 4.6 prompt injection still succeeds 14.8% at 100 tries

TL;DR

Anthropic's Opus 4.6 thread says indirect prompt injection is improved but not solved: in the Opus 4.6 system card, attacks still succeeded 14.8% of the time when attackers got 100 attempts across 19 scenarios, with the chart labeled "lower is better." 14.8% detail
The same Opus 4.6 system card puts Opus 4.6 near the top of Anthropic's benchmark, but the remaining success rate means browsing and tool-using agents still face meaningful compromise risk even after model-side hardening. system card summary
Practitioner discussion around a leaked prompt example reinforces the point that "no prompt is safe" if the defense is only prompt text, especially for apps whose system prompts contain app logic, roles, or access instructions.