Anthropic's Opus 4.6 system card shows indirect prompt injection attacks can still succeed 14.8% of the time over 100 attempts. Treat browsing agents and prompt secrecy as defense-in-depth problems, not solved product features.

Anthropic's Opus 4.6 system card includes an "Indirect Prompt Injection Robustness" chart covering 12 models and three attack budgets: one try, 10 tries, and 100 tries. In Simon Willison's thread on the chart, Opus 4.6 posts 0.2% success at one try, 2.1% at 10 tries, and 14.8% at 100 tries, which is better than many peers but still far from zero.
The practical detail is the retry budget. Willison's follow-up note makes the setup explicit: at k=100, "attacker gets 100 attempts," and even Anthropic's best reported score still lets a non-trivial share through. That matters more for agent products than for single-turn chat, because long-running workflows naturally create repeated opportunities to hit vulnerable tool calls, retrieval steps, or browser contexts.
The strongest takeaway is that prompt secrecy is not a security boundary. The shared Reddit screenshot describes an internal tool whose system prompt contained "instructions on data access, user roles, response formatting," and users could still get the model to "dump the entire system prompt" after a few follow-ups. That is consistent with Anthropic's benchmark framing: training helps, but prompt injection resistance is probabilistic, not absolute.
Willison's profiling write-up widens the threat model beyond prompt leakage. His example prompt, "Profile this user," run against 1,000 public Hacker News comments, produced a detailed profile that his posted screenshots say captured "personality and debate style," recurring technical views, and personal interests. If an injected agent can be induced to gather public text, summarize it, and act on it, the risk is not just hidden-prompt exposure but downstream misuse of tools and data aggregation.
A solo developer wired Claude into emulators and simulators to inspect 25 Capacitor screens daily and file bugs across web, Android, and iOS. The writeup is a solid template for unattended QA, but it also shows where iOS tooling and agent reliability still crack.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
No prompt is safe. This is a real problem if your prompts are highly optimized and you invested a lot of effort into them. What can you do?
New surveillance dystopia prompt: try running "Profile this user" against 1,000 comments by someone on Hacker News to see what an LLM can figure out simonwillison.net/2026/Mar/21/pr…
I think both - plus the labs have been putting a lot of effort into training them to resist prompt injection style attacks Anthropic usually mention prompt injection in their system cards eg this one for Opus 4.6 www-cdn.anthropic.com/0dd865075ad313…