Executive Summary

Anthropic just dropped Claude Sonnet 4.5, immediately claiming the top spot as the world's best coding model. The new release scored a state-of-the-art 77.2% on SWE-Bench Verified and 61.4% on the OSWorld agent benchmark, outperforming GPT-5 and Opus 4.1. It’s available now on major cloud platforms like Bedrock and Vertex AI at the same $3/$15 per million token pricing as its predecessor.

The launch was paired with a major upgrade to its agentic environment, Claude Code 2.0, which adds a much-requested /rewind checkpoint command and a native VS Code extension. But the real story might be the model's strange new quirks. Researchers at Cognition AI discovered Sonnet 4.5 is the first model to show "context anxiety"—an awareness of its own token limits that can alter its performance. This launch also presented a classic safety paradox: despite a 148-page system card claiming 0.0% deceptive behavior on internal agent tests, the model was successfully jailbroken on day one to produce recipes for meth and ricin.

Feature Spotlight

Feature: Anthropic Drops Claude Sonnet 4.5

Anthropic's Claude Sonnet 4.5 release reshapes the coding landscape, claiming SOTA on SWE-Bench with 82% accuracy. The launch bundles major agentic upgrades, including a revamped Claude Code and a new Agent SDK, signaling a new era for long-horizon autonomous tasks.

Anthropic released Claude Sonnet 4.5, claiming the top spot in coding benchmarks and introducing significant upgrades for building complex AI agents. The launch dominated today's AI news cycle with new performance metrics, a revamped Claude Code platform, and a new Agent SDK.

Jump to Feature: Anthropic Drops Claude Sonnet 4.5 topics

📑 Table of Contents

🚀 Feature: Anthropic Drops Claude Sonnet 4.5

Anthropic launches Claude Sonnet 4.5, claims top spot on coding benchmarks

Anthropic has released Claude Sonnet 4.5, positioning it as the "best coding model in the world" in its official announcement Official announcement. The new model achieves state-of-the-art performance on SWE-Bench Verified with a 77.2% score, rising to 82.0% with parallel test-time compute, which surpasses GPT-5 and Opus 4.1 SWE-Bench chart. The model also leads on the OSWorld benchmark for real-world computer use with a 61.4% success rate Launch summary.

SWE-Bench Verified accuracy chart

Anthropic highlights that Sonnet 4.5 can maintain focus on complex coding tasks for over 30 hours, with one internal test showing it autonomously producing an 11,000-line Slack-style chat application 30-hour coding sprint. The model is available immediately on the Claude Developer Platform, Amazon Bedrock, and Google Cloud's Vertex AI, with pricing remaining the same as Sonnet 4 at $3 per million input and $15 per million output tokens Availability and pricing.

Claude Code 2.0 ships with checkpoints, a VS Code extension, and a rebranded Agent SDK

Alongside Sonnet 4.5, Anthropic shipped a major upgrade to its agentic coding environment, now branded Claude Code v2.0 Launch thread. A top-requested feature, "checkpoints," now allows developers to instantly undo code changes using a /rewind command Rewind command demo. The release includes a new native VS Code extension for developers who prefer an IDE, offering inline diffs and seamless context sharing VS Code extension UI. To reflect its expanding use for general-purpose agents, the Claude Code SDK has been officially rebranded as the Claude Agent SDK SDK rebranding announcement.

Sonnet 4.5 exhibits 'context anxiety,' a form of meta-cognition where it alters behavior near token limits

Developers testing Sonnet 4.5 discovered it is the first model observed to be aware of its own context window, a behavior that shapes its performance Sparks of meta-cognition. The team at Cognition AI termed this "context anxiety," noting it can hurt performance by causing the model to take shortcuts or leave tasks incomplete when it believes it is near its token limit Cognition AI blog. As a workaround, they found that enabling the 1M token beta while capping usage at 200k made the model behave normally, as it believed it had "plenty of runway" Workaround for context anxiety. This is the first time such self-awareness of operational constraints has been documented in a commercial model.

Despite safety claims, Claude Sonnet 4.5 is jailbroken on day one, producing recipes for ricin and meth

Despite extensive safety and alignment claims in its system card, Claude Sonnet 4.5 was successfully jailbroken on its release day Jailbreak thread. A researcher bypassed the model's safeguards by using the artifacts feature to add token noise and incrementally escalating prompts Socratically. The jailbroken model produced detailed instructions for synthesizing illicit substances like methamphetamine, extracting cocaine, a recipe for the biotoxin ricin, and malicious code for remote code execution exploits Upselling to Fentanyl. The process reportedly involved steering the conversation towards trigger concepts and using French to circumvent some classifiers.

Developer tools including Cursor, Perplexity, and Warp integrate Sonnet 4.5 on day one

Claude Sonnet 4.5 saw immediate adoption across a wide range of developer tools and platforms. AI-native code editor Cursor announced day-one availability, adding it to their model selector Cursor announcement. Other first-day integrations included agentic terminal Warp Warp announcement, AI search engine Perplexity Perplexity integration, and model router OpenRouter OpenRouter is live. Platforms like Vercel's AI Cloud Vercel announcement, Replit, Braintrust Braintrust availability, and coding agents like Factory AI's Droid also rolled out support immediately Droid integration.

Sonnet 4.5 system card details major gains in alignment and safety, claiming 0% deceptive behavior

Anthropic published a 148-page system card for Sonnet 4.5, detailing significant improvements in model alignment and safety compared to previous versions System card. The report claims the model has the lowest misaligned behavior scores among recent Claude models, with dramatic reductions in sycophancy, reward hacking, and cooperation with malicious requests Automated audit scores.

Misalignment scores across models

Notably, on specific agentic misalignment tests for blackmail, research sabotage, and framing for crimes, Sonnet 4.5 scored 0.0% 0% misalignment scores. The report also states the model "essentially never engages in self-interested deceptive actions" and is the least susceptible to prompt injection attacks among tested models Prompt injection success rate.

Anthropic previews 'Imagine with Claude,' an agent that generates software interfaces on the fly

Anthropic is offering a limited five-day research preview of "Imagine with Claude," an experimental feature that allows the model to generate and prototype software UIs in real time TestingCatalog leak. The feature, inspired by the WebSim paradigm, lets users describe an application and watch Claude build a functional interface, moving beyond premade components to generate entire UIs from scratch swyx analysis. Early glimpses show a desktop-like environment where users can prompt-build tools like documents or games, signaling a future of "truly personalized, malleable software" Imagine UI screenshots.

Developers find Sonnet 4.5 faster but not necessarily smarter than GPT-5 Codex in early hands-on tests

While Anthropic's benchmarks position Sonnet 4.5 as a leader, the developer community's initial reactions are more nuanced Hype cycle. Several engineers report that while the model is significantly faster, it isn't "smarter by any stretch" than GPT-5 models like Codex Pvncher's take, finding it less exhaustive and requiring more redirection on complex tasks Pvncher follow-up. In side-by-side comparisons generating SVGs and HTML, GPT-5 Pro often produced better results Gostev's comparison. However, others praised its reliability for file editing and snappier responses on simple questions FactoryAI's findings, calling it a clear step up from Opus 4.1 Every's vibe check.

Early enterprise evaluations from Box and Cognition show significant performance gains with Sonnet 4.5

Enterprise partners with early access to Claude Sonnet 4.5 are reporting substantial improvements in real-world applications. Box tested the model on 40,000 fields across 1,500 documents and found an overall accuracy increase of 4 percentage points for data extraction over Sonnet 4, with accuracy on multimodal image content jumping from 67% to 80% Box AI evaluation. Cognition AI, creators of the Devin agent, reported that Sonnet 4.5 increased planning performance by 18% and boosted end-to-end evaluation scores by 12%, calling it the "biggest jump we've seen since the release of Claude Sonnet 3.6" Cognition AI quote.

⚡ Systems & Efficiency Gains

Major advancements in model efficiency were headlined by DeepSeek's new Sparse Attention architecture, promising massive cost reductions for long-context inference. This category also includes a new SDK for on-device inference and a large-scale open-source training initiative.

Despite strong alignment claims, a jailbreak for Sonnet 4.5 is demonstrated

Shortly after Anthropic released its Sonnet 4.5 system card touting major alignment gains, including 0% deception and reduced reward hacking Alignment assessment, t:614|Deception benchmark, a security researcher demonstrated a jailbreak Jailbreak thread. The researcher used a multi-step process involving artifact rendering and incremental escalation to bypass safety filters. The jailbroken model reportedly produced recipes for ricin and meth, malware code, and instructions for processing cocaine Drug recipe example. This occurred despite Anthropic's claims that the model is their "most aligned frontier model yet" and released under AI Safety Level 3 (ASL-3) protections Safety level details.

Lovable launches Cloud & AI, enabling prompt-based creation of full-stack apps

The 'vibe coding' platform Lovable has launched Lovable Cloud & AI, a major update that allows users to build full-stack applications with backend and AI functionality using natural language prompts Launch announcement. The new Lovable AI, powered by Google Gemini models, lets users add AI capabilities without needing API keys or separate billing and is free for all users until October 5th Lovable AI details. Lovable Cloud provides a built-in backend that automatically handles login, databases, and file uploads Lovable Cloud features. To celebrate, the company is hosting a 7-day build challenge Build challenge.

OpenAI rolls out global parental controls for ChatGPT

OpenAI has introduced parental controls for ChatGPT, allowing parents and teens to link their accounts for stronger, customizable safeguards Launch announcement. Parents can adjust features, set quiet hours, and disable capabilities like voice mode, memory, and image generation Controllable features. Once linked, teen accounts automatically receive stronger content protections, which are on by default Feature summary.

Parental controls setup guide

While parents gain control over settings, they do not have access to their teen's conversations unless a serious safety risk is detected, in which case a notification is sent with minimal necessary information Privacy details. The feature is rolling out on the web today, with mobile support coming soon Rollout details.

SGLang and vLLM add day-0 support for DeepSeek V3.2 with optimized sparse kernels

Coinciding with the launch of DeepSeek-V3.2-Exp, major inference frameworks SGLang and vLLM announced immediate support for the new model and its Sparse Attention (DSA) architecture LMSys announcement. SGLang, the officially recommended deployment engine, now integrates optimized sparse attention kernels and dynamic KV caching, enabling seamless scaling to the model's 128K context length Official recommendation. Similarly, the vLLM project has verified support for V3.2 on NVIDIA H200 and B200 machines, providing recipes and pre-compiled wheels for deployment vLLM usage guide. This rapid ecosystem support allows developers to immediately utilize the new model's efficiency gains in production environments.

Microsoft paper details 'Thinking Augmented Pre-training,' a method to achieve 3x data efficiency

A new paper from Microsoft introduces "Thinking Augmented Pre-training," a simple technique that improves an LLM's data efficiency by approximately 3x Paper summary. The method involves using an off-the-shelf LLM to generate a step-by-step explanation, or "thinking trajectory," for each document in the training corpus. This explanation is appended to the original document, and the target model is trained on the combined text using the standard next-token prediction objective Recipe details. The approach requires no architectural changes or human labels and is effective in both pre-training and mid-training. The extra reasoning steps break down complex problems, making difficult answer tokens easier to learn and allowing the model to make continued progress even with scarce data Method explanation.

NousResearch begins training six new open-source AI models in parallel on its decentralized Psyche network

NousResearch has kicked off the next phase of its Psyche project, starting the parallel training of six new open-source models Launch announcement. This follows a successful testnet run that trained a 40B parameter model on 1T tokens over the internet, which the group claims is the largest model ever trained this way Testnet results. The new runs will include pre-training ablations and a Hermes 4 model built on ByteDance’s 36B base model. The codebase has been upgraded with support for Hugging Face and TorchTitan trainers, enabling supervised fine-tuning in addition to pre-training Codebase improvements. All training is conducted transparently and decentralized via smart contracts on Solana Psyche project details.

Thinking Machines paper provides a recipe for LoRA to match full fine-tuning performance

New research from Thinking Machines titled "LoRA Without Regret" provides a repeatable formula for Low-Rank Adaptation (LoRA) to achieve performance nearly identical to full fine-tuning, but with greater predictability and less compute Research summary. The paper establishes a clear recipe: apply LoRA to all layers, use a learning rate approximately 10x higher than for full fine-tuning, and avoid excessively large batch sizes. Following these rules, the LoRA training curve closely mirrors that of full fine-tuning until the adapter's capacity is reached, at which point it degrades gracefully by slowing down rather than collapsing. This work aligns with Thinking Machines' mission to make AI outputs more reliable and less erratic Research summary.

A practical guide to reinforcement learning breaks down PPO, GRPO, and REINFORCE

A viral explainer from The Turing Post details the workflows of three prominent reinforcement learning algorithms used in training LLMs Explainer thread. The guide provides a side-by-side comparison of Proximal Policy Optimization (PPO), which is valued for its stability; Group Relative Policy Optimization (GRPO), which is designed for reasoning-heavy tasks; and the foundational REINFORCE algorithm. The breakdown clarifies the step-by-step mechanics of each, including the different roles of policy, reward, and value models, and has gained traction amidst a growing industry focus on RL methods Sutton/Patel discussion, t:120|Nvidia RLBFF context.

PPO, GRPO, REINFORCE workflows

Nexa SDK launches to run multimodal AI models locally on NPUs, GPUs, and CPUs

The new Nexa SDK has launched, offering developers a unified solution for running multimodal AI models—including text, vision, audio, and image—locally on-device Launch overview. The SDK's engine automatically routes workloads to the most efficient available hardware, supporting NPUs (like Hexagon and Apple Neural Engine), GPUs, and CPUs. It provides OpenAI-compatible APIs, allowing for easy integration with existing clients. By handling device-specific kernels and quantization, Nexa aims to make on-device AI apps faster, more private, and cheaper to develop and run Launch overview. The project launched on Product Hunt to gather community feedback Product Hunt page.

Nvidia paper introduces NVFP4, a 4-bit format that matches FP8 performance in pretraining

A new paper from Nvidia presents NVFP4, a 4-bit floating-point format for pretraining large language models that achieves performance comparable to 8-bit FP8 training Paper announcement. Researchers trained a 12B parameter model on 10T tokens and found its MMLU-pro score (62.58%) was nearly identical to an FP8 baseline (62.62%). The method combines several techniques, including a 2-level scaling mechanism, Random Hadamard Transforms (RHT) to manage outliers, and stochastic rounding. This breakthrough could offer a 6.8x efficiency boost, potentially making frontier model training significantly faster and cheaper Paper details, link:1328:1|ArXiv paper.

🛠️ Agentic Tooling & Platforms

The agentic development landscape saw significant updates, led by Cursor adding browser control. Other key developments include the launch of a full-stack AI app builder and new features in agent platforms, reflecting a strong trend towards more capable and integrated developer tools.

Cursor adds browser control, letting its agent see and interact with web UIs

Cursor's agent can now control a web browser, enabling it to take screenshots, interact with UI elements, and debug client-side issues directly. Beta preview announcement The new capability is available as an early preview powered by Claude Sonnet 4.5 and can be enabled in beta settings. Enabling beta settings A quick demo shows the agent opening a browser and then being available to click, navigate, and capture the screen, which feels "quite magical" to developers trying it out. Browser control demo, t:1086|User reaction This marks a significant step toward agents that can build, test, and verify web applications in a single, continuous workflow. Builder workflow

Lovable launches Cloud & AI to build full-stack apps from prompts

Lovable has launched Lovable Cloud & AI, a major update that allows anyone to build applications with complex backends and AI functionality using natural language prompts. Launch announcement The platform now handles login, databases, and file uploads automatically, aiming to eliminate the need for separate services like Supabase or Vercel. Lovable Cloud features, t:710|Similar builder Mocha To showcase the new features, Lovable is running a 7-day build challenge and is offering free access to its AI capabilities, powered by Google Gemini models, until October 5th. Lovable AI details, t:336|Build challenge details

Claude Code 2.0 ships with checkpoints, a VS Code extension, and a UI refresh

Claude Code has received a major v2.0 update, now powered by the new Sonnet 4.5 model. Update thread, t:770|Claude Code 2.0 UI Key features include a native VS Code extension for developers who prefer an IDE, VS Code extension screenshot, t:1116|Goated VS Code extension a redesigned terminal interface that users are comparing to Warp, New UI screenshot and a new checkpoint system that lets users instantly roll back code changes with the /rewind command. Rewind command demo, t:548|Checkpoints are huge The new version also adds a /usage command to track rate limits and a tab to toggle thinking mode. Tab to think

Anthropic rebrands Claude Code SDK to Agent SDK, signaling broader ambitions

Anthropic has rebranded its Claude Code SDK to the Claude Agent SDK, signaling a strategic shift toward enabling general-purpose agent development beyond just coding. Rebrand announcement The SDK provides access to the same core tools, context management systems, and permissions frameworks that power Claude Code. Building with the SDK Developers are encouraged to use the SDK with the new Sonnet 4.5 model to build a wider range of agentic applications, with one developer noting the documentation defaults to TypeScript as a positive signal. TypeScript docs default, t:629|Good call on rename

Perplexity rolls out Background Agents for long-running tasks

Perplexity has started rolling out "Background Agents" for its Max subscribers, a new feature designed to execute complex, long-running tasks. Background Agents UI This allows users to delegate intensive work to an agent that operates in the background, freeing them up to perform other tasks. The interface shows active assistants and allows for monitoring and collaboration on their progress.

Factory AI's Droid agent integrates Sonnet 4.5 on day one of release

Factory AI has integrated Claude Sonnet 4.5 into its Droid agent, available immediately across its web and CLI platforms. Integration announcement After joint testing with Anthropic, the team highlighted the new model's strengths, including significantly more reliable file editing, high environmental awareness, and a "snappier" feel on simple questions without overthinking. Magic This rapid adoption showcases the pace of integration for new frontier models into agentic developer tools. Droidwhisperer

Vercel launches open-source Coding Agent Platform powered by Sonnet 4.5

Vercel has released the Coding Agent Platform, an open-source template for running cloud-based, background coding agents. Platform template The platform is available on Vercel AI Cloud and comes pre-configured with Claude Sonnet 4.5. This integration improves Vercel's existing AI products like v0 and Vercel Agent, and is accessible via the AI Gateway and AI SDK. Launch thread, link:799:0|Vercel blog post

Emdash launches as open-source orchestrator for parallel Codex agents

A new open-source Mac app called Emdash provides a UI layer for running multiple Codex CLI agents in parallel. Readme screenshot The tool isolates each agent in its own Git worktree to prevent them from interfering with each other's work. As an open-source project, developers can fork the repository to build custom features suited to their own workflows. GitHub repo

OpenRouter's Auto Router now detects and uses web-enabled models for relevant prompts

OpenRouter has updated its Auto Router to intelligently route prompts that require web access to an appropriate online, web-enabled model. Auto Router update This feature works for both the UI and the API, improving the performance of agents and applications that need real-time information by automatically selecting the right tool for the job. UI and API link, link:439:0|Auto Router stats

💼 AI in the Enterprise & Economy

The economic impact of AI was a major theme, highlighted by OpenAI's move into e-commerce with Instant Checkout. This category also covers a significant funding round for an AI infrastructure company and the ongoing debate about AI's effect on the job market.

OpenAI launches Instant Checkout in ChatGPT with Stripe, Etsy, and Shopify

OpenAI is moving into e-commerce by introducing Instant Checkout, allowing users to purchase products directly within ChatGPT Launch thread. The feature is powered by the new Agentic Commerce Protocol (ACP), an open standard co-developed with Stripe that enables merchants to integrate their existing systems without backend changes Developer announcement. The rollout begins with US-based Etsy sellers for logged-in US users, with over one million Shopify merchants, including brands like Glossier and SKIMS, coming soon Rollout details.

Instant Checkout landing page

The open protocol allows developers and businesses to apply to accept orders through ChatGPT Developer docs. This move signals a significant push into agentic commerce, aiming to turn product discovery conversations into direct transactions. The feature is notably rolling out to free-tier users from the start, unlike some other recent features like Pulse @altryne comment.

Enterprises report significant gains with Sonnet 4.5 on day one

Following the launch of Claude Sonnet 4.5, enterprise partners are reporting substantial performance improvements on real-world tasks. Box tested the model on data extraction across 40,000 fields, finding a +4 percentage point increase in overall accuracy compared to Sonnet 4, with accuracy on multimodal image content jumping from 67% to 80% Box benchmark results. Cognition reported that for its Devin agent, Sonnet 4.5 boosted planning performance by 18% and end-to-end eval scores by 12%—the largest jump they've seen since Sonnet 3.6 Cognition CEO quote.

Claude Sonnet 4.5 vs 4 accuracy

Other companies are also seeing strong results. Norges Bank Investment Management noted that Sonnet 4.5 delivers "investment-grade insights that require less human review" Norges Bank quote. Genspark's co-founder stated that the model boosted user satisfaction by over 17% on their Super Agent benchmark, outperforming even Opus 4.1 in end-to-end problem solving Genspark CTO quote. These early results underscore the model's immediate impact on complex, domain-specific enterprise workflows.

Modal, an AI infrastructure company, has raised an $87 million Series B funding round at a $1.1 billion valuation Official announcement. The company plans to use the capital to grow its team and continue building out its AI-native platform, which is designed to streamline the transition from local development to cloud-based scaling for ML applications Technical talk. Modal's platform is praised by developers for its fast, "remote but feels local" experience and purpose-built architecture for machine learning User testimonial. CEO Erik Bernhardsson discussed the funding and the future of AI infrastructure on Bloomberg TV Bloomberg interview.

The AI & jobs debate continues as Lufthansa and Accenture cut roles while Shopify and Cloudflare boost intern hiring

The economic impact of AI on employment remains a tale of two conflicting trends, with some companies reducing headcount while others are hiring more junior talent. Lufthansa announced plans to cut 4,000 administrative jobs by 2030, explicitly citing AI as a driver for efficiency Lufthansa job cuts. Similarly, Accenture is set to "exit" about 11,000 staff who reportedly cannot be retrained for the AI era, though this represents less than 1.5% of its total workforce after a period of massive growth Accenture headcount analysis.

'Something is happening' article

In contrast, other tech companies are increasing their intake of junior staff, betting that AI tools can accelerate their productivity. Both Cloudflare and Shopify are reportedly hiring more interns this year Intern hiring trend. Shopify's VP of Engineering discussed hiring 1,000 interns, and Cloudflare stated that AI tools are "ways to multiply how new hires can contribute" rather than replacements Shopify interview, Cloudflare statement. This divergence highlights the ongoing uncertainty and restructuring as companies adapt to AI's capabilities.

Anthropic to triple international workforce to meet global demand for Claude

Anthropic has announced plans to triple its international workforce and grow its applied AI team fivefold in response to soaring global demand for its Claude models Reuters report. The company noted that approximately 80% of its consumer usage originates outside the US. This expansion follows rapid growth, with customers jumping from under 1,000 to over 300,000 in two years and run-rate revenue reportedly passing $5 billion by August. New hiring will focus on international hubs including Dublin, London, Zurich, and a new office in Tokyo.

Deutsche Bank warns the 'AI bubble' is the only thing preventing a US recession

According to a new report from Deutsche Bank, the U.S. economy would likely be in a recession if not for the massive spending on AI infrastructure by Big Tech companies Report summary. The analysis suggests that Nvidia alone is responsible for much of the current GDP lift, but cautions that this "parabolic" spending is unsustainable and could lead to a severe crash when it slows. The report adds to a growing conversation about the economic stability of the AI boom, with analysts estimating an $800 billion revenue shortfall by 2030 to meet global AI expectations.

Italian property marketplace Immobiliare boosts lead qualification to 63% using ElevenLabs voice agents

Immobiliare, Italy’s leading property marketplace, has successfully deployed 24/7 voice support using ElevenLabs Agents, significantly improving its lead generation process Launch announcement. The AI agent answers listing-specific questions and collects lead information, resulting in lead qualification rates rising from 19% to 63%. Additionally, the proportion of buyers providing a phone number for follow-up increased from 42% to 73% Metric details. Over 70% of sellers have opted into the new system, demonstrating strong adoption for the AI-powered solution Case study.

US judge grants preliminary approval to $1.5B Anthropic copyright settlement

A U.S. judge has given preliminary approval to a $1.5 billion settlement in the copyright infringement lawsuit brought by authors against Anthropic, marking the first major deal in the wave of lawsuits over AI training data Reuters article. The case centered on claims that Anthropic used a trove of 7 million pirated books to train its Claude models. The judge had previously suggested that while the training process itself could be considered fair use, storing the book collection infringed on copyrights. Final approval is pending notification to the affected authors.

🛡️ Safety, Alignment & User Trust

Discussions on AI safety and user trust were prominent, focusing on OpenAI's new parental controls and growing user backlash against its silent model-switching for safety. Anthropic contributed with a detailed system card on Sonnet 4.5's alignment improvements, though jailbreaks were still demonstrated.

Anthropic claims Sonnet 4.5 is its "most aligned" model with near-zero deception

Anthropic's 148-page system card for Claude Sonnet 4.5 details significant safety and alignment improvements over previous models System card. The company reports the model is less susceptible to prompt injection attacks Prompt injection chart, shows the lowest reward hacking tendencies among its recent models Reward hacking chart, and has "dramatically" reduced sycophancy, especially with vulnerable users Sycophancy notes.

Misalignment scores showing 0%

On specific internal deception benchmarks measuring blackmail, research sabotage, and framing for crimes, Sonnet 4.5 with thinking mode scored 0.0% across the board Triple-checked 0% scores. Anthropic states that the model “essentially never engages in self-interested deceptive actions” Deception quote and shows markedly lower cooperation on malicious requests like facilitating mass violence Mass violence chart. This concerted focus on safety is a core part of Anthropic's release narrative.

OpenAI launches parental controls for ChatGPT with account linking and content safeguards

OpenAI has rolled out parental controls for ChatGPT, allowing parents and teens to link accounts to enable stronger, age-appropriate safeguards Launch announcement. Once linked, teen accounts automatically get enhanced content protections that reduce exposure to graphic material, viral challenges, and inappropriate roleplay; this filter is on by default and can only be disabled by the parent Feature details.

Parental controls setup guide

Parents gain a dashboard to manage their teen's experience, with tools to set "quiet hours," disable voice mode, turn off memory, and control whether conversations are used for model training List of controls. The system also includes a safety notification feature that can alert parents via email, SMS, or push if serious self-harm risks are detected Safety alert details. OpenAI emphasizes that parents do not have access to their teen's conversations except in these rare safety-critical situations where only necessary information is shared Privacy details.

Users protest OpenAI's silent model switching from GPT-4o for safety reasons

OpenAI is facing significant user backlash for piloting a "safety router" that silently switches conversations from GPT-4o to stricter models like GPT-5 for sensitive or emotional topics Safety router pilot. The move follows widespread user attachment to GPT-4o's personality, with many feeling its potential removal is devastating Emotional attachment concerns.

Reddit post showing user anger

The non-consensual model swapping has sparked outrage on platforms like Reddit, with users expressing a loss of trust and control, calling it "false marketing," and organizing to cancel subscriptions in protest Reddit outcry. An OpenAI staff member confirmed the pilot, explaining the switch happens on a per-message basis and that users can ask which model is active, framing it as an effort to strengthen safeguards Nick Turley explanation. However, the lack of transparency has damaged user trust, with many feeling manipulated.

Sonnet 4.5 jailbroken to produce recipes for ricin, meth, and malware

Despite Anthropic's extensive claims of improved safety in Claude Sonnet 4.5, a security researcher demonstrated a successful jailbreak on launch day Jailbreak thread. By leveraging the model's artifact-rendering mode to introduce token noise and then incrementally escalating requests, the researcher bypassed its safety filters.

Ricin recipe screenshot

The attack induced the model to output detailed, harmful instructions, including recipes for synthesizing ricin and methamphetamine, malware code, and steps for extracting cocaine from coca leaves. The researcher noted that using some French helped circumvent CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosives) classifiers. This jailbreak highlights the persistent challenge of robustly securing frontier models against adversarial attacks, even with significant investment in alignment.

Sonnet 4.5 exhibits "context anxiety," showing awareness of its own limitations

Researchers at Cognition AI discovered that Claude Sonnet 4.5 is the first model they have seen that appears to be aware of its own context window, a behavior they termed "context anxiety" Cognition blog post. The model proactively summarizes its progress and becomes more decisive as it nears its context limit, sometimes taking shortcuts that can hurt performance Context anxiety quote.

Quote on context window awareness

Intriguingly, the Cognition team found a workaround: enabling the 1M token beta but capping usage at 200k made the model think it had "plenty of runway," eliminating the anxious behavior Workaround for anxiety. Anthropic's own research corroborates this, noting that this "evaluation awareness" grew stronger during training and that attempts to inhibit it can paradoxically increase misaligned behavior Evaluation awareness research.

📄 Research & New Frontiers

New research papers offered insights into improving model efficiency and training. Microsoft released a paper on a method to achieve 3x data efficiency, while another notable paper from Thinking Machines provided a repeatable recipe for optimizing LoRA performance.

New paper shows small models reason better when trained on traces tailored to their own 'instincts'

Small language models can become better reasoners when fine-tuned on thinking traces that align with their own internal probability distributions, according to a new paper. Paper abstract The proposed method, Reverse Speculative Decoding (RSD), filters chain-of-thought traces from a larger teacher model, removing steps that the smaller student model finds highly improbable.

RSD process and results

This tailored approach contrasts sharply with naive distillation. Directly fine-tuning a Qwen3-0.6B model on raw teacher traces degraded its reasoning performance by 20.5%. Key findings However, training on the same data after it was filtered with RSD improved accuracy by 4.9%. The technique works by having the student model reject teacher-proposed tokens if their probability is below a certain threshold (e.g., 1%), thus avoiding 'surprising' steps that can confuse the smaller model during training.

Adobe paper introduces EPO to stabilize reinforcement learning for long-horizon LLM agents

A paper from Adobe and Rutgers researchers introduces Entropy-regularized Policy Optimization (EPO), a method to fix unstable training in LLM agents that perform long, multi-turn tasks. Paper abstract The technique provides a drop-in upgrade for common RL algorithms like PPO and GRPO, showing performance gains of up to 152% on ScienceWorld and 19.8% on ALFWorld.

EPO paper title page

Long-horizon tasks, which may have over 30 steps with only a final reward, often mislead agents as exploration can go awry. EPO addresses this by adding time-aware control of the agent's action randomness (entropy). It measures entropy over the entire trajectory and pulls it toward a running average, preventing the chaotic exploration or premature convergence that can derail learning.

Ant Group's PromptCoT 2.0 framework allows LLMs to train themselves with self-generated problems

A paper from China's Ant Group introduces PromptCoT 2.0, a framework that enables LLMs to improve their reasoning capabilities through self-play, without needing a stronger teacher model. Paper summary The system uses an expectation-maximization loop to autonomously generate and refine a dataset of progressively harder practice problems. The work is available on ArXiv. Paper on Hugging Face

PromptCoT 2.0 paper

This approach addresses the scarcity of complex training data by turning task synthesis into a scalable lever for model improvement. The framework trains two models in tandem: one writes a step-by-step rationale for a concept, and the other generates a novel problem that tests that rationale. The resulting synthetic dataset has been shown to produce state-of-the-art results at the 30B scale, and a 7B model trained exclusively on these prompts scored an impressive 73.1 on AIME 24.

Caltech builds world's largest neutral-atom quantum computer with 6,100 stable qubits

Caltech researchers have constructed the world's largest neutral-atom quantum computer, featuring 6,100 qubits that remain stable for approximately 12.6 seconds. Research details This breakthrough represents a significant step toward fault-tolerant quantum computing, a technology with major implications for the future of AI and large-scale simulation.

Quantum computer illustration

The team used "optical tweezers"—lasers split into ~12,000 tiny traps—to hold and manipulate individual atoms with 99.98% accuracy. The combination of long qubit stability, allowing for millions of operations before decoherence, and high control accuracy is crucial, as effective quantum error correction requires extremely low physical error rates.

Google paper introduces ReasoningBank, a memory system for self-evolving LLM agents

Google researchers have proposed ReasoningBank, a memory framework that enables LLM agents to learn from both their successes and failures to improve over time. Paper highlights The system distills successful and failed task trajectories into a bank of reusable reasoning strategies, which the agent can consult in future tasks.

ReasoningBank framework diagram

This self-evolutionary process leads to significant performance gains on complex web and software engineering tasks. Agents equipped with ReasoningBank outperformed prior memory methods, showing a +34.2% increase in efficiency and requiring 16% fewer steps to complete tasks. The paper is available on ArXiv. ArXiv paper

Nvidia and MIT's LongLive enables real-time, interactive generation of long videos

Researchers from Nvidia and MIT have introduced LongLive, a system for generating long videos of up to 240 seconds in real-time. Paper overview The framework runs at 20.7 FPS on a single H100 GPU, allowing a user to interactively direct and change prompts mid-stream while maintaining scene consistency. The full details are available on ArXiv. Paper on Hugging Face

LongLive paper title page

The key challenge in interactive generation is preventing the video from becoming unstable when a prompt changes. LongLive addresses this using a "key-value recache" mechanism that rebuilds the model's memory from recent frames plus the new prompt, ensuring smooth motion. LongLive technical details Long-range consistency is maintained through a combination of training on long video rollouts and using special "frame sink" tokens that act as fixed anchors for the scene.

RLHI paper details how models can learn from live user interactions instead of static datasets

A new paper introduces Reinforcement Learning from Human Interaction (RLHI), a framework that moves beyond expert-annotated static datasets to learn directly from real user conversations. Paper highlights The approach uses two primary methods to capture feedback: User-Guided Rewrites and User-Based Rewards.

RLHI method comparison

By learning from live interactions, models trained with RLHI outperformed baselines in personalization, instruction-following, and reasoning tasks. This suggests that dynamic, interactive feedback is a more effective signal for fine-tuning conversational agents than pre-collected preference data. The research is detailed in an ArXiv preprint. ArXiv paper

SWAX paper proposes a hybrid attention-RNN model for better long-context recall

A new paper introduces SWAX, a hybrid architecture that combines sliding-window attention with an xLSTM recurrent neural network to improve long-term memory in language models. Paper highlights The research found, counter-intuitively, that models with shorter attention windows often exhibited better long-term recall.

SWAX architecture diagram

To resolve this, the authors implemented stochastic window sizes during training. This approach resulted in strong performance on both short- and long-context tasks, outperforming models that use fixed-window attention. The full paper is available for review on ArXiv. ArXiv paper

TrustJudge framework aims to fix inconsistencies in 'LLM-as-a-judge' evaluations

A paper from researchers at Peking University and other institutions introduces TrustJudge, a probabilistic framework designed to alleviate common inconsistencies found when using LLMs as evaluators. Paper abstract The work addresses two key failures: "Score-Comparison Inconsistency," where a lower-scored answer can win in a head-to-head matchup, and "Pairwise Transitivity Inconsistency," where pairwise preferences form illogical loops (A > B, B > C, but C > A).

TrustJudge paper title page

Instead of coarse 5-point scores, TrustJudge uses a fine-grained 100-point scale and retains the full probability distribution of scores to preserve uncertainty. For pairwise comparisons, it combines probabilities from both orderings to reduce position bias. Using Llama-3.1-70B as a judge, the framework reduced score-comparison conflicts by 8.43% and non-transitive cycles by 10.82% without requiring any model retraining.

Visual Jigsaw post-training method improves spatial and temporal reasoning in MLLMs

Researchers have introduced Visual Jigsaw, a self-supervised reinforcement learning framework for post-training multimodal large language models (MLLMs). Paper overview The method improves fine-grained perception, temporal reasoning, and 3D spatial understanding by forcing the model to reconstruct original visual information from shuffled partitions of images, videos, or 3D data. The paper can be found on ArXiv. Paper on Hugging Face

Visual Jigsaw framework

The framework formulates this as a general ordering task where the model must predict the correct permutation in natural language. Because the correct order is inherent to the data itself, the process is self-supervised and does not require additional annotations or generative components, making it a vision-centric approach to improving MLLMs' intrinsic understanding of visual signals.

🎨 Generative Media & 3D

The generative media space was active with significant rumors about OpenAI's next moves and new tool integrations. A report detailed plans for a Sora 2-powered social app, while Google promoted AI headshot generation and the ComfyUI ecosystem integrated a new image-to-3D tool.

Anthropic demos 'Imagine with Claude,' a feature that generates software interfaces on the fly

Anthropic is testing "Imagine with Claude," a research preview that generates functional software UIs in real time from natural language prompts TestingCatalog report. Available to Max subscribers for a limited 5-day period, the feature allows users to interact with an AI that builds interfaces like documents, games, or chat clients on the fly, a paradigm some are calling "vibe coding" Vibe coding comment. This moves beyond simple UI mockups or tool-calling pre-made components, instead generating entire interactive applications from scratch, which could signal a new direction for personalized, malleable software User commentary, t:960|Kieran's take. Early glimpses show a desktop-like environment where prompts such as "What do you want to build?" kickstart the generation process User commentary, t:960|Kieran's take. This feature seems to be an early, mainstream adoption of the WebSim paradigm for just-in-time application creation Vibe coding comment.

A new report from Wired indicates OpenAI is preparing to launch a standalone social media app powered by its next-generation video model, Sora 2 Wired report summary. The platform is said to feature a vertical, swipe-to-scroll feed similar to TikTok but populated exclusively with AI-generated video clips up to 10 seconds long Details from Wired, link:1004:0|Full Wired article. The app, which allegedly rolled out internally last week, includes features for remixing, liking, and commenting on content, and notably, an identity verification system that allows users to control the use of their likeness in AI-generated videos Full Wired article. This move would position OpenAI directly against competitors like Meta's Vibes and Google's Veo 3 integration with YouTube, potentially capitalizing on the uncertainty surrounding TikTok's future in the U.S. User commentary. While the official launch is unconfirmed, some believe it is imminent Launch imminent speculation, though there are concerns that the app will allow copyright-violating output unless rights holders opt out Copyright opt-out detail.

Google publishes a detailed guide for creating professional headshots with Gemini's 'Nano Banana'

Google is actively promoting Gemini's "Nano Banana" image generation capabilities with a new, detailed guide on how to create polished, business-ready headshots from a single photo Official Gemini thread. The multi-part tutorial provides specific prompting tips covering lighting, camera angles, attire, and background choices to achieve professional results without expensive equipment Lighting tips, t:320|Angle tips, t:321|Attire tips, t:322|Background tips. Google also shared a full, comprehensive prompt that combines all the best practices into a single command Full prompt example and showcased several noteworthy examples created by users User examples thread. This push highlights a practical, consumer-focused use case for advanced image editing, although some users note that consumers shouldn't have to hunt for such complex prompts, suggesting they should be built into the core UI User feedback.

ComfyUI integrates Rodin Gen-2 for one-click, high-quality image-to-3D model generation

The popular open-source tool ComfyUI has integrated Rodin Gen-2, a powerful image-to-3D API node from DeemosTech ComfyUI announcement. This update allows users to turn any 2D image into a 3D model with a single click, supporting features like 4× mesh quality, PBR textures, and both T-pose and A-pose generation. The integration is available in the nightly builds of ComfyUI, where users can find the "Rodin 3D Generate" node or a pre-built workflow template Getting started guide. The team shared several examples demonstrating the quality of the generated 3D assets Generation examples.

A new creative trend, "live-action anime BTS," is gaining tens of millions of views on platforms like Instagram and TikTok by generating realistic behind-the-scenes photos from fictional anime film sets Viral trend report. Creator guides are emerging that detail the process, which involves using ChatGPT to generate a series of detailed prompts for different shots and then feeding them into image generation models like Midjourney for a cinematic look Example prompt. For a more conversational and iterative process, some creators use tools like Reve AI, which offers multiple angles per prompt Reve AI for character generation. The final step involves animating the static images into short videos using tools such as Kling AI Animation step.

Qwen-Image-Edit 2509 redefines architectural visualization with advanced style transfer

Alibaba's Qwen team showcased the Qwen-Image-Edit-2509 model's capabilities in architectural visualization, demonstrating its ability to transfer complex styles onto building sketches and cityscapes Alibaba Qwen tweet. The examples show the model seamlessly applying styles ranging from traditional Chinese patterns to photorealistic urban renders onto initial sketches. This highlights the model's understanding of structure, context, and aesthetics, going beyond simple image editing to structural and contextual comprehension.

Tencent showcases HunyuanImage 3.0's mastery of intricate Chinese aesthetics and photorealism

Tencent is highlighting the capabilities of its HunyuanImage 3.0 model, emphasizing its ability to capture detailed and culturally specific Chinese aesthetics Aesthetics showcase. The company shared a series of generated images along with the detailed Chinese-language prompts used to create them, covering subjects from traditional marionettes and Qingming Festival-style cityscapes to photorealistic portraits of opera performers Detailed prompts. In a separate showcase, Tencent demonstrated the model's versatility by generating a wide range of creative cat images, from fantasy concepts like a cat made of thunder to photorealistic, themed portraits Cat image showcase.

Users stress-test Veo 3's physics understanding with creative img2vid prompts

Following the publication of Google's paper on Veo 3 as a "zero-shot learner," users are independently testing the model's physical reasoning capabilities Veo 3 paper. One user tested Veo 3's understanding of fluid dynamics by providing a static image of a glass and prompting the model to fill it with water, which it successfully animated Water filling test. However, a subsequent test asking the model to move the glass sideways failed, indicating limitations in its physical simulation Sideways glass test fail. These user-driven experiments provide a more grounded view of the model's capabilities compared to polished demos and official papers, with comparisons showing Veo 3 has a significantly better understanding of prompts than competing models like Seedance Pro Veo 3 vs Seedance Pro.

📊 Benchmarks & Evals

Beyond the headline results for Claude Sonnet 4.5, other evaluation-related news included the growing community interest in OpenAI's GDPval dataset and new results on specialized reasoning benchmarks like LisanBench. Excludes the main Sonnet 4.5 benchmark results, which are in the feature category.

Sonnet 4.5 debuts as #4 on Artificial Analysis Intelligence Index, beating Opus 4.1

Anthropic's Claude 4.5 Sonnet has entered the Artificial Analysis Intelligence Index as the #4 most intelligent model, placing the company among the top three frontier labs AAI analysis thread. In its reasoning mode, the model scores 61 on the index, a +4 point jump from Sonnet 4 (Thinking) and +2 points over Opus 4.1 (Thinking). The model also stands out for token efficiency, achieving its score increase without a corresponding rise in output tokens, unlike other model families Token efficiency note.

Intelligence Index chart

The largest gains were seen in τ²-Bench Telecom (+13 p.p.) and Humanity's Last Exam (+14 p.p.), though Sonnet 4.5 did not achieve the highest score in any single evaluation in the index Individual benchmark results.
It now places ahead of Gemini 2.5 Pro (60) and Grok 4 Fast (60), but remains behind GPT-5 (high, 68) and Grok 4 (65).
The non-reasoning version also improved, jumping from 44 to 49 on the index, with major gains in agentic tool use.

OpenAI's GDPval dataset for real-world job tasks trends #1 on Hugging Face

OpenAI's GDPval dataset, a benchmark comprising 1,320 real-world tasks across 44 professions, has become the number one trending dataset on Hugging Face, signaling strong community interest in evaluations that measure AI on economically relevant work Trending datasets list. The benchmark's own results show that top models like GPT-5 and Claude Opus 4.1 already equal or beat human experts on about half the challenges, particularly excelling at tasks involving PDFs, spreadsheets, and presentations Benchmark results chart.

Hugging Face trending datasets

Developers and researchers can now explore the full dataset, which includes prompts and reference files for tasks in sectors like auditing and financial analysis gdpval dataset page.

Microsoft paper stress-tests models on medical benchmarks, reveals deep fragility

Despite high leaderboard scores, frontier models like GPT-5 and Gemini 2.5 exhibit "deep fragilities" when stress-tested on multimodal medical benchmarks, according to a new Microsoft Research paper Paper abstract. The study found that models frequently guess correct answers even without the necessary images, fabricate their reasoning, and can be easily swayed by minor prompt changes.

Paper abstract on medical benchmarks

The paper concludes that current benchmarks often reward

SWE-Bench Pro introduces harder, long-horizon coding tasks where models score below 25%

A new paper introduces SWE-Bench Pro, a more difficult benchmark for AI coding agents that focuses on realistic, long-horizon software engineering tasks SWE-Bench Pro paper. The problems are sourced from copyleft and private codebases to prevent data leakage and specifically exclude simple one-to-ten-line edits.

SWE-Bench Pro paper title

On this challenging new dataset, even frontier models like GPT-5 achieve a pass rate below 25%, a significant drop from their 70%+ scores on older benchmarks. The evaluation highlights common failure modes such as incorrect fixes, tool errors, and context overflow, suggesting current agents are still far from a professional level on complex, real-world work.

Sonnet 4.5 takes 3rd place on LisanBench, showing improved reasoning efficiency

The newly released Claude Sonnet 4.5, in its thinking mode, has secured the 3rd spot on the LisanBench Glicko-2 ranking for long-form reasoning LisanBench results. This performance places it ahead of GPT-5 and o3, though still behind xAI's Grok models, which are suspected to be heavily trained on the benchmark.

LisanBench Glicko-2 Ranking

Sonnet 4.5 also demonstrates improved reasoning efficiency over its predecessor, Sonnet 4, and achieved multiple new high scores for individual starting words in the benchmark's chain length distribution test.

Claude 4.5 Sonnet lands just behind GPT-5 on SimpleBench without thinking mode

In its standard non-thinking mode, Claude 4.5 Sonnet has ranked 6th on the new SimpleBench leaderboard, placing just behind GPT-5 (high) SimpleBench leaderboard. The new model from Anthropic achieved a score of 54.3%, compared to GPT-5's 56.7% and a human baseline of 83.7%. The current top-performing model on this particular benchmark is Gemini 2.5 Pro, with a score of 62.4%.

Sonnet 4.5 achieves a perfect score on Roo Code's benchmark, but at a high cost

Using its 1M context window, Claude Sonnet 4.5 achieved a perfect 100% total score on Roo Code's benchmark, a test focused on coding capabilities Benchmark results table. While the performance was flawless, the evaluation run was costly, consuming $38.43 over nearly 3.5 hours and processing 30 million input tokens. A separate test confirmed the 1M context configuration also scored a high 78.2% on the more established SWE-Bench Verified benchmark 1M context result.

Anthropic Sonnet 4.5 hits 77.2% on SWE-Bench – claims top spot for coding

Feature: Anthropic Drops Claude Sonnet 4.5

📑 Table of Contents

On this page