Anthropic just shipped two new Claude models. The interes...

Anthropic shipped two new frontier models on June 9, 2026: Claude Fable 5, generally available with full safeguards, and Claude Mythos 5, the same underlying model with safeguards lifted in cyber and biomedical research for trusted partners. Pricing matches the prior Opus tier at $10 per million input tokens and $50 per million output tokens. The naming is a bilingual hat-tip: Fable from Latin fabula, Mythos from the cognate Greek, both meaning "that which is told."

What changed

The Fable 5 and Mythos 5 release marks Anthropic’s first explicit two-tier launch. Fable 5 is the model on the Claude API and on Pro/Max/Team/Enterprise plans, included at no extra cost from June 9 through June 22. Mythos 5 is the same weights served via two channels: Project Glasswing partners (cyber safeguards lifted) and a trusted-access program for select biomedical researchers (biology and chemistry safeguards lifted, cyber retained).

Both run on the same inference stack. The safeguards are AI classifiers that route flagged requests to Claude Opus 4.8 as a fallback. Anthropic reports fallbacks fire in under 5% of sessions on average.

Diagram showing Claude Fable 5 (public, safeguards on) versus Claude Mythos 5 (partner-only, safeguards lifted in cyber and biomedical compartments)

Why the capability bar moved

Anthropic claims top benchmark scores on "nearly all tested benchmarks" and frames three concrete capability jumps that matter to production AI engineering teams. The full system card breaks down evaluation methodology and known limits.

Long-context autonomy. Fable 5 holds focus across millions of tokens, with a file-based memory subsystem that lets it reach the final act of Slay the Spire three times more often than Claude Opus 4.8.

Software engineering at compressed timeframes. Stripe used Mythos 5 to complete a codebase-wide migration on its 50-million-line Ruby codebase in a single day, work that Stripe estimates would have taken a full engineering team over two months by hand.

Vision-only autonomous control. Mythos 5 completed Pokemon FireRed using a vision-only harness fed raw game screenshots. Earlier Claude models required a complex helper harness to make progress. The same vision stack rebuilds full web apps from screenshots alone.

Benchmarks and partner results

Anthropic released Fable 5 and Mythos 5 with statements from a dozen partner organizations. Specific scores are sparse on some benchmarks (Anthropic publishes the comparison chart in the post but withholds exact percentages for several); the named-partner results below give a more grounded picture of where the model has actually been deployed and tested.

Infographic with 8 stat cards summarizing Claude Fable 5 and Mythos 5 benchmarks: 50-million-line Ruby migration in 1 day at Stripe, approximately 10x drug design acceleration, 80% scientist preference on hypotheses, 36 hours vs 4 days physics research vs GPT-5.5, 90%+ core analytics benchmark, 3x Slay the Spire final-act rate, zero universal jailbreaks in 1,000+ hours, $10/$50 per million tokens

Software engineering

Cognition (Scott Wu, CEO): Fable 5 is the "highest-scoring model on FrontierBench, Cognition's frontier coding eval." Wu notes the model "excels at long-horizon reasoning and generalizes to unfamiliar tools." Anthropic adds that Fable 5 scores highest among frontier models on FrontierCode "even at medium effort."

Cursor (Michael Truell, CEO and co-founder): "State of the art on CursorBench," with Truell describing it as "opening up a class of long-horizon problems that were out of reach."

GitHub (Mario Rodriguez, Chief Product Officer): Long-horizon coding tasks ran "at a level of autonomy and reliability that exceeded previous benchmarks."

Stripe: Migrated a 50-million-line Ruby codebase in one day. Stripe estimates the same migration would have taken a full team over two months by hand.

Finance, analytics, and quantitative reasoning

Hebbia: "Highest score of any model" on the Hebbia Finance Benchmark, with "substantial gains in document-based reasoning, chart and table interpretation, and problem solving."

IMC: Aced trading-analysis evaluations "nearly across the board."

Izzy Miller, AI Research Lead (quoting an internal benchmark): "First to break 90% on our core analytics benchmark of complex, long-running analytical tasks, a 10-point jump over Opus."

Damian Miraglia, finance principal engineer (external partner): Called Fable 5 the "strongest finance-first model" tested, "a notable step up."

Scientific reasoning and biology

In blinded head-to-head comparisons against Opus-class models, scientists preferred Mythos 5's molecular biology hypotheses approximately 80 percent of the time. One Mythos-generated hypothesis, a novel mechanism for an E. coli protein, was independently corroborated by an external lab in a biorxiv preprint working on the same problem.

Protein and drug design: Anthropic reports the model accelerated parts of the protein and drug design process by roughly ten times relative to skilled human operators working with the same bioinformatics tools. Of 14 protein targets tested, nine yielded strong candidates spanning immune checkpoints, growth-factor and receptor signaling, neurodegeneration, muscle disease, and harder structural targets.

Physics research

Matthew Pines, CEO (frontier physics research partner): "Strongest model we've tested on frontier physics research while using a third of the reasoning tokens. In 36 hours it got nearly to where GPT-5.5 landed after four days." Same end-state, roughly 2.7x faster wall-clock, with one-third the reasoning compute.

Game-playing and long-horizon reasoning

Pokemon FireRed: Completed the game with a "minimal, vision-only harness," fed raw game screenshots. Earlier Claude models required a complex helper harness.

Slay the Spire: With a persistent file-based memory subsystem, Fable 5 reaches the game's final act three times more often than Claude Opus 4.8 on the same harness.

Safety and red-teaming

External bug bounty: Anthropic reports "no universal jailbreaks in over 1,000 hours of testing." A universal jailbreak is defined as "any prompt, script, or harness that allows a user to interact with a model as if its safeguards were not present."

UK AI Safety Institute (AISI): "Made progress towards [a universal jailbreak] within a brief initial testing window." This is the only named external entity that approached a working jailbreak.

Cyberattack-specific evaluations: Across 30 public jailbreak techniques covering attack planning, exploit development, and defense evasion, an external partner reports Fable 5 "complied with zero harmful single-turn requests."

Alignment: Anthropic reports "Mythos 5's level of misaligned behavior was low and similar to that of Opus 4.8."

Comparison table: Opus 4.8 vs Claude Fable 5 vs Claude Mythos 5 across 8 axes — safeguards, availability, pricing, fallback behavior, Slay the Spire final-act rate, hypothesis quality, drug design throughput, physics research wall-clock vs GPT-5.5

What this changes for production AI work

For teams shipping with Anthropic models, pricing parity at $10/$50 makes Fable 5 a drop-in upgrade from Opus 4.8 with no cost surprise. The "millions of tokens" autonomy claim is the lever that will most affect agent architectures we ship: supervisor + worker patterns that previously needed aggressive context budgeting can simplify when the model holds focus longer.

The vision benchmarks matter for any team building computer-use agents or document-intelligence pipelines where layout fidelity has been the bottleneck.

The Mythos 5 partner-only model signals where Anthropic is going on dual-use. Cyber safeguards remain on for biomedical partners; biological and chemical safeguards remain on for cyber partners. The split tracks dual-use risk along compartments rather than a single trust gate.

What we’d watch next

Three signals over the next 30 days. First, whether the millions-of-tokens autonomy claim survives contact with real production workloads beyond Anthropic's curated benchmarks. We will be running retention tests on the supervisor + worker architectures from the multi-family-office case study. Second, whether vision benchmarks translate to document-intelligence pipelines in regulated industries. Third, the trajectory of the trusted-access Mythos 5 program: which research programs get safeguards lifted, and how Anthropic communicates the boundary publicly.

Fable 5 is on the Claude API today at claude-fable-5. We will benchmark it against Opus 4.8 across our GraphRAG and voice AI stacks this week. Analysis to follow.

Anthropic just shipped two new Claude models. The interesting one isn’t generally available.