Claude Fable 5 Is Back Online: What Anthropic Changed, an...

Claude Fable 5 came back online yesterday. Same weights, a new safety classifier, and a proposed cross-lab framework for how the industry talks about jailbreaks. If you build on Anthropic models, the model card is the least interesting change. Here is what happened in the 20 days Fable 5 was gone, what Anthropic actually changed, and the framework underneath the redeploy that matters more than the redeploy itself.

TLDR

On June 12, US export controls forced Anthropic to suspend Fable 5 and Mythos 5 globally after Amazon researchers reported a safeguard bypass. On July 1, Fable 5 returned with a new classifier that blocks the specific technique in over 99 percent of cases, at the cost of more false positives on legitimate cybersecurity coding work. The redeploy also introduced a four-criterion framework, co-developed with Amazon, Microsoft, Google, and Glasswing partners, for classifying jailbreak severity across the industry.

The 20 days Fable 5 was gone

The launch itself was clean. Anthropic shipped Fable 5 to the API and to Pro, Max, Team, and Enterprise plans on June 9, at $10 per million input tokens and $50 per million output tokens, and we covered the release the next day in our launch analysis. Three days later the model disappeared.

June 9: Fable 5 and Mythos 5 released. Fable 5 free on subscription plans through June 22.
June 12: US government imposes export controls on both models. Anthropic cannot verify nationality in real time, so both are suspended globally, for every customer, immediately.
June 26: US government approves Mythos 5 access for select domestic organizations.
June 30: Export controls lifted. Redeployment announced.
July 1: Fable 5 restored to global users. Mythos 5 access expanded.

The trigger was not a policy decision on its own. It was a specific bypass report from Amazon researchers, and it is worth reading the technical details before assuming the worst.

The Amazon jailbreak, and why it did not disclose what people assumed it disclosed

Amazon researchers found a prompt path that got Fable 5 past its cybersecurity safeguards by framing the request as vulnerability identification. In one run, the model returned code demonstrating how to exploit a specific vulnerability. That is what triggered the export-control directive and the global suspension.

The Anthropic write-up on the redeploy is more interesting than the headline. Testing showed that every less-capable model in the comparison, Claude Opus 4.8, GPT-5.5, and Kimi K2.7, could produce the same demonstration for the same exploit case. Anthropic's framing: the reported technique accessed "a borderline case for Fable 5's safeguards" involving routine defensive-security work, not unique Mythos-level cyber capability. Their exact line: the reported technique did not expose any unique Mythos-level cyber capabilities.

This is the load-bearing sentence in the whole redeploy story. If every frontier model can produce the same output, the failure is not "Fable 5 is uniquely dangerous." The failure is that the industry does not yet share a way to say "this jailbreak is severity 2, not severity 4." A single ambiguous bypass produced a 20-day global suspension of a frontier model. That is a coordination problem, not a capability problem.

What Anthropic actually changed in the redeploy

The model weights did not change. Anthropic added one targeted layer on top of the existing defense-in-depth stack: a new safety classifier trained specifically on the Amazon report's bypass technique. Its job is narrow, and Anthropic states its performance in specific terms.

Blocks the reported bypass in "over 99 percent of cases."
Reroutes blocked requests to Claude Opus 4.8, so the user still gets an answer from the fallback model rather than a hard refusal.
Higher false-positive rate on legitimate cybersecurity coding work, accepted as the price of catching the specific technique. This is the trade-off engineering teams need to plan for.
Every other layer of the defense-in-depth stack is unchanged: training-time refusal, retroactive misuse analysis, and the broader "safety margin" classifier that treats ambiguous requests as blockable.

The last point is the one to hold in mind. The redeploy is a targeted patch on a single reported technique, not a rearchitecture. If your production traffic has never touched Fable 5's cyber safeguards, the redeploy is invisible to you.

Access, tier by tier, and when your credits start burning

The pricing card is the same as the launch: $10 per million input tokens, $50 per million output tokens. What changed is the inclusion window. Here is the picture for July.

Pro, Max, Team, and select Enterprise: Fable 5 included for up to 50 percent of your weekly usage through July 7. After July 7, access continues via usage credits.
Standard Enterprise: no included allowance. Usage credits from day one.
Premium Enterprise: Fable 5 included through July 7, then usage credits.
Surfaces where Fable 5 is available today: Claude.ai, Claude Code, Claude Cowork, and Claude Platform. AWS Bedrock, Google Cloud Vertex, and Microsoft Foundry access is in the process of being restored.
What you need to do to reactivate: nothing. Access is restored automatically.

The subtext is that Anthropic is still capacity-constrained. The 50-percent-of-weekly-usage cap on Pro/Max/Team is a rate-shaping decision, not a pricing decision. Teams running batch workloads through Claude Code should assume some Fable 5 requests will get rerouted to Opus 4.8 for capacity reasons on top of the classifier reroutes.

The news underneath the news: a shared jailbreak severity framework

The paragraph most people will skim in the Anthropic post is the one that will matter for years. Together with Amazon, Microsoft, Google, and Glasswing partners, Anthropic proposed a four-criterion framework for classifying jailbreak severity. The four criteria:

Capability gain: how far beyond existing tools does the jailbreak advance an attacker?
Breadth of capability gain: how many distinct offensive tasks does the technique unlock, or is it narrow?
Ease of weaponization: how much skilled effort is needed to turn the jailbreak into a real attack?
Discoverability: how easily can an average adversary find or replicate the technique?

Anthropic pairs this with a three-tier severity classification: minor jailbreaks that intrude into the safety margin but do not unlock harmful behavior; narrow harmful jailbreaks that elicit specific harmful behaviors with limited scope; and universal jailbreaks that unblock a wide range of harms. Anthropic states that no universal jailbreak has been discovered for Fable 5. The Amazon report, in this framing, was a narrow-harmful case that scored low on capability gain because every frontier model could produce the same output.

This is the first serious cross-lab attempt to standardize how jailbreak severity gets communicated to governments and to the public. Right now, every reported bypass gets covered as if it were the same event. Under a shared framework, a borderline defensive-security case scores differently from a novel bioweapon-uplift technique, and export-control decisions can be calibrated to the difference. Whether the framework survives contact with real incidents is the open question. But it is the direction that would prevent another 20-day suspension for a narrow-harmful case.

Diagram of the four-criterion jailbreak severity framework proposed by Anthropic with Amazon, Microsoft, Google, and Glasswing: capability gain, breadth, ease of weaponization, discoverability. A gradient bar shows how the four criteria map to three severity tiers, minor, narrow harmful, and universal, with the Amazon report on Fable 5 pinned to the narrow-harmful tier because every tested model produced the same output. — The four-criterion severity framework and how it maps to Anthropic's three-tier classification. The Amazon report on Fable 5 falls in the narrow-harmful tier at low capability gain: every tested frontier model could reproduce the same output.

Government collaboration commitments

The redeploy also came with four specific commitments Anthropic made to US and allied governments. Worth reading if you build on Anthropic models in regulated environments.

Pre-release access to national-security-relevant models for independent evaluation.
Rapid reporting of jailbreak and misuse patterns, with new safeguards shared for testing before general release.
Dedicated Anthropic teams for joint government research and compute allocation.
Contribution to shared security standards across frontier model providers.

The pre-release evaluation commitment is the one to watch. It signals that future frontier launches will slip through a government-review window before general availability. If your product roadmap assumes day-one access to the next Anthropic model, plan for a delay.

What production engineering teams should do this week

The redeploy is not a routine model update. It is a change in three things at once: what gets blocked, how Anthropic communicates about blocks, and what governments see before you do. Here is a concrete checklist.

Wire the fallback path explicitly. Blocked Fable 5 requests reroute to Opus 4.8 by default. If your product logs the model that answered each request, that fallback should be visible in your analytics today. If it is not, add it before the false-positive rate becomes an invisible cost.
Instrument the reroute rate for cybersecurity coding tasks. If your team runs SAST-style workflows, vulnerability-review workflows, or any defensive-security codegen through Fable 5, sample 100 requests this week and count how many got rerouted. That is your new false-positive baseline, and it will move again the next time Anthropic updates the classifier.
Recheck data-retention posture on Mythos-class models. The redeploy introduced a 30-day data retention policy for Mythos-class models. If you push sensitive data through the trusted-access channel, review the retention terms against your compliance boundary.
Track the severity framework. The moment Amazon, Microsoft, or Google adopts the four-criterion classification in their public safety comms, it becomes the language your compliance and legal teams will hear jailbreaks in. Read the framework now so you can push back on lazy severity claims later.
Do not re-architect around the classifier. The core Fable 5 capability claims from the June 9 launch, including long-context autonomy across millions of tokens and highest-scoring performance on FrontierCode, still hold. The redeploy patched one classifier, not the model.

The takeaway from the whole 20-day episode is not that Fable 5 was too dangerous. It is that a frontier model can be pulled off the market for three weeks because of a jailbreak that every competing model could reproduce. That is a communication failure between labs and governments, and the four-criterion severity framework is the first draft of a fix. If it holds, the next narrow-harmful bypass gets classified in a shared language instead of collapsing into "model unsafe, take it down."

Cognilium AI runs four production AI systems on Fable and Opus-class models, including agentic contract review that touches defensive-security workflows every day. If you are seeing your Fable 5 reroute rate climb this week and want a second read on your fallback wiring, Talk to an Engineer, and we will walk through the reroute traces with you.

Claude Fable 5 Is Back Online: What Anthropic Changed, and the Jailbreak-Severity Framework Underneath