Fable 5 Classifiers: The Three Domains and How Reroute Works

Safety & cyber informational Last reviewed Updated

What the classifiers are

A classifier here is a safety check that inspects an incoming request and decides whether it falls into a gated high-risk domain. Fable 5 runs three of them:

  • Cybersecurity — offensive/exploitation-adjacent requests with meaningful dual-use uplift.
  • Biology / chemistry — requests touching dangerous biological or chemical capability.
  • Distillation — attempts to use Mythos-class outputs to train or extract a smaller model.

How flagging and reroute work

If none of the classifiers fire, the request is answered with the full Mythos-class model. If a classifier flags the request, Fable 5 invokes the classifier fallback: rather than answering at full capability, it reroutes to Claude Opus 4.8 and answers there. Crucially, the user is informed that the reroute happened — it is not silent degradation.

How often this happens

The classifiers are tuned to be narrow. Fewer than 5% of sessions hit a reroute, so the overwhelming majority of Fable 5 usage runs on the full model. The goal is to gate genuinely high-risk content without taxing ordinary work.

Fable 5 vs Mythos 5

These classifiers are the only difference between Fable 5 and Mythos 5. Mythos 5 is the identical underlying model with the classifiers lifted, available solely through Project Glasswing to vetted organizations.

Frequently asked questions

What happens to a flagged request?

It is rerouted to Claude Opus 4.8 and answered there instead of by the full Mythos-class model, and the user is notified of the reroute.

How many sessions are affected by the classifiers?

Fewer than 5% of sessions trigger a reroute. The classifiers are deliberately narrow so most usage stays on the full model.

Can I turn the classifiers off?

Not on Fable 5. The classifier-lifted version is Mythos 5, which is available only via Project Glasswing to vetted users.

Sources & further reading

Facts on this page link to their source. Quotes are kept under 15 words and attributed; figures labelled unofficial are third-party until Anthropic publishes system-card numbers.