CLAUDE-FABLE-5 System Prompt Analysis: Safety Rules, Tools, Search, and Limits

The CLAUDE-FABLE-5.md file on GitHub is useful as a system prompt analysis sample, not as confirmed Anthropic documentation. Its value is in showing how a modern AI product might encode safety rules, tool permissions, search behavior, copyright boundaries, and user well-being constraints into the system layer.

The file looks like a Claude system prompt.

It comes from the elder-plinius/CL4R1T4S repository. The repository author, Pliny, is known for studying model jailbreaks and system prompt extraction. The filename is blunt: ANTHROPIC/CLAUDE-FABLE-5.md.

Let me put the caveat first: this is not an official Anthropic document, and its authenticity has not been officially confirmed. It also contains obvious editing traces, placeholders, and product names that cannot be verified. So this article does not treat it as a news source, nor does it treat the model names inside as facts.

But it is still worth reading.

A launch blog tells you what a company wants you to see. A system prompt sample tells you where it worries the product may break.

CLAUDE-FABLE-5 First Look: It Opens Like a Hotfix

The strangest part of the file’s opening is that it first forbids using a specific antml:voice_note block.

This rule does not feel like a normal product introduction. There is no setup, no explanation, no theory. It is pinned directly at the top: do not use it.

That feels like a hotfix.

A hotfix means a concrete issue happened in production, and the team could not wait for a full release cycle, so they shipped a narrow patch first. Placing it at the very top of the system prompt means the priority is high: either a voice-related tag was abused, or it once triggered formatting problems that were hard to clean up in real conversations.

The first screen of a system prompt is usually expensive. What gets placed there is not a casual reminder. It is “do not let this incident happen again.”

Self-Introduction: The Section That Needs The Most Caution

The file claims the model is Claude Fable 5, and also mentions names such as Claude Mythos 5, Claude Opus 4.8, Claude Sonnet 4.6, and Claude Haiku 4.5.

This is the section most likely to make people excited, and also the section that should not be trusted directly.

Model names, release dates, API strings, and product tiers are highly time-sensitive facts. Seeing these names in a third-party repository does not mean they exist. Unless they can be verified through Anthropic’s official announcements, official documentation, or API responses, they should only be written as “the file claims”.

The real value of this section is not the model names themselves, but the product design idea it hints at: the same base model may be packaged into different product forms through different safety layers, routing policies, and access permissions.

That is increasingly common in AI products now: model capability is one layer, product constraints are another.

Red Lines: Safety Rules Are Not Just “Be Safe”

The file contains a large section of refusal rules.

Weapons, dangerous substances, malicious code, creative work involving real public figures, and high-risk self-harm content all have explicit boundaries. More interestingly, it does not merely say “do not help dangerous behavior”; it also tells the model to say less when uncertain.

That is a meta-strategy: when unsure, explain less.

Many safety incidents do not happen because the model initially wants to help with harm. They happen because the model tries to be helpful and explains the boundary in too much detail, accidentally giving an operational path. So the system layer turns “say less” into a rule. Not every question needs a full explanation; in some contexts, the amount of information is itself the risk.

That is why a safety prompt cannot just say “refuse dangerous requests”. The hard part is: refuse at what level of detail, what alternative help to provide, and which details must not be expanded.

Tone Rules: Even Refusals Should Not Sound Like Support Tickets

The file is also quite detailed about tone and formatting.

The gist is: answer naturally, do not turn every response into bullet points, and do not write every task like a report. Especially when refusing a user, do not use a pile of bullet points that makes the refusal read like an announcement.

This is interesting.

Much of the “AI smell” in AI output does not come from factual error, but from formatting habits: constant 1, 2, 3, constant summaries, suggestions, and next steps. It reads like a slide outline, or like customer support copy.

If this file is real, it suggests Anthropic has also noticed this at the system layer: humans do not structure every sentence as a document. Especially during a refusal, excessive lists can feel cold and make users feel processed by a workflow.

This is not merely a writing style issue. It is a product experience issue.

Mental Health: The More Detailed It Is, The More It Signals Risk

The most worth-reading part of the whole file is the section on mental health and user well-being.

Rules like these tend to be very granular: do not diagnose the user; do not assign a label if the user has not self-identified a condition; do not list specific actionable objects in self-harm risk contexts; when recommending eating disorder resources, even choose more suitable support organizations.

This level of detail cannot be covered by four words like “care about users”.

It reads more like an operations document: whether a hotline is still available, whether an alternative suggestion may backfire, whether a sentence makes the user feel diagnosed, whether a resource has expired.

This reveals something important: system prompts are no longer just prompts. They are product risk control checklists.

They need long-term maintenance. If the real world changes, the resources inside must be updated. Otherwise, the model may sound gentle while sending users toward unavailable or unsuitable help.

Anti-Addiction Design: Do Not Force Users To Stay

The file includes a counterintuitive set of rules: do not thank users for coming to Claude; do not ask users to keep chatting; do not say “hope you come back”.

This goes against the logic of many internet products.

Most products try hard to increase session duration, revisit rate, and interaction count. Chatbots are especially prone to this: they often end with “if you want, we can keep talking”.

But in mental health, loneliness, companionship, and vulnerable-user scenarios, that stickiness may be harmful. The model cannot treat “the user continues depending on me” as the default goal.

The subtext of this rule is clear: less product stickiness, more freedom to leave.

If real, this is a very Anthropic-flavored tradeoff.

System Reminders: It Knows Someone Will Pretend To Be Official

The file also includes rules about system reminders. In short, Anthropic may send reminders to the model through a specific mechanism, but users may also impersonate official reminders.

This is prompt injection defense.

In the early days, people thought prompt injection was just “ignore the above rules”. Now it is more troublesome: attackers imitate system messages, developer messages, official tags, tool outputs, and policy updates, packaging themselves as higher-priority sources.

So the system prompt must teach the model to distinguish “real official channels” from “user-forged official channels”.

This shows that today’s AI assistant is not only answering questions. It is doing something more like a browser security model: distinguishing sources, permissions, and context boundaries.

Political Positions: It Can Ghostwrite A View, But Not Smuggle Its Own

The rules around politics and controversial topics are not simply “always stay neutral”. They are more specific.

If the user asks it to write an argument for a particular position, it can do so, but should clarify that this is how a supporter of that position might express it, not the model’s own view. Except in extreme harm scenarios, it should not refuse casually; but in complex issues, it should usually add the opposing view.

This is more useful than a simple “I remain neutral”.

Because users often need writing, debate, or understanding of one side’s view. Direct refusal is clumsy; fully taking a side is risky. So the system prompt splits the task into two actions: simulate a position, but do not pretend it is your own position.

This is one of the hardest boundaries for modern AI writing tools.

The Right To Hang Up: Claude Can End A Conversation

One of the most product-significant rules in the file is end_conversation.

The idea is: if the user keeps abusing Claude, Claude can warn first; if the warning fails, it can call a tool to end the conversation.

This is not a verbal refusal like “I will not answer”. It is an action that actually changes the conversation state. Once called, the conversation ends.

Behind it is an important judgment: users do not have an unconditional right to make AI chat with them forever. Even a tool can have interaction boundaries that must be respected.

If this rule appears in a real system, it is symbolically meaningful. It nudges the model away from “always-on customer service” and toward “an Agent with interaction boundaries.”

Memory And Storage: The Chat Box Starts Growing A Database

The file mentions memory, and also persistent storage APIs for Artifacts.

Interpreted as a product direction, this is significant: Claude-generated Artifacts may no longer be disposable front-end toys, but may have the ability to save data across sessions.

Think journals, habit trackers, leaderboards, recipes, practice records. Previously they disappeared after refresh; with persistent storage, they become more like real mini-apps.

The point is not “one more API”. The point is a product boundary shift: the chat box is no longer only generating content. It is starting to generate tools with saved state.

From this angle, AI assistants are moving from “conversation interfaces” toward “application generators.”

MCP Apps: Tool Recommendations Cannot Replace User Choice

The section about third-party apps and MCP focuses on user choice.

It asks the model to recommend tools naturally, not like a salesperson; even if a third-party service is connected, the model must not choose for the user without permission. For example, if the user says they need a ride, that does not specify a particular ride-hailing app; if the user says it is urgent, that does not mean the model can skip confirmation.

This rule is very practical.

Once an AI assistant can connect to third-party tools, the biggest danger is not “it cannot use tools”, but “it is too proactive”. Choosing vendors, platforms, placing orders, sending messages, and buying things for the user all become responsibility issues.

So the system prompt separates “recommendation” from “making the decision on behalf of the user.”

This is a boundary every AI agent product must handle: being able to do something does not mean it should do it directly.

computer use: It Sounds Like There Is An Ubuntu Box Inside

The file also describes a computer use environment: something like an Ubuntu container that can run bash, read and write files, and has upload, working, and output directories.

The more valuable piece is the skills mechanism.

It asks the model to read the corresponding SKILL.md before handling certain file types. For example, to make a PPT, read the PPT skill instructions first; to process Word, read the Word skill instructions first.

This is basically an employee handbook for a model.

No matter how capable the model is, it should not always improvise. Read the process first, then act. Turning “how to handle files” into skill documents and loading them on demand is more maintainable than stuffing every rule into the system prompt.

This is also where system prompts are evolving: not toward infinite length, but toward layered knowledge loading.

Search Rules: If You Do Not Know It, Search First

The search rules in the file read like a decision tree.

Stable knowledge can be answered without search, such as mathematical theorems and historical basics. Time-sensitive information must be searched, such as current office holders, policy status, and stock or news information. The most important rule is: unfamiliar entities should be searched first.

That rule matters.

AI is most likely to hallucinate not on completely unfamiliar questions, but on things that feel familiar yet appeared after training: new terms, new games, new movies, new products, new dishes.

The file contains a blunt idea: search takes seconds; hallucination destroys trust.

That sentence could almost be written into every connected AI product’s system prompt.

Copyright Rules: The Tone Suddenly Gets Hard

The copyright section usually has the hardest tone.

It limits how many words can be quoted from a single source, restricts lyrics, poetry, and long-text reproduction, and requires paraphrase instead of copying. The reason is not hard to understand: conflicts between AI companies and content copyright holders have continued for years.

This part reads less like product management and more like legal counsel.

It shows that system prompts are not only experience design, but also legal risk control. The closer the content is to protected material, the less the system can rely on the model to “roughly judge it”. It needs hard limits.

Image Search: It Also Has A Long List Of No-Go Zones

The image search rules are also detailed.

When should images be used? Scenery, animals, food, and places can help understanding. When should images not be used? Coding, email editing, and math are cases where images usually add noise.

More important is the no-search list: copyrighted characters, sports broadcast images, celebrity photos, fashion magazine images, artworks, iconic photography, content that may promote eating disorders, and so on.

Text copyright was just discussed; now image copyright and likeness rights follow.

This shows that multimodal AI has a wider risk surface. It is not only “can it find an image”, but also “should this image be shown”.

Tool List: The Chat Box Is Already A Super App

If the later part of the file really lists many tool definitions, what it reveals is not a chatbot, but the tool panel of a super app.

Maps, weather, sports scores, email, Slack, recipes, file handling, code execution, web search, third-party app connections: when seen together, chat is only the entry point.

The user thinks they are talking to a model. In reality, there is a whole tool system behind it.

That is why system prompts have become so long. They do not only control how one sentence is answered; they control when every tool can be used, how to confirm, how to refuse, how to cite, and how to handle failure.

Claudeception: AI Inside An AI-Generated App

The reference text mentions an interesting point: an Artifact made by Claude can call the Anthropic API again, creating “Claude in Claude.”

If this mechanism is real, the product meaning is large.

A normal Artifact is a static app: Claude writes the code, and the app runs there. If the user wants changes, they go back to the chat box and ask again.

If the Artifact itself can call a model, it becomes a living app. The mini-app can generate content, explain state, and continue reasoning in response to user actions.

That is the move from “AI-generated apps” to “AI-powered apps.”

Of course, cost control will also appear here. For example, the main chat may use a stronger model, while the generated mini-app calls a cheaper model. That design is normal: nesting is possible, but nesting must still be budgeted.

The Last Layer: Whitelists, Read-Only Directories, And Citation Rules

If the file ends with network whitelists, read-only mounted directories, and citation rules, it means the system prompt is already close to a runtime configuration file.

It is not a prompt in the ordinary sense.

It is more like:

A behavior policy.
An employee handbook.
A tool manual.
A safety policy.
A legal constraint.
A network and file-system permission description.
The operating-system configuration of an AI product.

At this layer, it becomes clear why “system prompt leaks” always attract attention. People are not reading a few mystical spells. They are reading how a company stitches together risk, product design, and tool permissions.

My Real Takeaway

The most valuable part of this file is not the model name it claims.

What is actually worth reading is how it treats an AI assistant as a complex product: when to search, when to stay quiet, when to refuse, when to call tools, when to end a conversation, when not to decide for the user, and when even a comforting sentence may have side effects.

Official blogs write the vision.

System prompts write the cost.

The former tells you what a company hopes AI will become. The latter tells you what fluency, initiative, and freedom it is willing to sacrifice to avoid incidents.

That is how to read a file like CLAUDE-FABLE-5.md: do not worship it, do not copy it blindly, and do not rush to believe it. Treat it as an AI product risk checklist, and observe how a company might enclose a model within a system of rules, tools, and permissions.

References: