JuliusBrussee/caveman is an output-compression skill/plugin for Claude Code, Codex, Gemini, Cursor, Windsurf, Cline, Copilot, and other agent tools. Its goal is simple: make agents say less fluff, keep the technical point, and reduce output tokens.
The README slogan is “why use many token when few do trick.” In normal technical language, that means compressing away polite padding, long explanatory phrasing, and repeated confirmations, leaving the conclusion, cause, fix, and necessary code. It does not reduce reasoning/thinking tokens, and it does not make the model less capable. It mainly affects the visible final answer.
This is slightly counterintuitive. Many people treat “detailed” as “reliable,” but in programming work, long answers often bury the useful part. caveman’s idea is: keep the technical judgment accurate; make the delivery shorter.
What problem caveman solves
When using agents for coding, the issue is often not that the model cannot answer. It is that it answers too much:
- it repeats your question;
- adds a polite opening;
- explains background you already know;
- only then gives the command, file, bug, or fix you need.
That consumes output tokens and slows reading. In long sessions, it also adds low-density text to the context.
caveman gives the agent a skill rule set:
- remove filler;
- keep technical substance;
- use short sentences and fragments;
- preserve code, commands, paths, and error strings exactly;
- compress in the current language instead of forcing a language switch.
So it is not making the agent dumb. It is making the agent deliver the same information with fewer words.
A typical comparison
The README gives a React re-render example. A normal answer explains that the component re-renders because a new object reference is created on every render, React shallow comparison sees the prop as changed, and useMemo should be used. caveman compresses that into something like:
|
|
Same conclusion, less padding. For developers who already understand the context, that density is useful.
It also emphasizes language preservation. If you ask in Chinese, the answer should stay Chinese. If you use Portuguese, Spanish, or French, the style is compressed, not translated to English. Code, commands, and error strings should remain exact.
Installation
The official installer is a one-liner that detects available agents and installs caveman into the matching locations.
macOS / Linux / WSL / Git Bash:
|
|
Windows PowerShell 5.1+:
|
|
The README says installation usually takes about 30 seconds and requires Node.js 18 or later. The script skips agents that are not installed locally and is safe to rerun.
For the full installation matrix, read:
|
|
Trigger and exit
After installation, you can usually type:
|
|
or say:
|
|
To exit, say:
|
|
It supports several compression levels:
|
|
Roughly:
lite: removes filler and repetition;full: default compressed style;ultra: more telegraphic;wenyan: classical Chinese style for even shorter Chinese output.
In practice, start with lite or full. ultra is shorter, but it can hurt readability on complex tasks.
What you get
The README lists these main capabilities:
| Feature | Use |
|---|---|
| `/caveman [lite | full |
/caveman-commit |
Conventional Commit messages with subject under 50 characters |
/caveman-review |
One-line PR comments such as L42: bug: user null. Add guard. |
/caveman-stats |
Real session token usage, lifetime savings, and estimated cost |
/caveman-compress <file> |
Compress memory files such as CLAUDE.md or project notes while preserving code, URLs, and paths |
caveman-shrink |
MCP middleware for compressing MCP server tool descriptions |
cavecrew-* |
Compressed investigator / builder / reviewer subagents |
/caveman-compress <file> is worth noting. Long-term agent cost does not only come from output. It also comes from instruction files, preferences, and project notes injected into every session. Compressing those files reduces repeated low-density context.
How to read the benchmarks
The README claims that, based on real Claude API token counts, 10 prompts saw an average 65% output reduction, ranging from 22% to 87%. Example tasks include:
- explaining a React re-render bug;
- fixing auth middleware token expiry;
- setting up a PostgreSQL connection pool;
- explaining git rebase vs merge;
- reviewing PR security issues;
- debugging a PostgreSQL race condition;
- implementing a React error boundary.
The README also says the eval harness has three arms: baseline, terse, and skill. That matters because caveman is not only compared with verbose defaults, but also against a concise prompt such as Answer concisely.
Still, benchmark numbers need caution. Shorter output does not make every task better. Simple bug localization, PR review, commit messages, and command explanations are good candidates. Requirement clarification, architecture tradeoffs, tutorials, and complex incident reviews may still need fuller explanation.
caveman-compress for long-term context
The README also shows memory-file compression receipts for files like claude-md-preferences.md, project-notes.md, claude-md-project.md, todo-list.md, and mixed-with-code.md, averaging about 46% reduction.
That can matter more than one-off answer compression because files such as CLAUDE.md, project rules, and preference notes often enter every session. If they contain lots of redundant wording, the long-term cost is real.
Compress rule files carefully:
- preserve code, URLs, paths, and commands byte-for-byte;
- do not remove safety constraints, permission boundaries, or test requirements;
- do not turn required workflows into vague suggestions;
- reread the compressed file manually to confirm the rules still mean the same thing.
For team projects, start with personal preferences and notes. Do not immediately compress critical safety policies.
MCP middleware: caveman-shrink
caveman-shrink is an MCP middleware that compresses MCP server tool descriptions. This is practical because tool descriptions are often injected repeatedly into model context. If the server exposes many tools with verbose descriptions, input tokens grow quickly.
Good fit:
- the MCP server exposes many tools;
- tool descriptions are repetitive or bulky;
- the agent often reads the full tool list;
- you want to reduce description tokens without modifying the original MCP server.
Poor fit:
- tool descriptions are already short;
- descriptions contain strict parameter constraints that should not be compressed;
- you have not verified whether compression changes tool-use accuracy.
MCP description compression needs more care than normal answer compression because it affects how the model selects and calls tools.
OpenClaw integration
The README also covers OpenClaw integration. You can install only for OpenClaw:
|
|
On Windows:
|
|
Installation does two main things:
- Places the skill at:
|
|
- Appends a marker-fenced block to:
|
|
so OpenClaw injects the concise style rule every turn.
For a custom workspace:
|
|
To uninstall, rerun the same installer with --uninstall; it removes the skill folder and cleans the marker block from SOUL.md.
Best-fit scenarios
caveman works best for:
- daily code changes;
- bug localization;
- commit messages;
- PR reviews;
- short command explanations;
- collaboration sessions where context is already clear;
- cases where agent output is too verbose and you only need the conclusion and fix.
It is less suitable for:
- tutorials;
- beginner-facing explanations;
- product or architecture long-form docs;
- complex decisions that need a fuller reasoning path;
- legal, medical, financial, or other high-risk advice needing careful qualifiers;
- unclear problems that need substantial clarification.
In short: caveman is good for “I know the background, tell me what to change.” It is not good for “I am learning this for the first time, start from the concept.”
Usage advice
If you want to try it:
- Install it in your personal agent environment first, not as a team default.
- Use
liteorfullfor a few days and watch quality on bug fixes, reviews, and commit messages. - Switch manually to
normal modefor complex tasks. - Try
/caveman-compress <file>later, starting with low-risk note files. - If compressing MCP tool descriptions, compare tool-call accuracy in a non-production environment first.
The ideal state is not that every answer becomes extremely short. It is that the agent can switch between “short, accurate, enough” and “explain with context.”
Summary
caveman is an interesting small tool. It does not try to make the model smarter; it controls expression density. For people who use Claude Code, Codex, Gemini, or Cursor heavily every day, fewer output tokens, less filler, and earlier conclusions can make the workflow lighter.
But it is not a universal default. Compressed answers fit execution-heavy development tasks; they do not always fit teaching, architecture discussion, or high-risk decisions. The right way to use it is as a switchable work mode: turn it on when you need speed and density, turn it off when you need explanation and context.