Claude Code on KnightLi Blog

Claude Opus 4.8 Released: Anthropic Keeps Strengthening Coding and Agent Tasks

Fri, 29 May 2026 15:22:47 +0800

Anthropic released Claude Opus 4.8 on May 28, 2026. This is a new version in the Opus series, and the official positioning is clear: it is not a generational renaming, but a continued improvement over Opus 4.7 in coding, agent tasks, reasoning, and expert knowledge work.

This update certainly matters for regular chat users, but Claude Code and long-running agent scenarios are the more interesting part. Anthropic describes Opus 4.8 as a more reliable collaborator: in complex tasks, it should be better at judging when to ask questions, when to move forward, and when to handle things conservatively.

Key points in this update

Claude Opus 4.8 is now available, with pricing unchanged. Anthropic also highlighted several accompanying changes:

Opus 4.8 continues to improve over the previous generation in coding, agent capabilities, reasoning, and knowledge work evaluations.
claude.ai users can control how much effort Claude spends on a task.
Claude Code adds dynamic workflows for handling larger-scale problems.
Opus 4.8’s fast mode can work at roughly 2.5x speed and is three times cheaper than the previous model’s fast mode.

Taken together, these changes show that Anthropic is not merely making a small model-score upgrade. It is reshaping the product around “running complex tasks for a long time.” A stronger model is only one part of that; task control, workflow decomposition, and cost structure matter just as much.

Why Claude Code users should pay closer attention

For a coding agent like Claude Code, the biggest risk is not failing to write a single function, but getting lost inside a real repository. It needs to read files, understand dependencies, run tests, inspect errors, revise its plan, and keep changes within a reasonable scope.

Opus 4.8’s selling points line up closely with these problems:

It is better suited to agentic tasks, meaning tasks where the model must keep planning, call tools, observe results, and adjust strategy.
It puts more emphasis on judgement, so it can stop and confirm when uncertain instead of confidently writing the wrong thing all the way through.
Dynamic workflows make Claude Code better suited to large, multi-step problems.

If these abilities prove stable in real projects, Claude Code will feel closer to “give it a clear goal and let it push forward” instead of only asking it to fill in a piece of code.

What effort control means

Anthropic added effort control to claude.ai this time, and the meaning is straightforward: users can adjust how much energy the model spends on a task.

That is very practical for everyday use. Simple questions do not need deep reasoning, while complex tasks are worth giving the model more time to think. In the past, many users could only express “be more careful” or “answer quickly” through prompts. Now this kind of control is starting to appear in the product layer.

For developers, this is also a signal: future agent products will not expose only “which model to choose.” They will also expose more execution strategies, such as speed, cost, reasoning depth, tool-call aggressiveness, and risk preference.

The cost change in fast mode matters

Anthropic says Opus 4.8’s fast mode can reach roughly 2.5x speed, while costing much less than the previous model’s fast mode.

This point is easy to miss under the model-capability headlines, but it matters a lot for real workflows. Many agent tasks do not run just once; they repeat:

Generate an initial draft
Run tests
Fix failures
Run tests again
Continue revising based on review

If fast mode is cheap enough, teams will be more willing to put it into high-frequency workflows instead of using the top model only occasionally for critical tasks. Once speed and cost come down, agents can more easily move from “demo effect” to “everyday tool.”

Its relationship to Opus 4.7

Opus 4.8 feels more like a usability-focused enhancement. It inherits Opus 4.7’s positioning, but pushes further into coding, agent tasks, and professional work.

Based on Anthropic’s wording, Opus 4.8 is not just better at answering. It is better at collaborating. During a task, it should be clearer about when it needs information, when a plan is shaky, and when it should build confidence before making large changes.

These capabilities are hard to judge from a single benchmark. The real test is how it performs in large repositories, complex business rules, long-context tasks, and multi-round fixes.

Impact on AI coding competition

In 2026, model competition has clearly shifted from “chat ability” to “can it get work done.” OpenAI, Anthropic, Google, and xAI are all binding models more tightly to toolchains: models handle reasoning, tools handle execution, and the product layer keeps tasks within a controllable range.

The release of Claude Opus 4.8 continues this trend. Its focus is not showing off one isolated capability, but strengthening three links:

The model itself is better suited to code and agent tasks.
Claude Code can break down larger workflows.
The product layer is starting to offer execution controls such as effort and fast mode.

For developers, the practical meaning is that choosing a model cannot be only about “which one is smartest.” You also need to ask whether it fits the tool you use, whether it can call tools reliably, whether the cost of long tasks is acceptable, and whether it is easy to correct when it fails.

My take

Claude Opus 4.8 is a pragmatic update. It does not build the story around an exaggerated new parameter, but keeps filling in what agent workflows need most: judgement, stability, speed, cost, and task control.

If you already use Claude Code, this update is worth trying soon. It is especially suitable for comparison on long tasks in real repositories, such as cross-module refactors, test fixes, documentation sync, and complex bug hunting.

If you are only a regular chat user, Opus 4.8 may not feel immediately stunning in the way a new model generation does. But as a product-direction signal, it shows Anthropic is still pushing Claude toward “reliably executing complex work.”

Original link: Introducing Claude Opus 4.8

CLIProxyAPI: Wrapping Codex, Claude Code, and Gemini CLI into a Unified API

Sun, 24 May 2026 10:03:33 +0800

CLIProxyAPI is a very practical, community-engineering kind of project. It is not another large model, and it is not merely an API forwarder. Instead, it repackages a set of AI tools that are originally interactive, CLI-oriented, or OAuth-login-oriented into a unified API service.

It supports Gemini CLI, OpenAI Codex, Claude Code, Amp CLI, AI Studio Build, and upstream OpenAI-compatible services. In plain terms, it tries to answer this question:

I have CLI tools, subscription accounts, and OAuth login sessions. Can I connect these capabilities to my own client, scripts, IDE, or internal services just like calling a normal API?

CLIProxyAPI’s answer is yes: put a proxy layer in the middle and translate CLI capabilities from different sources into OpenAI-, Gemini-, Claude-, and Codex-compatible interfaces.

The Real Pain Point It Solves

Many AI coding tools are powerful, but their default usage patterns are not automation-friendly.

For example:

Gemini CLI can log in with an account, but your program may prefer calling an HTTP API.
Claude Code is excellent for interactive coding, but integrating it into other clients can run into protocol mismatches.
Codex CLI supports OAuth login and Responses-style capabilities, but not every upper-layer tool knows how to talk to it.
A team may have multiple accounts and need rotation, load balancing, unhealthy account removal, and quota visibility.
You may want some tools to see only an OpenAI-style interface, while the backend is actually Gemini, Claude, or Codex.

CLIProxyAPI is positioned as the protocol adaptation layer between these tools and your clients.

It hides the complex side behind the scenes: OAuth, CLI login, multiple accounts, different protocols, and different providers. On the front side, it exposes familiar interfaces such as OpenAI Chat Completions, OpenAI Responses, Gemini, Claude Messages, and Codex-related endpoints.

Capability Overview

According to the official README and documentation, CLIProxyAPI currently focuses on:

Providing OpenAI-, Gemini-, Claude-, and Codex-compatible API endpoints for CLI models.
Connecting OpenAI Codex and Claude Code through OAuth login.
Supporting streaming and non-streaming responses, plus WebSocket in some scenarios.
Supporting function calling, tool calling, and multimodal input.
Supporting multi-account rotation and load balancing for Gemini, OpenAI, and Claude.
Supporting Gemini AI Studio API keys.
Supporting account pools for AI Studio Build, Gemini CLI, Claude Code, and OpenAI Codex.
Connecting OpenAI-compatible upstreams through configuration, such as OpenRouter.
Providing a Go SDK so the proxy capability can be embedded into your own services.

The most valuable part of this kind of project is not that it supports a few more model names. It is that it packages account login, protocol translation, and request routing into one operational layer.

Who It Is For

CLIProxyAPI is better suited to several groups of users.

The first group is heavy AI coding users. You already use Codex, Claude Code, and Gemini CLI, but you want to connect them to Cursor, Cline, RooCode, Amp, internal scripts, or custom workflows.

The second group is people with multiple account pools. For example, you may have several Gemini, OpenAI, or Claude login sessions and do not want to switch manually. You want automatic rotation, balanced usage, and quick troubleshooting when an account becomes abnormal.

The third group is people building internal team gateways. The team may not want every client to separately adapt to Gemini, Claude, and Codex. Instead, it wants one middle layer that exposes a unified API.

The fourth group is people who like working with protocols. You may care how Responses, Chat Completions, Claude Messages, and Gemini v1beta interfaces can be converted between one another, or you may want to switch backends from the same client.

If you only ask AI a few questions occasionally, or only use the official apps for chat, the deployment and maintenance cost of CLIProxyAPI may feel heavy.

How It Differs from a Regular API Proxy

A typical API proxy looks like this:

`1`	`Client -> Proxy API -> Upstream model API`

CLIProxyAPI is closer to this:

`1`	`Client -> CLIProxyAPI -> CLI / OAuth session / account pool -> model service`

The difference is that it handles more than API key forwarding. It also deals with CLI tools, OAuth accounts, protocol surfaces, and model aliases.

Tools such as Codex and Claude Code are not traditional “give me one API key and I can call it stably” services. CLIProxyAPI wraps their login sessions and calling logic so external clients can access them as if they were normal APIs.

That is what makes it attractive, and also what makes it complex.

Common Misunderstandings

First, do not assume that a unified /v1/... path eliminates all protocol differences.

The CLIProxyAPI documentation specifically notes that when you need the request and response shape of a certain backend type, you should prefer provider-specific paths. For example, use /api/provider/{provider}/v1/messages for messages-style requests, /api/provider/{provider}/v1beta/models/... for Gemini model paths, and /api/provider/{provider}/v1/chat/completions for chat-completions-style requests.

A unified entry point is convenient, but the semantics of different protocols do not disappear. Tool calling, streaming responses, multimodal input, and system message handling may all differ by backend.

Second, a model name does not uniquely identify a backend.

If multiple backends expose the same client-visible model name, the path alone may not lock the request to the backend that actually performs inference. To strictly pin a backend, use unique aliases, prefixes, or avoid exposing the same model name from multiple backends.

Third, multi-account rotation is not unlimited quota.

Rotation only spreads usage more evenly across the account pool. It cannot bypass the real limits of upstream services. Abnormal accounts, exhausted quota, risk controls, and expired OAuth sessions still need monitoring.

Fourth, it is not a maintenance-free magic box.

Once you put it into your daily workflow, you need to care about configuration, logs, upstream account status, version upgrades, client compatibility, and security boundaries.

Management and Monitoring

The official README notes that since v6.10.0, CLIProxyAPI and CPAMC no longer include built-in data statistics. If you need usage statistics, you can use separate projects:

CPA Usage Keeper: syncs CLIProxyAPI data into SQLite and provides aggregation APIs and a dashboard.
CLIProxyAPI Usage Dashboard: a local-first usage and quota dashboard that can show accounts, models, time windows, and remaining Codex quota.
CPA-Manager: a fuller management center for request monitoring, cost estimation, account pool inspection, abnormal account location, and cleanup suggestions.

This suggests that CLIProxyAPI’s core is closer to a proxy and protocol layer, not an all-in-one commercial admin backend. If a team uses it, logs, monitoring, and account pool management should be considered from the beginning.

A Reasonable Way to Try It

If you want to test it, a safer order is:

Start it with the official Quick Start documentation.
Connect only one provider first, such as Gemini CLI or Codex, and confirm basic requests work.
Then test higher-risk capabilities such as streaming responses, tool calling, and multimodal input.
Confirm which endpoint the client actually uses, and avoid mixing protocol paths.
Finally add multi-account rotation, management panels, and usage statistics.

Do not connect Gemini, Codex, Claude, OpenRouter, multiple accounts, and all clients at once from the start. When something breaks, it becomes hard to tell whether the issue is authentication, protocol conversion, model naming, or the upstream account.

Think Through the Security Boundary

CLIProxyAPI can touch account login sessions, API keys, OAuth-related credentials, and request contents. If it only runs on your own machine, the risk is relatively manageable. If it is exposed to the public internet or a team intranet, authentication, access control, log redaction, and network isolation become mandatory.

Management endpoints especially should be limited to localhost or a trusted internal network. Do not expose management interfaces directly just to save a few minutes.

Conclusion

CLIProxyAPI’s value is that it gathers AI capabilities scattered across multiple CLIs, accounts, and protocols into one programmable API layer.

It fits heavy AI coding users, multi-account users, and internal team gateway scenarios. It is less suitable for lightweight users who want something completely plug-and-play with no maintenance.

If you are already experimenting with Codex, Claude Code, and Gemini CLI, and want to connect them to your own client or automation workflow, CLIProxyAPI is worth a serious look. Treat it as infrastructure, not as a disposable small utility.

References:

What is CodeGraph? A local code map for Claude Code, Codex, and Cursor

Sat, 23 May 2026 21:09:46 +0800

CodeGraph is a local code knowledge graph designed for AI coding tools. It indexes a project ahead of time and organizes symbol relationships, call graphs, code structure, route relationships, and related information into a queryable graph. That lets Claude Code, Codex CLI, Cursor, OpenCode, Hermes Agent, and similar tools avoid relying on grep, glob, Read, and exploratory subagents every time they need to understand a project.

It solves a very practical problem: when an AI Agent works on a large codebase, much of the cost is not spent on changing code, but on finding where the relevant code lives. If every task starts with repeated searches, reads, and filtering, tokens, time, and tool calls are wasted. CodeGraph tries to turn the repository into a local map first, so the agent can ask the map before deciding which files to read.

What pain points does it address?

AI coding tools usually work well in small projects. There are few files, search is fast, and reading files is cheap. In larger projects, common problems appear:

The agent repeatedly calls grep, find, ls, and Read just to understand one module.
Exploratory subagents read many irrelevant files, while the main task context remains unclear.
Architecture questions spend too many tokens locating files.
Before changing a function, it is unclear who calls it and what it calls.
In web projects, URL routes and handler functions are not always obvious.

CodeGraph tries to move this “find the way first” work earlier. Once the project index exists, the agent can query related symbols, callers, callees, impact scope, and code snippets directly.

Installation

The project provides cross-platform installation scripts and does not require users to prepare Node.js manually:

`1`	`curl -fsSL https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.sh \| sh`

On Windows PowerShell:

`1`	`irm https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.ps1 \| iex`

If you already have a Node environment, you can use npm directly:

`1`	`npx @colbymchenry/codegraph`

Or install it globally:

`1`	`npm i -g @colbymchenry/codegraph`

The installer detects and configures installed agents such as Claude Code, Cursor, Codex CLI, opencode, and Hermes Agent. It writes the relevant MCP server configuration and instruction files so those tools know when to call CodeGraph.

Initializing a project

After installation, build an index inside the target project:

1
2

cd your-project
codegraph init -i

This command creates a project-level knowledge graph index. The README notes that as long as a .codegraph/ directory exists in the project, agents can automatically use CodeGraph tools.

To stop using it, you can remove the global configuration:

`1`	`codegraph uninstall`

That removes the MCP server configuration, instructions, and permissions written by the installer. The .codegraph/ index in the project is not deleted automatically. To remove the project index, use codegraph uninit.

Why it helps agents

Tools like Claude Code, Codex CLI, and Cursor often explore before making changes: find files, read entry points, inspect references, and follow call chains. For humans this feels like browsing a project. For models, it becomes a series of tool calls and context cost.

CodeGraph turns that into index queries. An agent can first use codegraph_context to find relevant entry points, symbols, and snippets, then use codegraph_explore or other tools to read the necessary details. The benefits are:

Fewer irrelevant files read.
Fewer search tool calls.
Faster discovery of relevant code.
Clearer impact scope before edits.
Easier answers to architecture questions in large repositories.

The README reports benchmark results across seven real open source repositories comparing runs with and without CodeGraph. On average, enabling CodeGraph reduced cost, tokens, latency, and tool calls. The exact numbers depend on project size, language, question type, and agent behavior, but the direction is clear: the larger the repository, the more valuable pre-indexing becomes.

Core capabilities

1. Smart context construction

One tool call can return entry points, related symbols, and code snippets, reducing the need for the agent to launch many exploratory tasks before filtering the results. This is useful for architecture understanding, module location, and feature entry-point analysis.

2. Full-text search

CodeGraph uses FTS5 for full-text search, letting it quickly search names and text across the codebase. It does not replace every grep use case, but it gives the agent a more structured first stop.

3. Impact analysis

Before changing a function, class, method, or route, the agent can query callers, callees, and impact radius. This is especially useful for refactoring, bug fixing, and deleting old code, where missing upstream or downstream calls is the main risk.

4. Automatic freshness

The README says CodeGraph uses native filesystem events such as FSEvents, inotify, and ReadDirectoryChangesW, along with debounced auto-sync. In practice, the index updates as local code changes, so users do not need to rebuild it manually after every edit.

5. Multi-language support

The project lists support for more than 19 languages, including TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Lua, Luau, Svelte, Liquid, and Pascal / Delphi.

That makes it suitable for multi-language repositories and full-stack projects, not just one language.

6. Web route awareness

CodeGraph also detects route files and route declarations in many web frameworks, connecting URL patterns with handler functions. The README mentions Django, Flask, FastAPI, Express, NestJS, Laravel, Rails, Spring, Gin, Axum, ASP.NET, Vapor, React Router, SvelteKit, and others.

This is practical because the real entry point of many web projects is not an obvious main function, but routes, controllers, handlers, views, or resolvers. If an agent can first understand the URL-to-handler relationship, it can understand business flow much faster.

Local-first design

CodeGraph emphasizes being 100% local. It does not require an API key or external service. Index data is stored in a local SQLite database.

For enterprise projects, private repositories, or sensitive code, this matters. The concern with AI coding tools is often not only “can they find the code?”, but “will the code structure and index be sent elsewhere?” CodeGraph is positioned as local indexing, local querying, and local service for agents.

Of course, local indexing also means considering disk usage, indexing time, file watching, and project size. Very large repositories still require resources for initial indexing and later synchronization.

Suitable scenarios

CodeGraph is a good fit for:

Large codebases where architecture and call-chain questions are common.
Teams using Claude Code, Codex CLI, Cursor, or similar agents for code understanding and edits.
Reducing random file reads, broad searches, and repeated exploration by agents.
Analyzing impact before code changes.
Web projects with complex routing, where URL-to-handler lookup matters.
Teams that want a more stable local project index for AI agents.

For a small project with a few dozen files, normal search may be enough and CodeGraph’s advantage may not be obvious. It is most valuable in medium-to-large repositories and workflows where agents do a lot of exploration.

Things to watch out for

First, CodeGraph is not a substitute for code review or testing. It helps agents find relevant code faster, but it does not guarantee that their changes are correct.

Second, index quality affects results. If a project has complex structure, lots of generated code, mixed languages, or unignored build artifacts, the index may become noisy. Before using it seriously, check .gitignore, project layout, and indexing scope.

Third, MCP configuration and agent instructions matter. The README also warns that CodeGraph helps only when it is queried properly. If an agent ignores it and still reads many files directly, pre-indexing becomes extra overhead.

Fourth, even though it is local, permissions still matter. The installer writes agent configuration and permission lists. In team environments, review those configurations centrally.

Summary

CodeGraph can be understood simply: it gives AI agents a local map of the codebase. It does not make the model smarter; it helps the model get less lost.

When tools like Claude Code, Codex CLI, and Cursor face large repositories, the expensive part is often exploration. CodeGraph uses pre-indexed symbol relationships, call graphs, route graphs, and full-text search to handle “where is the code?” earlier, leaving more budget for understanding and editing.

If you already use AI coding tools in real projects and often see the agent read many files without finding the point, CodeGraph is worth trying. It represents an important direction for AI coding tools: not only stronger models, but better local code context for those models.

References:

GitHub project: https://github.com/colbymchenry/codegraph

Claude Code has a plugin marketplace now: what you can install, how to install it, and what to watch out for

Sat, 23 May 2026 19:03:30 +0800

anthropics/claude-plugins-official is the official Claude Code plugin directory managed by Anthropic. It is not just a normal code repository. It is a marketplace that Claude Code’s plugin system can use directly, collecting Claude Code plugins maintained or curated by Anthropic.

This repository matters because Claude Code is moving from “an AI coding command-line tool” toward “an extensible development environment.” Plugins can package Skills, Agents, Hooks, MCP servers, LSP servers, background monitors, and default settings so teams and communities can distribute them in a consistent way.

What is this repository?

The README describes it directly: it is a curated directory of high-quality Claude Code plugins.

The directory is mainly split into two parts:

/plugins: plugins developed and maintained internally by Anthropic.
/external_plugins: third-party plugins from partners and the community.

In other words, it contains both official capabilities and curated external ecosystem entries. For regular users, the direct value is that plugins can be discovered and installed through Claude Code’s /plugin system. For developers, it is a useful window into Claude Code’s plugin format and ecosystem direction.

How to install plugins

The README gives a simple installation command. You can install directly through Claude Code’s plugin system:

`1`	`/plugin install {plugin-name}@claude-plugins-official`

You can also open the plugin discovery entry inside Claude Code:

`1`	`/plugin > Discover`

The key part is @claude-plugins-official, which refers to the official plugin marketplace. According to the Claude Code documentation, claude-plugins-official is the official marketplace maintained by Anthropic and is available by default in Claude Code installations.

What does a plugin look like?

The repository README shows a standard plugin structure:

plugin-name/
├── .claude-plugin/
│   └── plugin.json
├── .mcp.json
├── commands/
├── agents/
├── skills/
└── README.md

.claude-plugin/plugin.json is the metadata file, usually declaring the plugin name, description, version, author, and related fields. Other directories are optional and depend on what the plugin provides:

skills/: instructions for skills Claude can invoke automatically.
commands/: slash commands.
agents/: custom agent definitions.
hooks/: event-triggered logic.
.mcp.json: MCP server configuration.
.lsp.json: language server configuration.
monitors/: background monitor configuration.
settings.json: default settings shipped with the plugin.

This means a Claude Code plugin is not one single kind of extension. It is a packaging format. A plugin can be a tiny command, or it can be an entire workflow for a specific stack.

What directions are already in the official directory?

The /plugins directory already covers many development scenarios, including:

LSP plugins: typescript-lsp, pyright-lsp, rust-analyzer-lsp, gopls-lsp, clangd-lsp, csharp-lsp, jdtls-lsp, kotlin-lsp, lua-lsp, php-lsp, ruby-lsp, swift-lsp.
Programming workflows: code-review, feature-dev, code-modernization, code-simplifier, commit-commands, pr-review-toolkit.
Claude Code configuration and plugin development: claude-code-setup, claude-md-management, plugin-dev, skill-creator, mcp-server-dev.
Output styles and specialized capabilities: explanatory-output-style, learning-output-style, security-guidance, session-report, math-olympiad.

The /external_plugins directory points toward more third-party tools and services, such as github, gitlab, linear, asana, firebase, playwright, terraform, context7, serena, telegram, and discord.

Together, these plugins suggest a trend: Claude Code does not only want to edit files. It also wants to connect with code intelligence, project management, cloud services, testing, infrastructure, and team collaboration tools.

Why the plugin system matters

Previously, many Claude Code customizations could live inside a project’s .claude/ directory, such as commands, agents, skills, or hooks. That works for personal workflows or one project, but it is not ideal for reuse across projects or consistent team distribution.

Plugins solve the reuse and distribution problem:

The same configuration can be installed across multiple projects.
Commands and skills are namespaced, reducing conflicts.
Plugins can be published and updated through a marketplace.
Teams can package internal best practices as standard plugins.
The community can maintain extensions for specific frameworks, languages, or services.

This resembles VS Code extensions, JetBrains plugins, or browser extensions. Once a tool has a stable plugin ecosystem, it is no longer just a single product; it starts becoming a platform.

What does it mean for developers?

If you are only a Claude Code user, the most practical use of this repository is finding plugins. For example, if you need LSP support for TypeScript, Python, Rust, or Go, you can first check whether the official directory already has the corresponding plugin. If you need PR review, commit helpers, or code modernization workflows, the official plugins are also a good starting point.

If you develop plugins, this repository is more like a reference library. You can study its directory layout, plugin.json style, README structure, and how Anthropic combines skills, agents, MCP, LSP, and hooks.

The Claude Code documentation also gives a clear guideline: use .claude/ for single-project customization, but turn it into a plugin when you want to share it with a team, reuse it across projects, version releases, or distribute it through a marketplace.

Security boundaries matter

The repository README opens with an important warning: make sure you trust a plugin before installing, updating, or using it. The reason is simple. A plugin may include MCP servers, files, scripts, or other software. Anthropic maintaining the directory does not mean every plugin will behave exactly as expected in your local environment.

In practice, it is worth doing at least a few checks:

Read the plugin homepage and README before installing.
Check whether it includes .mcp.json, hooks, executable scripts, or background monitors.
Be extra careful with plugins that access accounts, code repositories, chat tools, or cloud services.
Test plugins in a sandbox or test repository before enabling them in important projects.
In team environments, review plugin sources and versions centrally.

AI coding plugins often have much higher privileges than ordinary editor themes. They may read project files, call external services, start local commands, or affect commit and deployment flows. Treat the trust boundary more strictly than “installing a small tool.”

Relationship with the community marketplace

The Claude Code documentation says Anthropic maintains two public plugin marketplaces:

claude-plugins-official: a curated set of plugins maintained by Anthropic.
claude-community: a community plugin directory where third-party submissions go through review.

They have different roles. Community plugins can enter the review pipeline through submission forms. The official directory is curated separately by Anthropic, with no public application process. In short, claude-plugins-official is closer to an official curated directory, while claude-community is the open community directory.

Summary

The significance of anthropics/claude-plugins-official is not merely that another GitHub repository exists. It shows Claude Code’s extension mechanism becoming platform-like: Skills, Agents, Hooks, MCP, LSP, background monitors, and default settings can now be packaged, installed, updated, and distributed.

For individual developers, the official plugin directory can lower the cost of configuring Claude Code. For teams, it offers a way to standardize internal workflows. For plugin developers, it shows the plugin structure and ecosystem direction Anthropic is endorsing.

The next thing to watch is not just any single plugin, but whether the Claude Code plugin ecosystem forms stable layers: official curated plugins, community plugins, private team marketplaces, and specialized extensions for mainstream languages, frameworks, and SaaS services. If that path works, Claude Code will look more and more like a programmable AI development platform, not just a command-line assistant.

References:

GitHub project: https://github.com/anthropics/claude-plugins-official
Claude Code plugin documentation: https://code.claude.com/docs/en/plugins

Graphify Solves Claude Code's Biggest Limitation: Turning a Codebase into an AI-Queryable Knowledge Graph

Thu, 21 May 2026 08:02:32 +0800

safishamsi/graphify is a knowledge graph tool for AI coding assistants. Its goal is direct: take the code, docs, SQL schemas, scripts, papers, images, video, and audio inside a project folder, turn them into a queryable knowledge graph, and stop AI assistants from relying only on grep, full-file reading, or ad hoc search to understand a project.

Project link: safishamsi/graphify

At the time of writing, the GitHub page shows about 50.2k stars and 5.4k forks, with an MIT license. The README describes it like this: type /graphify inside your AI coding assistant, and it maps the entire project into a queryable knowledge graph.

The Core Problem It Solves

AI coding assistants are becoming stronger, but in real codebases they still frequently run into several problems:

They do not know how key modules connect.
They read many files but do not form an overall architecture map.
Search finds text, but not upstream and downstream dependencies.
Code, database schemas, docs, and infrastructure configuration are scattered across different places.
In team collaboration, each person may have a different mental model of the project structure.

Graphify tries to add a “memory layer” to the project. It connects code entities, documentation concepts, database tables, configuration, design notes, and cross-file relationships so the AI assistant can query the graph instead of scanning files from scratch every time.

Minimal Usage

Graphify’s minimal workflow is simple. After installation, type this inside your AI coding assistant:

`1`	`/graphify .`

In PowerShell, the leading / is treated as a path separator, so on Windows PowerShell use:

`1`	`graphify .`

After running, it generates a graphify-out/ directory with three core files:

graphify-out/
├── graph.html
├── GRAPH_REPORT.md
└── graph.json

These files serve different purposes:

graph.html: an interactive graph you can open in a browser, with clickable nodes, filters, and search.
GRAPH_REPORT.md: highlights, key concepts, surprising connections, and suggested questions.
graph.json: the full graph, which can be queried later without rereading all files.

To generate a more readable architecture page with Mermaid call-flow diagrams, run:

`1`	`graphify export callflow-html`

Installation and Platform Support

Graphify’s PyPI package name is graphifyy, with a double y. The README specifically warns that other graphify* packages on PyPI are not affiliated with the project, although the CLI command is still graphify.

The recommended installation method is:

`1`	`uv tool install graphifyy`

Alternatives:

1
2

pipx install graphifyy
pip install graphifyy

Then register it with your AI assistant:

`1`	`graphify install`

The project supports many platforms, including Claude Code, Codex, OpenCode, GitHub Copilot CLI, VS Code Copilot Chat, Aider, Cursor, Gemini CLI, Kimi Code, Kiro, and Google Antigravity. Different platforms can use different install commands, for example:

graphify install --platform codex
graphify install --platform gemini
graphify cursor install
graphify antigravity install

Codex users also need to add this under [features] in ~/.codex/config.toml:

`1`	`multi_agent = true`

The README also notes that Codex uses $graphify, not /graphify.

What Files It Handles

Graphify supports a wide range of input types.

For code, it supports 31 languages, including Python, TypeScript, JavaScript, Go, Rust, Java, C/C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Lua, Zig, PowerShell, SQL, Shell, JSON, and more.

For documents, it supports:

.md
.mdx
.qmd
.html
.txt
.rst
.yaml
.yml

Optional dependencies extend it further:

pip install "graphifyy[pdf]"
pip install "graphifyy[office]"
pip install "graphifyy[video]"
pip install "graphifyy[mcp]"
pip install "graphifyy[neo4j]"
pip install "graphifyy[sql]"
pip install "graphifyy[all]"

Here, pdf is for PDF extraction, office for .docx and .xlsx, video for video and audio transcription, mcp for an MCP stdio server, neo4j for pushing to Neo4j, and sql for SQL schema extraction.

Why the Report Matters

GRAPH_REPORT.md is not a normal summary. It extracts relationships inside the project that are especially useful for AI assistants.

The README mentions report contents such as:

God nodes: the most-connected core concepts in the project.
Surprising connections: unexpected links across files or modules.
The why: design rationale extracted from comments, docstrings, and design docs.
Suggested questions: questions the graph is particularly well suited to answer.
Confidence tags: relationships are labeled EXTRACTED, INFERRED, or AMBIGUOUS.

This is important. Normal search only tells you “where this word appears.” A graph can answer “which modules, configs, tables, and docs this concept is connected to.” For large codebases, that is closer to architecture understanding than simple full-text search.

Common Commands

Common Graphify commands include:

/graphify .
/graphify ./docs --update
/graphify . --cluster-only
/graphify . --no-viz
/graphify . --wiki
graphify export callflow-html
/graphify query "what connects auth to the database?"
/graphify path "UserService" "DatabasePool"
/graphify explain "RateLimiter"

You can also add a paper or video to the graph:

1
2

/graphify add https://arxiv.org/abs/1706.03762
/graphify add <youtube-url>

For PR-assisted analysis:

graphify prs
graphify prs 42
graphify prs --triage
graphify prs --conflicts

These commands fit code review scenarios: identify which graph communities a PR affects, whether it risks conflicts with other PRs, and which review queues deserve priority.

MCP, Neo4j, and CI

Graphify is not only an HTML graph generator. It can also expose the graph to AI assistants for repeated tool use.

For example, start an MCP server:

`1`	`python -m graphify.serve graphify-out/graph.json`

The MCP server provides capabilities such as query_graph, get_node, get_neighbors, shortest_path, list_prs, get_pr_impact, and triage_prs.

It also supports Neo4j export or push:

1
2

/graphify ./raw --neo4j
/graphify ./raw --neo4j-push bolt://localhost:7687

For team collaboration, the README suggests committing graphify-out/ so everyone on the team starts with the same project map. You can also run:

`1`	`graphify hook install`

This rebuilds the graph after each git commit and sets up a merge driver so graph.json does not get left with conflict markers when multiple people commit in parallel.

Privacy and Cost

Graphify’s README is fairly clear about privacy boundaries.

Code files are parsed locally through tree-sitter and do not trigger API calls. Video and audio can be transcribed locally with faster-whisper. Docs, PDFs, and images used for semantic extraction go through your AI assistant’s model API.

For headless graphify extract, you may need these environment variables:

ANTHROPIC_API_KEY
GEMINI_API_KEY
GOOGLE_API_KEY
OPENAI_API_KEY
DEEPSEEK_API_KEY
MOONSHOT_API_KEY
OLLAMA_BASE_URL

Local Ollama, AWS Bedrock, and Claude Code CLI can also be used as backends. The README also states that the project has no telemetry, usage tracking, or analytics.

In practice, remember that local code parsing does not mean everything stays offline. When docs, PDFs, images, or cloud models are involved, you still need to consider the backend, API keys, enterprise compliance, and data boundaries.

Suitable Scenarios

Graphify is suitable for several types of users:

Developers who want Claude Code, Codex, Cursor, and Gemini CLI to better understand project structure.
People who need to quickly understand a large unfamiliar codebase.
Teams that need to analyze code, SQL schemas, docs, and configuration together.
People doing architecture review, PR review, or refactor impact analysis.
Teams that want to expose project knowledge as an MCP tool for Agents.
Technical leads who want to keep a “project map” for the team.

It is not necessary for every project. For small scripts, one-off demos, or very simple repositories, normal search and README files may be enough. Graphify’s value shows up more clearly in projects with many modules, many docs, team collaboration, and frequent AI assistant involvement.

Summary

Graphify matters because it moves AI coding assistants from “temporarily reading files” toward “a long-lived, queryable project knowledge graph.”

For developers, it does not replace the IDE, search, or LSP. It adds a structured memory layer for AI assistants: which modules matter, which concepts are tightly connected, which docs explain design rationale, and which communities a PR may affect. As Codex, Claude Code, Gemini CLI, Antigravity, and similar Agent tools become more common, this kind of project graph layer will become increasingly useful.

References:

GitHub: safishamsi/graphify

Open Design Explained: Turning Claude Code and Codex into AI Design Tools

Mon, 18 May 2026 18:57:16 +0800

Open Design is an open-source AI design project from nexu-io. Its positioning is local-first and open-source, as an alternative direction to Claude Design and Figma.

The problem it targets is clear: Claude Design showed that large models can generate design artifacts directly, but if this capability only exists inside a closed, cloud-only, single-model product, users cannot easily self-host, connect their own agents, swap models, build private design systems, or put outputs into a local workflow.

Open Design does not try to build a new foundation model. Instead, it connects the coding-agent CLIs already installed on your computer into a design workspace. Claude Code, Codex, Cursor Agent, Gemini CLI, OpenCode, Qwen, Copilot CLI, Kimi, DeepSeek TUI, and similar tools can become its design engine.

What is Open Design?

Open Design can be understood as a combination of three parts:

A Web UI for chat, preview, project management, and export.
A local daemon that schedules agents, manages files, stores projects, and provides APIs.
A set of Skills, Design Systems, and templates that guide agents toward design-quality artifacts instead of generic AI pages.

After the user enters a request, Open Design does more than pass a sentence to a model. It first asks the user to refine the design brief, choose the scenario and direction, then injects project metadata, the active design system, Skill files, templates, and checklists into the agent’s context. The agent reads and writes files inside a real project folder and produces an artifact that can be previewed in a sandboxed iframe.

This makes it closer to an AI design workflow than a one-shot webpage generator.

Why it differs from ordinary AI webpage generation

Many AI tools can generate an HTML page. Open Design’s focus is not “make the model write a page,” but “make the model follow a design workflow to deliver artifacts that can be previewed, exported, and iterated.”

It emphasizes several design choices:

Ask questions before generating. A new design brief starts with an interactive question form to lock down audience, tone, brand context, constraints, and visual direction.
Skills are files, not black-box plugins. Each Skill is made of SKILL.md, assets/, and references/, so it can be read, replaced, and extended.
Design Systems are Markdown, not fixed theme JSON. Color, typography, spacing, components, motion, brand voice, and anti-patterns can be written into DESIGN.md.
The agent works in a real project directory. It can read templates, write files, generate images, and output .pptx, .pdf, .zip, and other files.
Artifacts are previewed in a sandboxed iframe, reducing the risk of directly executing uncontrolled code.

The goal is to make AI behave more like a design collaborator with rules, assets, and checklists.

Which agents does it support?

One highlight of Open Design is that it treats agents as the runtime instead of locking into one model vendor.

The README lists support for Claude Code, Codex CLI, Devin for Terminal, Cursor Agent, Gemini CLI, OpenCode, Qwen Code, Qoder CLI, GitHub Copilot CLI, Hermes, Kimi, Pi, Kiro, Kilo, Mistral Vibe, DeepSeek TUI, and more. It detects these CLIs from PATH and lets users switch between them.

If no local CLI is available, it can also use an OpenAI-compatible BYOK proxy. Users provide baseUrl, apiKey, and model name, and the daemon normalizes streaming output into the same chat stream.

This design brings several benefits:

It does not lock users into one model.
It can reuse agents users have already installed and configured.
Local file reads and writes are managed by the daemon, making the permission boundary clearer.
For enterprises and advanced users, it is easier to connect custom models and API providers.

Skills and Design Systems are the core assets

Open Design bundles many Skills and Design Systems. The README says the built-in Skills cover web prototypes, SaaS landing pages, dashboards, mobile apps, gamified apps, social carousels, magazine posters, decks, weekly updates, finance reports, HR onboarding, invoices, kanban boards, OKRs, and more.

Design Systems provide brand-grade visual constraints for the agent. The repository description mentions sources such as Linear, Stripe, Vercel, Airbnb, Tesla, Notion, Apple, Anthropic, Cursor, Supabase, Figma, Xiaohongshu, and others.

The relationship is simple:

Skill decides what kind of artifact should be delivered.
Design System decides what brand style the artifact should follow.

Without these two layers, AI can easily produce generic pages that look familiar but lack judgment. With Skills and Design Systems, the model has clearer task boundaries, visual references, and review rules.

What can it generate?

Open Design is not limited to web prototypes.

According to the README, it covers web, desktop, mobile prototypes, slides, images, videos, HyperFrames, and more. It also supports export formats such as HTML, PDF, PPTX, ZIP, and Markdown. Media generation is included in the same design loop, covering posters, avatars, infographics, illustrated maps, short videos, and HTML-to-MP4 motion graphics.

This gives it a wide range of use cases:

Startup teams can create pitch decks quickly.
Product teams can generate landing pages or functional prototypes.
Operations teams can create campaign pages, social media images, and weekly reports.
Designers can use it for moodboards, visual directions, and first-pass layouts.
Developers can turn requirements into runnable frontend artifacts.

Its value is not just generating one page, but bringing multiple content forms into the same agent workflow.

What local-first means

Open Design emphasizes local-first. It does not hand everything to a remote SaaS backend. Instead, it runs a local daemon and project workspace.

The architecture described in the README roughly looks like this:

The frontend uses Next.js, React, and TypeScript.
The local daemon uses Node, Express, SQLite, and SSE.
Projects, sessions, messages, tabs, and templates are stored in local SQLite and .od/projects/<id>/.
Agents are started through child_process.spawn and read/write inside project artifact folders.
Preview is rendered through a sandboxed iframe.
Export includes HTML, PDF, PPTX, ZIP, and Markdown.

This structure is better for users who want design outputs to stay on their machine, local agents to be connected, API keys to remain under their control, and private workspaces to be maintained.

However, local-first does not mean fully offline. Actual generation still depends on the agent and model you use. If you use a cloud model API, content still goes to that provider. A more accurate description is that Open Design brings workspace, scheduling, files, and preview back to local control, while leaving the model layer to the user.

Relationship with Claude Design and Figma

Open Design explicitly describes itself as an open-source alternative direction to Claude Design and Figma, but it is not a traditional Figma clone.

Figma is a professional tool for manual design editing, collaboration, and delivery. Open Design is more agent-native: users drive agents through natural language, forms, Skills, and design systems to produce runnable artifacts.

It combines several ideas:

Claude Design’s artifact-first experience.
Figma’s design-system awareness.
File reading, writing, and execution from agents such as Claude Code and Codex.
Local daemon project management and sandboxed preview.

So it may not replace the full professional design workflow, but it is well suited as a fast path from idea to previewable prototype.

Who is it for?

Open Design is better suited for:

Developers already using Claude Code, Codex, Cursor, Gemini CLI, and similar agents.
Users who want AI design outputs managed inside local project folders.
Startup teams that need web prototypes, decks, posters, and marketing assets quickly.
Advanced users who want to customize Skills, Design Systems, and prompt stacks.
Teams that do not want to be locked into a single model or cloud product.

It is less suitable for:

Lightweight users who only want to open a webpage, type one sentence, and download an image.
Users who do not want to touch Node, pnpm, daemons, CLIs, or local configuration.
Professional Figma workflows that need mature collaboration, design review, and vector editing.

In short, Open Design is more of a tool for agent users and technical design teams than a lightweight design SaaS for everyone.

Things to watch

The README marks the project as 0.8.0-preview and notes that it is still evolving quickly. That energy is valuable, but it also means APIs, data directories, desktop migration, Skills structure, and export flows may change.

Before using it seriously:

Do not treat it as a fully stable enterprise design platform.
Try the workflow with test projects before importing important materials.
Back up .od/ before migration, and make sure the daemon and desktop app are stopped.
When using BYOK, pay attention to API keys, proxy addresses, and local private-network access risks.
Review generated designs manually, especially for brand, copyright, copy, and visual consistency.

The benefit of an open-source project is that it can be inspected, modified, and contributed to. The cost is that you need to accept some engineering friction.

Summary

The interesting part of Open Design is not just that it is an open-source Claude Design alternative. What matters is how it organizes Agent CLIs, Skills, Design Systems, a local daemon, and sandboxed preview into one design workflow.

It pushes design generation from a single prompt toward a more structured process: ask questions, choose direction, load a design system, read the Skill, write real files, preview the artifact, then export the result.

If you already use Claude Code, Codex, or Cursor for coding work, Open Design is worth watching. It represents a new product shape: AI is not only drawing one picture for you, but working inside a local project space, following design systems and task skills to generate design artifacts that can keep evolving.

References

nexu-io/open-design GitHub repository

Claude Code Token-Saving Guide: How Models, MCP, CLAUDE.md, and Skills Affect Cache

Mon, 18 May 2026 18:30:24 +0800

In long Claude Code tasks, Prompt Cache hit rate directly affects cost and speed. Many users know that caching can save tokens, but not which actions make the cache suddenly miss.

The simplest mental model is a left-to-right context chain:

`1`	`tools -> system -> CLAUDE.md / skills -> messages`

The farther left something sits, the more stable it should be and the larger the cache benefit. If a left-side section changes, everything after it may need to be recalculated. If a right-side section changes, the impact is smaller.

So optimizing Prompt Cache in Claude Code is not guesswork. The rule is simple: before a task begins, prepare the model, MCP servers, Skills, CLAUDE.md, and other base context. Once the task starts, change as little of that fixed context as possible.

Prompt Cache does not cache plain text

Prompt Cache is not just a string cache for prompts. In Transformer inference, what matters is the Key/Value state calculated by attention layers from the prefix context, usually called KV cache.

That means two things:

If the prefix stays stable, part of the previous computation can be reused.
If the model, tool definitions, system prompt, or prefix messages change, old cache entries may no longer match.

Anthropic’s documentation summarizes the invalidation hierarchy as tools -> system -> messages. Changes to tool definitions can invalidate the whole cache; system changes affect system and messages; message changes mainly affect message cache.

Claude Code adds more context sources such as CLAUDE.md, Skills, MCP, plugins, and subagents, so it is easier to accidentally break cache reuse.

Cache killer 1: switching models mid-task

Switching models is one of the most expensive changes.

Prompt Cache is isolated by model. Opus, Sonnet, and Haiku have different architectures and weights, so the KV cache calculated from the same text is not interchangeable. If you build a long context in Opus and then switch to Sonnet, Sonnet cannot reuse Opus’s cache.

This creates a counterintuitive result: switching models mid-task to save money may make the previous cache useless. Context that could have been read at cache-read price may need to be written and computed again.

A steadier pattern is:

Keep the main conversation on one model.
Use a subagent for side tasks that can run on a cheaper model.
Let the side agent search, explore, or summarize, then hand a concise result back to the main conversation.

This keeps the long main-context prefix stable and improves cache hit consistency.

Cache killer 2: adding MCP or reloading plugins mid-task

MCP provides tools to Claude Code. When you add an MCP server, the tool list changes, and tool definitions sit at the far left of the context chain.

From a Prompt Cache perspective, when the tool list changes, the system and messages that follow may need to be recalculated. If you use many MCP servers, the tool definitions themselves can be large, so the cost of invalidation becomes obvious.

One detail matters: Claude Code usually reads MCP configuration at session startup. Changing config mid-session may not affect the current session immediately. The dangerous moments are restart, resume, plugin reload, or anything that rebuilds the tool list.

Recommended practice:

Install required MCP servers before starting a long task.
Avoid discovering missing tools halfway through and then reloading.
Reduce default-enabled MCP servers when possible.
Do not keep rarely used MCP servers always enabled.

Stable tool definitions are the foundation of stable Prompt Cache hits.

Cache killer 3: editing CLAUDE.md mid-session

CLAUDE.md is Claude Code’s project memory file. It is useful for build commands, test commands, architecture conventions, code style, and project-specific constraints.

It is helpful, but it also enters the context. Claude’s help documentation explains that CLAUDE.md is read at session start and delivered as a user message. It also benefits from Anthropic Prompt Cache: the first request pays full input price, while later requests can hit the lower cache-read price if the cache is still valid.

The catch is that CLAUDE.md is content-addressed. Once the file changes, the old cache no longer matches.

So avoid frequently editing CLAUDE.md during a long task. Better practices:

Check whether CLAUDE.md is sufficient before the task starts.
Put stable rules in the file and temporary instructions in the current conversation.
Do not edit long-term memory for one-off instructions.
If you must change it, treat the next stage as a new session or new phase.

CLAUDE.md should be stable project guidance, not a scratchpad that changes every round.

Cache killer 4: installing or updating Skills mid-task

Skills are also part of the context. Installing a new Skill, updating a Skill, or changing the Skill list changes what gets injected into the session.

These changes often do not fully take effect until reload, resume, or a new session. Once messages are rebuilt, old cache entries may no longer match.

The same advice applies:

Decide which Skills are needed before starting.
Keep the Skill set stable for the same kind of task.
Avoid installing Skills in the middle of a long task.
If you install a new Skill, treat it as the beginning of a new stage.

For repeatable workflows such as content production, review, deployment, and translation, keeping a fixed Skill set helps keep the context structure stable.

Cache killer 5: idle time exceeding TTL

Prompt Cache does not last forever. A common default TTL is on the order of minutes, and Claude Code-related documentation often refers to roughly a five-minute cache window. After TTL expires, even the same request may need to rebuild the cache.

This explains a common feeling in long tasks: everything was cheap and fast, then after a coffee break the token cost jumps again.

Long tasks hit this easily. You may review Claude Code output, inspect files, run tests, or think about the next step. Five minutes can disappear quickly.

If your environment supports it, you can request a one-hour Prompt Cache TTL before long tasks:

`1`	`export ENABLE_PROMPT_CACHING_1H=1`

In Windows PowerShell:

`1`	`$env:ENABLE_PROMPT_CACHING_1H="1"`

One-hour cache writes usually cost more than five-minute cache writes. It is not always worth it for short tasks, but for large codebases, long conversations, and complex multi-step development, it may be cheaper than repeated cache expiration.

A token-saving Claude Code workflow

A steadier long-task setup looks like this:

Choose the model before the task starts and avoid frequent switching.
Enable the MCP servers you need and disable the ones you do not.
Keep CLAUDE.md short, stable, and focused on durable rules.
Prepare the Skills needed for this task in advance.
For complex tasks, consider one-hour TTL.
Split the task into phases, but keep context structure stable within each phase.
Use subagents or separate sessions for side exploration instead of disturbing the main conversation.

The goal is not to prevent every cache miss. It is to avoid the high-cost misses that are easy to overlook.

A simple rule of thumb

Ask one question:

Does this operation change the model, tool definitions, system context, or fixed messages near the start of the session?

If yes, it probably affects Prompt Cache. The farther left it is in the context chain, the greater the impact.

Common operations:

Switch model: high risk, model caches are isolated.
Add MCP or reload plugins: high risk, tool list changes.
Edit CLAUDE.md: medium-high risk, project memory changes.
Install Skills: medium-high risk, injected context changes.
Continue normal conversation: low risk, mostly appends messages.
Idle past TTL: high risk, server-side cache expires.

Summary

Prompt Cache optimization in Claude Code is about keeping the session prefix stable.

Do not switch models casually. Do not install MCP servers and Skills halfway through. Do not use CLAUDE.md as a temporary scratchpad. For complex tasks, consider a longer TTL. Once these basics are stable, token cost and response speed become much more predictable.

The most practical sentence is: configure before you start, change less after you start.

References

Anthropic Founder’s Playbook Explained: How Claude Helps Startup Teams Move Faster

Mon, 18 May 2026 18:02:58 +0800

Anthropic published The Founder’s Playbook on the official Claude blog, aimed at founders. Its core question is direct: how can an AI-native startup move faster from insight to product, launch, and scale?

The playbook is not simply a feature list for Claude. It breaks the startup journey into four stages: Idea, MVP, Launch, and Scale. The point is not to let AI replace founders’ judgment, but to hand repetitive work such as market research, copy drafts, code scaffolding, operations workflows, and sales materials to Claude first, so founders can spend more time on judgment, taste, trade-offs, and trust.

What this playbook is about

AI startups increasingly face a kind of compression race: product cycles are shorter, competitors are more numerous, and users expect speed and quality at the same time. Work that once required a multi-person team can now often be drafted by AI first, then reviewed, corrected, and advanced by the founding team.

Anthropic’s framework is clear: do not try to make the entire company “AI-powered” on day one. Instead, find one process that is time-consuming, repetitive, and low in creative density. Let Claude generate the first draft, script, research summary, or execution checklist. Founders remain responsible for defining goals, calibrating direction, judging quality, and connecting useful output to real business work.

Stage 1: Idea

The Idea stage is not about coming up with a cool concept. It is about validating whether the idea deserves further investment.

Claude can help founders at this stage by mapping markets, summarizing user pain points, comparing competitor positioning, proposing possible wedges, and turning vague ideas into clearer value propositions.

But the most important part is still human judgment. AI can help you see more possibilities faster, but it cannot take responsibility for whether a market truly has strong demand. Founders still need to talk to real users, observe whether they are willing to change existing workflows, and see whether they are willing to pay.

Stage 2: MVP

The MVP stage is where Claude Code can be especially useful.

For small teams, the scarcest resource is often not ideas, but the speed of turning ideas into something users can try. Claude Code can help generate scaffolding, write scripts, fill in components, check edge cases, and produce technical plan notes, helping teams get to a testable version faster.

The key is not asking AI to write a perfect product in one pass. It is reducing the friction from zero to first version. Founders and engineers still need to review architecture, security, data handling, and user experience, but they do not need to spend as much time on mechanical first drafts.

Stage 3: Launch

The Launch stage tests narrative, distribution, and feedback speed.

Many startup teams underestimate how complex a launch can be: website copy, product demos, emails, social media content, user interviews, sales scripts, investor updates. Every item needs to clearly explain why this product is needed now.

Claude can act as a high-frequency collaborator here: generating different positioning variants, rewriting introductions for different user groups, simulating user questions, organizing the launch rhythm, and turning early feedback into the next round of product and market actions.

Stage 4: Scale

The Scale stage shifts the focus from “building it” to “growing repeatably.”

Once a company has stable users and revenue, the founding team gets pulled into operations, sales, support, data analysis, and internal coordination. Agent-like capabilities such as Claude Cowork are better suited to more complete tasks: conducting market research, designing campaigns, organizing fundraising strategy, summarizing growth metrics, or turning an operations process into repeatable steps.

This is also where the difference between AI-native companies and traditional software companies begins to appear. The real change is not simply that employees use AI tools. It is that company processes are designed around AI collaboration from the beginning: which tasks require humans to define standards, which tasks should be drafted by AI first, which outputs must be reviewed, and which workflows can become reusable templates.

What Claude Code, Claude Cowork, and Chat are best for

Based on the official blog post, Anthropic wants founders to think about Claude across three kinds of use cases.

Claude Code is more engineering-oriented. It is suited for writing code, generating scripts, analyzing edge cases, producing component specs, and drafting technical documentation. It helps move ideas toward something that can run.

Claude Cowork is closer to a delegatable work agent. It fits tasks that require continued execution, such as market research, campaign design, fundraising strategy, and operations analysis. It helps push a relatively complete business task through a first pass.

Claude Chat is better suited for founder judgment moments: thinking through go-to-market strategy, stress-testing product positioning, comparing roadmap priorities, and refining key narratives. It is not an execution machine, but a thinking partner that can support rapid iteration.

What is actually useful for startup teams

The value of this playbook is not that it tells founders “AI is important.” That is no longer new.

Its more useful contribution is shifting AI use from scattered tool calls into a company-building method. Each stage has different bottlenecks, and each bottleneck can be broken into parts where AI can participate.

At the Idea stage, AI expands the search space. At the MVP stage, it compresses implementation time. At the Launch stage, it accelerates messaging and distribution experiments. At the Scale stage, it helps turn processes into repeatable workflows.

This logic is especially important for small teams. Small teams do not have enough people to cover every function, but they can use AI to create a first version of a capability, then spend limited human energy on the parts that most require judgment and relationship building.

Pitfalls to watch for

The first pitfall is treating AI-generated output as a conclusion. Market research, competitor analysis, user personas, and growth strategies all need to be validated against real data and user feedback.

The second pitfall is underestimating review cost. AI can significantly reduce the cost of first drafts, but code quality, legal risk, brand expression, commercial promises, and security issues still need human accountability.

The third pitfall is automating too early. A process that has not yet worked manually should not be handed to an agent for automatic execution. A steadier approach is to let AI participate in one small part of the workflow, observe output quality, and then gradually expand the scope.

Summary

The signal from Anthropic’s Founder’s Playbook is clear: the advantage of an AI-native startup is not merely that it can use AI to write code. It is that from day one, AI becomes a collaboration layer across product, engineering, marketing, sales, and operations.

For founders, the most practical starting point is not building a grand AI workflow. It is choosing one task that consumes too much time, repeats too often, and slows progress the most, then letting Claude produce the first version. Real competitiveness comes from human founders’ control over direction, quality, and trust, and from whether the team can embed this collaboration pattern into everyday work.

References

The founder’s playbook for the age of AI

easy-vibe: A Learning Map for Vibe Coding Beginners

Sat, 16 May 2026 22:44:43 +0800

easy-vibe is an open source Vibe Coding learning project from Datawhale. It is not aimed at developers who are already fluent with AI coding tools. It is aimed at students, product managers, designers, operators, indie developers, and technical hobbyists who are just starting with Vibe Coding.

The value of this project is not that it lists another batch of AI tools. It turns “how to start building projects with AI” into a learning path that is easier to understand. For many beginners, the hard part is not knowing that Claude Code, Cursor, MCP, or Agents exist. The hard part is knowing what to learn first, how to practice, and when to move into more advanced tools.

Beginners Need a Path Most

Vibe Coding has become popular in recent years, but it is not very friendly to beginners.

On the surface, as long as you can describe a requirement, you can ask AI to write code. In reality, as soon as the task becomes slightly more complex, problems appear: the requirement is unclear, the model edits the wrong file, the project structure is confusing, errors are hard to handle, dependencies fail to install, prompts become messier, and the workflow falls back to “copy code into a chat box”.

So getting started with Vibe Coding cannot only mean learning “how to write prompts”. It needs to solve several things:

How to split an idea into executable tasks;
How to let AI understand a project structure;
How to read code generated by the model;
How to handle errors and iterate;
How to use the terminal and local development environment;
How to move from web chat to real AI coding tools.

This is where easy-vibe matters: it tries to organize these topics into a learning route, instead of leaving beginners lost among tools, tutorials, and terminology.

It Is a Roadmap, Not a Single Tutorial

According to the project description, easy-vibe covers basic tutorials, interactive exercises, visual content, RAG, terminal tools, AI coding tools, and more advanced topics such as Claude Code, MCP, Skills, and Agent Teams.

This structure is suitable for beginners because AI coding is not a single skill. It is a combination of abilities:

Describing requirements;
Splitting tasks;
Reading projects;
Asking the model to edit code;
Running and verifying results;
Iterating based on errors;
Turning repeated workflows into tools or skills.

If you only learn one tool, it is easy to be constrained by that tool’s interface. Switch models, editors, or CLIs, and the workflow becomes unclear again. A roadmap helps build the working method first, then places tools where they belong.

Especially Useful for Non-Programmers

The biggest appeal of Vibe Coding is that it lets non-professional programmers build prototypes.

Product managers can turn product ideas into interactive demos. Designers can validate interaction logic. Operators can write internal tools. Students can quickly build course projects. Founders can validate demand early. These people do not necessarily need to become full-time engineers in the traditional sense, but they do need a method for “letting AI help me turn ideas into working things”.

This is also why easy-vibe fits the Chinese community. Many Chinese users already know AI can write code, but they still lack systematic beginner materials. Development environment, prompts, project structure, debugging methods, and Agent tools are easier to learn when explained clearly in Chinese and paired with exercises.

For these users, the most important thing is not to learn a complex framework immediately. It is to complete a full loop first: propose a requirement, generate a project, run it, find problems, keep modifying, and finally get a usable version.

The Advanced Part Moves Toward Real AI Development Workflows

The Claude Code, MCP, Skills, and Agent Teams mentioned in easy-vibe are no longer just beginner concepts.

Claude Code represents terminal coding Agents: the model can enter a local project, read files, edit code, and run commands. MCP solves tool and data source integration, so the model is not trapped in a chat box. Skills preserve reusable workflows, such as fixed project generation, document organization, test checks, or content production processes. Agent Teams further split tasks across multiple agents.

These topics may feel distant for beginners, but they are worth understanding early. The direction of Vibe Coding is already clear: from “let AI write a piece of code” to “let AI participate in a complete project workflow”.

If a learning route stops at prompts, it will quickly fall behind tool evolution. On the other hand, if every advanced concept is thrown at beginners immediately, they will not know where to start. The useful part of easy-vibe is that it places these topics on a gradual upgrade path.

Two Mistakes to Avoid

The first mistake is thinking that Vibe Coding means you can ignore code entirely.

AI can generate a lot, but the user still needs to judge whether the result is correct. At minimum, you need to understand the project structure, know how to run it, and roughly know where an error is happening. Even if you do not write complex code, you still need basic engineering common sense.

The second mistake is thinking that more advanced tools are always better.

Beginners do not necessarily need Claude Code, MCP, or multiple Agents at the start. A better order is to first build a feedback loop with simple projects, then gradually introduce the terminal, version control, testing, tool calling, and automated workflows. Tools should match task complexity; otherwise they look powerful but have no clear use.

How to Use It

If you are just starting with Vibe Coding, you can use easy-vibe as a learning checklist.

Start with basic concepts and simple exercises. Do not rush to chase every tool. Build a small project, such as a personal homepage, data dashboard, form tool, automation script, or knowledge base demo. During the process, observe where AI helps and where you still need to confirm things yourself.

Once you can complete small projects consistently, move into more complex topics:

Use terminal tools to work with local projects;
Use Git to manage each change;
Use RAG to connect your own materials;
Use MCP to connect external tools;
Use Skills to solidify repeated workflows;
Use Agent Teams to split complex tasks.

Learning Vibe Coding this way is not just learning to ask AI. It is learning to put AI into your own workflow.

Conclusion

easy-vibe is best seen as a Chinese learning map for Vibe Coding. It organizes scattered AI coding concepts, tools, and exercises into a route that helps beginners move from “I heard AI can write code” to “I can build a project with AI”.

The real value of Vibe Coding is not that it lets people skip all learning. It lowers the threshold from idea to prototype. You still need to understand requirements, organize tasks, verify results, and control risks. But many repetitive, tedious, and blocking steps can be handled with AI assistance.

If you want a systematic entry point into AI coding, without getting trapped immediately in tool names and complex engineering setup, easy-vibe is a good place to start.

Claude Code + Ollama Local Deployment Guide: Build a Free AI Coding Assistant with CC Switch

Fri, 15 May 2026 23:27:50 +0800

Claude Code has become a popular AI coding assistant recently. Its appeal is not just that it can chat about code, but that it can read a project, modify files, run commands, install dependencies, and keep fixing errors in an agent-like workflow.

The hard part is cost. Once a project grows, long context and repeated agent turns can burn through API quota quickly. If you just want to experiment, refactor small utilities, generate scripts, or work on a private local project, it is natural to ask: can Claude Code’s workflow be kept while the model runs locally?

The key tool in this setup is CC Switch. It lets Claude Code connect to the local Ollama service through an OpenAI-compatible API endpoint, so requests can be forwarded to a local model instead of the official Claude API.

What This Setup Solves

You can think of the whole setup as:

1
2
3

Claude Code desktop
+ CC Switch API forwarding layer
+ Ollama local model

Claude Code is still responsible for the coding workflow and project operations. CC Switch handles model provider configuration and API compatibility. Ollama runs the model locally.

This does not make a local model suddenly become Claude. Its real value is that it makes Claude Code’s agent workflow usable in lower-cost, offline, and private local scenarios.

Basic Preparation

Before you start, prepare these pieces:

Install Git.
Install Ollama.
Pull a local model suitable for coding.
Install CC Switch.
Have Claude Code available on your machine.

For the model side, you can start with coding-oriented models, such as Qwen Coder, DeepSeek Coder, or other models with decent tool-calling and code generation behavior. The larger the model, the better the result may be, but memory and GPU pressure will also rise.

If your machine only has limited memory, start with a smaller model first. Confirm that the workflow runs smoothly before trying a larger one.

Key CC Switch Configuration

After Ollama starts, its default local API address is usually:

`1`	`http://127.0.0.1:11434/v1`

In CC Switch, choose an OpenAI-compatible provider type, commonly:

`1`	`OpenAI Chat Completions`

Then point the base URL to Ollama’s local address.

For the API key field, local Ollama normally does not need a real key, but many tools still require an environment variable or placeholder. You can use:

`1`	`ANTHROPIC_API_KEY`

or another placeholder variable accepted by your local setup.

One configuration item is worth special attention:

`1`	`"inferenceModels"="[\"haiku\",\"sonnet\",\"opus\"]"`

This means mapping Claude Code’s expected model roles to the local provider. In practice, you need to bind haiku, sonnet, and opus to the model names exposed by Ollama or CC Switch. If this mapping is wrong, Claude Code may fail to call the model or may keep falling back to an unexpected configuration.

Where Claude Code Is Strong

Claude Code’s biggest advantage is not raw completion. It is the full coding workflow:

reading and understanding project structure;
locating related files based on a task;
editing code directly;
running commands and tests;
observing errors and iterating;
completing multi-step tasks in one session.

This is why many people want to keep Claude Code even when switching to a local model. A normal chat UI can generate code snippets, but it does not naturally operate inside a repository. Claude Code is closer to an executable development assistant.

What Role Ollama Plays Here

Ollama is responsible for local model runtime and management. It handles model downloading, loading, and local inference.

The advantage is clear: requests stay on your machine, repeated use does not create API bills, and you can use it when the network is limited. For private code, this is also easier to accept than sending every context window to a cloud model.

The trade-off is also clear. Local models depend heavily on your hardware and on model quality. A smaller model can handle simple edits, explanations, and script generation, but it may struggle with large cross-file refactors or subtle architectural decisions.

Where The Experience Has Boundaries

This setup should not be treated as a full replacement for Claude’s strongest cloud models.

You may run into these issues:

weaker long-context understanding;
unstable tool-calling behavior in complex tasks;
slower inference on CPU-only machines;
more hallucinated file paths or APIs;
less reliable multi-round planning;
lower success rate on large repository refactors.

So the better expectation is: use it as a free local development assistant, not as a perfect substitute for a top-tier cloud model.

Multimodal Compatibility Is Still Unstable

Some users want Claude Code to handle screenshots, UI images, diagrams, or other multimodal inputs. This part depends on the local model and the forwarding layer.

If the selected Ollama model does not support vision, or CC Switch does not translate the request format correctly, multimodal features may fail. Even with a vision model, behavior may differ from Claude’s official API.

For now, this setup is more suitable for text and code workflows. Treat multimodal support as experimental.

Who Should Try It

This setup is suitable for:

developers who want to try Claude Code’s workflow at low cost;
users who frequently write scripts, small tools, and automation snippets;
teams that want to keep code on local machines;
learners who want an AI coding assistant without constant API spend;
people testing different local coding models.

It is less suitable if you rely heavily on long context, large monorepos, strict code review quality, or complex full-project refactors.

Usage Advice

Start with small tasks.

For example:

explain a single file;
refactor a small function;
generate a shell script;
fix a simple error;
add a small feature;
write unit tests for a narrow module.

After each change, run tests or at least review the diff yourself. A local model can be useful, but you should not blindly accept every generated edit.

If the model keeps losing context, reduce the task scope. Instead of asking it to “refactor the whole project”, ask it to “refactor this function” or “add validation in this file”.

Summary

Claude Code + CC Switch + Ollama is an interesting combination. It keeps Claude Code’s agent-style development workflow while moving inference to a local model.

Its biggest strengths are lower cost, local privacy, and a smooth development workflow. Its limits are also obvious: model quality, hardware performance, long context, and tool-calling stability all affect the final experience.

If you already use Ollama and want a more practical local AI coding workflow, this setup is worth trying. Just remember to start small, verify every change, and treat the local model as an assistant rather than an automatic engineer.

Superpowers: a skills framework that pulls coding agents back into engineering process

Fri, 15 May 2026 08:53:17 +0800

obra/superpowers is both a skills framework for coding agents and a software development methodology. Its goal is not to add another universal prompt, but to make agents follow a process: clarify goals, produce a design, write a plan, implement through TDD, then review and finish.

Project: https://github.com/obra/superpowers

At the time of writing, the GitHub API shows more than 190,000 stars, an MIT license, and recent activity. The README describes it plainly: An agentic skills framework & software development methodology that works.

What problem it solves

Many AI coding tools are not weak at writing code; they are too eager to write code.

A user says something vague, the agent edits files, and the result looks finished while boundaries, tests, and architecture remain unclear. Small tasks may survive this. Complex projects turn it into rework and technical debt.

Superpowers makes the agent enter a workflow before touching code:

When the user wants to build something, ask about the goal first.
Turn the conversation into a spec and confirm it in sections.
After design approval, write an implementation plan.
After the user says “go”, begin implementation.
During implementation, emphasize TDD, YAGNI, DRY, and code review.

This is not new software engineering. It is important because fast agents need stronger guardrails.

Supported tools

Superpowers is not tied to a single agent. The README lists installation paths for Claude Code, Codex CLI, Codex App, Factory Droid, Gemini CLI, OpenCode, Cursor, and GitHub Copilot CLI.

That makes it more like a workflow layer across harnesses than a model-specific trick.

The base workflow

The base workflow has several stages.

First is brainstorming. Before implementation, the agent turns rough ideas into an executable design and confirms it with the user.

Second is using-git-worktrees. After design approval, it creates an isolated worktree and branch, then checks that install and test baselines are clean.

Third is writing-plans. It decomposes design into small tasks with paths, code scopes, and validation steps. The plan should be clear enough for someone without context to execute.

Fourth is execution. subagent-driven-development can dispatch tasks to subagents, while executing-plans runs them in batches. Each task should be reviewable and verifiable.

Fifth is test-driven-development: true RED-GREEN-REFACTOR. Write a failing test, confirm failure, implement minimally, confirm pass, refactor.

Sixth is requesting-code-review. Reviews happen between tasks; critical findings block progress.

Finally, finishing-a-development-branch validates tests and offers choices such as merge, PR, keep, or discard the worktree.

What is in the skills library

The skills library can be grouped by purpose.

Testing centers on test-driven-development.

Debugging includes systematic-debugging and verification-before-completion. They focus on reproduction, minimization, hypotheses, validation, and not claiming completion before verification.

Collaboration skills include:

brainstorming
writing-plans
executing-plans
dispatching-parallel-agents
requesting-code-review
receiving-code-review
using-git-worktrees
finishing-a-development-branch
subagent-driven-development

Meta skills include writing-skills and using-superpowers.

Together they give the agent engineering habits: when to ask, when to plan, when to test, and when to stop for review.

How it differs from a prompt

A normal prompt often piles rules into one system message: do not over-edit, think first, test, explain, be concise. As rules accumulate, complex tasks make the model forget or ignore some of them.

Superpowers splits rules into phase-specific workflow modules. Each skill is shorter and focused. The agent knows the current phase, complex processes become checkable, and teams can turn their own practices into reusable skills.

The lesson is not just “use a smarter model”. Give the model a repeatable way to work.

Who should use it

Superpowers is most useful for developers already using coding agents on real projects, especially when:

The task spans multiple files.
The agent should design before implementation.
TDD or validation matters.
Multiple branches or worktrees are common.
Subagents can help with implementation or review.
A team wants to encode its workflow as skills.

For a one-line config change, it may feel heavy. For multi-step development, the constraints are valuable.

Notes before using it

Do not treat it as full autopilot. It gives the agent process, but humans still own requirements, tradeoffs, and final acceptance.

TDD and review add upfront cost. For small tasks they may slow things down; for complex tasks they reduce rework.

Parallel subagents are not always better. They work when boundaries and write scopes are clear. If the requirement is still fuzzy, parallelism only multiplies confusion.

Teams must maintain skill quality. Outdated processes, vague instructions, and conflicting rules can also hurt agents.

Summary

Superpowers is valuable because it pulls coding agents away from “receive request, edit code” and back into software engineering process.

AI coding often lacks not generation speed, but clarification, planning, verification, review, and closure. The stronger the model becomes, the less these steps should be skipped.

If you use Codex, Claude Code, Cursor, or Gemini CLI on real projects, Superpowers is worth studying. Even if you do not install it, its skill decomposition is a good reference for designing your own agent workflow.

Reject Vibe Coding: Matt Pocock's skills repo adds engineering constraints to AI coding

Fri, 15 May 2026 08:46:23 +0800

The faster AI writes code, the faster a project can lose control. The real question is not whether a model can generate functions, but whether it understands the requirement, follows the team’s language, and makes small changes inside the existing architecture.

Matt Pocock’s mattpocock/skills repository points in the opposite direction of casual vibe coding: do not let AI take over the whole development process. Put it inside mature software engineering constraints.

Project: https://github.com/mattpocock/skills

This is not about one magic prompt. It is a set of composable agent skills that turn requirement clarification, domain modeling, TDD, debugging, and architecture review into AI-friendly workflows.

Solve alignment failure first

The most common failure in AI coding is assuming the model understood the request when it merely guessed from a vague sentence.

grill-me flips the interaction. Before writing code, the agent acts like a demanding reviewer and keeps asking about branches, boundaries, and unresolved decisions.

If you ask for a login page, it should first ask:

How should password reset work?
Should third-party login be supported?
What should failed-login errors look like?
Are account lockout, CAPTCHA, or risk controls in scope?
Where should the user go after success?

This feels slower, but it prevents expensive rework later. The cheaper code generation becomes, the more costly unclear requirements become.

Write domain language into context

Another common problem is generic vocabulary. The model does not know the team’s business terms, so names and documents drift.

grill-with-docs asks questions while also checking CONTEXT.md, ADRs, and domain docs. Once terms and decisions are confirmed, they can be written back into shared context.

This is close to the “ubiquitous language” idea in domain-driven design. If a team says customer instead of user, or transaction instead of order, the model should inherit that language.

Context documents are valuable because they reduce guessing.

Use TDD to slow down generation

AI is risky because it is fast. Bad code used to take time to write; now hundreds of lines can appear in seconds. The problem is not speed itself, but lack of feedback loops.

The tdd skill brings back red-green-refactor:

Write a failing test for one behavior.
Implement only enough code to pass.
Refactor.
Continue with the next vertical slice.

The key is one behavior at a time. AI executes, while humans keep control of direction and boundaries.

Debug through a loop

When facing a bug, many agents guess and patch repeatedly until the code becomes messier.

diagnose asks the agent to build a feedback loop:

Reproduce the issue
Minimize the case
Form a hypothesis
Add observations or logs
Fix the cause
Add a regression test

This process is old, but it matters even more with AI. The model is good at trying things; the loop keeps it close to the root cause.

Review architecture regularly

A task passing tests does not mean the codebase is healthier. Repeated AI patches can blur module boundaries, make interfaces more complex, and make tests harder to write.

Skills such as improve-codebase-architecture ask the agent to step back and inspect the whole codebase:

Are responsibilities mixing across modules?
Which interfaces are too complex?
Which paths are hard to test?
Which names conflict with domain language?
Which duplicate logic should be merged?

This is not automatic large-scale refactoring. It is structured observation and suggested direction; humans still decide whether and how far to change.

What really needs limiting is freedom

The core idea is simple: AI coding is not about letting the model improvise freely. It is about giving it clear goals, context, tests, and stopping conditions.

Humans define the problem, architecture, tradeoffs, and acceptance criteria. AI generates code, fills in tests, repeats edits, and handles local refactors. Used well, AI amplifies capability; used poorly, it amplifies confusion.

Software engineering fundamentals did not become obsolete because AI improved. Requirement clarity, domain language, TDD, diagnosis, and architecture review are becoming more important.

More people will be able to write code. The gap will be between people who can put AI inside a maintainable, verifiable, evolving engineering system and those who cannot.

What is cc-haha? A project that turns Claude Code into a desktop workbench

Thu, 14 May 2026 22:38:04 +0800

cc-haha is a project built around a modified Claude Code workflow. Its full repository name is NanmiCoder/cc-haha. The project page says plainly that it is based on Claude Code source code leaked from the Anthropic npm registry on 2026-03-31, and that its current main form is a desktop Claude Code workbench.

Project URL: https://github.com/NanmiCoder/cc-haha

There are two important points in that description.

First, it is not Anthropic’s official Claude Code. The README also states that the original source code copyright belongs to Anthropic and that the project is only for learning and research.

Second, its focus is no longer just “run a Claude Code CLI locally.” Judging from the README and the latest release, cc-haha is more like a desktop app that brings Claude Code sessions, projects, permissions, diffs, Computer Use, remote access, and model provider configuration into one place.

What problem is it trying to solve?

Claude Code is originally terminal-oriented. Sessions, command execution, permission prompts, file edits, and context switching all happen in the terminal. That works for people who are comfortable with CLI tools, but long-term use exposes a few rough edges:

Multiple projects and sessions are hard to manage side by side.
To see what files the AI changed, you often need to switch to Git or an editor.
Permission approvals, command execution, and file diffs are spread across different surfaces.
Remote viewing from a phone or another device requires extra setup.
Connecting non-Anthropic models requires dealing with protocol compatibility.

cc-haha tries to package these pieces into a graphical workbench. It is not just a skin for Claude Code; it moves session management and local development flow control into the desktop app.

Desktop workbench: from terminal to control center

According to the README, the cc-haha desktop app brings these capabilities into a macOS / Windows app:

Multi-session workbench: manage tasks with tabs, project switching, terminal entry points, and session history.
Branch / Worktree launch: choose a repository branch for a new session and decide whether to use the current worktree or an isolated Worktree.
Right-side code changes panel: view modified files, added and removed lines, and workspace status while chatting.
Visualized code edits: inspect AI edits, diffs, and execution steps.
Permission and approval flow: review dangerous commands, tool calls, and AI questions in the desktop app.
Multiple model providers: supports Anthropic-compatible APIs, third-party models, WebSearch fallback, and local configuration.
H5 remote access: use a one-time token to connect to the current desktop session from a phone or another device.
IM integration: use Telegram, Feishu, WeChat, or DingTalk to chat remotely, switch projects, and approve permissions.
Scheduled tasks and token usage: create scheduled tasks and view local token usage trends.

These features make it closer to an “AI coding workbench” than a simple command-line replacement. It tries to put the common surfaces of AI coding into one place: chat, file changes, permissions, projects, remote access, and model configuration.

Installation and startup

Most users should download the desktop installer from Releases.

The README describes the desktop install flow as:

Go to GitHub Releases and download the macOS or Windows installer.
On first launch, configure the model provider, API key, and default model in the desktop settings.
If macOS says the app cannot be opened, follow the installation guide to handle Gatekeeper permissions.

The latest release page shows that v0.2.6 was published on 2026-05-13. That version mainly focuses on restoring secure H5 mobile access, desktop session management, file mention search, and desktop UX polish.

If you want to start the CLI from source, the README provides:

1
2
3

bun install
cp .env.example .env
./bin/claude-haha

That path is better for people who want to debug the lower-level CLI, server, or build their own changes. For normal use, the desktop app is more direct.

What changed in v0.2.6

The main point of v0.2.6 is that H5/LAN access was pulled back from a temporary open state into an explicit enablement and token pairing model.

Notable changes include:

H5/LAN access must be explicitly enabled locally.
QR links carry a one-time visible token.
Remote APIs, proxies, and WebSockets are no longer exposed without protection.
Settings now has a separate H5 Access page.
The desktop sidebar gained batch management for selecting and deleting sessions.
Desktop file mention search became git-first, respects ignore rules, and reduces noise from node_modules and build output.
A pure white theme was added, and bugs such as long URLs breaking chat layout and draft leakage across tabs were fixed.

This shows the project has moved beyond “it runs” and is now filling in the safety boundaries and daily UX details that a desktop product needs.

The H5 access part deserves special care. The author explicitly notes in the release that H5 is a browser access entry for individuals or trusted teams, not a public multi-tenant login system. In practice, it should not be treated as an internet-facing SaaS admin console.

Computer Use: letting the Agent operate the desktop

Another important selling point of cc-haha is Computer Use.

The project docs say this feature is a heavily modified version of the Computer Use implementation in the leaked Claude Code source. The official implementation depends on Anthropic’s private native modules, such as @ant/computer-use-swift and @ant/computer-use-input, which are not publicly available. cc-haha replaces the low-level operation layer with a Python bridge using public libraries such as pyautogui, mss, and pyobjc.

Computer Use supports operations such as:

Screenshot: screenshot, zoom
Mouse: click, drag, move, scroll, and read cursor position
Keyboard: type text, press keys, hold keys
Applications: open applications, switch displays
Permissions: request app access, list granted applications
Clipboard: read and write clipboard content
Other: wait, batch operations

Its workflow is a “screenshot - analyze - act” loop:

The model receives a user request.
It calls screenshot to capture the screen.
The model uses vision to identify buttons, input fields, and coordinates.
It calls click, typing, or application tools.
It screenshots again to confirm the result, then continues.

From the docs, the fully supported platform is mainly macOS, including Apple Silicon and Intel. Windows / Linux are theoretically possible, but the pyobjc app-management parts need platform-specific replacements and are not fully adapted yet.

Runtime requirements include:

Bun >= 1.1.0
Python >= 3.8
macOS Accessibility permission
macOS Screen Recording permission

This kind of feature is powerful, but it also raises permission risk. When letting AI operate desktop apps, it is better to authorize only the applications that are clearly needed and avoid leaving sensitive content open in unrelated windows.

Multi-model access through an Anthropic-compatible layer

cc-haha still communicates using the Anthropic Messages API protocol. The project docs recommend using LiteLLM as a protocol conversion proxy.

The basic structure is:

`1`	`claude-code-haha ──Anthropic协议──▶ LiteLLM Proxy ──OpenAI协议──▶ 目标模型 API`

In other words, cc-haha sends Anthropic Messages API requests, LiteLLM converts them to formats such as OpenAI Chat Completions, and then forwards them to OpenAI, DeepSeek, Ollama, or other model services.

The LiteLLM install command in the docs is:

`1`	`pip install 'litellm[proxy]'`

Then you can configure OpenAI, DeepSeek, Ollama, and other models in litellm_config.yaml. After the proxy starts, set these values in .env or ~/.claude/settings.json:

ANTHROPIC_AUTH_TOKEN=sk-anything
ANTHROPIC_BASE_URL=http://localhost:4000
ANTHROPIC_MODEL=gpt-4o
ANTHROPIC_DEFAULT_SONNET_MODEL=gpt-4o
ANTHROPIC_DEFAULT_HAIKU_MODEL=gpt-4o
ANTHROPIC_DEFAULT_OPUS_MODEL=gpt-4o
API_TIMEOUT_MS=3000000
DISABLE_TELEMETRY=1
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

There are a few practical caveats:

drop_params: true is important, because Anthropic parameters such as thinking and cache_control do not exist in the OpenAI API.
Extended Thinking is an Anthropic-specific feature and is unavailable with third-party models.
Prompt Caching will not work in the Anthropic-native way.
Tool calls must be converted from Anthropic tool_use to OpenAI function calling, so complex tool use may have compatibility issues.
Small local Ollama models may not handle this tool-heavy workflow reliably.

So multi-model access can work, but that does not mean every model will feel the same. cc-haha still demands strong tool use, code understanding, and long-context ability from the model.

Who is it for?

cc-haha is better suited for:

People already familiar with Claude Code who want desktop session management.
Users who often work across multiple repositories, branches, and AI sessions.
People who want to inspect AI file changes, diffs, and workspace status in a side panel.
Users who want to experiment with Computer Use and let an Agent operate desktop apps.
People who want to connect OpenAI, DeepSeek, Ollama, or other models through an Anthropic-compatible protocol.
Users who need phone or IM-based remote viewing and permission approval.

It is less suitable for:

Users who only want the stable official Claude Code experience.
People who cannot accept the leaked-source background and copyright uncertainty.
Users who do not want to grant high system permissions to local tools.
Teams that need enterprise compliance, auditability, and official support.
Users unfamiliar with API keys, proxies, model compatibility, and local service configuration.

Risks and boundaries

This article cannot only talk about features. It also has to talk about risk.

The origin of cc-haha means it is not an ordinary community reimplementation. The README clearly states that it is based on leaked Claude Code source code and that the original source belongs to Anthropic. This creates uncertainty around copyright, compliance, and long-term maintenance.

Computer Use, H5 remote access, IM integration, and local permission approval are also high-permission capabilities. The more convenient they are, the more clearly boundaries need to be defined:

Do not expose H5 access on untrusted networks.
Do not treat the token as a long-term public login credential.
Do not grant the Agent access to unrelated sensitive applications.
Do not casually use it in production or company compliance environments.
Do not expose third-party model proxy settings or API keys in public repositories.

If your goal is to study AI coding tool architecture, desktop workflows, and Computer Use implementation, it is a useful reference. If you want to put it into a long-term production workflow, evaluate legal, permission, security, and maintenance risks first.

Summary

The most interesting thing about cc-haha is not whether it can replicate Claude Code. It is that it pushes Claude Code-style AI coding tools toward a desktop workbench form.

Sessions, projects, Worktree, diffs, permissions, remote access, Computer Use, model providers, scheduled tasks, and token usage are all brought into one desktop experience. That suggests the next step for AI coding tools is not only stronger models, but also a more complete workflow interface.

But its boundaries are also clear: it is not an official Anthropic product, it has a sensitive source-code background, and its high-permission features require caution. A better way to view it is as a project for observing where AI coding tools may evolve, not as a careless replacement for official Claude Code.

References

GitHub repository: https://github.com/NanmiCoder/cc-haha
Latest release: https://github.com/NanmiCoder/cc-haha/releases/tag/v0.2.6
Computer Use documentation: https://github.com/NanmiCoder/cc-haha/blob/main/docs/computer-use.md
Third-party model documentation: https://github.com/NanmiCoder/cc-haha/blob/main/docs/guide/third-party-models.md

Codex /goal vs Claude Code /goal: running long tasks until they are done

Thu, 14 May 2026 22:25:31 +0800

/goal is becoming an important command in AI coding tools.

It is not about making the model write a few more lines of code. It solves a more practical problem: when a task has clear completion conditions, can the agent keep going until those conditions are met, instead of stopping after every turn and waiting for the user to say “continue”?

Codex CLI has already added an experimental /goal command in its official docs. Claude Code has also published its own /goal documentation, describing it as an automation capability that can keep working across multiple turns. The names are the same, but the product direction is not exactly the same.

What problem does `/goal` solve?

Ordinary AI coding conversations usually work as a one-turn-at-a-time loop:

The user describes a task.
The agent analyzes, edits code, and runs tests.
The agent reports the result.
The user decides what to do next.

That workflow is fine for short tasks. But for migrations, refactors, test fixes, or issue backlog cleanup, it gets fragmented. The agent may move forward a little, then stop and wait for you to type “continue”.

/goal changes the question from “what should you do next?” to “what final state counts as done?” For example:

`1`	`/goal 完成登录模块迁移，所有 auth 测试通过，lint 无报错`

This kind of target naturally fits long tasks because it has a clear endpoint: tests pass, the build succeeds, files are split, a queue is empty, or acceptance criteria are satisfied.

Codex `/goal`: experimental and attached to the current thread

OpenAI’s Codex CLI documentation marks /goal as experimental. It is not a stable default capability and requires features.goals to be enabled first.

There are two ways to enable it:

`1`	`/experimental`

Or add this to config.toml:

1
2

[features]
goals = true

Once enabled, you can use it like this:

`1`	`/goal Finish the migration and keep tests green`

Common commands include:

/goal
/goal pause
/goal resume
/goal clear

According to OpenAI’s docs, Codex attaches the goal to the current active thread and keeps tracking that target while a larger task continues.

One detail matters here: the official wording for Codex /goal is restrained. It emphasizes setting an experimental goal for long-running work and attaching the goal to the current thread, but it does not describe, in the same level of detail as Claude Code’s docs, an independent evaluator that automatically checks every turn and starts the next one. So for now, it is better to treat Codex /goal as an experimental long-task goal mechanism, not a fully stable unattended execution mode.

Claude Code `/goal`: multi-turn execution driven by completion conditions

Claude Code’s /goal documentation is more explicit: after the user sets a completion condition, Claude keeps working across turns until that condition is met.

Example:

`1`	`/goal all tests in test/auth pass and the lint step is clean`

Claude Code’s mechanism is roughly:

After the current turn finishes, control is not immediately returned to the user.
A small, fast model checks whether the goal condition has already been met.
If it has not been met, Claude automatically starts the next turn.
If it has been met, the goal is cleared automatically and the completion status is recorded in the transcript.

This makes Claude Code’s /goal more like “auto-continue until the completion condition is satisfied.” It does not merely pin a target to the conversation; it gives an independent evaluation step the decision of whether to continue.

Claude Code also supports checking status directly:

/goal

The status shows the goal condition, elapsed time, evaluated turn count, token usage, and the evaluator’s latest reason.

To stop early, use:

`1`	`/goal clear`

stop, off, reset, none, and cancel also work as clearing aliases. After a goal is enabled, if the session is interrupted and later resumed with --resume or --continue, an active goal can be restored. However, elapsed time, turn count, and token baselines are recalculated.

The biggest difference

Both Codex and Claude Code are pushing AI coding from single-turn answers toward long-running task execution, but their /goal commands have different positioning.

Comparison	Codex CLI `/goal`	Claude Code `/goal`
Status	experimental	documented on a dedicated official page
Enablement	requires `features.goals`	usable directly in a trusted workspace
Goal scope	current active thread	current session
Common operations	set / view / pause / resume / clear	set / view / clear
Automatic evaluation	docs emphasize attachment and tracking	docs explicitly describe evaluator checks after each turn
Auto-continuation	official wording is restrained	starts the next turn automatically when conditions are unmet
Best fit	keeping a long-term target in a Codex task	letting Claude Code keep moving toward completion conditions

In short, Codex /goal is closer to “attach an experimental long-term target to the current thread.” Claude Code /goal is closer to “set a verifiable stop condition for the current session and let it keep working until satisfied.”

How to write a good `/goal`

Whichever tool you use, /goal is not a good place for vague wishes.

Not a great goal:

`1`	`/goal 把项目优化一下`

A better goal:

`1`	`/goal 将 payment 模块迁移到新 API，npm test -- payment 退出码为 0，git diff 只包含 payment 相关文件`

A good goal usually includes three things:

A clear completed state.
An executable validation method.
Boundaries that must be respected.

If the goal is large, add a stop condition:

`1`	`/goal 修复 eslint 报错，npm run lint 退出码为 0；如果超过 20 轮仍未完成，停止并总结剩余问题`

This matters. The stronger /goal becomes, the more it needs boundaries. Otherwise, the agent may modify too many files, run too long, consume too many tokens, or keep pushing forward on a question that should have been paused for human input.

When `/goal` is a good fit

Good fits:

Test fixes: until specific tests pass.
Code migrations: until all call sites are updated and compilation succeeds.
Batch cleanup: until a class of lint or type errors is reduced to zero.
Documentation completion: until all specified modules have documentation.
Issue queue handling: until every issue under a tag is handled or clearly classified.

Poor fits:

The requirement itself is still unclear.
The task needs frequent product judgment.
It involves high-risk deletion, data migration, or permission changes.
Acceptance can only be judged subjectively.
The task spans many unrelated modules.

A practical rule: if you can write “which command to run, what result to see, and which files must not be touched,” it is a good candidate for /goal. If you can only write “make this better,” ordinary conversation, plan mode, or human review is still safer.

What this means for AI coding tools

/goal points to a clear direction: AI coding tools are moving from interactive assistants toward continuously executable work units.

In the past, using an agent often meant staying nearby. If it got stuck, you prompted it. If tests finished, you told it to continue. If errors appeared, you issued another command. /goal compresses that interaction into a completion condition and lets the agent decide what the next turn should do.

But this also raises the bar for users. Writing prompts is no longer just describing a task; it also means defining acceptance criteria, validation commands, modification boundaries, and stop rules. In other words, the user’s job shifts from “keep telling it to continue” to “define what done means.”

The fact that both Codex and Claude Code have reached /goal shows that long-running agents are no longer only for background tasks or cloud queues. Local terminal coding tools now also need stronger autonomous progress.

Summary

Codex CLI and Claude Code both have /goal, but at this stage they should not be treated as the same feature.

Codex /goal is still experimental, requires features.goals, and is better understood as a way to maintain a long-term target in the current Codex thread. Claude Code /goal more explicitly connects completion conditions with auto-continuation, using an independent evaluator to decide whether to keep going.

For everyday development, this kind of command is best for engineering tasks with clear acceptance criteria. It does not replace product judgment or code review, but it can reduce the repetitive “continue,” “run it again,” and “fix until tests pass” loop inside long tasks.

The real skill is not memorizing the command. It is learning how to write tasks as clear, verifiable, stoppable goals.

References

OpenAI Codex CLI Slash Commands: https://developers.openai.com/codex/cli/slash-commands
Claude Code Goal documentation: https://code.claude.com/docs/en/goal

Why DeepSeek Became the Cost-Saving Key in This Round of AI Coding Tools

Mon, 11 May 2026 04:59:00 +0800

In this round of AI coding tool competition, the surface battle is about model capability, plugin ecosystems, and agent automation. But once you actually use these tools, the first wall you hit is cost.

Claude Code, Codex, OpenClaw, and Superpowers are all useful, but they share one trait: once a task becomes complex, they eat tokens aggressively. They need to read the project, build a plan, call tools, summarize context, repeatedly check results, and sometimes launch multiple subtasks. The smarter the model and the more automated the workflow, the easier it is for the bill to quietly grow.

That is why DeepSeek has become important in this cycle. Not merely because it can write code, but because its long context and cache pricing happen to hit the most expensive part of AI coding tools.

Why Agent Tools Burn So Many Tokens

Traditional chat-style coding assistants usually work in question-and-answer mode. You ask how to write a function, and the model returns a code snippet. This still costs tokens, but it is relatively controllable.

Agent tools are different. They do not just answer questions. They enter the project like a temporary engineer:

scan directories and key files;
understand the requirement and existing architecture;
make a plan;
modify files;
run commands or tests;
keep fixing based on errors;
summarize what changed at the end.

During this process, the model repeatedly reads the same context. Project descriptions, code snippets, tool outputs, conversation history, plans, and error logs all get placed back into the context. Once the task is a little complex, hundreds of thousands of tokens can disappear quickly.

If you add more aggressive plugins, the cost becomes even more obvious. Some OpenCode or Claude Code enhancement tools may organize a whole agent team by default. You only wanted to change a small feature, but it may still start planning, review, execution, and retrospective steps. The task may look more “intelligent”, but the token count keeps climbing.

The Advantage of Superpowers Is On-Demand Activation

One advantage of tools like Superpowers is that they do not force a full agent workflow onto every task.

Most of the time, you can still let Claude Code, OpenCode, or Codex work in their normal mode. Only when you explicitly call a skill, such as brainstorming, planning, executing a plan, or doing a retrospective, does it enter a heavier automation flow.

That matters for cost.

AI coding should not use heavy artillery for every task. Changing one config line, checking one error, or writing a small script can be handled through ordinary conversation. Only complex refactors, cross-file changes, long-document processing, and multi-round validation deserve a full agent workflow.

The stronger the tool, the more you need to control when it triggers. Otherwise, more automation simply means more waste.

DeepSeek’s Key Advantage Is Cheap Cache Hits

One important reason DeepSeek fits these agent tools is its low cache-hit cost.

AI coding tasks contain a lot of repeated prefixes: project background, system prompts, tool instructions, file content, and earlier conversation turns often appear again in later requests. If the model service supports prompt caching, those repeated parts become much cheaper after a cache hit.

For many models, a cache hit is only somewhat cheaper than a miss, perhaps around one third of the original price. DeepSeek’s advantage is that the gap after a cache hit can be much larger. For long-context, multi-round agent workflows that repeatedly read the same project, this gap shows up directly on the bill.

In other words, DeepSeek is not necessarily the strongest answer on every single turn. But in scenarios with long tasks, many rounds, and repeated context reads, its cost structure is unusually suitable for AI coding.

Long Context Makes Claude Code More Useful

When Claude Code or similar tools are connected to DeepSeek V4, another clear advantage is long context.

AI coding tools fear insufficient context. Once context runs short, compression becomes frequent. Once compression becomes frequent, previously read details may be lost. The model may start forgetting the project structure, constraints, or why a certain file was changed, and quality declines afterward.

DeepSeek V4’s long-context capability makes it better suited for code repositories, document batch processing, subtitle translation, and site article cleanup. Especially when connected to tools like Claude Code or OpenClaw, the right configuration can delay context compression and preserve more project detail.

That is why some tasks feel “durable” when run on DeepSeek. It may not be dazzling at every step, but it can tolerate long-running, low-cost, repeated calls.

How to Split Work Between V4 Pro and V4 Flash

DeepSeek V4 Pro and V4 Flash should not be mixed casually.

For simple tasks, DeepSeek V4 Flash is usually a better fit. It is fast and cheap, and is often enough for:

subtitle translation;
document cleanup;
ordinary script generation;
small code edits;
lightweight OpenClaw tasks;
simple site content processing.

For complex tasks, consider DeepSeek V4 Pro:

large-scale refactoring;
multi-module code understanding;
complex reasoning;
long-chain agent tasks;
high-risk code changes;
engineering tasks that require stronger planning.

Many people want to attach the strongest model immediately, but that is often uneconomical. The practical way to use AI coding tools is to layer tasks: let the cheaper model handle a large amount of routine work, and reserve the expensive model for key decision points.

MiniMax, Doubao, and DeepSeek Occupy Different Positions

Among domestic models and plans, MiniMax, Doubao, Kimi, and DeepSeek each have their own place.

MiniMax’s advantage is generous quota, low price, and broad functionality. It may not be the smartest coding model, but it is cost-effective for translation, lightweight cleanup, and batch processing. For example, batch subtitle processing, format conversion, and simple proofreading are good fits for MiniMax-style plans.

Doubao’s advantage is a broader tool ecosystem: image, video, search, TTS, possible STT, and embedding can be connected together. It feels more like a comprehensive toolbox.

DeepSeek’s position is clearer: text, code, long context, and low-cost caching. It lacks a complete image generation, voice, and video ecosystem, and its weaknesses are obvious. But in AI coding and long-text agent workflows, its strengths are long enough to matter.

So this is not about one tool replacing another. It is about splitting the task and using each tool where it fits.

Saving Money Is Not Just Choosing a Cheap Model

Saving money in AI coding does not mean simply switching every request to the cheapest model.

The effective methods are:

Do not start a heavy agent for simple tasks.
Do not use Pro when Flash is enough.
Use cache as much as possible for long tasks.
Keep repeated context stable, so meaningless changes do not break cache hits.
Let a cheaper model draft and batch-process first, then use a stronger model for key reviews.
Tell the agent clearly not to repeat facts or summarize the same point again and again.

The last point matters more than it looks. AI tools are prone to verbosity, and verbosity is not only a reading problem; it is also a cost problem. Putting “describe each fact once and state each opinion once” into the prompt can improve both article quality and token consumption.

What AI Coding Workflows DeepSeek Fits Best

DeepSeek is best suited for:

reading long code repositories;
lightweight multi-file edits;
batch document cleanup;
batch subtitle translation;
Hugo article cleanup;
agent plan execution;
low-cost automation with lots of repeated context.

It is not the best fit for every task. If you need especially strong frontend taste, complex product judgment, or cross-modal creation, you may still need Claude, GPT, Gemini, Doubao, or other tools.

But whenever a task is long-text, long-context, repeated-call, and cost-sensitive, DeepSeek can easily become the first choice.

Summary

In this round of AI coding tools, DeepSeek’s value is not just that a domestic model can write code. Its real value is that it addresses the most practical pain point of agent tools: long tasks are too expensive.

Tools like Claude Code, OpenClaw, and Superpowers make the development process increasingly automated, but behind that automation are massive context reads and multi-round calls. Whoever can lower this part of the cost can make AI coding go from “fun once in a while” to “affordable every day”.

DeepSeek’s long context, low cache cost, and layered use of V4 Flash / V4 Pro put it in exactly that position.

The real cost-saving key in this cycle is not avoiding good models. It is combining good models, cheap models, cache, and agent workflows properly. Once you understand that bill, AI coding tools can become real productivity rather than a beautiful but expensive toy.

How to Choose AI Coding Plans: Convenience for Light Users, Flexibility for Heavy Users

Sun, 10 May 2026 08:20:58 +0800

AI coding plans have changed quickly over the past six months. Many tools have shifted from message-style pricing to usage-based pricing, generous low-cost tiers have become tighter, and some overseas services have added stricter identity checks, regional limits, and usage rules.

For developers, the question is no longer just which model is strongest. It is also about how much to spend every month, whether the quota is enough, whether the tool feels comfortable to use, and whether you can switch smoothly when a provider suddenly raises prices or changes the rules.

A practical conclusion is this: light users should buy convenience, mid-level users should buy value, and heavy users should buy flexibility. The heavier your usage, the less you should bind models and tools together in a single plan.

Four things to evaluate before choosing a plan

In the past, people usually looked at three things when choosing an AI coding plan:

Whether the model was strong enough.
Whether the response speed was stable.
Whether the usage quota was sufficient.

Now there is a fourth factor: whether the model and the tool can be separated.

The model provides reasoning ability, while the tool provides context management, file editing, agent orchestration, and workflow experience. Both matter, but they are better not fully tied together. For example, if you like Claude models, you can buy an official plan or connect the API to another tool. If you like a certain editor or agent environment, it is better if it can connect to different models instead of only its own.

The value here is not complexity for its own sake. It is risk reduction. AI coding is one of the fastest-changing segments in the industry. A plan that feels generous today may switch pricing in two months, and a tool that feels good today may become worse after the next model integration change. Separating models from tools gives you room to move.

Overseas plans are getting tighter

Tools such as GitHub Copilot, Cursor, Windsurf, and Claude Code are still the primary choices for many users, but the trend is clear: cheap plans with unusually high quotas are becoming harder to sustain, and usage-based billing is becoming more common.

Once services like GitHub Copilot lean more heavily on usage-based billing, the room for plan-based arbitrage becomes much smaller. For light users, these products are still convenient. But for people who frequently use agents, long context, and complex code tasks, actual consumption starts to look much closer to real API cost.

Cursor and Windsurf essentially package model capability into an IDE experience. Their strength is convenience and a mature editor workflow. Their weakness is tighter tool lock-in. Once you become dependent on their proprietary agents, indexing, and automation flow, migration costs can rise quickly.

Claude Code remains attractive in terms of experience and ecosystem attention, but overseas subscriptions, identity verification, regional restrictions, and the safety of relay services are all risks that users in China have to factor in. Third-party relay services may mix models, be unstable, expose user data, or even disappear entirely, which makes them hard to treat as long-term infrastructure for important work.

The strengths and limits of domestic plans

One advantage of domestic AI coding plans is that many of them are offered through APIs, which means they are less tightly bound to a specific tool. You can connect them to OpenCode, Cline, Continue, your own scripts, or internal agents.

The weakness is also clear: if you want model strength, high speed, and generous quota all at once, very few plans can deliver everything together.

GLM models are strong within the domestic model landscape, but throughput during peak hours may not be stable, which can make heavy tasks feel slow. Kimi is capable, but pricing and quota rules still need ongoing attention, especially whether backend quota is transparent. Models like MiniMax are friendlier in speed and allowance, which makes them suitable for light day-to-day tasks, batch jobs, and simpler coding help, though they may sit a tier lower on harder engineering reasoning. DeepSeek can be highly cost-effective when a new model is still in its promotional pricing period, but once that ends, you have to evaluate it again under normal pricing.

That is why domestic options are often better used as a model pool: different tasks use different models, instead of betting everything on one model and one plan.

Light users: choose what feels convenient and do not overbuild

If you only ask AI to tweak scripts, patch documentation, explain errors, or generate small tools once or twice a week, you probably do not need a complicated setup.

For this kind of user, convenience matters most. Cursor, Windsurf, Trae, CodeBuddy, Tongyi Lingma, GitHub Copilot, and similar tools are all worth trying. The goal is not the absolute lowest unit cost. The goal is low friction: something stable inside your editor, decent completions, and easy recovery when it makes a mistake.

Light users usually should not spend too much time building multi-layer API setups, relays, and proxy chains just to save a little money. The time cost, account risk, and debugging overhead are often more expensive than the subscription fee you save.

Mid-level users: focus on value, but also on portability

If you use AI every day for coding, project edits, test generation, and document work, then quota and actual consumption start to matter much more.

For this kind of user, it makes sense to separate the main tool from backup models. For example, one convenient IDE plan can handle daily editing, while a multi-tool API or aggregator plan can be used for longer-context and more complex agent tasks.

Three things matter most at this stage:

Whether it supports third-party tool integration.
Whether token or quota consumption is visible and understandable.
Whether overage means throttling, downgrade, shutdown, or pure usage-based billing.

If a plan looks cheap but can only be used inside its own tool, you need to count migration cost as part of the real price. If a plan costs more but can plug into multiple tools, it may be the better long-term choice.

Heavy users: do not lock models and tools together

For heavy users, flexibility is the core requirement.

When a person or team uses AI agents intensively every day, consumption grows very quickly. Repository search, long-context edits, multi-round debugging, and automated test repair can all multiply token use. Once you rely on a single plan, three problems show up easily:

The quota suddenly becomes too small.
The pricing rule suddenly changes.
A tool or model becomes temporarily unavailable.

A more stable approach is to prepare a layered setup: one primary agent tool, one or more replaceable model endpoints, one low-cost model for simple work, and one high-capability model for harder tasks. Small routine work should not always go to the most expensive model, and critical work should not rely only on the cheapest model either.

For heavy users, the ability for tools to connect to any model and for models to connect to any tool matters more than saving a few dozen dollars per month. The real expense is not the subscription itself. It is the cost of being locked into one ecosystem and having to rebuild your workflow later.

A more stable combination strategy

A relatively steady way to structure your setup looks like this:

Use a low-cost model for light tasks such as code explanations, small scripts, formatting, and simple documents.
Use a value-oriented model for mid-level tasks such as standard feature work, test completion, and refactor suggestions.
Use a stronger model for difficult tasks such as architecture changes, cross-file fixes, hard bugs, and long-context reasoning.
Keep the tool layer open by choosing tools that can connect to APIs, export configuration, and switch models.
Maintain a backup path so that when a main plan changes rules, you can switch quickly to another model or tool.

This may not be the absolute cheapest setup, but it is much more resilient. AI coding prices and quotas will keep changing. The thing worth investing in for the long term is a portable workflow, not a short-term deal that only looks unusually generous for a while.

Summary

AI coding plans should not be judged by monthly price alone. Light users should keep things simple and choose a convenient tool. Mid-level users should start paying attention to quota, consumption, and portability. Heavy users should decouple models from tools and avoid being trapped in one ecosystem.

The most useful thing to remember is that plans will change, models will change, and tools will change too. Keeping the choice in your own hands is the most important form of cost control in long-term AI coding work.

Claude Code Limits Doubled: Anthropic Uses SpaceX Compute Expansion to Ease Usage Constraints

Sat, 09 May 2026 10:59:48 +0800

On May 6, 2026, Anthropic announced higher usage limits for Claude Code and the Claude API, along with a new compute partnership with SpaceX. For everyday users, the most direct change is more usable capacity for Claude Code. For developers and enterprises, the larger point is that Claude’s inference capacity is still expanding.

The announcement has two parts:

Higher limits for Claude Code and the Claude API.
New compute capacity from SpaceX data centers.

What changed for Claude Code limits

Anthropic says the following three changes took effect on the day of the announcement:

Claude Code’s five-hour rate limit doubled for Pro, Max, Team, and seat-based Enterprise plans.
Peak-hour limit reductions for Pro and Max Claude Code accounts were removed.
Claude Opus API rate limits were significantly increased.

In practical terms, if you often use Claude Code for long coding sessions, repository analysis, refactoring, debugging, or agent workflows, this change may reduce the number of times a task stops before it is finished.

That does not mean unlimited usage. Claude Code is still affected by subscription plan, usage pattern, model, task length, context size, and platform policy. But Anthropic has clearly expanded the usable room compared with the previous limits.

Why compute affects the Claude Code experience

Tools like Claude Code consume more resources than ordinary chat. A single coding task can involve:

Reading many files.
Long-context analysis.
Multiple tool calls.
Generating, editing, and checking code.
Repeatedly running tests or explaining errors.
Using Opus for difficult reasoning.

Behind those actions are not only tokens, but also inference capacity, concurrency, and scheduling resources. Users see limits, queues, or slower peak-hour behavior; the platform sees pressure between compute supply and demand.

So Anthropic putting limit increases and a compute partnership in the same announcement is meaningful. It is saying that improving Claude Code is not just a plan-setting change, but also depends on more backend inference capacity.

What the SpaceX partnership adds

Anthropic says it has signed an agreement with SpaceX to use the full compute capacity of SpaceX’s Colossus 1 data center. The announced capacity is over 300 megawatts, corresponding to more than 220,000 NVIDIA GPUs, and will be made available to Anthropic within a month.

This added capacity is expected to directly improve available capacity for Claude Pro and Claude Max subscribers.

Anthropic also says it is interested in future work with SpaceX on orbital AI compute. That is more of a long-term direction, not the same thing as the Claude Code limit increase users can feel immediately.

Anthropic’s compute footprint is getting larger

SpaceX is only one part of Anthropic’s recent compute expansion. The company also lists other partnerships:

Up to 5GW with Amazon, including nearly 1GW of new capacity planned to come online by the end of 2026.
5GW with Google and Broadcom, expected to come online starting in 2027.
A strategic partnership with Microsoft and NVIDIA, including $30 billion of Azure capacity.
A $50 billion U.S. AI infrastructure investment with Fluidstack.

Anthropic also notes that Claude training and inference will use multiple types of AI hardware, including AWS Trainium, Google TPUs, and NVIDIA GPUs.

The trend is clear: competition among leading model companies is not only about model names, benchmarks, and product features. It is also about power, data centers, GPUs, TPUs, networking, and global deployment capacity.

Practical impact for Claude Code users

For developers, the most important change is the doubled five-hour Claude Code limit. It affects scenarios such as:

Reading large repositories.
Multi-file refactoring.
Bug investigation and test fixing.
Code migration and dependency upgrades.
Long-running agentic coding tasks.
Multiple people using Claude Code in Team or Enterprise plans.

A common Claude Code problem has been reaching the limit while a task is still in progress. Higher limits make it easier for an agent to complete a full task instead of stopping halfway.

For Pro and Max users, removing peak-hour limit reductions is also important. It means the experience may become more stable during busy periods, with less disruption from temporary tightening.

What it means for API users

The announcement also says Claude Opus API rate limits have increased significantly. For teams using Opus for difficult tasks, that usually means:

Higher concurrency.
Fewer 429 rate-limit errors.
Easier support for batch workloads.
Better fit for long-context, complex reasoning, and agent workflows.

Actual limits still vary by account, organization, model, and plan. Before production deployment, teams should still check their Anthropic Console, rate limit documentation, and error logs.

Enterprise and regional deployment matter more

Anthropic also notes that regulated industries such as finance, healthcare, and government increasingly need regional infrastructure to satisfy compliance and data residency requirements. Part of its capacity expansion will therefore be outside the United States, especially for inference capacity in Asia and Europe.

This matters for enterprise customers. Once large model applications enter core business workflows, the questions are not only whether the model is good enough. They also include:

Whether data stays in the required region.
Whether industry compliance requirements are met.
Whether peak-hour capacity is stable.
Whether team-level and organization-level concurrency are supported.
Whether audit, permission, and security controls are available.

From that perspective, compute expansion is not just performance news. It can shape enterprise procurement and deployment decisions.

Summary

Anthropic’s message is direct: Claude Code and Claude API usage constraints are being relaxed because new compute capacity is coming online.

For everyday Claude Code users, the most important points are the doubled five-hour limit and the removal of peak-hour reductions for Pro and Max. For API and enterprise users, the main points are higher Opus rate limits and Anthropic’s longer-term compute partnerships with SpaceX, Amazon, Google, Microsoft, NVIDIA, and Fluidstack.

AI tools are increasingly infrastructure services. Model quality matters, but stable capacity, regional compliance, limit policy, and cost control also shape the user experience.

Reference:

Anthropic: Higher usage limits for Claude and a compute deal with SpaceX

What to Do if Your Claude Account Is Suspended: Claude Code Limits and Appeal Guide

Sat, 09 May 2026 10:32:12 +0800

When a Claude or Claude Code account is suddenly limited, suspended right after payment, loses Pro access, or shows lower-than-expected usage capacity, many users naturally look for quick explanations. The important point is that this should not be treated as a simple “change IP” or “create another account” technical problem. Account risk systems usually combine signals such as region, payment, device, login behavior, usage content, automation, and sharing patterns.

A safer way to handle the issue is to first identify what kind of problem you actually have: normal quota limit, payment or subscription mismatch, Claude Code authorization issue, or an account-level action because Anthropic believes usage violated its policies or terms.

First, distinguish three situations

The first category is normal usage limits. Claude Pro, Max, Team, API, and Claude Code have different quota models. Peak-hour use, long context, coding tasks, and agent workflows may consume limits faster. Seeing “limit reached” does not necessarily mean your account is banned.

The second category is subscription or authorization trouble. For example, payment may have succeeded but access has not refreshed, a mobile subscription may not match the web account, Claude Code may not be logged in correctly, or an old ANTHROPIC_API_KEY may remain in your environment. Start by checking billing, login state, and client configuration.

The third category is account suspension or termination. Typical signs include emails mentioning suspension, disabled account, or termination, or a login page that says the account is unavailable. In this case, do not repeatedly switch devices, networks, and accounts to try again. That may make the risk signals more complicated.

Common triggers

Anthropic’s help and privacy documentation mention common risk areas such as violations of the Usage Policy, account creation or use from unsupported regions, terms violations, repeated violations, unusual access, and abuse.

In practice, risky patterns include:

Account registration, login region, and payment region do not match.
Long-term use of datacenter proxies, shared proxies, or frequent IP switching.
Multiple people sharing one personal account.
Frequent logins from many devices or regions in a short time.
Automated high-frequency access to Claude.ai.
Treating Claude Code as a shared service or resale entry point.
Requesting content that clearly violates Anthropic’s policies.
Conflicts among payment method, billing address, and account region.

The key is not that any single signal always causes suspension. The risk increases when multiple abnormal signals appear together.

Do not solve it by evading risk controls

Online advice often suggests “stable usage solutions” such as fingerprint browsers, device fingerprint reset, deleting local folders, changing environments, aligning time zone and language, or registering with a new email. Some of this is ordinary troubleshooting, but some is clearly aimed at evading platform risk controls.

Do not treat “bypassing risk control” as the solution. Reasons are simple:

It may violate the terms of service.
It may add more account risk signals.
It does not solve root causes such as payment, region, or policy violations.
For team or business use, it makes later appeals harder to explain.

If your goal is long-term stable use of Claude, the right direction is not disguise. It is making account information, region, payment, device, and usage real, consistent, and explainable.

Troubleshooting Claude Code limits

Claude Code users can start with:

1
2

claude --version
claude auth status

If you use an API key, confirm that the environment variable points to the right account:

`1`	`echo $ANTHROPIC_API_KEY`

In Windows PowerShell:

`1`	`echo $env:ANTHROPIC_API_KEY`

If you have used web login, OAuth, API keys, third-party clients, or different terminals, standardize the authentication method first. One tool may still be using old credentials.

Also distinguish two cases:

Claude Code reached its usage limit: usually a quota or subscription issue.
The account or organization is disabled: usually an account, organization, payment, or policy risk issue.

For the first, wait for quota refresh or adjust the plan. For the second, keep screenshots and emails, then use official support or appeal channels.

Compliant stability tips

To reduce the chance of account problems, start with the basics:

Use a normal account in a supported country or region.
Keep login region, payment method, and billing information consistent when possible.
Avoid sharing a personal account among multiple people.
Do not use a personal Pro/Max account as a team API pool.
Avoid frequent changes of IP, device, and browser environment.
Do not use unknown third-party Claude clients.
Avoid high-frequency automation against Claude.ai’s web interface.
For business or team use, prefer Team, Enterprise, or API plans.
Read Anthropic’s Usage Policy and avoid restricted use cases.

If you genuinely need to use Claude on multiple devices, log in normally. Do not keep clearing environments, changing fingerprints, or switching proxies. Excessive environment manipulation can itself look abnormal.

What to do after suspension

If the account is already suspended, handle it in this order:

Check emails from Anthropic or Claude and confirm the stated reason or message type.
Stop creating new accounts, changing networks, and retrying from more devices.
Collect account email, subscription order, payment proof, and recent usage context.
If you believe it is a mistake, submit an appeal or contact support through official channels.
Explain the real usage scenario. Do not invent region, identity, or purpose.
If payment is involved, ask separately about refund or subscription handling.

When appealing, be specific. Mention whether you used Claude Code, switched devices, used a VPN, shared with a team, or connected third-party tools. The platform needs to identify the source of risk. A vague “I did nothing” usually does not help much.

Claims to treat carefully

Some posts or videos claim that “fixed fingerprints prevent bans”, “one browser prevents suspension completely”, “deleting one directory resets device identity”, or “matching IP, time zone, and language solves everything”. Do not accept these claims uncritically.

Platform risk systems are usually multidimensional. They do not only look at browser fingerprint or IP. Account history, payment information, region policy, content, access frequency, automation patterns, client version, and API calling behavior may all matter. Single-signal disguise is not long-term stability and may create more inconsistencies.

More importantly, many so-called anti-ban solutions are actually selling tools or services. What users really need is to identify the risk source, use the service compliantly, and preserve appeal evidence, not rely on third-party environment wrappers for account safety.

Summary

Claude account suspension or Claude Code limitation is not always caused by one thing. It may be quota, subscription, authorization, or a combined risk signal involving region, payment, device, sharing, automation, or policy-sensitive content.

The key to long-term stable use of Claude is not bypassing risk controls. It is compliant usage, consistent account information, stable access patterns, and formal plans for team use. If an account is suspended, stop manipulating the environment, preserve evidence, and use official appeal and support channels.

References:

From PPT to Prototypes: Use Cases for Guizang PPT Skill and Huashu Design

Sat, 09 May 2026 08:34:23 +0800

Two design-oriented Agent Skills made by Chinese developers are worth looking at side by side: guizang-ppt-skill by Guizang, and huashu-design by Huashu.

They are not “design tools” in the traditional sense. Instead, they turn a design process, aesthetic preferences, checklists, and engineering templates into Skills that an Agent can execute. You are not opening a UI and slowly dragging elements around. You hand the requirement to an Agent such as Claude Code, Codex, or Cursor, and let it generate HTML, PPT, animation, or prototypes through a fixed workflow.

The value of these projects is not that they let AI improvise. It is that they turn “how to make this not look bad” into a repeatable process.

guizang-ppt-skill: focused on magazine-style web PPT

Guizang’s guizang-ppt-skill has a clear positioning: it generates single-file HTML, horizontally paged PPTs with a visual baseline of “digital magazine x e-ink.” It feels more like a layout system prepared for talks than a general-purpose design framework.

The repository README lists these core capabilities:

Single-file HTML output, with no build step or server required. Open it directly in a browser.
Horizontal page navigation, with support for keyboard, mouse wheel, touch swipes, bottom dots, and an ESC index.
5 preset theme palettes, including Ink Classic, Indigo Porcelain, Forest Ink, Kraft Paper, and Dune.
10 page layouts, including opening cover, section divider, big-number data poster, text-left-image-right, image grid, Pipeline, suspense question, large quote, Before/After comparison, and mixed text-image layout.
Built-in templates, component notes, layout skeletons, theme configuration, and quality checklists.

It is suitable for offline sharing, internal industry talks, private salons, AI product launches, demo days, and presentation decks with a strong personal style. It is less suited to large tables, training courseware, or multi-person collaborative editing.

This project makes a good tradeoff: it does not try to cover every design scenario, but narrows itself to “magazine-style PPT.” Theme colors are chosen from presets, and layouts have clear skeletons. That actually reduces the chance of the Agent drifting off course.

If you often need to turn opinions, industry observations, or product launch content into a presentation deck, it can be highly practical.

The install command is straightforward:

`1`	`npx skills add https://github.com/op7418/guizang-ppt-skill --skill guizang-ppt-skill`

huashu-design: a fuller HTML-native design workflow

Huashu’s huashu-design has broader coverage. Its goal is not just to make PPTs, but to treat HTML as a native design canvas and let an Agent produce deliverable design assets.

The repository README lists these capabilities:

Clickable App or Web prototypes.
HTML slides, plus editable PPTX export.
Product launch animations, MP4, GIF, and versions with music.
Multiple design directions shown side by side for comparison.
Infographics, data visualizations, and PDF, PNG, SVG export.
5-dimensional expert review, covering philosophical consistency, visual hierarchy, execution craft, functionality, and innovation.

Its core idea is to let the Agent understand the brand and assets first, then produce high-fidelity design. The project emphasizes a Core Asset Protocol: when dealing with a specific brand, first confirm the logo, product images, UI screenshots, color palette, fonts, and brand guidelines instead of guessing from memory.

This matters. Many AI-generated designs look “like design,” but they do not look like a real product or brand. huashu-design tries to solve that problem up front: find real assets first, then design.

The install command is:

`1`	`npx skills add alchaincyf/huashu-design`

It is better suited to people who want to complete a fuller design delivery from the terminal: product prototypes, launch animations, presentations, infographics, and design reviews can all be handled inside one Agent workflow.

The biggest difference between the two

In simple terms, guizang-ppt-skill is a narrower and steadier presentation deck generator; huashu-design is a broader and more complete HTML-native design system.

If you only look at PPT:

guizang-ppt-skill emphasizes magazine feel, rhythm, layout, and single-file browser presentations.
huashu-design emphasizes general design capability, editable PPTX, brand assets, export paths, and review workflows.

If you look at overall design capability:

guizang-ppt-skill has clearer boundaries and is suitable for quickly making a stylish horizontal presentation.
huashu-design is more comprehensive and is suitable for breaking a product or brand design task into prototypes, animations, slides, and infographics.

These two projects also represent two different ways to write Skills. The former is like a highly constrained set of templates and aesthetic rules. The latter is like a workflow manual for a small design team.

Why this kind of Skill matters

A common problem with Agents is that they “can do it, but not consistently.” The same request may produce a strong result once, then drift into purple gradients, rounded cards, fake icons, and a pile of fancy-sounding empty copy the next time.

Skills are a way to add stability. They lock down things such as:

Reusable templates.
Executable checklists.
Clear aesthetic preferences.
Rules for avoiding common mistakes.
Output formats and validation flows.
When to ask questions and when to start directly.

This is far more reliable than simply writing “please make it look more premium.”

This is especially true for design tasks. Aesthetics cannot be reproduced reliably by a single prompt. What really helps is process: confirm assets first, decide the direction, build the structure, work on the visuals, then inspect the output. When this process is written as a Skill, the Agent becomes more like a collaborative executor rather than a one-shot image generator.

Usage recommendations

If you just want to turn a topic into an offline talk or sharing deck, try guizang-ppt-skill first. Its output boundary is narrow, and single-file HTML is also easy to distribute and preview.

If you want an Agent to take on a more complete design task, such as App prototypes, launch animations, branded slides, exportable PPTX, or infographics, look at huashu-design first. Its workflow is longer and better suited to tasks that need multiple rounds of iteration and exported deliverables.

If you are already writing your own Codex or Claude Code Skill, both projects are worth studying:

To learn “how to make a narrow scenario stable,” look at guizang-ppt-skill.
To learn “how to break a complex workflow into executable protocols,” look at huashu-design.

Summary

What Guizang and Huashu have in common is that both turn “design capability” from a one-time prompt into a repeatable process.

guizang-ppt-skill focuses on magazine-style HTML PPT and works well for highly stylized presentations. huashu-design focuses on an HTML-native design system covering prototypes, animations, slides, infographics, and reviews. The problem they solve is not “can AI generate design,” but “can AI generate deliverable design through a stable method.”

This may become an important type of open-source project in the Agent tooling ecosystem: not just code templates, but packaged human experience, aesthetics, and working methods as Skills.

Reference links:

Codex vs Claude Code: How to Choose Between Two Subagent Designs

Fri, 08 May 2026 14:14:01 +0800

AI coding tools are paying more attention to subagents. This is not just feature chasing. A single agent eventually hits limits when it has to handle real engineering work.

If one agent reads code, checks logs, edits implementation, runs tests, analyzes failures, and summarizes results at the same time, the main context quickly becomes noisy. Search results, command output, test logs, and intermediate reasoning get mixed together. Later decisions become less reliable. Work also becomes hard to parallelize: exploration, implementation, verification, and review all sit on one main thread.

The purpose of subagents is to reduce that pressure. The main session stops doing everything from start to finish and becomes more like a coordinator: define goals, assign work, receive results, and merge them into the final answer. A subagent handles a local piece of work, such as exploration, implementation, verification, or review, and returns a compressed conclusion.

So a subagent is not “another copy of me.” It is a way to split tangled engineering work into clearer roles.

Shared Foundations

A mature subagent system usually needs four foundations:

Context isolation.
Role specialization.
Project and user-level configuration.
Tool and permission boundaries.

Context isolation comes first. Real repositories produce a lot of intermediate material: dozens of search hits, hundreds of test log lines, noisy command output. If all of that is poured into the main session, the main thread gets confused. A subagent can digest that local process and bring back only the signals that matter.

Role specialization is just as important. Multi-agent does not mean opening several identical models. Exploration roles should search, read, and summarize. Implementation roles should focus on local code changes. Verification roles should run checks, identify risks, and report clearly.

Tool and permission boundaries determine whether the system can be used safely. A subagent should not automatically inherit every capability of the main session. A read-only explorer does not need write access. A verifier may not need to change implementation. Background tasks and isolated worktrees need visible boundaries.

Codex and Claude Code share these concerns, but they take different routes.

Codex: Explicit Delegation

Codex’s subagent design feels restrained.

It gives you a controlled, lightweight delegation mechanism around the current main session. When to delegate, who receives the task, and when results are collected are all explicit decisions. The control flow stays in the current task.

Its traits are clear:

The main session explicitly delegates subwork.
The role set stays small.
The main session knows which agent is doing what.
Results return to the main line for final judgment.
Collaboration boundaries are transparent.

This works well for teams that care about manual orchestration, predictability, and execution determinism. You can ask an explorer to inspect a call chain, ask a worker to make a bounded change, then let the main session merge the result and decide whether to test further.

The tradeoff is that orchestration pressure still sits with the main session. The main thread must decide when to split work, how to split it, who should take it, and how to merge the result. For lightweight collaboration this is pleasant; for long-running engineering workflows it can become tiring.

Claude Code: Agents as Workstations

Claude Code takes a more platform-like route.

It treats agents as describable, selectable, configurable, memorable, isolated, and background-capable objects. A subagent is not just a helper in a conversation. It is closer to a workstation in an engineering system.

The system can expose agent lists, use cases, descriptions, and tool boundaries to the model, allowing the model to decide which role should handle a turn. That makes delegation more automatic.

Several capabilities define this direction.

First, a role system. Explorer, planner, general-purpose, and verifier roles can carry usage descriptions, tool restrictions, default models, and runtime conditions. A read-only explorer can be prevented from editing files. A planner can focus on architecture. A verifier can focus on checks.

Second, inheritance and overrides. A subagent is not completely free. It inherits the larger boundary of the main session by default, but can adjust local behavior within allowed rules. The main session defines the big boundary; the agent performs local assembly inside it.

Third, memory. Memory is not just “remember a few things.” It can have scope. User memory is like long-term preference. Project memory is repository background. Local memory is environment-specific state. This lets some agents avoid relearning the project from scratch.

Fourth, background work and worktree isolation. Some verification tasks can keep running in the background, while the main thread continues. When stronger isolation is needed, an agent can work in a separate worktree, keeping the project connected but the operation space separated.

Fifth, plugin ecosystem. If agents are first-class objects, you have to think about distribution, installation, priority, override rules, and safety. Plugin agents can enter the system, but high-risk fields such as permission mode, hooks, and MCP servers should remain guarded.

This makes Claude Code feel more like an agent runtime than a one-session collaboration tool.

The Difference

Codex is closer to a controlled delegation tool:

Explicit delegation.
Lightweight role set.
Clean control flow.
Subtasks centered on the current session.
Good for deterministic, human-orchestrated work.

Claude Code is closer to an engineering workstation system:

Agents are formally modeled.
Roles are more systematic.
Memory, background execution, isolation, and plugins are part of the runtime.
The model can help choose roles.
Good for long-term projects and platform-like workflows.

The real question is not which one has more features. It is whether you want a subagent to be “a helper I explicitly call” or “a long-lived workstation in the system.”

How to Choose

Choose the Codex style if you value explicit control, lightweight delegation, and safe parallelism inside the current session. It is good for code review, small changes, clearly scoped implementation tasks, and workflows where a human wants to keep the rhythm.

Choose the Claude Code style if you want systematic roles, long-term memory, background execution, worktree isolation, plugin extension, and a more complete agent runtime.

Ask two questions:

Are you comfortable with the model choosing who should do the work?
Do you need a fuller agent runtime?

If the first question makes you uncomfortable, explicit delegation is likely better. If the second answer is yes, a platform-like workstation system may fit better.

Practical Advice

Do not treat subagents as “more models means stronger.” Better practice is:

Give every role a clear task boundary.
Limit the tools each role can use.
Ask subagents to return conclusions, not raw logs.
Keep final decisions in the main session.
Make background tasks and worktree isolation visible.
Set clear safety boundaries for plugin agents.

The value of subagents is not quantity. It is clean division of labor, cleaner context, and more stable main-thread decisions.

Summary

Codex and Claude Code solve the same problem: one agent cannot comfortably carry all real engineering work. Both recognize the importance of context isolation, role specialization, permissions, and local summarization.

Codex is more restrained, emphasizing explicit delegation and main-session control. Claude Code is more systematic, treating agents as configurable, memorable, isolated, background-capable workstations that can also enter a plugin ecosystem.

The choice is not which brand wins. It is whether your workflow needs a controlled collaboration tool or a full agent runtime.

9Router: Connect Claude Code, Codex, and Cursor to One AI Router

Fri, 08 May 2026 13:41:15 +0800

9Router is a local router for AI coding tools. It lets Claude Code, Codex, Cursor, Cline, Copilot, OpenCode, OpenClaw, and similar tools connect to one OpenAI-compatible endpoint, then routes requests to different models and providers.

It is not trying to be another chat client. It sits between your AI coding tools and model providers, solving a few practical problems: incompatible API formats, manual provider switching, fast token burn from tool output, interrupted work when quotas run out, and messy multi-account configuration.

According to the project README, 9Router supports 40+ providers and 100+ models. It includes RTK Token Saver, automatic fallback, quota tracking, multi-account rotation, format translation, and request logging. The project is written in JavaScript and uses Node.js, Next.js, React, Tailwind CSS, and LowDB. It is licensed under MIT.

What It Is Good For

9Router is most useful when you use multiple AI coding tools and multiple model sources at the same time.

Examples:

Claude Code uses a subscription account.
Codex or Cursor needs a custom OpenAI endpoint.
Cline, Continue, or RooCode needs an OpenAI-compatible API.
Free providers are used for experiments.
GLM, MiniMax, or Kimi is used as a cheaper backup.
High-quality models are reserved for difficult tasks.

Without 9Router, these settings are scattered across many tools. Each tool needs its own endpoint, API key, model name, and fallback plan. 9Router centralizes that into one local routing layer.

Default local API:

`1`	`http://localhost:20128/v1`

Dashboard:

`1`	`http://localhost:20128/dashboard`

Quick Install

For local use, npm is the simplest path:

1
2

npm install -g 9router
9router

The dashboard opens locally, and the README uses 20128 as the default port.

Run from source:

git clone https://github.com/decolua/9router.git
cd 9router
cp .env.example .env
npm install
PORT=20128 NEXT_PUBLIC_BASE_URL=http://localhost:20128 npm run dev

Production mode:

1
2

npm run build
PORT=20128 HOSTNAME=0.0.0.0 NEXT_PUBLIC_BASE_URL=http://localhost:20128 npm run start

The npm package requires Node.js >=18.0.0. For VPS or Docker deployment, configure JWT_SECRET, INITIAL_PASSWORD, DATA_DIR, and API_KEY_SECRET instead of exposing defaults.

Connect Coding Tools

9Router exposes an OpenAI-compatible API, so most tools that support custom OpenAI endpoints can connect to it.

Typical configuration:

1
2
3

Base URL: http://localhost:20128/v1
API Key: copied from the 9Router dashboard
Model: a model name or combo name configured in 9Router

For Codex CLI:

export OPENAI_BASE_URL="http://localhost:20128"
export OPENAI_API_KEY="your-9router-api-key"

codex "your prompt"

For Cline, Continue, or RooCode, choose OpenAI Compatible and set:

1
2
3

Base URL: http://localhost:20128/v1
API Key: your-9router-api-key
Model: cc/claude-opus-4-7

Model names depend on connected providers. The README shows prefixes such as cc/, cx/, gh/, glm/, minimax/, kr/, and vertex/.

RTK Token Saver

AI coding tools often burn tokens fastest on tool outputs:

git diff
git status
grep
find
ls
tree
logs
long file lists

9Router includes RTK Token Saver, which compresses these outputs before they are sent to the model. The project says this can save 20%-40% input tokens in many requests.

The value is that you do not need to change tools or models. The routing layer removes waste before the request reaches the provider. Still, for critical logs or complete file content, test the behavior first and make sure answer quality does not drop.

Automatic Fallback

9Router can arrange models in priority order:

1
2
3

1. Subscription model
2. Cheap API
3. Free provider

When the first tier is rate-limited, out of quota, or failing, it can switch to the next one. This reduces manual switching and keeps coding sessions from stopping suddenly.

Example:

1
2
3

1. cc/claude-opus-4-7
2. glm/glm-5.1
3. kr/claude-sonnet-4.5

Fallback changes output consistency. Different models have different style and reasoning quality. For large refactors, protocols, migrations, or other consistency-sensitive work, prefer a fixed model and switch manually only when needed.

Be Careful with Free Providers

The README highlights Kiro, OpenCode Free, Vertex, and also notes that some old free tiers have changed or are no longer recommended.

Always confirm provider policy at the time of use:

Is it really free?
Is it region-limited?
Is third-party tool access allowed?
Can it trigger bans or rate limits?
Does the free quota expire?

9Router manages routing, not upstream terms. Be especially careful when using personal subscriptions, OAuth tokens, or free quotas with automated tools.

Local Deployment Advice

For personal use, bind to localhost. Local tools can reach it, but the internet cannot.

For VPS or LAN deployment:

Change the default login password.
Set a strong JWT_SECRET.
Set API_KEY_SECRET.
Put authentication in front of the dashboard.
Do not expose the dashboard directly to the public internet.
Require Bearer API keys for /v1/*.
Back up DATA_DIR.

Docker example:

docker run -d \
  --name 9router \
  -p 20128:20128 \
  --env-file ./.env \
  -v 9router-data:/app/data \
  -v 9router-usage:/root/.9router \
  9router

Start locally first, verify providers, combos, logs, and model names, then decide whether server deployment is worth it.

Who Should Use It

9Router is a good fit if you use multiple AI coding tools, multiple providers, subscription plus free or cheap tiers, and want a central fallback policy. It is less useful if you only use one model and one tool.

Its real value is turning scattered model access into a configurable local routing layer.

Summary

9Router is a local gateway for AI coding tools. It lets Claude Code, Codex, Cursor, Cline, and similar tools talk to http://localhost:20128/v1, while it handles model selection, format translation, token compression, quota tracking, and fallback.

It is best for heavy AI coding users who already switch between providers. Start with one tool and one provider, then add accounts and combos gradually.

References

24 Claude Code Tips: Plan Mode, Rewind, CLAUDE.md, Skills, Agents, and Plugins

Fri, 08 May 2026 08:54:14 +0800

Claude Code is not just a chat box. It is closer to a coding Agent that can enter a project directory, read and write files, run commands, and maintain context.

If you only throw a requirement at it and wait for code, problems appear quickly: unclear plans, repeated permission prompts, growing context, unsatisfactory output, no clear rollback path, and no persistent place for project rules.

Here is a set of common operations for developers getting started with Claude Code.

Start Inside the Project Directory

Claude Code works best when launched inside the project directory, not from a random terminal location.

Create a folder as the project directory, enter it, open a command line, and start Claude Code:

claude

When first entering a project, if Claude Code asks whether to trust the current folder, confirm before continuing. This lets it read files, create files, and run later operations around the current project.

A simple practice task is to ask it to create a photographer portfolio website. The task is visual enough to inspect, and it also lets you practice file generation, command execution, rewind, and later refactoring.

Use Plan Mode First

For more complex tasks, Claude Code may enter plan mode. Plan mode is meant to discuss requirements and break down steps before you approve execution.

After it writes a plan, you usually see options like:

Approve the plan and automatically allow future edit tools.
Approve the plan, but require manual approval for later edits.
Pause and continue discussing the plan with Claude Code.

If the task is clear, approve and continue. If it is not clear yet, ask it to refine the plan, such as page style, tech stack, directory structure, interactions, and acceptance criteria.

Plan mode reduces rework. If an Agent starts directly, it may quickly generate many files; if the direction is wrong, later changes can get messy.

Switch Modes With Shift + Tab

In Claude Code, Shift + Tab can switch between working modes. A common use is entering plan mode or switching into an auto-approve-edit mode.

Suggested habits:

New projects, new features, major changes: start in plan mode.
Small edits and clear fixes: execute directly.
Deletion, bulk replacement, dependency installation: keep manual approval.

In plan mode, Claude Code may ask project-detail questions. Use arrow keys to choose and Enter to confirm. After submitting feedback, it updates the plan.

Do Not Open All Permissions Blindly

When Claude Code runs commands, edits files, or starts programs, it may request permission.

Common choices include:

Allow only this time.
Allow this command type for the current session.
Reject or pause.

For local preview, dev server startup, or file inspection, approve as needed. But do not permanently use a mode that auto-approves all permissions just to save clicks.

Full automation is only suitable when the task is low-risk, clearly understood, and the project already has Git backups. For daily use, keep human approval for deletion, overwriting folders, dependency installation, networking, commits, and scripts.

Run Local Commands in Terminal Mode

Claude Code can enter a terminal-command mode to run local commands.

For example, after generating a page, you can open an HTML file with:

`1`	`start index.html`

start is a Windows command for opening a file, followed by the filename. This is faster than finding the file manually.

Terminal mode is useful for:

Opening generated pages.
Listing directory contents.
Starting local development servers.
Running tests or builds.

Still, be careful with high-risk commands such as recursive deletion, moving directories, bulk overwrites, and system environment changes.

Rewind When the Result Goes Wrong

If the page or code produced by Claude Code is not what you want, and each correction makes it worse, rewind early.

Rewind can return code or conversation to a previous point. Common options include:

Rewind both code and conversation.
Rewind only conversation.
Rewind only code.
Compress earlier content into a summary.
Cancel.

When the direction is clearly wrong, it is usually better to rewind both code and conversation. That returns context and files to a cleaner state together.

Note that Claude Code rewind usually only covers files it created or changed through built-in tools. Files created through external commands may not be fully rewindable. Important projects should still use Git.

Write Long Prompts in an Editor

Do not squeeze complex requirements into one input line.

If the system supports editing a long prompt in a text editor, open the editor, write the requirement clearly, save it, and then send it to Claude Code.

Long prompts should include:

The goal.
The tech stack.
What not to do.
Which files must be kept.
How to verify completion.
Page or feature acceptance criteria.

For example, if you want Claude Code to refactor a plain HTML page into a more modern stack, do not just say “refactor it.” Explain component structure, visual preservation, responsive layout, and ask it to run a build check.

Restore Sessions After Exit

If you need to quit Claude Code midway, exit normally. Later, return to the same project directory and start again:

claude

If previous records do not appear directly, use history-related commands to view and load recent sessions.

This is useful for continuing interrupted work. But do not treat session history as the only memory. Project rules, tech stack, common commands, and notes should live in project files.

Use CLAUDE.md for Project Rules

CLAUDE.md is an important memory file for Claude Code. It usually sits at the project root and tells Claude Code project rules, tech stack, directory structure, and collaboration constraints.

You can ask Claude Code to initialize it:

/init

CLAUDE.md is good for:

Project goals.
Tech stack.
Common start, test, and build commands.
Directory notes.
Code style.
Forbidden actions.
Commit and deployment rules.

During each conversation, Claude Code can use these rules as part of the context. Think of it as a project manual.

A simple test is to add a clear rule into CLAUDE.md, then ask Claude Code something. If its answer follows the rule, it has read the project memory.

Reference Files With @

Typing @ in the input box lets you select files or Agents and add them to the current context.

This is useful when you want Claude Code to:

Read a config file.
Modify a specific page.
Continue based on CLAUDE.md or another document.
Only inspect a specific file instead of guessing the whole project.

Compared with copying file contents into the input box, @ references are clearer and less error-prone.

View and Compress Context

After a long conversation, context grows. When it gets too long, the model may slow down or start ignoring earlier details.

Use:

`1`	`/context`

If context is long, compress history:

`1`	`/compact`

If the result is still poor, consider clearing the current context:

/clear

After clearing, Claude Code can still understand part of the project through files, CLAUDE.md, and the current directory, but it will not keep the full conversation history.

A practical habit: start a new chat after a task is done, write project rules into CLAUDE.md, and do not let temporary discussion grow forever in one chat.

Skills: Turn Repeated Work Into Instructions

Skills are reusable task instructions for Claude Code. They are not one-off prompts, but packaged workflows.

For example, if you often generate weekly reports, create a weekly-report Skill that defines:

Required input.
Output format.
Tone and structure.
What must be preserved.
What must not be invented.

Skills usually contain name, description, and detailed instructions. Once installed in the global Skills directory, Claude Code can recognize and load them for related tasks.

Good Skill candidates include:

Weekly reports.
Code review templates.
Document cleanup.
Image batch processing.
Fixed-format articles.
Project initialization flows.

If you repeatedly copy the same prompt, consider turning it into a Skill.

Agents: Delegate Subtasks to Independent Helpers

Agents are different from Skills.

A Skill is more like an instruction manual. An Agent is more like an independent helper that can work outside the main conversation and return results.

The value of Agents is context isolation. For code inspection, you can create a read-only Agent that only reads the project and outputs a report, without modifying files. This avoids polluting the main conversation and lowers risk.

When creating an Agent, consider:

Project-level or user-level Agent.
Whether Claude Code should generate the config.
Which tools are allowed.
Which model to use.
Whether memory should be saved.
Whether the Agent prompt is clear enough.

For code-audit Agents, give read-only permissions first. Let it output a report, then decide in the main conversation whether to change code.

Plugins: Package Skills, Agents, MCP, and Hooks

Plugins are more complete capability packages. They may include:

Skills
Agents
MCP
Hooks

Compared with installing one Skill, a plugin is better for a full capability set. For example, a frontend design plugin may package visual rules, layout habits, component preferences, and related Agents together.

When installing a plugin, you may choose:

Install to the user directory, effective for all projects.
Install to the project directory, shareable with the project.
Install to a local project directory, effective only on your computer.

Use the user directory for personal common capabilities, the project directory for team conventions, and local project install for temporary testing.

Plugins Can Improve Specific Tasks

For frontend page generation, plugins can be more stable than raw prompts.

For example, for “make a photographer portfolio website,” a plain prompt may generate an acceptable page. If you explicitly use a frontend design plugin, the structure, visual hierarchy, spacing, colors, and overall finish are often better.

This does not mean plugins replace human taste. A better workflow is to let the plugin generate a stronger first draft, then refine details manually.

A More Stable Claude Code Workflow

Putting these tips together gives a steadier workflow:

Start claude inside the project directory.
Discuss requirements in plan mode first.
Confirm tech stack and acceptance criteria before approving the plan.
Keep manual approval for high-risk actions.
Use terminal mode for local preview and tests.
Rewind early when the result goes off track.
Write project rules into CLAUDE.md.
Check and compress context during long chats.
Turn repeated workflows into Skills.
Delegate inspection, research, and analysis to read-only Agents.
Use plugins for domain-specific tasks.
Always keep Git checkpoints for important projects.

This is much more stable than simply sending one requirement and waiting for generation.

Summary

Claude Code efficiency does not come only from model capability. It also comes from workflow control.

Plan mode sets direction, permission approval controls risk, rewind reduces rework, CLAUDE.md stores project rules, /context, /compact, and /clear manage context, Skills reuse fixed workflows, Agents isolate complex subtasks, and plugins package complete capabilities.

The best way to use Claude Code is to let it move tasks forward inside clear boundaries, not to hand the entire project to it at once.

opencode, Claude Code, and Codex: What's the Difference? A Guide to Open Source AI Coding Tools

Fri, 08 May 2026 08:33:37 +0800

opencode is an open source AI Coding Agent from anomalyco. Its positioning is straightforward: give developers a programmable, extensible coding assistant in the terminal that can connect to multiple model providers.

If you compare it with Claude Code and Codex, all three solve the same broad problem: bringing AI into real codebases so it can understand context, edit files, run commands, and execute tests. But their product directions are different.

opencode emphasizes open source, multi-model support, and a terminal TUI. Claude Code emphasizes Anthropic’s model ecosystem and local engineering collaboration. Codex is OpenAI’s AI coding agent, available through the terminal, IDEs, the Codex app, and cloud tasks.

Who opencode Is For

opencode is a better fit for these kinds of developers:

People who want to complete code changes, project analysis, and engineering tasks in the terminal.
People who do not want their AI Coding Agent tied to a single model provider.
People who prefer open source tools and want to audit, extend, or build on top of them.
People already comfortable with Neovim, TUIs, and command-line workflows.
People who want to eventually drive the same coding agent remotely through a desktop app, mobile app, or other clients.

Its point is not to create another chat window, but to put AI coding capability inside the terminal and project directories developers already use.

Installation

The official README provides several installation methods.

# Direct install
curl -fsSL https://opencode.ai/install | bash

# npm
npm i -g opencode-ai@latest

# Windows
scoop install opencode
choco install opencode

# macOS and Linux
brew install anomalyco/tap/opencode
brew install opencode

# Arch Linux
sudo pacman -S opencode
paru -S opencode-bin

# Other methods
mise use -g opencode
nix run nixpkgs#opencode

The official README also recommends removing versions older than 0.1.x before installing to avoid problems caused by older remnants.

The installation script chooses the installation directory by priority:

$OPENCODE_INSTALL_DIR
$XDG_BIN_DIR
$HOME/bin
$HOME/.opencode/bin

If you need to specify a path, use:

1
2

OPENCODE_INSTALL_DIR=/usr/local/bin curl -fsSL https://opencode.ai/install | bash
XDG_BIN_DIR=$HOME/.local/bin curl -fsSL https://opencode.ai/install | bash

The Desktop App Is Still Beta

In addition to the command-line tool, opencode also provides a desktop app, currently marked as Beta. It can be downloaded from GitHub Releases or opencode.ai/download.

The desktop app covers these platforms:

Platform	File
macOS Apple Silicon	`opencode-desktop-mac-arm64.dmg`
macOS Intel	`opencode-desktop-mac-x64.dmg`
Windows	`opencode-desktop-windows-x64.exe`
Linux	`.deb`, `.rpm`, or `.AppImage`

macOS and Windows users can also install the desktop app through package managers.

# macOS
brew install --cask opencode-desktop

# Windows
scoop bucket add extras
scoop install extras/opencode-desktop

Two Built-In Agent Modes

opencode includes two built-in Agents, switchable with the Tab key.

build is the default mode. It has full development permissions and is suitable for editing code directly, running commands, and moving engineering tasks forward.

plan is read-only mode. It is better for analyzing unfamiliar codebases, understanding project structure, and planning changes. It denies file edits by default and asks before running bash commands.

opencode also includes a general subagent for complex searches and multi-step tasks. Users can invoke it by typing @general in a message.

This design is practical: use plan to understand the project before acting, then switch to build when code needs to change. For large repositories, separating read and write permissions helps reduce mistakes.

What Is Codex?

Codex is OpenAI’s AI coding agent for helping developers write code, review code, fix bugs, and ship engineering tasks.

Unlike a simple code completion tool, Codex is closer to an Agent that can operate on a codebase. It can pair with you in local tools, and it can also take delegated tasks in the cloud. OpenAI’s official materials describe Codex as available through multiple surfaces, including CLI, IDEs, the Codex app, and ChatGPT/Codex cloud workflows.

For developers, Codex has several important traits:

It can read codebases, edit files, run commands, and execute tests.
It supports multiple interfaces, including terminal, IDE, app, and cloud.
It fits bug fixing, feature work, refactoring, migrations, code review, and test generation.
It is more closely tied to OpenAI accounts, models, and the Codex product ecosystem.
Cloud tasks are useful for running multiple well-scoped engineering tasks in parallel.

If opencode is more like an open terminal agent framework, Codex is more like a full AI coding workbench from OpenAI: local pairing, cloud delegation, and longer engineering workflows for teams.

Core Differences

opencode, Claude Code, and Codex are all AI coding tools, but the choice becomes clearer if you look at these dimensions.

Tool	Core Positioning	Main Advantages	Best Fit
`opencode`	Open source AI Coding Agent	Open source, multi-model, TUI, client/server architecture	Developers who want an open toolchain, replaceable models, and a terminal-first workflow
`Claude Code`	Anthropic’s command-line coding tool	Claude model experience, code understanding, long context, engineering task collaboration	Developers already using the Claude/Anthropic ecosystem who want to work on local code tasks
`Codex`	OpenAI’s AI coding agent	CLI, IDE, Codex app, cloud tasks, multi-Agent workflows	Teams already using ChatGPT/OpenAI who want both local pairing and cloud delegation

In short, opencode is about openness and replaceability, Claude Code is about the Claude ecosystem and local engineering agents, and Codex is about the OpenAI ecosystem and multi-surface collaboration.

How It Differs From Claude Code

opencode’s official FAQ directly compares it with Claude Code. The two are similar in capability, but the main differences are these.

First, opencode is a 100% open source project, hosted on GitHub and released under the MIT license.

Second, opencode is not tied to a single model provider. It recommends models provided through OpenCode Zen, but it can also work with Claude, OpenAI, Google, or local models. For developers, this means that when model cost, capability, or availability changes, you are not locked into one platform.

Third, opencode includes optional LSP support. For code completion, navigation, diagnostics, and project understanding, LSP is a very important foundation.

Fourth, opencode emphasizes TUI. It is built by Neovim users and the creators of terminal.shop, so the product focus is clearly on the terminal experience.

Fifth, opencode uses a client/server architecture. That means opencode can run on your computer while being controlled in the future by a TUI, desktop app, mobile app, or other clients. The TUI is only one possible frontend.

When to Choose opencode, Claude Code, or Codex

If you already use Claude Code or Codex, opencode does not have to replace them immediately. A better way to think about it is that opencode provides an open, model-replaceable, terminal-first option.

Consider opencode first when:

You want your AI coding tool to be as open source as possible.
You do not want your workflow tied to one model provider.
You want to test Claude, OpenAI, Google, or local models with the same tool.
You like TUI workflows and do not want a desktop or web app to interrupt your main workflow.
You care about the remote-control potential of a client/server architecture.

Consider Claude Code first when:

You mainly use Claude models.
You care about long context, code understanding, and complex engineering task collaboration.
You want to keep moving edits, tests, and refactors forward in a local repository.
You trust Anthropic’s default Claude Code product experience.

Consider Codex first when:

You already use ChatGPT or the OpenAI account ecosystem.
You want one coding agent across terminal, IDE, desktop app, and cloud tasks.
You want to delegate well-scoped bug fixes, feature work, migrations, or test generation to the cloud in parallel.
You need code review, background tasks, team collaboration, and multi-Agent workflows.

If you care more about an official end-to-end experience, default model configuration, enterprise management, and ready-made integrations, Claude Code or Codex may be easier. If you care more about control, openness, and being provider-agnostic, opencode is worth watching.

Things to Note

opencode, Claude Code, and Codex are all moving quickly. GitHub releases, installation commands, desktop app file names, model availability, and plan access can all change. Before installing or choosing a tool, check the official README, documentation, and release pages.

Also, opencode’s desktop app is still marked as Beta, so it should not be treated as the default stable production tool. For everyday engineering tasks, the terminal version is still the main entry point.

From a tooling trend perspective, opencode represents the open-toolchain direction for AI Coding Agents: replaceable models, replaceable clients, and an open core agent capability. Codex and Claude Code are closer to model companies turning coding agents into complete product surfaces. For developers, both directions will likely coexist for a long time.

References

opencode GitHub: https://github.com/anomalyco/opencode
opencode official site: https://opencode.ai
opencode docs: https://opencode.ai/docs
opencode Releases: https://github.com/anomalyco/opencode/releases
OpenAI Codex: https://openai.com/codex/
Using Codex with your ChatGPT plan: https://help.openai.com/en/articles/11369540-codex-in-chatgpt
OpenAI Codex CLI Getting Started: https://help.openai.com/en/articles/11096431-openai-codex-ci-getting-started

Claude Opus 4.7, Sonnet 4.6, and Haiku 4.5: Differences and Model Selection Guide

Fri, 08 May 2026 08:19:03 +0800

Anthropic’s core large language models mainly evolve through the Claude series. As of May 2026, Claude’s mainstream product line has entered the 4.x stage, while still following a three-tier structure: Opus is for maximum capability, Sonnet balances performance and cost, and Haiku focuses on speed and cost effectiveness.

If you only want a quick rule of thumb, remember this:

For the most complex and demanding reasoning and agentic coding: start with Claude Opus 4.7.
For most development, writing, analysis, and enterprise API scenarios: Claude Sonnet 4.6 is the safest starting point.
For high-concurrency, low-latency, cost-sensitive tasks: consider Claude Haiku 4.5.

Current Mainstream Models

According to Anthropic’s official model documentation, the current Claude mainstream models can be understood this way.

Model	Positioning	Suitable Scenarios
`Claude Opus 4.7`	The strongest generally available model, built for complex reasoning and agentic coding	Large codebase refactoring, multi-step tasks, complex strategy analysis, work that requires stronger consistency
`Claude Sonnet 4.6`	The balance point between speed, capability, and cost, with a 1 million token context window	Code generation, long-document analysis, enterprise knowledge work, Agent development, everyday high-quality production tasks
`Claude Haiku 4.5`	The fastest and lower-cost small-model tier, while still retaining capabilities close to frontier models	Real-time chat, customer support, batch classification, simple code collaboration, high-concurrency API calls

There are two naming details worth noting.

First, the official name is Claude Haiku 4.5, not Claude 4.5 Haiku. Second, Claude Mythos Preview is not a mainstream available model for regular users or developers. It is a controlled research preview related to Project Glasswing, mainly aimed at defensive cybersecurity workflows, and should not be mixed into regular Claude model selection.

Opus: For the Hardest Problems

Opus is the tier Anthropic uses for its strongest models. The point of Claude Opus 4.7 is not being cheap or the fastest option, but being better suited to complex, multi-step tasks that require repeated verification.

It is better suited to these situations:

Large code changes across many files.
Complex system refactoring and architectural reasoning.
Long-chain Agent tasks.
Work requiring stronger visual understanding, document understanding, and multi-turn planning.
Enterprise analysis tasks where mistakes are costly.

If the cost of a single failed task is high, or you want the model to spend more time understanding context before acting, Opus is usually more worth trying.

Sonnet: The Default Starting Point for Most People

Claude Sonnet 4.6 is better suited as the default entry point. Its positioning is not “a lower-end Opus,” but rather a way to put sufficiently strong reasoning, coding, visual understanding, long context, and agent planning into a more controllable cost and speed profile.

For developers, the value of Sonnet 4.6 mainly comes from three points:

It can handle very long context, making it suitable for codebases, contracts, reports, or multiple documents.
It is easier to use as a regular model in Claude Code, API, and enterprise scenarios.
It costs less than Opus, making it more suitable for high-frequency use.

If you do not know which Claude model to start with, Claude Sonnet 4.6 is usually the right beginning. Switch to Opus only when the task clearly needs stronger capability.

Haiku: When Fast and Affordable Matter More

Claude Haiku 4.5 is the small-model tier, but it should not simply be understood as a “weak model.” Anthropic positions it as fast and low cost while retaining capabilities close to frontier models.

It fits these scenarios:

Real-time chat and customer support bots.
Large-scale short-text classification.
Low-latency API calls.
Simple code edits and rapid prototypes.
Subtask execution in multi-Agent workflows.

If the task itself is clear, the context is not complex, and throughput matters, Haiku is often more reasonable than blindly using a larger model.

Claude’s Tool Capabilities

The Claude series is not just a set of chat models. Anthropic now places model capabilities inside multiple products and developer tools.

Claude Code is a command-line coding tool for developers. It can read codebases, edit files, run commands, and execute tests, making it suitable for sustained engineering work. Its experience depends heavily on the model’s code understanding, context management, and tool-calling stability.

Computer Use lets the model operate a desktop environment through screenshots, mouse actions, and keyboard input. It still needs to be used carefully, and the official documentation emphasizes running it in an isolated environment to avoid mistakes or security risks.

Artifacts is more of a Claude app-side experience. It can place code, page prototypes, charts, or document outputs into the interface for preview and iteration. It is not a standalone model, but part of the Claude product experience.

As for terms like “Managed Agents” or “self-evolving Agents,” be careful when writing about them. Anthropic is indeed strengthening Agent SDK, Claude Code, long context, tool use, and enterprise workflows, but it should not be described as already having uncontrolled self-evolution capability.

Access Options

Regular users can use Claude through the Claude.ai web app or mobile apps. Different plans affect available models, usage limits, and features.

Developers usually have several access options:

Anthropic Console and Claude API.
Amazon Bedrock.
Google Cloud Vertex AI.
Microsoft Foundry.

Specific available models, context windows, pricing, and regional support can change. Before development, it is best to rely on Anthropic’s official model documentation and the relevant cloud platform pages.

How to Choose

In actual use, you do not need to chase the strongest model from the beginning. A better approach is to tier model choice by task cost.

For everyday writing, code generation, long-document analysis, knowledge organization, and most Agent prototypes, start with Claude Sonnet 4.6. It is usually the best starting point for cost effectiveness and general capability.

If the task requires stronger complex reasoning, cross-file engineering changes, long-chain planning, or higher reliability, switch to Claude Opus 4.7.

If the task is simple, high-volume, and latency-sensitive, such as classification, summarization, customer support, or batch processing, put Claude Haiku 4.5 on the shortlist.

Claude’s model line is not simply “new versions replacing old versions.” It is a toolbox layered by task difficulty, speed, and cost. Choosing the right model matters more than blindly using the most expensive one.

References

Anthropic Models Overview: https://platform.claude.com/docs/en/about-claude/models/overview
Introducing Claude Opus 4.7: https://www.anthropic.com/news/claude-opus-4-7
Introducing Claude Sonnet 4.6: https://www.anthropic.com/news/claude-sonnet-4-6
Introducing Claude Haiku 4.5: https://www.anthropic.com/news/claude-haiku-4-5
Anthropic Computer Use Tool: https://docs.anthropic.com/en/docs/build-with-claude/computer-use

How ChatGPT, Claude Code, and Gemini memory mechanisms differ

Thu, 07 May 2026 14:47:17 +0800

“Memory” is becoming increasingly important in AI products. It marks the shift from one-off conversations to long-term collaboration: you do not need to reintroduce your background, repeat your preferences, or ask the model to understand the same project again and again.

But memory does not mean the same thing in every product. ChatGPT, Claude Code, and Gemini all try to help AI remember longer, but their goals, storage locations, transparency, and use cases are very different.

As of May 7, 2026, they can be roughly understood as three types:

ChatGPT is more like personal assistant memory.
Claude Code is more like engineering project memory.
Gemini is more like Google ecosystem context.

ChatGPT: long-term preferences around the person

ChatGPT memory is mainly designed for personal collaboration. It cares about who you are, what you prefer, and what you work on over time.

OpenAI currently separates ChatGPT memory into saved memories and chat history.

saved memories are important pieces of information ChatGPT stores, such as your name, preferences, goals, common tech stack, and writing habits. You can explicitly ask it to remember something, and it may also save information from conversation when it thinks it will be useful later.

chat history lets ChatGPT reference past conversations when answering. It does not mean every chat becomes a permanent memory. Instead, ChatGPT can search past conversations for relevant context when needed.

So ChatGPT’s core logic is: understand the same user across sessions.

Typical examples include:

“Keep code examples concise for me.”
“I mainly use Python and TypeScript.”
“I am writing a Hugo blog about AI tools.”
“I prefer conclusions first, then details.”

These memories are not bound to one project. They follow the account and the user’s working habits.

Memory Sources: making personalization more visible

OpenAI emphasized Memory sources in its May 2026 update.

The purpose is not to add another type of memory, but to show users what sources ChatGPT referenced when personalizing a response. According to OpenAI help documents, Memory Sources may show:

Past chats.
Saved memories.
Custom instructions.
Files in the file library.
Emails from connected Gmail.

Files and Gmail visibility depend on plan, region, and connection status. OpenAI also states that Memory sources may not show every factor that influenced a response, but they help users understand and manage personalization.

This matters. The more AI can “remember you,” the more users need to know what it used to answer. Otherwise personalization becomes a black box: it seems to know you, but you do not know why.

ChatGPT’s advantage is cross-session, cross-topic understanding of personal preferences. The risk is that memories can become outdated, or users may forget an old memory is still affecting answers. It is worth periodically cleaning saved memories and old chats.

Claude Code: around codebases and engineering rules

Claude Code memory is more engineering-oriented. It cares less about a user’s everyday preferences and more about how this codebase should be changed.

Claude Code has two memory mechanisms that are easy to confuse:

Explicit project memory: CLAUDE.md.
Automatic project memory: Auto Memory.

CLAUDE.md is the most basic and stable project memory file. It can live at the project root or inside subdirectories. Claude Code reads these files as project instructions and operating rules.

Good content for CLAUDE.md includes:

Common build, test, and lint commands.
Code style and naming rules.
Project architecture notes.
Module boundaries and risky areas.
Team conventions and commit workflow.

If CLAUDE.md is stored in the repository, it can be committed to Git and shared as a team agent guide. This is completely different from ChatGPT’s cloud-based personal memory.

Claude Code Auto Memory: accumulating project experience

Claude Code also has Auto Memory. Its goal is to let Claude automatically accumulate project knowledge across sessions without requiring users to write every note manually.

According to Claude Code documentation, Auto Memory lets Claude save notes while working, such as build commands, debugging discoveries, architecture notes, code style preferences, and workflow habits. It does not save every session, but judges what may be useful later.

One common misconception is that Auto Memory writes by default to .claude/memory.md in the project root. Official documentation says each project has its own memory directory under the user’s home directory, with a path like:

`1`	`~/.claude/projects/<project>/memory/`

MEMORY.md loads the first 200 lines or 25KB at the start of each conversation, while detailed content may be split into other topic files. Auto Memory files are local Markdown files, and users can view, edit, or delete them through /memory.

This makes Claude Code memory more like a local project knowledge base. It is closer to the codebase than ChatGPT’s personal memory, and more dynamic than a plain CLAUDE.md.

But Auto Memory is local to the machine. It does not naturally follow the repository to other machines or cloud environments. For team-shared stable rules, put them in the repository’s CLAUDE.md.

Gemini: around Google ecosystem context

Gemini’s memory logic is different again.

Gemini also supports saved information and past-chat references. Google help documents say users can save information about life, work, or preferences, and Gemini can reference past chats before answering. When it uses this information, the response may show sources such as Your saved info or Previous chats.

But Gemini’s differentiation is not only “saving a few preferences.” It is Google ecosystem integration.

With user authorization and feature availability, Gemini can access context from connected Google apps such as Gmail, Google Drive, Docs, and Sheets. Its advantage is not making users teach it every item manually, but turning existing Google account data into searchable work context.

A typical difference:

ChatGPT remembers: “I have been repairing an LTO tape drive recently.”
Gemini may find the purchase confirmation email in Gmail or read repair notes from Drive.

This does not mean Gemini can unconditionally read all Google data. It depends on account type, region, permissions, connected apps, Keep Activity settings, and product availability. Enterprise and school accounts may also be controlled by Google Workspace administrators.

More accurately, Gemini memory is a combination of saved info, past chats, and connected Google ecosystem data.

Core differences

Dimension	ChatGPT	Claude Code	Gemini
Core object	Person and preferences	Project and codebase	Google account and ecosystem data
Typical memory	Preferences, background, long-term goals	Architecture, commands, conventions, debugging experience	Saved info, past chats, Gmail/Drive/Docs context
Storage form	Memory and chat context in OpenAI account	`CLAUDE.md`, `MEMORY.md`, local Markdown files	Google account activity, saved info, connected app data
Transparency	Memory sources show part of the source	Markdown files can be opened and edited	Managed through source prompts, Gemini Apps Activity, and Google settings
Cross-project ability	Strong, follows user account	Weak, mainly follows project or local project memory	Strong, depends on Google data and permissions
Team sharing	Not suitable for direct sharing	`CLAUDE.md` can be shared through Git	Mainly depends on Workspace and permissions
Best for	Personal preferences and long-term assistant behavior	Long-term coding projects and agent collaboration	Google Workspace retrieval and cross-tool work

How to choose

If you want AI to remember who you are, what style you prefer, and how you usually work, ChatGPT memory is more suitable.

It is good for saving personal preferences such as writing style, tech stack, answer format, professional background, and long-term project direction. Its focus is reducing self-introduction cost so each new conversation can start faster.

If you want AI to remember how a codebase should be changed, which commands work, and which traps to avoid, Claude Code is more suitable.

Put stable rules into CLAUDE.md for team sharing. Let Auto Memory assist with dynamic experience. Important decisions should still be organized into documentation or CLAUDE.md, not left only in local automatic memory.

If most of your materials live in Gmail, Drive, Docs, and Sheets, Gemini’s ecosystem context has an advantage.

It is useful for finding old emails, organizing Drive documents, and connecting calendar and office materials. The key to using Gemini is not repeatedly reminding it in chat, but making sure the relevant app connections, permissions, and activity settings are correct.

A practical division of labor

You can divide them like this:

ChatGPT remembers general personal preferences.
Claude Code remembers engineering knowledge for a repository.
Gemini retrieves materials from your Google ecosystem.

In other words, ChatGPT is like a personal secretary, Claude Code is like a senior engineer inside the project, and Gemini is like an indexer for your Google account.

There is no absolute winner. They have different goals.

The biggest mistake is mixing them together. Personal preferences do not always belong in project memory; project architecture does not always belong in cloud personal memory; and Google ecosystem retrieval does not mean the model has truly understood you long-term.

Short Take

The next stage of AI memory is not simply “remember more.” Memory needs layers, visibility, and control.

ChatGPT focuses on cross-session personalization. Claude Code focuses on code project continuity. Gemini focuses on Google ecosystem context. Good long-term AI collaboration does not put all information into one black box; it keeps different kinds of memory in the right places.

Put personal preferences in personal memory, engineering rules in the codebase, and historical materials in the original document and email systems. AI’s job is to call the right context when needed, not mix everything into one pile.

Anthropic raises Claude usage limits and expands compute with SpaceX

Thu, 07 May 2026 14:26:14 +0800

Anthropic announced on May 6, 2026 that it is raising some Claude Code and Claude API usage limits, while also disclosing a new compute partnership with SpaceX.

On the surface, this is about “more quota.” The more important signal is that model companies are tying product experience, subscription tiers, API rate limits, and infrastructure supply together. For heavy users, compute is not abstract. It determines whether they can run more Claude Code tasks, wait less, and call Opus models more reliably.

How Claude Code and API limits are changing

Anthropic announced three changes, all effective from the day of the announcement.

First, Claude Code’s five-hour usage limits are being doubled for Pro, Max, Team, and seat-based Enterprise plans.

This matters directly for heavy Claude Code users. In the past, continuous code reading, editing, and task execution could quickly run into the five-hour limit. Doubling the limit allows more sustained development work in the same working window.

Second, Pro and Max accounts will no longer see reduced Claude Code limits during peak hours.

This is more important than the number itself. The most frustrating part of many AI tools is not the normal quota, but sudden slowdowns or unstable limits during busy periods. Removing peak-hour reductions shows Anthropic wants paid users to have a more predictable experience even when demand is high.

Third, Anthropic is considerably raising API rate limits for Claude Opus models. The original article presents the detailed numbers in an image table; the core point is that Opus API capacity is being raised meaningfully.

For developers, Opus is the more expensive, heavier, and more capable model. Higher Opus API limits suggest Anthropic wants more companies and developers to put Opus into real business workflows, not just use Claude in a chat interface.

The weight of the SpaceX compute deal

The higher limits are backed by new compute supply.

Anthropic says it has signed an agreement with SpaceX to use all compute capacity at SpaceX’s Colossus 1 data center. The partnership will provide more than 300 megawatts of new capacity within a month, corresponding to more than 220,000 NVIDIA GPUs.

Those numbers say two things.

First, compute is still a bottleneck for frontier model companies. Model capability, context length, tool use, coding agents, multimodality, and enterprise use cases all consume large amounts of inference resources. The more users and complex tasks a platform supports, the more stable large-scale GPU supply it needs.

Second, AI infrastructure competition has entered a massive scale phase. In the past, attention focused more on model rankings, product features, and pricing. Now, whoever can secure power, facilities, networking, and GPUs faster has a better chance of turning model capability into a stable product.

Anthropic also says the SpaceX capacity will directly improve capacity for Claude Pro and Claude Max subscribers. In other words, this is not just training infrastructure; it also supports user-facing inference.

Anthropic’s compute map

SpaceX is not Anthropic’s only compute partner.

The announcement also points to several previously announced infrastructure arrangements:

An up to 5GW agreement with Amazon, including nearly 1GW of new capacity by the end of 2026.
A 5GW agreement with Google and Broadcom, expected to begin coming online in 2027.
A strategic partnership with Microsoft and NVIDIA that includes $30 billion of Azure capacity.
A $50 billion investment in American AI infrastructure with Fluidstack.

The common thread is that Anthropic is not binding itself to one hardware stack or one cloud platform. The original article explicitly says Claude is trained and run on AWS Trainium, Google TPUs, and NVIDIA GPUs.

This multi-supplier strategy is practical. It is hard for one cloud provider to satisfy frontier training and large-scale inference demand over the long term. A multi-platform approach increases engineering complexity, but reduces supply chain and capacity risk.

Why usage limits are really a compute issue

AI product “limits” are not just membership copy. They map to real costs.

Every time Claude Code reads a repository, generates a patch, or runs a long task, it consumes inference resources. API users who put Opus into support, financial analysis, code review, document processing, or agent workflows create sustained demand. For the platform, loosening limits means having more reliable compute behind the scenes.

So the logic of this announcement is clear: first explain that users get higher limits, then explain why those limits can now be raised. The new SpaceX capacity, along with existing Amazon, Google, Microsoft, NVIDIA, and Fluidstack partnerships, supports heavier usage.

This also explains why AI products increasingly emphasize tiering. Free, Pro, Max, Team, and Enterprise users consume compute differently and pay differently. Model companies have to realign quotas, priority, model access, and infrastructure costs.

The signal from orbital AI compute

The announcement includes one futuristic detail: Anthropic says it has also expressed interest in partnering with SpaceX to develop multiple gigawatts of orbital AI compute capacity.

That does not mean orbital data centers are becoming a product immediately. A safer reading is that frontier AI companies are already thinking beyond ground-based data centers for future compute supply.

AI data centers are constrained by power, land, cooling, networking, and regulation. As training and inference demand grows, the industry will explore more infrastructure forms. Orbital compute may sound distant, but its appearance in an official Anthropic announcement is itself a signal: the imagination around compute competition is expanding.

International expansion and compliance

Anthropic also says enterprise customers, especially in regulated sectors such as finance, healthcare, and government, increasingly need in-region infrastructure for compliance and data residency.

That means model companies cannot build all infrastructure in the United States. Enterprise AI has to handle regional compliance, data residency, supply chain security, power costs, and relationships with local communities. Anthropic says its collaboration with Amazon already includes additional inference in Asia and Europe.

It also says it will be intentional about adding capacity in democratic countries whose legal and regulatory frameworks support large-scale investment and secure supply chains, while exploring ways to extend its US data center electricity-price commitment to other jurisdictions.

This shows that AI infrastructure is not just a technical issue. It is increasingly an energy, manufacturing, and geopolitical economic issue.

Short Take

Anthropic’s announcement can be summarized simply: Claude limits are going up because new large-scale compute is coming online.

For users, the near-term effects are higher Claude Code five-hour limits, fewer peak-hour reductions for Pro and Max, and more Opus API room. For the industry, the bigger point is that model competition is expanding from “whose model is stronger” to “who can continuously secure enough stable and compliant compute.”

Future AI product experience may differ not only because of model parameters and product design, but also because of infrastructure capacity. Whoever can organize power, GPUs, data centers, cloud partnerships, and regional compliance has a better chance of turning frontier models into long-term services.

CC Switch: A desktop tool for managing Claude Code, Codex, Gemini CLI, and OpenClaw in one place

Wed, 06 May 2026 09:03:08 +0800

CC Switch is a desktop management tool for heavy AI coding users. The problem it tries to solve is straightforward: many people now use Claude Code, Codex, Gemini CLI, OpenCode, and OpenClaw at the same time, but each tool has its own configuration format, Provider syntax, MCP setup, and Skills management method.

When you only use one tool, manually editing configuration files is still tolerable. Once several tools are mixed together, plus official accounts, third-party APIs, relay services, local models, and shared team configuration, editing JSON, TOML, and .env files by hand quickly becomes tedious.

CC Switch is positioned as a way to pull these scattered configurations into one cross-platform desktop app.

What problem does it solve

Modern AI coding tools increasingly feel like “development colleagues inside the command line”, but their ecosystems are still not fully unified.

Common pain points include:

Claude Code, Codex, Gemini CLI, OpenCode, and OpenClaw all use different configuration formats.
Switching API Providers requires repeated configuration-file edits.
MCP servers are configured repeatedly across different tools.
Prompt files such as CLAUDE.md, AGENTS.md, and GEMINI.md are hard to maintain consistently.
Skills installation, sync, backup, and removal lack a single central entry point.
Switching between multiple accounts, relays, and model services can easily become confusing.
Once a manually edited configuration file breaks, troubleshooting is costly.

The idea behind CC Switch is to stop forcing users to remember every tool’s configuration details, and instead use one unified interface to manage Providers, MCP, Prompts, Skills, Sessions, and proxies.

Supported tools

The README lists five core supported targets:

Claude Code
Codex
Gemini CLI
OpenCode
OpenClaw

These tools are similar in positioning: all center on AI coding, Agent workflows, and command-line collaboration. But their configuration systems differ, and the value of CC Switch lies in wrapping those differences.

For people who often compare different AI coding tools, this is much easier than manually opening configuration files every time.

Provider management

The first layer of CC Switch is Provider management.

It includes more than 50 Provider presets. The README mentions directions such as AWS Bedrock, NVIDIA NIM, and various community relays. Users can copy an API key, import it with one click, and then switch from the interface.

The practical points include:

Add Providers with one click.
Reorder Providers by dragging.
Quickly switch from the system tray.
Import and export Providers.
Sync some common Providers across multiple apps.

For many people, this feature alone is already attractive. In daily AI coding work, the problem is often not “I do not know how to use the model”, but “which tool, endpoint, and account should this key use today”.

Local proxy and failover

Besides writing configuration files, CC Switch also provides a local proxy mode.

The focus of this capability is:

Hot-switching Providers.
Format conversion.
Automatic failover.
Circuit breakers.
Provider health checks.
Request correction.

In simple terms, it does not only write configuration into target tools. It can also add a local proxy layer in the middle, so different tools access model services through the proxy.

This is useful for users with multiple Providers: if one service is down, switch to another; if one model is expensive, move to a cheaper one; if a request format is incompatible, adapt it through the proxy layer.

MCP, Prompts, and Skills

The second important layer of CC Switch is unified management for MCP, Prompts, and Skills.

MCP

It provides a unified MCP panel for managing MCP servers across multiple apps, with support for bidirectional sync and Deep Link import.

This is practical for users already working with MCP. Once there are many MCP servers, configuration easily becomes scattered across different clients. A unified panel reduces duplicate configuration and makes migration easier.

Prompts

The Prompts section supports Markdown editing and can sync corresponding files across different tools, such as:

CLAUDE.md
AGENTS.md
GEMINI.md

These files are essentially project manuals for Agents. Unified management makes it easier to maintain team rules, project conventions, and global prompts.

Skills

Skills can be installed with one click from GitHub repositories or ZIP files. Custom repository management, symbolic links, and file copying are also supported.

If you use tools such as Claude Code, Codex, and OpenClaw at the same time, Skills can easily turn into scattered files across different directories. CC Switch centralizes them and reduces maintenance cost.

Sessions and workspace

The README also mentions Session Manager and Workspace features.

It can browse, search, and restore session history from multiple apps. For people who use AI coding tools over a long period, session management is genuinely important: many valuable contexts, debugging trails, and solution comparisons are buried in old conversations.

It also provides a Workspace editor for OpenClaw, allowing users to edit agent files such as AGENTS.md and SOUL.md with Markdown preview.

This shows that CC Switch is not just a small “key switching” utility. It is expanding toward an AI Agent workstation.

Cloud sync and data storage

CC Switch supports syncing Provider data through Dropbox, OneDrive, iCloud, NAS, or WebDAV.

Local data storage is also clearly defined:

Database: ~/.cc-switch/cc-switch.db
Local settings: ~/.cc-switch/settings.json
Automatic backups: ~/.cc-switch/backups/
Skills: ~/.cc-switch/skills/
Skill backups: ~/.cc-switch/skill-backups/

It uses SQLite as the main data source and emphasizes atomic writes and automatic backups, with the goal of avoiding configuration-file corruption during switching or writing.

This design matters for heavy users. If the configuration management tool itself writes a bad configuration, every AI coding tool can be affected.

Installation

CC Switch is a cross-platform desktop app built on Tauri 2.

The approximate system requirements are:

Windows: Windows 10 or later
macOS: macOS 12 Monterey or later
Linux: Ubuntu 22.04+, Debian 11+, Fedora 34+, and other mainstream distributions

Windows users can download the .msi installer or a portable compressed package.

macOS users can install it with Homebrew:

1
2

brew tap farion1231/ccswitch
brew install --cask cc-switch

To update:

`1`	`brew upgrade --cask cc-switch`

Linux users can choose .deb, .rpm, or AppImage. Arch Linux users can also install it through paru -S cc-switch-bin.

As of May 6, 2026, the repository page shows the latest release as CC Switch v3.14.1, published on April 23, 2026.

Tech stack

Judging from the repository structure, CC Switch is a typical Tauri desktop app:

Frontend: React 18, TypeScript, Vite, TailwindCSS, TanStack Query, shadcn/ui
Backend: Tauri 2, Rust, SQLite, Tokio
Testing: Vitest, MSW, Testing Library

Core design patterns include:

SQLite as the Single Source of Truth.
JSON for device-level local settings.
Writing into target tools’ live config during switching.
Filling current Provider edits back from live config.
Atomic writes using temporary files plus rename.
Locked database connections to avoid concurrent write issues.

This architecture suggests the project is not a simple script, but a desktop tool designed for long-term use.

Who it is for

CC Switch suits these users:

People who use Claude Code, Codex, Gemini CLI, OpenCode, and OpenClaw together.
People who frequently switch between official accounts, third-party relays, local models, or team Providers.
Users already making heavy use of MCP.
Teams that want to maintain CLAUDE.md, AGENTS.md, and GEMINI.md in one place.
Users who often install, test, and migrate Skills.
People who want to view session history and usage across different tools.

If you only use one AI coding tool, rely on official login, and rarely touch Providers, MCP, or Skills, its value may not be obvious.

But if you have already entered a “many tools, many accounts, many Providers, many projects” state, it can remove a lot of repetitive configuration work.

What to watch out for

Tools like this are convenient, but they also need clear boundaries.

First, it manages configuration for multiple AI CLIs, so users should be sure they trust the tool and its write logic.

Second, API keys, relay endpoints, and MCP servers are all sensitive configuration. Before enabling cloud sync, make sure the sync folder and WebDAV service are secure and trustworthy.

Third, after switching Providers, most tools still need the terminal or CLI to be restarted before changes take effect. The README mentions that Claude Code supports hot-switching Provider data, but other tools usually still require a restart.

Fourth, when switching back to official login, it is better to add the official provider according to the project instructions and then rerun the corresponding tool’s login flow.

Summary

The value of CC Switch is not that it creates yet another AI coding tool. Its value is that it acknowledges a reality: the AI coding ecosystem has entered a stage where multiple tools coexist.

Claude Code, Codex, Gemini CLI, OpenCode, and OpenClaw each have their own configuration systems, while MCP, Skills, Prompts, and Providers are expanding quickly. Continuing to edit configuration files by hand will eventually become a burden.

CC Switch pulls these pieces into one desktop app, making it easier to switch Providers, sync MCP, manage Skills, maintain prompt files, and view sessions. For heavy AI coding users, tools like this may move from “optional utility” to “daily infrastructure”.

References

farion1231/cc-switch

What Happened in Claude Code's HERMES.md Billing Incident

Sat, 02 May 2026 11:19:23 +0800

Claude Code recently had a typical billing incident: a user only started the CLI and had not made an explicit request, yet a large local HERMES.md file was read and generated a significant charge.

This is worth looking at because it exposes a new risk in AI coding tools. Once a tool automatically reads context, local files can become real token cost.

What Happened

The public issue shows that the user had a large HERMES.md file in the working directory. When Claude Code started, the CLI scanned and loaded project context. The problem was that this file was automatically included in context and counted toward API usage.

The user did not explicitly ask the model to process that file, but billing had already happened. The harder part is that this can occur during initialization or context preparation, so users may not immediately realize that cost is being generated.

Anthropic later replied in the issue that it would refund the abnormal charge and provide extra credits. That confirms the problem was acknowledged and handled, but it also reminds users that “automatic context” in an AI CLI is not free.

Why HERMES.md Triggered It

HERMES.md itself is not the point. It could be any large file: logs, exported documents, test data, database dumps, generated reports.

The real issue is the combination of three things:

Claude Code automatically reads project context.
The file being read may be large.
Context tokens enter the billing path.

If a file is large enough, even being pulled in “incidentally” can create noticeable cost. For token-based models, stronger automation needs clearer boundaries.

This Is Not an Ordinary Bug

An ordinary CLI bug may mean a failed command, wrong output, or broken feature. A billing bug is more sensitive because it affects the user’s bill directly.

For AI coding tools, the billing boundary can be blurry:

System prompts consume tokens.
Project rules consume tokens.
Automatically read files consume tokens.
Tool call results consume tokens.
Retries, compression, and summaries can keep consuming tokens.

Users may see only “starting the tool” or “one chat,” while the background may already have sent multiple requests with a large amount of context.

How Users Can Reduce Risk

If you use Claude Code, Codex, Cline, or similar AI coding tools, start with a few habits:

Do not put large files directly in the project root.
Add logs, exported data, build outputs, and temporary files to ignore rules.
Check whether the tool supports .ignore, context exclusion, or file allowlists.
Enable budget alerts or usage limits.
Test in a small directory before running in a large repository.

If a repository must keep large files, explicitly tell the tool not to read them. Project rules can also say: do not proactively read logs, dumps, datasets, archives, or large Markdown files.

What Tool Vendors Should Improve

This cannot rely only on user caution. Tools should provide hard boundaries.

Better designs include:

Initialization should not silently bill for large files.
Reading very large files automatically should require confirmation.
The CLI should show estimated tokens and cost range for the request.
Common large files and generated directories should be ignored by default.
Abnormal token spikes should have protective thresholds.

The more AI coding tools behave like autonomous agents, the more transparent their costs need to be. Otherwise users cannot judge how much a single operation will cost.

Summary

The Claude Code HERMES.md billing incident is essentially a conflict between automatic context and usage-based billing.

For users, the key is to control project context: do not expose large files to AI tools by default, and set budget and usage limits. For tool vendors, automatic file reading needs visible cost prompts and protective mechanisms.

References:

How DeepSeek V4 Price Cuts Rewrite the Cost Model for AI Agents

Fri, 01 May 2026 19:47:47 +0800

DeepSeek V4 did not arrive with an especially loud launch. There was no major event, nor a benchmark story that instantly crushed every competitor. But a few days later, the part that truly affects the industry became visible: repeated price cuts.

The point of this change is not that “the model got a little stronger”, but that “usage cost has been pushed into another tier”. When token prices become low enough that an ordinary Agent task can finish for a few cents or a couple of yuan, the business logic behind many Coding Plans and Token Plans needs to be reconsidered.

Launch Day Was Not Explosive

The first wave of feedback to DeepSeek V4 was not especially heated. Many people expected it to deliver the kind of shock R1 did: across-the-board benchmark leadership, validation of domestic compute, and simultaneous breakthroughs in multimodal and Agent capabilities. After the actual release, however, it looked more like a steady upgrade.

V4 Pro is indeed a strong model, especially in coding, math, long context, and agentic coding. But it is not the kind of product that instantly makes every peer model look outdated. So on launch day, the discussion felt a little awkward: people wanted to praise it, but it was hard to find a sufficiently explosive angle.

The real turning point was not launch day, but the price adjustments that followed.

Successive Price Cuts Are the Key

After DeepSeek V4 was released, prices started to move downward. According to DeepSeek’s official pricing page and the information summarized in the source article, the rough prices at that time were:

DeepSeek V4 Flash: about 1 yuan per 1 million input tokens; about 0.02 yuan per 1 million tokens after a cache hit;
DeepSeek V4 Pro: about 3 yuan per 1 million input tokens; about 0.025 yuan per 1 million tokens after a cache hit;
the cache-hit input price across the model family dropped to one tenth of the launch price;
V4 Pro was once in a 75% discount period, extended until May 31, 2026 at 23:59.

The API prices in US dollars make the difference easier to see:

Model	Cached input	Non-cached input	Output	Context
`deepseek-v4-flash`	$0.0028 / 1M tokens	$0.14 / 1M tokens	$0.28 / 1M tokens	1M
`deepseek-v4-pro` promotional price	$0.003625 / 1M tokens	$0.435 / 1M tokens	$0.87 / 1M tokens	1M
`deepseek-v4-pro` regular price	$0.0145 / 1M tokens	$1.74 / 1M tokens	$3.48 / 1M tokens	1M

Two details matter here.

First, V4 Pro’s $0.435 / $0.87 is a promotional price, not the long-term regular price. In DeepSeek’s official notes, this 75% discount was extended until May 31, 2026 at 15:59 UTC.

Second, cache-hit pricing is the key variable in the Agent cost model. Flash’s cached input price is as low as $0.0028 / 1M tokens, while Pro’s promotional cached input price is $0.003625 / 1M tokens. That means repeated project context, tool definitions, system prompts, and historical summaries no longer need to be charged at the full input price.

The most important thing about this pricing is that it makes the token cost of many tasks “insensitive”. In the past, developers worried that one Agent task would consume a large amount of context, repeatedly read and write code, and call tools frequently. Now, as long as the cache hit rate is high enough, the cost can be pushed very low.

Price Comparison With GPT and Claude

DeepSeek’s own prices alone do not fully convey the gap. The contrast becomes much clearer when placed next to common closed-source models from the same period.

Model	Input	Cached input	Output	Best fit
`deepseek-v4-flash`	$0.14 / M	$0.0028 / M	$0.28 / M	High-frequency Agents, routine coding, batch tasks
`deepseek-v4-pro` promotional price	$0.435 / M	$0.003625 / M	$0.87 / M	Complex coding, planning, fact checking
`deepseek-v4-pro` regular price	$1.74 / M	$0.0145 / M	$3.48 / M	Pro cost baseline after the promotion
GPT-5.5	$5 / M	$0.50 / M	$30 / M	High-quality complex tasks, general reasoning
GPT-5.4	$2.50 / M	$0.25 / M	$15 / M	Mid-range choice for programming and professional tasks
GPT-5.4 mini	$0.75 / M	$0.075 / M	$4.50 / M	Lower-cost general and subtask model
Claude Opus 4.7	$5 / M	$0.50 / M	$25 / M	High-quality writing, complex reasoning, long tasks
Claude Sonnet 4.6	$3 / M	$0.30 / M	$15 / M	Programming, Agents, general work
Claude Haiku 4.5	$1 / M	$0.10 / M	$5 / M	Lightweight tasks, summarization, classification

The most striking number in this table is output price. Agents do not only read context; they also keep generating plans, patches, explanations, logs, and next actions. If there is a lot of output, DeepSeek V4 Pro’s promotional $0.87 / M becomes dramatically cheaper than GPT-5.5’s $30 / M or Claude Sonnet 4.6’s $15 / M.

Even at V4 Pro’s regular output price of $3.48 / M, it is still clearly below GPT-5.4, GPT-5.5, and Claude Sonnet / Opus. If the task can be handled by Flash, the output price drops further to $0.28 / M.

The cached input gap is even more extreme. DeepSeek V4 Flash’s cached input price is $0.0028 / M, while GPT-5.5 and Claude Opus 4.7 are both $0.50 / M. These are not in the same order of magnitude. For Agents that repeatedly read the same code repository, this gap matters more than it does in ordinary chat.

Why Agent Tasks Are Especially Affected

AI Agents are different from ordinary chat. Ordinary chat is usually a question-and-answer flow with relatively limited input context. Agent tasks repeatedly read project files, generate plans, call tools, inspect results, and then modify code again.

These tasks have two traits:

large token consumption;
lots of repeated context.

The second point is crucial. In a code project, the model repeatedly reads the same files, directory structure, error logs, and modification results. If the platform supports cache hits, the cost of repeated input drops sharply.

The source article mentioned a real experience: connecting DeepSeek V4 Pro and Flash to a Claude Code-like tool, asking it to pull a prompt repository and turn it into a local search site. The task was completed, with a total cost of roughly a little over 0.8 yuan, and Pro reached a cache hit rate of 98.7%.

This example illustrates a practical issue: the more an Agent task resembles “repeated work around the same project”, the more valuable cache hits become. If generating a website, fixing a bug, or changing a frontend costs only a few cents to a few yuan, subscription plans become less attractive.

We can estimate the gap with a simplified task. Assume one coding agent task includes:

500,000 input tokens, of which 80% can hit cache;
50,000 output tokens;
no tool calls, search costs, or platform markup included, only model token cost.

The rough costs are:

Model	Estimated cost
DeepSeek V4 Flash	about $0.03
DeepSeek V4 Pro promotional price	about $0.09
DeepSeek V4 Pro regular price	about $0.36
GPT-5.4 mini	about $0.30
GPT-5.4	about $1.01
GPT-5.5	about $1.75
Claude Sonnet 4.6	about $1.11
Claude Opus 4.7	about $1.65

This estimate does not mean DeepSeek is better for every task. Model quality, tool-call stability, long-context retrieval ability, coding style, and factual reliability all need separate evaluation. But from a cost perspective, DeepSeek V4 pushes the marginal cost of “letting the Agent run a few more rounds” very low. That will encourage developers to design longer workflows, more frequent self-checks, and more candidate solutions instead of worrying about the token bill every time.

The Difference Between Coding Plans and Token Plans

Many AI products now offer two types of plans: Coding Plans and Token Plans.

The rough difference is:

Coding Plans are usually mainly for programming;
Token Plans usually cover more capabilities, such as STT, TTS, image generation, search, embedding, and RAG;
STT means speech to text;
TTS means text to speech;
Coding Plans often restrict users to programming scenarios, while other capabilities still require separate purchases.

From a business perspective, a Coding Plan is more like a buffet. Users pay a fixed fee in advance, while the vendor bets that most people will not use up the quota. Some users consume more, others consume less, and the platform can still make money on average.

But if pay-as-you-go token prices are low enough, users start calculating: why do I have to buy a plan? If the real monthly usage cost is only a few yuan or a dozen yuan, a 40-yuan or 200-yuan plan may no longer be worthwhile.

Why Price Cuts Challenge the Subscription Model

Subscription plans rely on one premise: users feel that each individual use is expensive, or they do not want to calculate the cost of every call. When token prices are high, a plan feels reassuring. When token prices are almost negligible, pay-as-you-go becomes more natural.

DeepSeek V4’s price cut effectively reveals the underlying cost:

Agent tasks can be very cheap;
long context is not necessarily too expensive to use;
cache hits can reduce cost significantly;
ordinary developers do not necessarily need a fixed subscription;
the model entry point can shift from a “plan platform” to a “low-cost API”.

This will make platforms built around Coding Plans uncomfortable. If users find pay-as-you-go calls cheaper and freer, they have less reason to be locked into one platform’s subscription.

How to Choose Between Flash and Pro

A practical way to use DeepSeek V4 is to split work between Flash and Pro.

Flash is suitable for high-frequency, lightweight, repeatable tasks:

fixing bugs;
writing frontend code;
writing scripts;
routine code understanding;
processing ordinary information in long context;
running large numbers of subtasks.

Flash is cheap, fast, and also supports very long context. For everyday coding agents, many tasks do not need Pro from the start.

Pro is better for complex judgment and fallback work:

multi-round planning;
complex Agent workflows;
multiple function calls;
fact checking;
financial research;
content production that requires stronger knowledge and judgment;
high-risk code changes.

A reasonable setup is: Flash handles volume, Pro handles fallback. Start ordinary tasks with Flash, then switch to Pro for long-horizon planning, complex judgment, fact checking, or multi-tool collaboration. This keeps cost under control while preserving model quality.

Why DeepSeek Can Price This Way

DeepSeek has a different business structure from many large platforms. It does not have e-commerce, social networking, short video, cloud computing, phones, cars, office suites, operating systems, browsers, or a large enterprise SaaS ecosystem.

That means it does not need to lock users into a complete platform. It can simply sell text model capability: use cheap text models here, and call any other capability elsewhere.

Large platforms usually think differently. If you buy their Coding Plan or Token Plan, you are pulled into their cloud, search, image generation, voice, database, and developer-tool ecosystem. The plan is not merely selling the model; it is competing for the user entry point.

DeepSeek’s approach is more direct: push text model prices down and try to become the default model entry point for Agents. Once the default entry point is occupied, many developers and toolchains will naturally adapt around it.

Open Models and the Default Entry Point

If DeepSeek V4 keeps an open model route, third-party cloud vendors and platforms may deploy it themselves and provide services. For DeepSeek, that is both distribution and potential diversion.

This is where a low-price official API matters. If the official price is already low enough, other platforms will struggle to offer an obvious price advantage even if they can deploy the model. Users will tend to use the default, cheap, stable entry point directly.

This is especially true for Agent tools. Agent tasks depend on long context, caching, tool calls, and stable throughput. Once a model is cheap enough in these scenarios, it has a chance to become the default option.

Coding Plans Are Still Not Useless

This does not mean Coding Plans will disappear immediately. They still fit some users.

If some users are truly heavy users who max out their quota every day, a fixed subscription may still be economical. Just like a buffet, if nobody could ever eat enough to get their money’s worth, users would not buy it.

The problem is that most users are not that kind of extremely high-frequency user. Low-frequency users, lightweight developers, and people who occasionally write scripts or modify projects are better suited to pay-as-you-go. After DeepSeek lowers pay-as-you-go costs, the appeal of plans weakens.

The future is more likely to become a layered choice:

heavy high-frequency users keep buying Coding Plans;
ordinary users move to low-cost APIs;
Agent tools automatically choose Flash / Pro according to the task;
platform plans need to provide more non-model value, such as workflows, IDE integration, deployment, team management, and security auditing.

Summary

DeepSeek V4 did not create its biggest impact through benchmarks. What truly changed industry expectations was the price reduction that followed.

When input tokens and cache-hit pricing are pushed very low, the cost of using AI Agents changes. Long context, code-project analysis, and multi-round tool calls that used to look expensive may now become everyday costs of a few cents to a few yuan.

This directly challenges the business logic of Coding Plans and Token Plans. If users can pay by usage, freely combine models and tools, and keep costs low enough, they may not want to be tied to a specific platform plan.

What DeepSeek V4 truly touches this time is not only the ranking of model capability, but the cost structure of AI Agents and the battle for the default entry point.

References:

mattpocock/skills: A Practical Skill Collection for AI Coding Agents

Fri, 01 May 2026 03:43:20 +0800

mattpocock/skills is a public collection of AI coding agent skills from Matt Pocock.

It is not a full application, nor a new chat client. It is a set of working skills that can be used by AI coding assistants. The idea is practical: break common AI coding problems into small skills that an Agent can call in the right task, instead of relying on one huge prompt every time.

If you often use Claude Code, Codex, Cursor, or similar AI coding tools, this kind of skills collection is worth watching. What really affects the AI coding experience is often not whether the model can write code, but whether it can move through the task in your preferred working style.

What Problem It Solves

AI coding assistants are powerful, but they can easily go wrong.

Common situations include:

Starting code changes before understanding the requirement
Modifying too many files at once
Producing lots of explanation but little useful action
Blindly trying things after errors
Not running tests or checks in time
Ignoring existing project patterns
Introducing unnecessary abstractions to finish a task
Writing code without truly reviewing risks afterward

These problems are not always caused by weak model capability. Often, the workflow is not constrained well enough.

The value of mattpocock/skills is that it turns these common failure modes into reusable operating methods, making the Agent behave more like an experienced engineering collaborator in different scenarios.

What Are Skills

In the AI Agent context, a skill can be understood as a reusable task instruction, working method, or professional workflow.

It does not have to be a code plugin, and it does not always need to call an external service. In many cases, a skill is simply a clear set of rules:

When to use it
What to do first
What not to do
What output is required
How to judge task completion

This is somewhat like a normal prompt template, but the granularity is closer to a task capability.

Normal prompt templates are usually copied and pasted manually by the user. Skills are better as part of an agent toolbox, allowing the Agent to choose the right workflow for the task.

Why Small and Composable Matters

The README emphasizes that these skills are small and composable.

This direction matters.

If one skill tries to handle everything, it quickly becomes a new giant prompt: long, vague, and hard to maintain. The advantage of small skills is clear boundaries.

For example, one skill can focus on:

Planning first
Fixing TypeScript errors
Running tests and fixing based on results
Doing code review
Summarizing project conventions
Improving prompts
Removing unnecessary abstractions

These skills can be combined according to the task. A simple task may need only one skill, while a complex task can chain several together.

This is closer to real engineering work. You do not use the same workflow for every problem; you choose tools according to the situation.

Keeping the Engineer in Control

One important direction of this repository is keeping the engineer in control.

AI coding can easily slide into two extremes.

The first is fully manual. AI only helps write a few lines of code, while all context, planning, and verification still depend on you.

The second is fully hands-off. You throw a task to an Agent, let it change a lot of things, and then face a diff that is hard to review.

Skills help find a more stable middle position.

They let AI take on more repetitive workflow, while still constraining it with rules:

Understand the task before acting
Read relevant files before editing
Keep the modification scope controlled
Report uncertainty
Verify after changes
Do not refactor unrelated code just to show off

This does not weaken AI. It makes AI actions easier for humans to review and take over.

Alignment Problems

The first kind of AI coding failure is often alignment failure.

The user wants a very specific change, but the Agent may understand it as a larger refactor. The user only wants a bug fixed, but it changes styles along the way. The user wants existing architecture to be followed, but it introduces a new pattern.

Skills can help the Agent do several things at the start of a task:

Restate the goal
Identify the impact scope
Recognize existing implementation patterns
Provide a plan
Clarify what will not be done

This step is like an engineer’s self-check before starting work.

If the Agent cannot clearly state the task boundary and starts writing code directly, it is easy for the task to drift.

Feedback Loop Problems

AI should not write code through one-shot generation alone.

In real development, feedback loops matter:

Change a small piece
Run tests or type checks
Read the errors
Fix them
Verify again

Many Agents fail because they skip the middle feedback. They change many things at once and then summarize from intuition that “it should work.”

Skills can make the feedback loop explicit. For example, they can require the Agent to:

Run relevant checks after modification
Read error messages first if checks fail
Avoid blindly changing unrelated files
Re-verify after each round of fixes
Report final verification results

This makes AI coding more like real debugging and less like one-shot writing.

Architecture Control Problems

AI is good at generating abstractions, and also good at over-generating abstractions.

To complete a small requirement, it may create a service layer, helper functions, configuration objects, type wrappers, and adapters, making the code much more complex than the requirement itself.

This is especially dangerous in large projects. AI-generated abstractions often look “professional,” but they may not match existing project style and may increase maintenance cost.

Good skills remind the Agent to:

Prefer existing patterns
Avoid unnecessary new abstractions
Avoid refactoring unrelated areas
Match the change to the size of the task
Understand the code before designing structure

This reduces output that looks engineered but is actually harder to maintain.

Why Review Skills Matter

Writing code and reviewing code are different states.

When an Agent writes code, it usually tends to prove that its implementation works. It may explain why the change should work, but it does not always actively look for risks.

The purpose of a review skill is to switch the Agent’s role:

Find potential bugs
Find behavior regressions
Find missing tests
Find edge cases
Find increased complexity
Find inconsistencies with existing conventions

This matters for AI coding because AI generates code quickly. Without review, users can easily be overwhelmed by large diffs.

A good review output should list issues first, not praise the implementation first. It should help the engineer decide whether the change can be merged.

Difference from Normal Rules Files

Many AI coding tools support rules, instructions, or memory.

These files usually record long-term rules, such as:

Project tech stack
Naming conventions
Test commands
Directories not to modify
Answer style preferences

Skills are more focused on task workflow.

Rules tell the Agent “how to behave in the long term,” while skills tell the Agent “how to execute this kind of task.”

The two work best together.

For example, rules can say the project uses pnpm test, while a review skill requires checking test coverage after changes. Then the Agent knows not only the command, but also when to use it.

Suitable Scenarios

Repositories like mattpocock/skills are suitable for:

Frequent use of AI coding tools
Agents working on real codebases
Reducing out-of-scope AI edits
Making the Agent verify results more actively
Turning your engineering habits into skills
Learning how others design agent workflows
Turning temporary prompts into a maintainable skill collection

If you only occasionally ask AI to write a small function, you may not need to maintain skills.

But if you already treat AI as a long-term development partner, skills become increasingly important. They are like a reusable working method for the Agent.

How to Learn from This Repository

Even if you do not use every skill directly, you can learn several things from this repository.

First, write down failure modes.

Do not only complain when AI makes a mistake. Turn the patterns it often gets wrong into rules, so a skill can prevent them next time.

Second, keep skills short.

One skill should solve one clear problem. The shorter it is, the easier it is to call correctly and maintain.

Third, make output format clear.

If you want the Agent to list a plan first, execute next, and summarize verification results at the end, write that structure clearly. Vague requirements usually produce vague results.

Fourth, keep human handoff points.

A good skill should not let AI run too far alone. When there is uncertainty, expanded impact scope, failing tests, or a product decision, it should stop and explain the situation.

Notes for Use

First, do not turn everything into a skill.

Too many skills make the system complex, and the Agent may not know which one to choose. Start with the highest-frequency and most painful scenarios.

Second, skills need iteration.

The first version of a skill may not be good. Watch how AI actually executes it, then gradually delete, add, and rewrite.

Third, do not let skills replace engineering judgment.

Skills can improve workflow, but they cannot guarantee correct implementation. Tests, review, build checks, and human judgment still matter.

Fourth, pay attention to differences between Agents.

Claude Code, Codex, Cursor, and Copilot support instructions, skills, and rules differently. The same idea can be reused, but the specific format should be adjusted for each tool.

Reference

mattpocock/skills

Final Thought

What makes mattpocock/skills worth watching is not one magic prompt inside it, but the practical AI coding idea it demonstrates: break engineering experience into small skills, then let the Agent combine them by scenario.

As AI coding moves from occasional assistance into daily workflow, skills become important tools for constraining Agents, keeping engineers in control, and improving feedback quality.

free-claude-code: Connecting Claude Code to OpenRouter, DeepSeek, and Local Models Through a Proxy

Fri, 01 May 2026 03:41:49 +0800

free-claude-code is an Anthropic-compatible proxy for Claude Code.

Its idea is not to crack Claude Code, nor to provide an official free Claude service. Instead, it starts a local proxy service that looks like an Anthropic API, then forwards requests from Claude Code to other model backends. The README mentions backends such as NVIDIA NIM, OpenRouter, DeepSeek, LM Studio, llama.cpp, and Ollama.

In simple terms, it solves this problem: you like the terminal experience of Claude Code, but want to send model requests to another provider or a local model.

What Problem It Solves

Claude Code has an interaction model that works well for development tasks.

It can read code, edit files, run commands, and move tasks forward based on project context inside the terminal. But many users may not always want to use the same model backend:

They want to try different models on OpenRouter
They want to use models such as DeepSeek to reduce cost
They want to route requests to local Ollama
They want to run local models through LM Studio or llama.cpp
They want one proxy entry point in the development environment
They want to compare different models inside the Claude Code workflow

free-claude-code is positioned as a compatibility layer between Claude Code and these model services.

Claude Code still sends requests in an Anthropic-like style, while the proxy adapts those requests to different backends.

How It Works

You can think of it as three layers:

The frontend is Claude Code
The middle layer is the free-claude-code proxy
The backend is OpenRouter, DeepSeek, a local model, or another model service

Claude Code believes it is accessing an Anthropic-compatible API.

After the proxy receives a request, it selects a target provider according to configuration, transforms the necessary fields, and returns the response to Claude Code.

The benefit of this structure is that you do not need to modify Claude Code itself, and you do not need every model service to natively support Claude Code. As long as the proxy can align the interfaces, more models can be connected to the same workflow.

Supported Backends

The README lists these directions:

NVIDIA NIM
OpenRouter
DeepSeek
LM Studio
llama.cpp
Ollama

These backends represent different usage styles.

OpenRouter is more like a model aggregation entry point, useful for testing different commercial and open-source models.

DeepSeek is suitable for people who care about Chinese ability, coding ability, and cost.

LM Studio, llama.cpp, and Ollama are more local-model oriented. They are suitable for running models on your own machine or inside an intranet, reducing dependence on external APIs and making offline experiments easier.

NVIDIA NIM is more oriented toward enterprise and GPU inference deployment scenarios.

Why an Anthropic-Compatible Proxy

Claude Code was originally designed around Anthropic interfaces and model conventions.

If you want to connect it to other models, the most direct problem is interface mismatch:

Request fields differ
Model names differ
Streaming formats differ
Tool use is represented differently
Error response formats differ
Token and context limits differ

This is where the proxy layer is useful.

It keeps the interface seen by Claude Code close to the Anthropic shape, then adapts to the backend. For users, after configuring the proxy once, they can test different models inside the same Claude Code workflow.

Suitable Scenarios

free-claude-code is suitable for:

Using the Claude Code terminal workflow
Testing non-Anthropic models in Claude Code
Reducing model calling costs
Connecting Claude Code to OpenRouter
Connecting to compatible model services such as DeepSeek
Running local models through Ollama, LM Studio, or llama.cpp
Giving a team one unified model proxy entry point

If you only use official Claude Code normally and have no special needs around providers, cost, or local deployment, you may not need this type of proxy.

But if you often compare models, or want Claude Code to connect to local and third-party models, this type of tool is useful.

Difference from Directly Using OpenRouter or Ollama

Using OpenRouter, Ollama, or LM Studio directly usually means chatting with a model or calling it through an API.

The point of free-claude-code is not to replace those services, but to connect them to the Claude Code development workflow.

The difference is:

You still use the Claude Code terminal experience
AI can execute tasks around a code repository
The model backend can be changed to another provider
Local models can enter the Claude Code workflow
Configuration is centralized in the proxy layer instead of changed in each tool

So it is more like a bridge than a new chat client.

Notes About Local Models

Connecting Claude Code to local models is attractive, but there are real limitations.

First, model capability differs.

Claude Code tasks are usually not just chat. They include understanding code, planning modifications, editing files, and handling command output. Smaller local models may not complete these tasks reliably.

Second, context window matters.

Code tasks need a lot of context. If the model context is too small, it may fail to read full files, miss constraints, or lose background across multi-turn tasks.

Third, tool use compatibility matters.

Claude Code workflows depend on tool calls and structured behavior. Even if a backend model can chat, it may not follow tool-use protocols well.

Fourth, speed and hardware matter.

Local model speed depends on machine configuration, quantization, and model size. If code tasks respond too slowly, the experience drops noticeably.

So local models are better for experiments, low-risk tasks, and specific scenarios. For truly complex coding tasks, choose carefully according to model capability.

Usage Boundaries

Projects like this are easy to misunderstand from the title, so the boundaries should be clear.

First, it is not an official free Claude Code quota.

It only forwards Claude Code requests to other model backends. When using OpenRouter, DeepSeek, NVIDIA NIM, or other APIs, you still need to follow the pricing, quotas, and terms of the corresponding services.

Second, it is not a tool for bypassing authorization.

When using any proxy tool, you should follow the licenses and terms of Claude Code, model providers, and the project itself. Do not interpret it as a way to avoid official restrictions.

Third, the proxy handles your request content.

Code, command output, and project context may pass through the proxy and backend services. When deploying, consider logs, keys, network boundaries, and privacy. For company code or sensitive projects, use a controlled environment.

Fourth, model performance varies greatly.

The same Claude Code operation may behave very differently after switching models. Do not assume every model can replace Claude.

Relationship with Proxies Such as LiteLLM

Conceptually, free-claude-code belongs to the category of compatible interface proxies.

The shared goal of such tools is to reduce coupling between upper-level applications and lower-level model services. The upper-level application faces a relatively unified interface, while backend providers can be switched by configuration.

Different projects focus on different areas. Some are general model gateways, some focus on OpenAI-compatible APIs, and some specifically adapt tools such as Claude Code.

What makes free-claude-code worth noting is that it puts Claude Code directly at the center, rather than building a generic chat proxy.

Suitable Users

It is better suited to users who are comfortable tinkering:

Familiar with Claude Code
Know how to configure API keys and model providers
Understand proxy service startup and environment variables
Can troubleshoot network, port, model name, and streaming issues
Want to compare different models on coding tasks

If you only want something that works out of the box, the official configuration is usually simpler.

If you are willing to set up a proxy, switch models, tune parameters, and let Claude Code enter more model environments, this project is worth studying.

Reference

Alishahryar1/free-claude-code

Final Thought

The value of free-claude-code is not in the word “free,” but in the bridge it builds between Claude Code and more model backends.

When you want to keep the Claude Code development experience while testing OpenRouter, DeepSeek, local models, or enterprise inference services, an Anthropic-compatible proxy like this becomes useful.

Compound Engineering Plugin: Turning AI Coding into a Plan, Execute, Review Engineering Loop

Fri, 01 May 2026 03:15:39 +0800

Compound Engineering Plugin is an open-source AI coding workflow plugin from Every Inc.

It is not focused on “making AI write a piece of code faster.” Instead, it places AI coding inside a loop that looks more like an engineering team: plan first, implement next, review afterward, then preserve what was learned. For people who frequently use tools such as Claude Code, Codex, Cursor, and Copilot, this kind of plugin solves a workflow problem, not just a prompt problem.

AI coding tools are becoming stronger, but in real projects the hardest part is often not generating code. It is making the AI continuously follow project rules, understand task boundaries, avoid repeating mistakes, and accumulate context across multiple iterations.

What Problem It Solves

Many people use AI coding assistants in a flow like this:

Describe the requirement directly
Ask AI to modify the code
Check whether the result runs
Add more explanation after errors appear
Explain the background again in the next task

This can work for small tasks, but it easily breaks down in complex projects:

Requirements are not clarified before AI starts editing
There is no systematic review after code changes
Project conventions depend on repeated user reminders
Similar mistakes happen again next time
Multiple Agent tools lack a shared working method
Experience is not turned into reusable rules

Compound Engineering Plugin is designed for this class of problems. It splits AI coding into multiple stages, so an Agent is not only executing commands but participating in a more complete engineering process.

What Is Compound Engineering

From the project README, Compound Engineering can be understood as a method for AI-assisted software development.

It emphasizes a loop:

Plan: understand the goal, split the task, confirm the path
Execute: modify code according to the plan, run commands, handle problems
Review: check implementation quality, risks, and test coverage
Learn: preserve experience as reusable rules for future work

This loop resembles how real engineering teams work.

A reliable engineer does not receive a requirement and immediately make random changes, nor does he finish edits and hand them off without checking. He first judges the impact scope, then implements, then checks risks and test results, and finally records the traps he stepped into. AI Agents need similar constraints.

Why a Plugin Is Needed

A prompt can tell AI, “Please plan before executing,” but prompts themselves are not always stable.

Once a conversation becomes long and context becomes complex, the model may skip planning, ignore rules, or become overconfident in order to finish the task. The value of a plugin is that it fixes the workflow so different Agent environments can follow similar methods.

This kind of plugin usually breaks a workflow into commands, rules, templates, or subflows. The user does not need to manually write the full prompt every time. Instead, a fixed entry point triggers a specific stage.

For example:

Ask the Agent to generate a plan first
Implement step by step according to the plan
Trigger review after edits
Return to fixing after problems are found
Write useful experience into memory or rules

This makes AI coding feel more like controlled collaboration instead of one-off chat.

Supported Agent Environments

The README mentions support for multiple AI coding environments, including:

Claude Code
Codex
Cursor
GitHub Copilot
Amp
Factory
Qwen Code

This is worth noting.

Many workflow tools are tied to one client. Once you switch tools, the rules cannot be reused. Compound Engineering Plugin is more like a cross-Agent engineering method, bringing similar planning, execution, and review workflows to different tools.

If you use multiple AI coding assistants at the same time, this unified workflow becomes more valuable. Different tools have different capabilities, but project conventions, review habits, and task decomposition methods should remain as consistent as possible.

Why the Planning Stage Matters

The value of the planning stage is to stop AI from acting too early.

In complex tasks, the truly important questions are usually:

Which files need to change?
Which modules may be affected?
What existing pattern should be followed?
Are there tests?
Where are the risks?
Should documents be read first?
Can the task be split into smaller steps?

If an Agent starts writing code before thinking through these questions, it can easily produce an implementation that looks finished but deviates from the project structure.

A plan does not need to be long. A good plan should be short, specific, and executable. Its purpose is not to create documentation, but to give the following implementation clear boundaries.

What to Avoid in Execution

When AI executes coding tasks, several problems appear easily:

Refactoring unrelated code
Overwriting existing user changes
Only handling the happy path
Ignoring error handling
Not following the existing project style
Not running necessary verification
Blindly trying things after errors

A workflow plugin cannot guarantee these problems will disappear, but it can reduce their probability through rules and staged constraints.

For example, the execution stage can require the Agent to proceed according to the plan. When it discovers something outside the plan, it should explain the risk first. When modifying shared modules, it should add tests or at least run related verification.

This is especially important in large codebases. The faster AI writes code, the more process is needed to constrain its momentum.

Why Review Matters

Many AI coding failures are not caused by code that cannot run at all. They come from detail problems:

Edge cases are not handled
State updates are inconsistent
API contracts are changed quietly
Tests do not cover key paths
Error messages are unclear
Performance or security risks are not mentioned

The review stage switches the Agent from “author mode” to “reviewer mode.”

Author mode tends to justify its own implementation. Reviewer mode should actively look for holes, regression risks, and missing tests. Separating these two stages is more reliable than asking the same response to both implement and self-review.

For users, review output is also more valuable. It helps you quickly judge whether the change is ready to merge or still needs rework.

The Meaning of Learning and Memory

The word “Compound” in the project name suggests an important idea: engineering experience should compound.

If AI fixes a mistake only for the current task and then repeats the same mistake next time, the productivity gain is limited. A better approach is to preserve useful experience:

Directory conventions in this project
Debugging methods for a class of errors
Test commands and notes
Generated files that should not be touched
Code style preferences
Common implementation patterns

These experiences can become rules, memories, documents, or templates. In later tasks, the Agent reads these accumulated notes before starting work.

This is the key to moving AI coding from “one-off Q&A” toward “long-term collaboration.”

Suitable Scenarios

Compound Engineering Plugin is suitable for:

Long-term use of AI Agents for coding
Projects that receive many rounds of modifications
Teams that want AI to plan before implementing
Users who want review thinking after changes
Teams that want a unified AI coding workflow
People who use Claude Code, Codex, Cursor, and other tools at the same time
Teams that want to turn project experience into reusable rules

If you only occasionally ask AI to write a small script, the full workflow may feel heavy.

But if you treat AI coding assistants as daily development partners, the plan, execute, review, learn loop becomes clearly useful.

Difference from Normal Prompt Templates

Normal prompt templates usually solve “how to state the task clearly.”

For example:

Please think step by step
Please read the files first
Please keep code style consistent
Please run tests
Please summarize the changes

These prompts are useful, but they still rely on the user using them correctly every time.

Compound Engineering Plugin operates more at the workflow layer. It organizes these requirements into a repeatable process and adapts them to different Agent tools. You are not writing prompts from scratch every time; you are moving tasks through a workflow.

Simply put, a prompt template is like a reminder, while a workflow plugin is like a system.

Notes for Use

First, do not let the process become a burden.

Small tasks do not always need a full plan and long review. A good workflow should adapt to task complexity: handle simple problems quickly and use the full loop for complex ones.

Second, review cannot replace tests.

Agent review can find many problems, but it can still miss real runtime errors. Final judgment still depends on tests, type checks, build results, and human review.

Third, rules need continuous cleanup.

Preserving experience is important, but rules can become noise as they accumulate. Outdated rules, duplicate rules, and temporary experience that only applied to one task should be cleaned up regularly.

Fourth, cross-tool consistency does not mean everything is identical.

Claude Code, Codex, Cursor, Copilot, and other tools have different capabilities and interaction models. What should be unified is the working method, not necessarily every command or configuration detail.

Suitable Teams

If a team already allows AI Agents to modify real code, it is not enough to discuss only “which model is stronger.”

The more important questions are:

Does AI understand the task before editing?
Does AI follow project boundaries during editing?
Does AI actively review risks after editing?
Can AI learn from historical mistakes?
Does the team have unified Agent usage conventions?

This is where projects such as Compound Engineering Plugin matter. They move AI coding one step away from personal tricks and toward reusable team workflow.

Reference

EveryInc/compound-engineering-plugin

Final Thought

What makes Compound Engineering Plugin worth watching is not that it adds another AI coding command, but that it organizes AI coding into an engineering workflow that can improve over time.

When AI Agents start participating in real projects, planning, execution, review, and experience preservation become more important than one-off code generation.

Claude Code Hooks Mastery: An Introduction to 13 Hook Lifecycle Events and Automation Control

Fri, 01 May 2026 03:11:27 +0800

claude-code-hooks-mastery is a learning project focused on Claude Code Hooks.

It is not just a collection of scattered scripts. It explains the Claude Code hook lifecycle, configuration methods, script patterns, and common automation scenarios in one place. For people who want Claude Code to be more controllable and more like an engineering assistant, this kind of material is worth reading.

Claude Code can already read code, edit files, and run commands by default. But if you want it to automatically check permissions, block risky operations, inject project rules, run tests, or remind it of team conventions at specific moments, chat instructions alone are not stable enough. The value of hooks is that they turn “rules I need to remind the AI about every time” into executable workflow.

What Problems Hooks Solve

After using Claude Code for a while, common pain points include:

Every new session needs the same project rules repeated
You worry that it may run commands it should not run
You want checks before and after file edits
You want formatting, tests, or security scans before committing
You want team conventions as fixed workflow instead of verbal reminders
You want context before and after tool calls for logging or blocking
You want complex tasks to trigger subagents or dedicated scripts

Hooks are designed for these “automatic actions at fixed moments.”

You can think of them as event hooks in the Claude Code workflow. When a session starts, a user submits a prompt, the model is about to call a tool, a tool call finishes, or an agent is about to stop, Claude Code can run the scripts you configured.

The 13 Hook Lifecycle Events

One of the main points in the project README is that it systematically covers the 13 Claude Code hook events.

These events span multiple stages, from session startup to tool calls, and from user input to agent termination. By purpose, they can be roughly grouped as:

Session startup: initialize environment and inject project context
User input: inspect prompts, add rules, and perform auditing
Before tool calls: permission checks, command blocking, and security validation
After tool calls: log results, trigger formatting, and run verification
Task ending: summarize, clean up, notify, or save state

This lifecycle design means you do not need to put every rule into one very long prompt.

For example, permission control should happen before tool calls. Formatting checks are better after file edits. Project rule injection is better at session startup or after user input. Putting rules at the right hook point is usually more reliable than stuffing everything into a system prompt.

Where Configuration Lives

Claude Code hooks are usually configured through settings files.

Common locations include:

User-level configuration: ~/.claude/settings.json
Project-level configuration: .claude/settings.json

User-level configuration is good for personal preferences, such as general security rules, command blocking, and log paths.

Project-level configuration is better for repository-specific rules, such as which tests must run, which directories cannot be edited, how generated files are handled, and which checks are required before commit.

If you use Claude Code in a team, it is better to put project-level configuration into the repository. That way everyone opens the project with the same AI collaboration constraints instead of relying on personal memory.

Why Single-File Scripts Matter

The project emphasizes UV single-file scripts.

The benefit is simple deployment. A single Python file can declare dependencies and run without maintaining a complex environment for one hook. This fits hooks well because many hooks only do one small thing:

Check whether a command is allowed
Determine whether a file path is safe
Read project rules and return them to Claude
Scan output for sensitive information
Run formatting or tests after edits
Write events to logs

The smaller a hook script is, the easier it is to maintain, and the less likely it is to become a new complicated system.

What Automation Can Hooks Do

claude-code-hooks-mastery shows many directions. In real work, the most common ones are below.

1. Permission and Security Control

This is the most direct use of hooks.

Before Claude Code executes a command, a hook can inspect the command content. If it contains high-risk actions such as deletion, reset, cleanup, or overwrite, it can block execution or require manual confirmation.

Similar rules can apply to file paths:

Do not modify production configuration
Do not write to secret files
Do not delete migration scripts
Do not touch specific directories
Do not run unapproved network commands

Putting this protection before tool calls is more reliable than writing “do not perform dangerous operations” in a prompt.

2. Context Injection

Many projects have fixed background information:

Tech stack
Coding conventions
Test commands
Branching strategy
Directory structure
Prohibited actions
Rules for generated files

Telling Claude Code this manually every time is annoying and easy to forget. Hooks can automatically inject necessary context at session startup or after the user submits a prompt.

This is like giving Claude Code a project-level work manual. It does not replace the README or development documentation, but it helps AI enter the correct state before executing a task.

3. Verification After Edits

After Claude Code modifies files, hooks can automatically trigger checks.

Common actions include:

Run formatting
Run lint
Run unit tests
Check type errors
Scan generated files
Validate Markdown or JSON format

This helps reduce low-level mistakes. When AI edits multiple files, a lightweight verification pass after modification can reveal problems earlier.

However, hooks should not run heavy tasks by default. Running the full test suite after every file change can make the experience slow. A better approach is to choose checks based on file type, directory, and task risk.

4. Team Rule Validation

If a team already has clear conventions, some of them can be placed in hooks.

For example:

Commit message format
Code style rules
Do not directly edit certain generated files
Documentation must be updated together
API changes must update tests
Certain directories can only be generated by specific tools

This makes Claude Code more like part of the team workflow rather than an unconstrained external assistant.

Of course, hooks should not replace CI. They are better for local reminders and early blocking. Final validation should still belong to CI, review, and test systems.

5. Subagents and Dedicated Tasks

The README also mentions subagent-related content.

This type of usage is suitable for sending complex tasks into more specialized workflows. For example, the main conversation can understand the requirement, while a hook or configuration triggers dedicated checking, auditing, summarizing, or documentation tasks.

For individual users, the first useful step is not complex agent orchestration. It is better to hand repetitive, clear, low-risk actions to hooks first. More complex automation can come after the rules become stable.

Statusline and Output Styles

The project also covers statusline and output styles.

This may look like a small experience detail, but it matters for long-term Claude Code usage. A statusline can show current context, task state, environment information, or hints. Output styles can make Claude Code answers fit your working habits better.

If you collaborate with AI in the same terminal every day, these details affect efficiency. Good status hints reduce mistakes and help you quickly determine whether the current session is in the right project, branch, and environment.

Do Not Make Hooks Too Heavy

Hooks are powerful, but they are not the place to put everything.

Good rules are:

High-frequency actions should be fast
Security blocking should be clear
Output should be short
Failure reasons should be readable
Scripts should have a single responsibility
Heavy checks should be explicit commands or CI tasks

If a hook takes more than ten seconds every time, users will soon want to disable it. If a hook has vague blocking rules, both Claude Code and the user will struggle to understand what to do next.

Hooks are best for tasks with clear boundaries: allow or reject, add context, log events, run lightweight checks, and suggest the next step.

Who Should Use It

If you only occasionally ask Claude Code to edit a small piece of code, you may not need to study hooks deeply yet.

But this project is useful if you:

Use Claude Code frequently
Often let AI modify real project code
Worry about AI running dangerous commands
Want to automatically inject team rules into AI workflows
Want checks to run automatically after edits
Want to turn repeated reminders into configuration
Are building a more stable AI coding workflow

Hooks are especially meaningful in collaborative projects. They can turn part of team experience into scripts instead of relying on every person to remind AI manually.

Notes for Use

First, start with security hooks.

Compared with complex automation, command blocking, path protection, and sensitive file checks are easier to implement and immediately reduce risk.

Second, commit project-level rules carefully.

.claude/settings.json affects everyone who uses the repository. Before committing rules, make sure they do not over-restrict normal development or depend on paths that only exist on your machine.

Third, keep hook output concise.

Claude Code consumes this output. If it is too long, it pollutes the context. If it is too vague, it does not guide the next step. It is best to return only the necessary judgment and next recommendation.

Fourth, keep hooks debuggable.

When hooks increase in number, problems can come from configuration, scripts, permissions, paths, dependencies, or Claude Code itself. Clear logs make later debugging much easier.

Reference

disler/claude-code-hooks-mastery

Final Thought

The value of Claude Code Hooks is turning “rules I hope AI remembers every time” into workflows that actually execute.

If you already use Claude Code in real projects, hooks are a key step from “a coding assistant that can chat” toward “a constrained engineering collaborator.”

Claude-Mem: Adding Cross-Session Long-Term Memory to Claude Code

Fri, 01 May 2026 03:01:02 +0800

Claude-Mem is a persistent memory system for Claude Code.

It tries to solve a very specific problem: every time an AI coding assistant starts a new session, it often forgets earlier architecture decisions, past pitfalls, project preferences, and implementation context.
If a project lasts for a long time, repeatedly explaining the same background becomes a waste of time.

The idea behind Claude-Mem is to compress Claude Code conversations into memories, store them in a local database and vector store, and then retrieve them later through a search tool.

What Problem Does It Solve?

Claude Code is good at code tasks, but session context is still limited.

Common pain points include:

A new session does not know what previous sessions did
Project design decisions need to be explained repeatedly
Problems that were already debugged are easy to repeat
Long-running tasks lack continuity
Project knowledge is hard to accumulate across conversations

Claude-Mem is designed around these problems.

It is not simply saving chat logs. Instead, it compresses conversations into memory fragments that are easier to retrieve. When needed later, semantic search can bring the relevant context back.

How It Works

From the README design, Claude-Mem mainly consists of several parts.

The first part is hooks.

It integrates with the Claude Code session flow and captures conversation data at the right time.

The second part is a background worker.

The worker processes raw conversation content into shorter, more searchable memories.

The third part is local storage.

The project uses SQLite for structured metadata and Chroma for vector indexing. This preserves basic session information while supporting semantic retrieval.

The fourth part is mem-search.

This is the query entry point for Claude Code. When old context is needed, it can search relevant memories through this tool.

The overall flow can be understood like this:

Claude Code sessions generate content
Hooks capture session data
The worker asynchronously compresses and organizes it
Memories are written to SQLite and Chroma
Later sessions retrieve them through mem-search

When Is It Useful?

Claude-Mem is suitable for long-running projects, not one-off small tasks.

For example:

A repository is developed over many days
The code structure is complex and has a lot of background
Project conventions, naming habits, and architecture choices need to be remembered
Claude Code is often used for bug fixes, features, and documentation
You want the AI to remember why something was changed earlier

If you only ask Claude Code to make a one-line change, long-term memory is not very meaningful.
But if you treat Claude Code as a long-term collaborator, it becomes useful.

Installation and Startup

The README gives a direct installation flow:

1
2

npm install -g claude-mem
claude-mem install

Start it with:

`1`	`claude-mem start`

Check status:

`1`	`claude-mem status`

Stop it when needed:

`1`	`claude-mem stop`

The goal behind these commands is to connect the memory system as a long-running local service to the Claude Code workflow.

How to Use `mem-search`

mem-search is the key entry point for retrieving memory.

It is not meant to replace ordinary search. It lets Claude Code query past conversations by meaning.

For example, Claude Code can search for:

Why a module was designed in a certain way
How a bug was debugged earlier
Naming rules agreed on in the project
Technical trade-offs discussed before
The background behind a refactor

This is different from simple keyword search.
If memory compression and vector indexing work well, you can retrieve semantically related content even if you do not remember the exact wording.

How Is It Different from Project Documentation?

Project documentation is good for stable conclusions.

For example:

Architecture notes
Deployment procedures
API conventions
Database structure
Development rules

Claude-Mem is better for context created during conversations.

For example:

Why a plan was rejected
How a temporary issue was worked around
The discussion behind an implementation
Project preferences not yet written into docs
Task background accumulated across multiple conversations

The two are not replacements for each other.
A good workflow is to write stable knowledge into project docs and use the memory system to help retrieve conversational context.

Things to Watch Out For

First, more long-term memory is not always better.

If every conversation is saved without distinction, later retrieval can become noisy. The most valuable memories are project decisions, implementation background, debugging history, and long-term preferences.

Second, memory cannot replace code and documentation.

Old context found by AI is only a reference. Final judgment still depends on the current code, test results, and latest requirements.

Third, pay attention to privacy and local data.

Since it stores conversation content, you should know which projects are suitable for it and which sensitive information should not enter the conversation.

Fourth, memory systems need maintenance.

As a project moves forward, old memories may become outdated. If outdated context is reused incorrectly, it can mislead later tasks.

Why This Kind of Tool Matters

AI coding tools are moving from one-off Q&A toward long-term collaboration.

In one-off Q&A, the model only needs to answer the current question.
In long-term collaboration, it needs to know project history, earlier decisions, team preferences, and pitfalls that have already been found.

This is where tools like Claude-Mem matter: they turn “remembering context” from a temporary chat capability into a local system that can be installed, run, and searched.

For real engineering projects, this is more practical than simply making the model context window longer.
Much information does not need to be stuffed into context all at once; it needs to be retrieved at the right time.

Who Should Try It?

You may want to try it if:

You use Claude Code frequently
You often work on the same project across multiple days
The project context is complex
You repeatedly explain the same background to AI
You want to preserve experience from conversations

If you only use Claude Code occasionally, or the project is small, you may not need this kind of system yet.

Reference

thedotmack/claude-mem

Final Thought

The point of Claude-Mem is not “saving chat logs.” It is helping Claude Code retrieve useful context in later tasks.

As AI coding moves from one-off tasks to long-running project collaboration, memory systems will become increasingly important.
They cannot replace documentation and tests, but they can reduce repeated explanations and make the AI feel more like an assistant that understands project history.

Ralph and Multi-Agent Collaboration: How to Keep AI Working Reliably Over Long Tasks

Mon, 27 Apr 2026 08:19:02 +0800

If you have been using coding agents lately, you quickly run into a very practical question: AI can work, sure, but how do you keep it working for hours without drifting, forgetting requirements, or redoing the same work?

That is the real question behind many discussions around Ralph and multi-agent collaboration. The point is not simply to compare which model is stronger. The more useful question is this: how do you design a workflow that lets AI stay stable during long tasks?

If you break the problem down, there are usually two main routes:

The Ralph approach: keep starting fresh sessions and connect context through the filesystem
The multi-agent approach: let a lead agent coordinate while worker agents split the execution

Put more simply, the question is not “which model is more powerful,” but “how do you organize AI so it behaves more like a small team that can keep delivering?”

01 Why Long Tasks Go Off the Rails

In short tasks, many problems stay hidden. You give an instruction, the model reads a few files, changes a few lines, and the job is done.

Once the task gets longer, the common failure modes start to pile up:

Conversations grow longer and context starts to bloat
Earlier requirements get squeezed out by newer information
One agent has to plan, implement, and test at the same time
Without a clear acceptance step, “it is done” often just means “it says it is done”

So when AI runs for a long time, the real challenge is often not single-shot model quality. It is task slicing, state handoff, role separation, and feedback loops.

02 The Ralph Approach: Break Long Tasks into Short Rounds

Ralph is a good fit when the main problem is dirty, overloaded context.

Its core pattern is straightforward:

Keep launching new agent sessions in a loop
Let each round handle only one small enough task
Store cross-round state in files instead of forcing everything into one conversation

The benefit is immediate: every round starts with fresh context, so the session stays more focused and is less likely to get dragged down by old history.

If you have already looked at Ralph-style projects, the structure will feel familiar:

Current tasks live in structured files
Intermediate learnings go into progress files
Code changes stay in git history

In other words, Ralph does not try to make one agent remember everything forever. It externalizes memory on purpose so the session itself can stay lighter.

This kind of setup works especially well when:

The work can already be split into small stories
Each story can fit inside one context window
The project already has tests, typecheck, or other checks

It is a solution to the problem of how to keep AI moving forward one round at a time.

03 The Multi-Agent Approach: Split the Work One Agent Cannot Handle Alone

The other route is multi-agent collaboration.

In this kind of workflow design, the more promising pattern is usually this: the lead agent should not do all the work directly. Instead, it coordinates while other agents handle development, testing, checking, and acceptance.

That differs from Ralph in an important way:

Ralph feels more like serial iteration
Multi-agent work feels more like parallel division of labor

When the task naturally contains different roles, multi-agent collaboration becomes easier to use. For example:

One agent breaks down the task and writes the execution plan
One agent implements the actual change
One agent tests and validates the result
One agent checks whether the result still matches the original goal

The point is not to open more windows for the sake of it. The real value is role separation. Tasks that used to be piled onto one agent can now be split into clearer stages.

Once the role boundaries are clear, several problems become lighter:

The person writing does not have to be the same one reviewing
The testing side does not have to reconstruct the full requirement every time
The lead agent is less likely to drown in implementation detail

This is a solution to the problem of how to make AI cooperate more like a small team.

04 The Real Key Is Not Parallelism, but Task Design

Whether you choose Ralph or multi-agent collaboration, the easiest thing to underestimate is this: workflow design matters more than opening more agents.

If the task split is wrong, adding more agents only parallelizes the confusion.

A more stable breakdown usually has a few traits:

One task maps to one clear objective
One role owns one category of output
Every round has a clear done condition
The output of one round can be consumed directly by the next

For example, instead of giving AI one giant instruction like “build the whole feature,” a steadier structure is often:

Break out requirements and boundaries first
Then split implementation
Then split testing
Then make acceptance its own step

The advantage is that when something goes wrong, it becomes easier to tell whether the problem sits in understanding, implementation, testing, or delivery criteria.

05 Why Acceptance Matters So Much

Many AI workflows fail not because nothing happened earlier, but because the last step lacked a genuinely independent confirmation pass.

In long tasks, there is often a wide gap between “a result was produced” and “the result is actually usable.”

So one especially important direction is to separate development from acceptance. Even without a complex process, it is worth asking at least these questions:

Did it really complete the original task?
Did it only patch the surface without fixing the root cause?
Did testing cover only the happiest path?
Did the upstream requirement get silently changed along the way?

Without that layer, AI can easily keep declaring success inside a long workflow.

06 How to Choose Between the Two

If you want a fast rule of thumb:

If your main pain is context bloat and long-session drift, start with Ralph
If your main pain is one agent wearing too many hats, start with multi-agent collaboration

More specifically:

Ralph fits work that is clear, granular, and easy to move forward round by round
Multi-agent collaboration fits work with strong role boundaries and a need for parallelism and cross-checking

In practice, these two approaches are not always competitors. A mature setup often combines them:

Use a Ralph-style outer loop to push the larger task forward
Use multi-agent collaboration inside each round for research, implementation, testing, and acceptance

That gives you both better control over long context and better collaboration inside a single round.

07 One-Sentence Summary

What makes these approaches worth studying is not that they recommend Ralph or multi-agent collaboration in isolation. It is that they make one practical truth very clear: keeping AI stable over long tasks depends less on the model itself and more on whether you designed context, tasks, roles, and acceptance well.

If you are already asking Claude Code, Codex, or other coding agents to handle longer real-world tasks, this kind of workflow thinking is often more valuable than simply switching to a stronger model.

What Ralph Is: Turning Claude Code and Amp into a Repeatable Autonomous Development Loop

Mon, 27 Apr 2026 08:08:55 +0800

If you have been paying attention to long-running coding agent workflows lately, snarktank/ralph is a project worth a close look. It is not another model wrapper or another chat UI. Instead, it organizes Claude Code or Amp into an autonomous loop that keeps running through stories in a PRD until everything is done.

Its core idea is simple: do not force the same agent to keep working inside an increasingly long and messy context. Start a brand-new AI coding session for every iteration instead. That keeps context from bloating and makes task boundaries much clearer.

01 What Ralph Is

Ralph describes itself very clearly: it is an autonomous AI agent loop that repeatedly runs an AI coding tool until the items in a PRD are complete.

The repository currently supports two tools:

Amp CLI
Claude Code

Each iteration starts a fresh instance. In other words, it does not depend on one endlessly extended conversation. Instead, it keeps memory in external state:

git history
progress.txt
prd.json

That detail matters a lot. When people let an agent run on large tasks, the main problem is often not that the model cannot code. It is that the session becomes heavier over time, starts losing context, forgets requirements, and repeats work. Ralph is designed almost entirely around that problem.

02 How It Works

Ralph’s workflow has three steps.

1. Write a PRD first

The README suggests starting with the bundled prd skill to generate a requirements document and break the feature into smaller stories.

2. Convert the PRD into `prd.json`

Then the ralph skill converts the Markdown PRD into a structured prd.json. That file stores the user stories and whether each one has passed.

3. Run the loop script

The actual execution is handled by ralph.sh. The commands look like this:

1
2

./scripts/ralph/ralph.sh [max_iterations]
./scripts/ralph/ralph.sh --tool claude [max_iterations]

The default is 10 iterations. In each round, Ralph roughly does the following:

Create a branch from branchName
Pick the highest-priority story where passes: false
Implement only that story
Run quality checks such as typecheck and tests
Commit if the checks pass
Update prd.json
Append learnings to progress.txt
Continue to the next round

So Ralph is not trying to finish everything in one go. It compresses work into many small loops that can fit inside a single context window.

03 What Makes Ralph Interesting

1. Every round uses fresh context

This is Ralph’s defining design choice. The README emphasizes that every iteration is a brand-new AI instance, and cross-iteration memory lives only in git, progress.txt, and prd.json.

That is very different from the common pattern of keeping Claude Code or another tool inside one long conversation. Once tasks get larger, that approach often slows down under its own history and gradually loses focus. Ralph accepts that no single round should remember everything, then moves memory into files instead.

2. It forces tasks to stay small

The docs explicitly say that each PRD item must be small enough to finish within one context window. Tasks like adding a filter, updating a server action, or adding a database column are about the right size. Tasks like rebuilding the whole API or creating an entire dashboard are too large.

That constraint is practical. Many autonomous agent loops fail not because the loop is bad, but because the task slicing is too coarse and each round carries too much at once.

3. It preserves learnings, not just code

Beyond progress.txt, the README also stresses updating AGENTS.md. The reason is straightforward: future iterations and future developers will read those notes, so patterns, gotchas, and conventions discovered in each round should be written down in the project itself.

Put differently, Ralph is not only trying to keep an agent coding continuously. It is also trying to help the agent build working memory about the codebase over time.

04 When It Fits Best

Ralph is a good fit when your task looks like this:

It can already be broken into a clear set of user stories
The codebase has reliable feedback loops such as tests, typecheck, or CI
You want the agent to keep moving forward without putting everything into one long conversation
You are fine with iterative progress instead of demanding a one-shot completion

On the other hand, if the requirement is still vague, or the work depends on frequent discussion and constant changes of direction, Ralph may not be the first thing to reach for. It fits better once the requirements are already shaped and execution needs to be steady.

05 How It Differs from Normal Claude Code Usage

With plain Claude Code, the usual pattern is simple: open a session and let it keep reading code, editing files, and running commands. That works very well for small and medium tasks, but larger tasks often hit two problems:

Context keeps growing
Intermediate decisions are harder to preserve in a structured way

Ralph turns Claude Code or Amp into something closer to a batch executor:

The task source is prd.json, not ad hoc chat instructions
Each iteration recognizes only one story
Completion state is written back to files
Learnings go into progress.txt
Code changes are preserved in git

So in practice, it feels less like a new AI assistant and more like an iteration controller added on top of a coding agent.

06 One Important Requirement

Whether Ralph works well depends less on the loop itself and more on the quality of your feedback loops. The README says this very directly: without typecheck, tests, and CI, errors will compound across later iterations.

For frontend tasks, the repository even recommends adding browser verification to the acceptance criteria. Without real verification, an agent can easily confuse “it looks done” with “it actually works.”

That point is important. Ralph is not magical automation. It is more like a force multiplier for the engineering discipline you already have. If your project already has clear task breakdowns and reliable checks, Ralph becomes much more useful. If those foundations are missing, the loop will only repeat the confusion.

07 One-Sentence Summary

What makes Ralph worth studying is not that it introduces a huge amount of new infrastructure. It takes a simple but useful idea and turns it into a practical workflow: let Claude Code or Amp handle one small story per round, keep focus with fresh context, and preserve continuity through git, prd.json, and progress.txt.

If you are already using coding agents in real projects and keep getting stuck on how to push long tasks forward reliably, Ralph’s approach is well worth borrowing.

References

GitHub repository: https://github.com/snarktank/ralph
Interactive flowchart: https://snarktank.github.io

Claude Code's Four-Part Environment Setup: CLAUDE.md, Rules, Memory, and Hooks Explained

Thu, 23 Apr 2026 10:43:40 +0800

If you use Claude Code for a while, you quickly realize something: the model itself is important, but the environment you give it, the boundaries you define, and the rules you set matter just as much.

At first, many people focus on “how should I write this prompt?” But once you really start using Claude Code seriously, you care more about something else:

Does it know who you are?
Does it know how you work?
Does it know which rules cannot be broken?
Does it know which actions require confirmation first?
Can it remember these boundaries over time?

What makes Claude Code a mature tool is not just model capability. It is that there is a whole system for turning your working style into persistent structure. At a high level, that system can be divided into four layers:

CLAUDE.md
Rules
Memory
Hooks

This article explains all four in one pass.

Why environment setup matters more than one-off prompts

You can think of Claude Code as an assistant you hired.

On day one, you would not just tell them, “help me do things.” You would give them a handbook and explain:

who you are
what communication style you prefer
which actions always require confirmation
which mistakes have happened before and must not happen again
where the most important project documents live

That is why, in the long run, environment setup usually matters more than a single prompt.

A prompt solves “what should we do this time?” Environment setup solves “how should we work every time from now on?”

Layer 1: `CLAUDE.md`

Start with the most basic piece. CLAUDE.md is essentially just a text file.

You can write instructions for Claude in it, such as:

who you are
what you are working on
your communication preferences
rules that must be followed
special background for the current project
where important documents or directories are

Every time Claude Code starts, this document is automatically injected into context, so the model will definitely read it.

I usually think of it as a “shared understanding file”, because that is what it really is: the standing agreement between you and the model.

What belongs in `CLAUDE.md`

The best things to put in CLAUDE.md are usually these kinds of information:

identity and work background
tone and output preferences
global behavior rules
important project background that comes up often
common mistakes and how to avoid them

For example:

your time zone
whether you allow the model to send emails or messages directly
which actions count as irreversible
your habits for handling documents and files
security practices and boundaries around sensitive information

One very important principle: keep it concise

There is one especially important principle for CLAUDE.md: keep it as concise as possible.

The reason is simple: it gets injected into context every time.

If it becomes too long, it takes up too much context space and dilutes the information that actually matters. The model is not ignoring it, but its attention becomes more spread out, so it is more likely to miss the rules you care about most.

The official recommendation is usually to stay under 400 lines.

My own habit is even more conservative: I try to keep it under 200 lines.

Common scopes for `CLAUDE.md`

In practice, CLAUDE.md can exist at different levels, and those levels determine its scope. The two most common ones are:

1. User Level

This is the global level.

It lives in your machine-level environment and applies to all projects you work on locally.

This is a good place for:

your identity information
general communication preferences
habits that apply across projects
global security rules

For example, if your time zone is not the default one people often assume, but Bangkok time, that fits very well at the user level, because it helps the model avoid mistakes whenever it works with dates and times later.

2. Project Level

This is the project level.

It sits inside a specific project directory and only applies to that project.

This is a good place for:

project-specific background
rules that only make sense in this project
explanations of the project’s directory structure
entry points to the project’s key documents

For example, if one project handles finance and another handles HR, the background and constraints are obviously different, so they should not live in the same global instructions.

How to decide which level to use

The rule is actually simple:

If the thing you are writing would still be true in another project, put it at the user level.

If it stops being true as soon as you switch projects, put it at the project level.

How to write the first version

There are two common ways to get started:

1. Use `/init`

You can run the slash command /init directly in the terminal and let Claude scan the current project to generate a basic CLAUDE.md for you.

2. Let Claude help you organize it

You can also ask Claude to look up how other people structure CLAUDE.md, ask you some questions based on your situation, and then help you organize a version that fits you.

In many cases, that is much easier than writing it from scratch yourself.

One very practical habit

As you collaborate with Claude over time, whenever you notice something that definitely needs to be remembered in the future, or something that must not go wrong again, you can ask it to write that into CLAUDE.md.

Before doing that, though, you still want to decide:

is this a global rule?
or a rule for the current project only?

Do not dump everything into one file.

Layer 2: `Rules`

Next is Rules.

The biggest difference between it and CLAUDE.md is not the file format. It is how it gets loaded.

CLAUDE.md is read no matter what you are doing.

The advantage of Rules is that they can be loaded conditionally.

In other words, a rule can be loaded only for certain paths, files, tools, or scenarios.

Why conditional loading matters

Because context space is always scarce.

If every rule gets shoved into context all the time, two things happen:

the model carries more overhead
truly important rules get buried

That is the value of loading rules on demand: the model sees the right information at the right time.

When to move rules from `CLAUDE.md` into `Rules`

Usually there are two situations:

1. `CLAUDE.md` has become too long

If your CLAUDE.md starts going beyond 200 lines, keeps growing, and the important content gets diluted, it is time to split some rules out.

2. Some rules only apply to specific paths

If you clearly know that some rules only make sense for certain kinds of files, for example:

rules that only apply to Python scripts
rules that only apply to a hooks directory
rules that only apply to one subproject

then those rules belong in Rules much more naturally.

Where `Rules` fit best

The most typical use case is “specific situation, specific path, specific file type”.

For example:

conventions that only apply when handling hook files
coding rules that only matter in a certain class of scripts
ways of working that only apply under one directory

Keeping that kind of content inside CLAUDE.md is usually not cost-effective.

Layer 3: `Memory`

The third layer is Memory.

Like CLAUDE.md and Rules, it also enters model context, but its core difference is this:

CLAUDE.md is something you define deliberately.

Memory is more like notes Claude writes for itself during collaboration.

What goes into `Memory`

When Claude judges that something is worth remembering, or should be kept for a while, it writes that information into Memory.

Common examples include:

a way of working you corrected
a newly added preference
temporary state in the current project
something you did not finish today and need to continue tomorrow
who you have been collaborating with recently
personal information or context that only came up recently

In other words, Memory is closer to dynamic knowledge than long-term policy.

How it differs from the first two layers

A simple distinction is:

CLAUDE.md / Rules: long-term, policy-like, explicit rules
Memory: temporary, dynamic, newly learned context during work

If something only matters for the next few days, or keeps changing as the project evolves, it usually belongs in Memory, not in a permanent rule file.

`Memory` can also be written manually

Even though Memory can be maintained automatically, you can also explicitly tell Claude things like:

please remember what I need to do tomorrow
please remember whose status I need to follow up on
please remember the key milestone for this project this month

It can write that into Memory for you.

You can also use the slash command /memory to see what memories currently exist and edit or delete them manually.

That said, I personally do not maintain it manually too often, because Claude itself can periodically reorganize these memories and clear out what has gone stale.

Layer 4: `Hooks`

The last and most advanced layer is Hooks.

Everything before this, including CLAUDE.md, Rules, and Memory, is still ultimately natural-language guidance.

You write rules, and the model usually follows them, but it is still operating by interpreting them first and then acting.

As long as the rule lives in natural language, a few problems remain:

the model may occasionally miss it
too many rules can dilute attention
in some situations, the model may decide the rule is not important enough

That is not because you wrote the rule badly. It is because natural-language rules are hard to enforce with 100% reliability.

What `Hooks` really are

Hooks are no longer natural-language instructions. They are scripts.

They are event-triggered, program-level enforcement logic.

Once a certain event happens, that logic will run. It will not be skipped because the model decided to ignore it.

That is the key value of Hooks:

they turn “should follow this” into “must execute this”.

When you should upgrade to `Hooks`

If you notice that a rule is already written in CLAUDE.md or Rules, but Claude still occasionally fails to follow it, and missing it carries real risk, then that rule should probably become a Hook.

In simple terms:

low-risk behavior: use rules
high-risk behavior: use Hooks

The most typical `Hooks` scenarios

The most obvious examples are actions you absolutely do not want to get wrong, such as:

requiring confirmation before sending an email
requiring confirmation before sending Slack, Outlook, or Gmail messages
intercepting dangerous file deletions
blocking the outbound leak of passwords or API keys

If those are only written as natural-language rules, the model can still make a mistake someday.

If they are implemented as Hooks, the event gets intercepted every time.

That is a real program-level safety barrier.

Common trigger points for `Hooks`

Hooks can be attached at different stages, for example:

injecting reminders at the start of a conversation
checking conditions before a tool runs
validating results after a tool runs

You do not necessarily need to know the formal terminology yourself.

Often, as long as you can describe the requirement clearly and ask Claude whether it should become a hook, it can help you design it.

You can also use the slash command /hook to inspect which hooks are currently configured.

A more practical way to get started

If you want to connect all four layers together, I usually recommend this order:

Step 1: use `/init` to generate a basic `CLAUDE.md`

Do not try to hand-write a huge complete rule document at the start.

Let Claude scan the project and generate a starting point, then iterate from there.

Step 2: add things as you work

As you collaborate, whenever you notice:

this must be remembered in the future
this mistake must not happen again
this preference will keep applying every time

then ask Claude to add it to CLAUDE.md.

Step 3: move things into `Rules` once `CLAUDE.md` grows

Once CLAUDE.md gets longer and longer, and the model no longer reliably follows every rule, split things out:

which rules are global?
which ones only apply to certain paths?

Move the second kind into Rules so they load conditionally.

Step 4: upgrade high-risk rules into `Hooks`

If some rules still get missed even after you wrote them down, and the cost of missing them is high, do not stay in the natural-language layer. Upgrade them into Hooks.

That is the point where “reminder” becomes “enforcement”.

Step 5: let `Memory` handle temporary state

For things that expire, change frequently, or are not long-term policy, do not shove everything into CLAUDE.md.

It is usually cleaner to let Memory hold things like:

current project progress
recent collaborators
newly added preferences
short-term plans and to-dos

That keeps context cleaner and makes the model behave more consistently.

What each layer should store

If you want a quick mental model, use this:

CLAUDE.md: long-term shared understanding, global instructions, foundational project background
Rules: specialized rules loaded by path or scenario
Memory: dynamic knowledge, temporary state, things learned recently
Hooks: program-level enforcement for high-risk actions

Closing

Many people treat Claude Code as “a chat interface that can write code”. But once you use it deeply, it feels more like a long-term intelligent workstation.

The key is not only how you phrase each instruction. It is whether you have given it a stable, clear, and accumulative environment.

Once you build these four layers:

CLAUDE.md
Rules
Memory
Hooks

the quality of collaboration between you and the model usually improves very noticeably.

Because you are no longer re-explaining from scratch who you are, how you work, and what must not happen every single time. You have actually turned those things into part of the environment.

That is the key step in turning a strong model into a mature tool.

Claude Code Multi-Agent Collaboration: How to Choose Between Subagents and Agent Teams

Wed, 22 Apr 2026 21:35:52 +0800

When people talk about multi-agent collaboration in Claude Code, the easiest two concepts to mix up are Subagents and Agent Teams. They both sound like “spin up several agents to work together,” but they are meant for different kinds of work. In short, the former is better for splitting off independent tasks, while the latter is better when several agents need to collaborate around the same problem and cross-check each other over time.

If you have used Skills before, this framing also helps:

A Skill defines the workflow and rules
A Subagent or Agent teammate does the actual execution

So the real question is not which one is “more advanced,” but what kind of collaboration problem you are solving.

Subagents: split off side tasks

Subagents are closer to temporary worker copies launched from the current session. Each one gets its own context window, and when it finishes, it returns only a summary of the result. The main conversation stays cleaner because it does not have to absorb all the intermediate logs and output.

That gives Subagents a few very practical strengths:

The main thread stays clean instead of being flooded by test logs, search results, or long output
Independent research or execution tasks can run in parallel
They work well for tasks where “just bring me the result” is enough

The original article notes that Claude Code comes with three built-in kinds of Subagents:

Explore: read-only, useful for quickly searching a codebase
Plan: read-only, useful for gathering information in the background during plan mode
General-purpose: can read and write, suitable for tasks that mix exploration and editing

Custom Subagents

If the built-in options are not enough, you can define your own Subagent. The mechanism is simple: write a Markdown file in one of these locations:

.claude/agents/: only active in the current project
~/.claude/agents/: active across all your projects

The file format looks like this:

---
name: code-reviewer
description: Expert code review specialist. Proactively reviews code for quality, security, and maintainability. Use immediately after writing or modifying code.
tools: Read, Grep, Glob, Bash
model: inherit
---
You are a senior code reviewer ensuring high standards of code quality and security.

When invoked:

1. Run git diff to see recent changes
2. Focus on modified files
3. Begin review immediately

Review checklist:

- Code is clear and readable
- Functions and variables are well-named
- No duplicated code
- Proper error handling
- No exposed secrets or API keys
- Input validation implemented
- Good test coverage
- Performance considerations addressed
Provide feedback organized by priority:

- Critical issues (must fix)
- Warnings (should fix)
- Suggestions (consider improving)

Include specific examples of how to fix issues.

The key field here is description. Claude uses it to decide when this Subagent should be called, so the more precise the description is, the more reliable the trigger tends to be.

A few other common configuration fields are also worth knowing:

tools: limits which tools the Subagent can use
model: chooses between sonnet, opus, haiku, or inherit
permissionMode: controls edit permissions and permission prompt behavior
memory: gives the Subagent a cross-conversation memory directory

If you only need a Subagent temporarily, you can also define it through the CLI:

claude --agents '{
  "code-reviewer": {
    "description": "Expert code reviewer. Use proactively after code changes.",
    "prompt": "You are a senior code reviewer. Focus on code quality, security, and best practices.",
    "tools": ["Read", "Grep", "Glob", "Bash"],
    "model": "sonnet"
  }
}'

When Subagents fit best

Subagents are usually the best fit for tasks like these:

Running tests and returning only the failure summary instead of flooding the main thread with thousands of log lines
Investigating several unrelated modules in parallel
Splitting “find the issue” and “fix the issue” into a simple pipeline

For example:

`1`	`Research the authentication, database, and API modules in parallel using separate subagents`

`1`	`Use the code-reviewer subagent to find performance issues, then use the optimizer subagent to fix them`

But if a task needs constant back-and-forth adjustments, shares a lot of context across stages, or concentrates changes in only one or two files, handling it directly in the main conversation is often simpler than spinning up a Subagent.

Agent Teams: multiple independent sessions working together

Agent Teams operate at a different level. Instead of launching worker copies inside one session, they start multiple fully independent Claude Code instances that collaborate around a shared task list and can also message one another directly.

That makes an Agent Team feel more like a real small team than a simple side-task worker setup.

The article notes that this is currently an experimental feature and needs to be enabled first:

{
    "env": {
        "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
    }
}

Once this is added to settings.json, you can ask Claude to organize a team around a specific goal. For example:

1
2
3

I'm designing a CLI tool that helps developers track TODO comments across
their codebase. Create an agent team to explore this from different angles: one
teammate on UX, one on technical architecture, one playing devil's advocate.

What an Agent Team consists of

An Agent Team mainly includes three parts:

Team lead: the main session you are using, responsible for organizing, assigning, and summarizing
Teammates: multiple independent Claude Code instances
Task list and Mailbox: the shared task list and communication channel

The biggest difference from Subagents is that teammates can communicate directly with one another instead of routing everything through the lead. Tasks usually move through states such as pending, in progress, and completed, and once a teammate finishes one task, it can pick up the next one.

When Agent Teams fit best

When a task needs several perspectives, active discussion, conflicting hypotheses, or parallel work across modules, Agent Teams are a better fit.

The article gives several representative examples:

Several reviewers inspect the same PR in parallel, each focusing on a different dimension
Multiple agents investigate the same bug with competing explanations and challenge each other’s conclusions
Frontend, backend, and testing move forward in parallel on different parts of the project

For example, parallel code review:

Create an agent team to review PR #142. Spawn three reviewers:
- One focused on security implications
- One checking performance impact
- One validating test coverage
Have them each review and report findings.

And for debate-style debugging:

Users report the app exits after one message instead of staying connected.
Spawn 5 agent teammates to investigate different hypotheses. Have them talk to
each other to try to disprove each other's theories, like a scientific
debate. Update the findings doc with whatever consensus emerges.

The common pattern here is that you do not just want one answer. You want several agents to exchange judgments, challenge assumptions, and gradually converge on a stronger conclusion.

How to choose between them

If you want a quick rule of thumb, use this:

If you just need the result, use Subagents
If the work requires discussion and cross-validation, use Agent Teams

Expanded a bit further, the main differences are:

Communication style: Subagents mainly report results back to the main session, while Agent Teams members can talk directly to one another
Coordination model: Subagents depend more on the main conversation to orchestrate them, while Agent Teams work from a shared task list that members can claim themselves
Token cost: Subagents are cheaper, while Agent Teams cost more because each teammate is an independent instance
Best-fit tasks: Subagents are better for independent, result-oriented work, while Agent Teams are better for discussion-heavy and cross-check-heavy work

Practical cautions

Agent Teams are more powerful, but that does not mean every task deserves a full team. The article specifically calls out a few practical concerns:

token usage is noticeably higher
if multiple teammates edit the same file at once, overwrite conflicts become very likely
adding too many teammates increases coordination cost without guaranteeing better results

A safer default is usually:

start with 3 to 5 teammates
split tasks by module or file to avoid edit conflicts
if the lead starts doing teammate work too early, explicitly tell it to wait for the others first

The current experimental version also has a few limitations, such as:

no support for /resume and /rewind for in-process teammates
task status can lag and sometimes needs manual correction
one lead can manage only one team at a time
teammates cannot spawn child teams of their own

Short conclusion

These two features are not substitutes for one another. They solve two different collaboration problems.

If your goal is “parallelize side tasks and keep the main context clean,” start with Subagents. If your goal is “let several agents work like a small team, discuss, and cross-check each other,” then Agent Teams are the better tool.

Trying both in a real task usually makes the distinction obvious very quickly: one is optimized for context isolation and result collection, and the other is optimized for multi-perspective collaboration and ongoing interaction.

Original article: https://cloud.tencent.com/developer/article/2652960

nuwa-skill: Turning "distilling a person" from an idea into an executable workflow

Wed, 22 Apr 2026 16:20:00 +0800

[alchaincyf/nuwa-skill](https://github.com/alchaincyf/nuwa-skill) can easily make people think of one thing first: using AI to answer in a famous person’s voice. But what makes it genuinely interesting is not whether it sounds convincing. The key is that it tries to turn “distilling how a person thinks” into a repeatable workflow.

If that works, the value goes far beyond a few entertaining character prompts. It means taking someone’s judgment framework, priorities, common heuristics, and communication habits, and turning them into a skill that can be called again and again. What you want is not a sentence that sounds like something a person might say, but something closer to a working interface for “if this person analyzed the issue, what would they look at first, how would they trade things off, and what would they question?”

It solves modeling, not imitation

Many so-called persona prompts are basically just style overlays.

They usually ask the model to:

speak in someone’s tone
quote their signature lines more often
imitate the phrasing they use in public

That looks great in demos, but it often falls apart in real work. The reason is simple: tone is surface-level, while judgment structure is the core. A person is memorable not because they like a few certain words, but because they reliably approach problems in certain ways.

The direction of nuwa-skill is closer to extracting those stable methods. In other words, it cares less about “how to sound like them” and more about “how to think like them.”

A more complete workflow

From the repository description, nuwa-skill aims to build an end-to-end flow: enter a person’s name, then automatically do the research, extraction, and validation, and finally organize the result into a skill that can be used inside Claude Code.

There are several important shifts behind that idea.

First, it assumes the person being distilled does not have to be your coworker. Many people first encounter this kind of idea in the form of “capture how a strong teammate works.” That is valuable, but it is also limited: the sample pool is small, and it usually only covers internal team experience. nuwa-skill expands the target set to a much broader range of people, such as founders, investors, scientists, product managers, and writers.

Second, it emphasizes automation rather than asking the user to handcraft prompts. What really makes this kind of capability practical is not beautiful prompt wording, but whether you can consistently do source gathering, viewpoint synthesis, pattern extraction, and result validation. As soon as any one of those steps depends entirely on manual work, the reuse cost rises quickly.

Third, it tries to make the output a skill rather than a one-off conversation. The former can be reused, combined, and iterated on. The latter usually only works in the current context and falls apart after a few turns.

Why this direction matters

If you treat AI as a question-answering machine, the natural use case is “give me an answer.” But if you treat AI as a workbench, the question becomes “give me a way to look at this problem.”

That is where the value of nuwa-skill leans.

For example, when facing a product decision, what you want may not be one standard answer. You may want several sharply different analytical frames:

one person starts with long-term compounding
one starts with resource constraints
one starts with consistency of user experience
one starts with timing of market entry

If those frames can be packaged reliably, AI stops being “something that writes a paragraph for you” and becomes “something that helps you switch perspectives quickly.” That is much more useful than simply imitating famous quotes, because it directly affects decision quality.

Its most compelling part: turning tacit knowledge into callable assets

Many high-value capabilities are hard to write down as SOPs in the first place.

Why someone consistently judges better than others is often not because they know more explicit rules, but because they have built a tacit filtering system through years of practice:

which signals deserve attention first
which noise should be ignored immediately
which questions should be broken apart
which questions should be inverted
which conclusions must wait for more evidence

This kind of ability is hard to preserve because people cannot always explain it clearly themselves. That is exactly why structured extraction is so valuable. What makes nuwa-skill appealing is that it is not trying to move around surface knowledge. It is trying to reorganize cognitive habits.

Where it fits best

I think this kind of skill is especially useful in a few scenarios.

1. Multi-perspective review before a decision

If you already have a plan but worry that you are only thinking along the path you already know, switching into different “persona perspectives” to review the same issue is more valuable than asking the model to keep expanding your original wording.

2. Learning the judgment framework of a certain kind of expert

Many people learn from experts by collecting quotes, watching interviews, and copying summaries. In the end, they often only remember a few nice lines. Once a thinking pattern becomes a skill, learning becomes much closer to “repeatedly invoking it with real questions” rather than “making a pile of static notes.”

What teams truly lack is often not just documentation, but a shared answer to “how do we usually think when we hit a problem?” If this workflow matures further, it could also be used in reverse to preserve the methods of strong internal operators. It is just clear that the project does not want to limit the idea to internal use cases.

The hard part of projects like this

Of course, an attractive direction does not mean the hard problems are already solved.

The real challenge is never simply installing a skill. It is things like:

whether the sources are reliable enough
whether the extracted patterns are stable rather than illusions from scattered text
whether the model is actually using a person’s framework or merely repeating common impressions
whether the boundaries between different personas will blur inside the model

In other words, the key question is not “can it generate something that sounds plausible?” It is “can the cognitive framework produced by this skill survive reuse across many tasks?” If the project keeps going deeper on validation, its credibility will improve a lot.

Why it goes beyond a prompt template library

In the past, many projects handled this kind of capability as a prompt template library: one persona, one prompt, and the user copies it into a chat. The problem is that a template library is still basically a static asset. It updates slowly, validation is weak, and it is hard to turn it into a complete production workflow.

What nuwa-skill pushes further is that it turns “persona distillation” from a template problem into a workflow problem.

Once the center of gravity shifts from “write a prompt” to “systematically generate, validate, and iterate on a persona skill,” the whole thing starts to look more like engineering than inspiration. For anyone who wants to use it over the long term, that is the more important shift.

Closing

nuwa-skill is interesting not because it turns AI into a celebrity impression show, but because it pushes “how to learn how someone thinks” one step closer to something executable, reusable, and iterable.

If many persona prompts solve “how to talk like someone,” what this project wants to solve is “how to look at problems the way someone does.” The former is great for demos. The latter is much closer to a real productivity tool.

References

GitHub repository: https://github.com/alchaincyf/nuwa-skill
Project README: https://github.com/alchaincyf/nuwa-skill/blob/main/README.md
Skill definition: https://github.com/alchaincyf/nuwa-skill/blob/main/SKILL.md

Karpathy's 65-Line CLAUDE.md: Helping AI Coding Avoid Three Common Mistakes

Sun, 19 Apr 2026 18:27:23 +0800

A GitHub project about AI coding has been getting a lot of attention recently. Its core is not a complex codebase, but a roughly 65-line CLAUDE.md file. The reason it attracted so many stars is not technical complexity. It is that it captures problems many people repeatedly run into when using AI to write code.

The background starts with Andrej Karpathy’s observations on AI coding. Karpathy is an influential educator and engineer in AI: a Stanford PhD, an early OpenAI contributor, and a former Tesla AI leader responsible for Autopilot’s vision system. He has continued to share his views on large models, education, and AI tools, so his comments on changes in programming workflows tend to draw a lot of attention from developers.

He once said that after using Claude Code for a few weeks, his programming style changed noticeably. Previously it was roughly 80% handwritten code and 20% AI assistance. Now it is closer to 80% code written by AI and 20% edits by himself. He described it as “programming in English”, telling an LLM what to write through natural language.

But he also pointed out several recurring problems in AI coding.

01 Wrong Assumptions

The first problem is that models easily make assumptions on behalf of the user, then keep writing along that path. They do not always manage their own confusion, and they do not always stop to ask questions when the requirement is ambiguous.

For example, if the user only says “add a user export feature”, the model might assume it should export all users, output JSON, write to a local file, and skip any confirmation around permissions or fields. Only after the code is done does the user discover that the model’s understanding does not match the real scenario.

A better approach is to list the uncertainties first: should it export all users or filtered results? Should it trigger a browser download or run as a background job? Which fields are needed? How large is the data set? Are there permission constraints? If these questions are not clarified, writing faster only means drifting farther.

02 Over-Complexity

The second problem is that models often turn simple problems into complex ones. A task that could be handled with one function might receive abstract classes, strategy patterns, factory patterns, configuration layers, and a pile of extension points that may never be needed.

This kind of code can look engineered, but in practice it increases maintenance cost. AI is especially good at quickly generating large structures, but it does not always judge whether those structures are necessary. The result is that a task solvable in 100 lines becomes inflated into 1,000 lines.

The test is straightforward: would a senior engineer look at the change and think it is over-designed? If the answer is yes, remove the extra layers and solve the current problem with the least code needed.

03 Collateral Damage

The third problem is that models sometimes modify or delete code they do not fully understand. While fixing a small bug, they may casually change comments, reformat nearby code, clean up imports that look unused, or even touch logic unrelated to the current task.

These “drive-by improvements” are risky because they expand the change scope and make review harder. The user may only want to fix a validator crash caused by an empty email, but the model may also enhance email validation, add username validation, and rewrite docstrings. In the end, it becomes hard to tell which line changed behavior.

A safer rule is: only change what must be changed, and only clean up issues caused by your own change. Existing dead code, formatting problems, or historical baggage should not be touched unless the task explicitly asks for it. At most, mention it.

04 Turning Complaints Into CLAUDE.md

After Karpathy’s comments spread widely, developer Forrest Cheung did something clever: he organized these complaints into executable behavior rules and put them into a CLAUDE.md file.

The project does not contain complicated code. Its key idea is to turn the most failure-prone parts of AI coding into clear working rules. They can be summarized as four principles.

The first is to think before writing. Do not silently assume. Do not hide confusion. If a requirement has multiple interpretations, list them. If there is a simpler approach, say so. Ask when clarification is needed, and push back when needed.

The second is to keep things simple. Do not add features that were not requested. Do not abstract one-off code. Do not add unnecessary configuration. Do not write large amounts of defensive code for extremely unlikely scenarios. If 50 lines can solve it, do not write 200.

The third is to make precise changes. Every changed line should trace directly back to the user’s request. Do not improve nearby code as a side quest. Do not refactor something that is not broken. Match the existing project style as much as possible.

The fourth is goal-driven execution. Do not give the model only a vague instruction. Give it a verifiable success criterion. For example, “fix the bug” can become “write a test that reproduces the bug, then make it pass”; “add validation” can become “write invalid-input tests and make them pass”. The clearer the success criterion, the easier it is for the model to loop toward completion.

05 Why It Took Off

This project became popular not because the content is mysterious, but because it is close to real development work.

Many people using AI for coding have seen similar scenes: the model confidently misunderstands the requirement, the code gets more complex as it goes, or it touches places it should not touch. The value of CLAUDE.md is that it turns those experiences into collaboration rules that can be placed inside a project.

The entry cost is also low: one file can start making a difference, with no complicated integration. Combined with Karpathy’s influence and the project’s practical comparison examples, it naturally spread through the Claude Code user base and the broader AI coding community.

More importantly, these rules are not only for Claude Code. No matter which AI coding tool you use, the underlying issues are similar: the model needs to know when to ask, when to simplify, when to stop, and how to decide that the task is complete.

06 What Developers Can Take Away

The lesson for ordinary developers is simple: AI coding is not about throwing one sentence at a model and waiting for a miracle. The effective approach is to give the model boundaries.

When the requirement is unclear, ask it to expose its assumptions first. When the implementation starts getting complicated, ask it to return to the smallest viable solution. When changing code, keep it focused on the task goal. When finishing work, use tests, commands, or explicit checkpoints to verify the result.

AI is already very capable at writing code, but it still needs good collaboration constraints. The fact that a short CLAUDE.md can attract so much attention shows that developers do not only need smarter models. They also need more reliable ways of working.

In short:

Think before writing to reduce wrong assumptions.
Keep things simple to avoid over-design.
Make precise changes to control change scope.
Work toward goals with verifiable success criteria.

These four rules are not complicated, but they are practical. The prerequisite for AI coding to truly improve efficiency is not making the model write more. It is making it write more accurately, with less code, and under better control.

Using Claude Code Quota More Efficiently: Models, Context, Caching, and /compact

Sun, 19 Apr 2026 15:29:06 +0800

Many Claude Code or Claude Max users run into the same problem: even after paying for Pro, Max 5x, or Max 20x, the usage warning appears quickly, or they have to wait for the next reset. This feels especially obvious when Claude Code reads many files, fixes complicated bugs, or runs long tasks in a large project.

The key point is this: usage is not deducted linearly by “minutes.” It depends on the model, context length, attachments, codebase size, conversation history, tool calls, and current capacity. In the same 5-hour window, one person may work for a long time while another hits the limit in minutes. Usually the account is not broken; each request is simply too heavy.

This note collects a set of practical habits for using quota more efficiently.

01 First Understand Claude’s Usage Window

Claude Pro and Max both have usage limits. Claude Code usage is shared with Claude on web, desktop, and mobile under the same subscription quota. Anthropic’s help center explains that message counts depend on message length, attachment size, current conversation length, model or feature used, and that Claude Code usage is also affected by project complexity, codebase size, and auto-accept settings.

A simple way to think about it:

Pro: suitable for light usage and small projects.
Max 5x: suitable for more frequent usage and larger codebases.
Max 20x: suitable for heavier daily collaboration.
Usage windows reset on a 5-hour session basis.
Long messages, long conversations, large files, and complex tasks consume usage faster.
Stronger models such as Opus hit limits faster than Sonnet.

So “I only used it for 20 minutes” does not explain much by itself. What matters is how much context Claude read during those 20 minutes, which model was used, whether large files were processed repeatedly, and whether the same long conversation kept accumulating more tasks.

02 First Habit: Do Not Default to the Most Expensive Model

The Claude model family is commonly positioned like this:

Opus: strongest capability, suitable for complex reasoning, architecture decisions, and hard bugs.
Sonnet: balanced capability and cost, suitable for most everyday coding tasks.
Haiku: lighter, suitable for simple classification, summarization, and format conversion.

For daily scripts, small bug fixes, documentation cleanup, and code explanation, Sonnet is usually enough. Save Opus for cases such as:

Complex architecture design.
Deep multi-file refactors.
Bugs that are hard to reproduce.
Long-chain troubleshooting.
Tasks where the normal model is clearly stuck.

In Claude Code, use /model to switch models, or set the default in /config. A steadier habit is to use Sonnet by default and switch to Opus only at key points, rather than running the whole task on Opus.

03 Second Habit: Control Context, Do Not Drag Old Tasks Along

The longer the context, the more Claude needs to process on each turn, and the faster usage is consumed. The Claude Code docs explicitly recommend proactive context management:

Use /clear when switching to an unrelated task.
Use /compact when one phase is done but important context should remain.
Use /context to see what is taking space.
Configure a status line if you want continuous status visibility.

A useful rhythm:

Small phase done: /compact
Large task done: /clear
Switching to unrelated work: /clear
Context usage getting high: /compact early

/compact summarizes earlier conversation history while preserving key task state, conclusions, file paths, and remaining work. It reduces the amount of history carried into later requests. You can also add a short instruction:

`1`	`/compact Preserve changed files, test results, remaining TODOs, and key design decisions`

Do not wait for automatic compaction. The docs note that Claude Code auto-compacts when context approaches the limit, but manually compacting at phase boundaries is usually easier to control.

04 Third Habit: Long Conversations and Large Files Make Every Request Heavier

Many people assume that “I only asked one more question” should be cheap. But in a long conversation, that question may carry a lot of history, file summaries, tool definitions, and system rules behind it.

Things that easily bloat context include:

Long conversations that are never cleared.
Asking Claude to read entire large files.
Pasting long logs, build output, or test output.
Adding many screenshots or images at once.
Asking it to repeatedly scan the whole repository.
An overly long CLAUDE.md.
Too many MCP servers enabled.

A more efficient approach: paste only key errors from logs, include only failing parts of test output, and let Claude use rg, head, tail, and symbol search before reading only the necessary parts. If command-line filtering can shrink the content, do not paste the whole thing into context.

05 Fourth Habit: Understand Caching, but Do Not Worship It

Anthropic’s Prompt Caching can cache repeated prompt prefixes. The default cache lifetime is 5 minutes, and a 1-hour cache is also supported. When cache hits, large repeated context does not need to be fully reprocessed, which helps reduce cost and improve rate limit utilization.

But caching has limitations:

Content must match exactly, including text and images.
The default cache is short-lived.
Changing models, tools, system prompts, or context structure may reduce cache hits.
Output tokens do not disappear because of caching; the response still needs to be generated.
How Claude Code uses caching is a product-level implementation detail, so do not treat it as permanent “free memory.”

In practice, the important part is not studying every caching detail. It is keeping the session stable:

Avoid frequent model switching within the same phase.
Do not repeatedly rewrite large rule blocks mid-task.
Do not keep adding new images inside the same task.
Do not leave a long task idle for too long and then return with another huge request.
Use /compact at phase boundaries.

This makes repeated context easier to reuse and reduces later request weight.

06 About Peak Hours: Avoid Them When You Can, but Do Not Treat Them as a Formula

People often say certain hours feel tighter. Anthropic’s help center is more careful: message counts can be affected by current Claude capacity, conversation length, attachments, model, and features. In other words, peak capacity can affect the experience, but do not treat a specific local time window as a permanent rule.

Practical suggestions:

Put large refactors and heavy analysis in periods when both your network and the service are stable.
Do not start a huge task right before you plan to step away.
If you expect to leave for a long time, run /compact or /clear first.
For small edits, do not use Opus with a long context unless you really need it.

This is more reliable than memorizing a fixed “do not use it from X to Y” rule.

07 Slim Down CLAUDE.md, rules, MCP, and skills

Claude Code loads project rules, tool information, and some environment context into the session. The official docs also recommend separating general rules from specialized rules so every session does not start with a large amount of unrelated text.

A useful split:

CLAUDE.md: only global rules that always apply.
rules: path-specific or file-type-specific rules.
skills: specific workflows, such as publishing posts, deployment, image generation, or committing code.
MCP: only enable servers that the current task actually needs.

If CLAUDE.md is hundreds or thousands of lines long, every session carries that cost. A better pattern is to move occasional workflows into skills and load them only when needed.

MCP is similar. More tools do not automatically mean more efficiency. The Claude Code docs mention using /mcp to view and disable unnecessary servers, and /context to see what is consuming context space.

08 Practical Command List

These are the most useful daily commands:

/model

Switch models. Sonnet is a good default; use Opus for complex reasoning.

/clear

Clear the current context. Use it when switching to unrelated work.

`1`	`/compact`

Compress conversation history. Use it when a phase is done but the same task continues.

`1`	`/context`

Inspect context usage and find what is taking space.

/status

Check subscription or usage-related status. Anthropic’s help center also recommends monitoring remaining allocation.

/mcp

View and manage MCP servers, and disable tools not needed for the current task.

If you use API billing, /cost can be useful. But for Pro/Max subscriptions, the Claude Code docs explain that the dollar estimate from /cost is not the right billing reference; subscribers should rely more on usage information such as /stats and /status.

09 A Quota-Saving Workflow

A practical workflow looks like this:

Run /clear before starting a new task.
Use Sonnet by default.
Let Claude inspect project structure and key files first, not the whole repository.
Run /compact after each small phase.
Switch to Opus only for hard blockers.
Filter logs, errors, and test output before pasting them.
Run /clear after the task is done; do not start new work with stale context.
Periodically review CLAUDE.md, MCP, and skills to shrink always-on context.

The core idea is simple: let Claude see only what it truly needs for the current task.

10 Summary

Claude Code usage running out quickly is usually not caused by one thing. It is often a combination of high-cost models, long uncleared conversations, too many files and logs, heavy MCP and rule context, weaker cache reuse, and peak capacity fluctuations.

The practical fixes are also simple:

Use Sonnet for daily work.
Save Opus for truly complex problems.
Use /compact when a phase is done.
Use /clear when switching tasks.
Use /context to find context bloat.
Slim down CLAUDE.md, rules, MCP, and skills.
Do not dump the whole repository, full logs, or large image batches into context.

How much work the same Pro or Max plan can support depends heavily on how you manage context. Make the context smaller and task boundaries clearer, and Claude Code will feel much steadier.

References

Claude Help Center: Using Claude Code with your Pro or Max plan: https://support.claude.com/en/articles/11145838-using-claude-code-with-your-pro-or-max-plan
Claude Help Center: About Claude’s Max Plan Usage: https://support.anthropic.com/en/articles/11014257-about-claude-s-max-plan-usage/
Claude Code Docs: Manage costs effectively: https://code.claude.com/docs/en/costs
Anthropic Docs: Prompt caching: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching