MCP on KnightLi Blog

Claude Code has a plugin marketplace now: what you can install, how to install it, and what to watch out for

Sat, 23 May 2026 19:03:30 +0800

anthropics/claude-plugins-official is the official Claude Code plugin directory managed by Anthropic. It is not just a normal code repository. It is a marketplace that Claude Code’s plugin system can use directly, collecting Claude Code plugins maintained or curated by Anthropic.

This repository matters because Claude Code is moving from “an AI coding command-line tool” toward “an extensible development environment.” Plugins can package Skills, Agents, Hooks, MCP servers, LSP servers, background monitors, and default settings so teams and communities can distribute them in a consistent way.

What is this repository?

The README describes it directly: it is a curated directory of high-quality Claude Code plugins.

The directory is mainly split into two parts:

/plugins: plugins developed and maintained internally by Anthropic.
/external_plugins: third-party plugins from partners and the community.

In other words, it contains both official capabilities and curated external ecosystem entries. For regular users, the direct value is that plugins can be discovered and installed through Claude Code’s /plugin system. For developers, it is a useful window into Claude Code’s plugin format and ecosystem direction.

How to install plugins

The README gives a simple installation command. You can install directly through Claude Code’s plugin system:

`1`	`/plugin install {plugin-name}@claude-plugins-official`

You can also open the plugin discovery entry inside Claude Code:

`1`	`/plugin > Discover`

The key part is @claude-plugins-official, which refers to the official plugin marketplace. According to the Claude Code documentation, claude-plugins-official is the official marketplace maintained by Anthropic and is available by default in Claude Code installations.

What does a plugin look like?

The repository README shows a standard plugin structure:

plugin-name/
├── .claude-plugin/
│   └── plugin.json
├── .mcp.json
├── commands/
├── agents/
├── skills/
└── README.md

.claude-plugin/plugin.json is the metadata file, usually declaring the plugin name, description, version, author, and related fields. Other directories are optional and depend on what the plugin provides:

skills/: instructions for skills Claude can invoke automatically.
commands/: slash commands.
agents/: custom agent definitions.
hooks/: event-triggered logic.
.mcp.json: MCP server configuration.
.lsp.json: language server configuration.
monitors/: background monitor configuration.
settings.json: default settings shipped with the plugin.

This means a Claude Code plugin is not one single kind of extension. It is a packaging format. A plugin can be a tiny command, or it can be an entire workflow for a specific stack.

What directions are already in the official directory?

The /plugins directory already covers many development scenarios, including:

LSP plugins: typescript-lsp, pyright-lsp, rust-analyzer-lsp, gopls-lsp, clangd-lsp, csharp-lsp, jdtls-lsp, kotlin-lsp, lua-lsp, php-lsp, ruby-lsp, swift-lsp.
Programming workflows: code-review, feature-dev, code-modernization, code-simplifier, commit-commands, pr-review-toolkit.
Claude Code configuration and plugin development: claude-code-setup, claude-md-management, plugin-dev, skill-creator, mcp-server-dev.
Output styles and specialized capabilities: explanatory-output-style, learning-output-style, security-guidance, session-report, math-olympiad.

The /external_plugins directory points toward more third-party tools and services, such as github, gitlab, linear, asana, firebase, playwright, terraform, context7, serena, telegram, and discord.

Together, these plugins suggest a trend: Claude Code does not only want to edit files. It also wants to connect with code intelligence, project management, cloud services, testing, infrastructure, and team collaboration tools.

Why the plugin system matters

Previously, many Claude Code customizations could live inside a project’s .claude/ directory, such as commands, agents, skills, or hooks. That works for personal workflows or one project, but it is not ideal for reuse across projects or consistent team distribution.

Plugins solve the reuse and distribution problem:

The same configuration can be installed across multiple projects.
Commands and skills are namespaced, reducing conflicts.
Plugins can be published and updated through a marketplace.
Teams can package internal best practices as standard plugins.
The community can maintain extensions for specific frameworks, languages, or services.

This resembles VS Code extensions, JetBrains plugins, or browser extensions. Once a tool has a stable plugin ecosystem, it is no longer just a single product; it starts becoming a platform.

What does it mean for developers?

If you are only a Claude Code user, the most practical use of this repository is finding plugins. For example, if you need LSP support for TypeScript, Python, Rust, or Go, you can first check whether the official directory already has the corresponding plugin. If you need PR review, commit helpers, or code modernization workflows, the official plugins are also a good starting point.

If you develop plugins, this repository is more like a reference library. You can study its directory layout, plugin.json style, README structure, and how Anthropic combines skills, agents, MCP, LSP, and hooks.

The Claude Code documentation also gives a clear guideline: use .claude/ for single-project customization, but turn it into a plugin when you want to share it with a team, reuse it across projects, version releases, or distribute it through a marketplace.

Security boundaries matter

The repository README opens with an important warning: make sure you trust a plugin before installing, updating, or using it. The reason is simple. A plugin may include MCP servers, files, scripts, or other software. Anthropic maintaining the directory does not mean every plugin will behave exactly as expected in your local environment.

In practice, it is worth doing at least a few checks:

Read the plugin homepage and README before installing.
Check whether it includes .mcp.json, hooks, executable scripts, or background monitors.
Be extra careful with plugins that access accounts, code repositories, chat tools, or cloud services.
Test plugins in a sandbox or test repository before enabling them in important projects.
In team environments, review plugin sources and versions centrally.

AI coding plugins often have much higher privileges than ordinary editor themes. They may read project files, call external services, start local commands, or affect commit and deployment flows. Treat the trust boundary more strictly than “installing a small tool.”

Relationship with the community marketplace

The Claude Code documentation says Anthropic maintains two public plugin marketplaces:

claude-plugins-official: a curated set of plugins maintained by Anthropic.
claude-community: a community plugin directory where third-party submissions go through review.

They have different roles. Community plugins can enter the review pipeline through submission forms. The official directory is curated separately by Anthropic, with no public application process. In short, claude-plugins-official is closer to an official curated directory, while claude-community is the open community directory.

Summary

The significance of anthropics/claude-plugins-official is not merely that another GitHub repository exists. It shows Claude Code’s extension mechanism becoming platform-like: Skills, Agents, Hooks, MCP, LSP, background monitors, and default settings can now be packaged, installed, updated, and distributed.

For individual developers, the official plugin directory can lower the cost of configuring Claude Code. For teams, it offers a way to standardize internal workflows. For plugin developers, it shows the plugin structure and ecosystem direction Anthropic is endorsing.

The next thing to watch is not just any single plugin, but whether the Claude Code plugin ecosystem forms stable layers: official curated plugins, community plugins, private team marketplaces, and specialized extensions for mainstream languages, frameworks, and SaaS services. If that path works, Claude Code will look more and more like a programmable AI development platform, not just a command-line assistant.

References:

GitHub project: https://github.com/anthropics/claude-plugins-official
Claude Code plugin documentation: https://code.claude.com/docs/en/plugins

Graphify Solves Claude Code's Biggest Limitation: Turning a Codebase into an AI-Queryable Knowledge Graph

Thu, 21 May 2026 08:02:32 +0800

safishamsi/graphify is a knowledge graph tool for AI coding assistants. Its goal is direct: take the code, docs, SQL schemas, scripts, papers, images, video, and audio inside a project folder, turn them into a queryable knowledge graph, and stop AI assistants from relying only on grep, full-file reading, or ad hoc search to understand a project.

Project link: safishamsi/graphify

At the time of writing, the GitHub page shows about 50.2k stars and 5.4k forks, with an MIT license. The README describes it like this: type /graphify inside your AI coding assistant, and it maps the entire project into a queryable knowledge graph.

The Core Problem It Solves

AI coding assistants are becoming stronger, but in real codebases they still frequently run into several problems:

They do not know how key modules connect.
They read many files but do not form an overall architecture map.
Search finds text, but not upstream and downstream dependencies.
Code, database schemas, docs, and infrastructure configuration are scattered across different places.
In team collaboration, each person may have a different mental model of the project structure.

Graphify tries to add a “memory layer” to the project. It connects code entities, documentation concepts, database tables, configuration, design notes, and cross-file relationships so the AI assistant can query the graph instead of scanning files from scratch every time.

Minimal Usage

Graphify’s minimal workflow is simple. After installation, type this inside your AI coding assistant:

`1`	`/graphify .`

In PowerShell, the leading / is treated as a path separator, so on Windows PowerShell use:

`1`	`graphify .`

After running, it generates a graphify-out/ directory with three core files:

graphify-out/
├── graph.html
├── GRAPH_REPORT.md
└── graph.json

These files serve different purposes:

graph.html: an interactive graph you can open in a browser, with clickable nodes, filters, and search.
GRAPH_REPORT.md: highlights, key concepts, surprising connections, and suggested questions.
graph.json: the full graph, which can be queried later without rereading all files.

To generate a more readable architecture page with Mermaid call-flow diagrams, run:

`1`	`graphify export callflow-html`

Installation and Platform Support

Graphify’s PyPI package name is graphifyy, with a double y. The README specifically warns that other graphify* packages on PyPI are not affiliated with the project, although the CLI command is still graphify.

The recommended installation method is:

`1`	`uv tool install graphifyy`

Alternatives:

1
2

pipx install graphifyy
pip install graphifyy

Then register it with your AI assistant:

`1`	`graphify install`

The project supports many platforms, including Claude Code, Codex, OpenCode, GitHub Copilot CLI, VS Code Copilot Chat, Aider, Cursor, Gemini CLI, Kimi Code, Kiro, and Google Antigravity. Different platforms can use different install commands, for example:

graphify install --platform codex
graphify install --platform gemini
graphify cursor install
graphify antigravity install

Codex users also need to add this under [features] in ~/.codex/config.toml:

`1`	`multi_agent = true`

The README also notes that Codex uses $graphify, not /graphify.

What Files It Handles

Graphify supports a wide range of input types.

For code, it supports 31 languages, including Python, TypeScript, JavaScript, Go, Rust, Java, C/C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Lua, Zig, PowerShell, SQL, Shell, JSON, and more.

For documents, it supports:

.md
.mdx
.qmd
.html
.txt
.rst
.yaml
.yml

Optional dependencies extend it further:

pip install "graphifyy[pdf]"
pip install "graphifyy[office]"
pip install "graphifyy[video]"
pip install "graphifyy[mcp]"
pip install "graphifyy[neo4j]"
pip install "graphifyy[sql]"
pip install "graphifyy[all]"

Here, pdf is for PDF extraction, office for .docx and .xlsx, video for video and audio transcription, mcp for an MCP stdio server, neo4j for pushing to Neo4j, and sql for SQL schema extraction.

Why the Report Matters

GRAPH_REPORT.md is not a normal summary. It extracts relationships inside the project that are especially useful for AI assistants.

The README mentions report contents such as:

God nodes: the most-connected core concepts in the project.
Surprising connections: unexpected links across files or modules.
The why: design rationale extracted from comments, docstrings, and design docs.
Suggested questions: questions the graph is particularly well suited to answer.
Confidence tags: relationships are labeled EXTRACTED, INFERRED, or AMBIGUOUS.

This is important. Normal search only tells you “where this word appears.” A graph can answer “which modules, configs, tables, and docs this concept is connected to.” For large codebases, that is closer to architecture understanding than simple full-text search.

Common Commands

Common Graphify commands include:

/graphify .
/graphify ./docs --update
/graphify . --cluster-only
/graphify . --no-viz
/graphify . --wiki
graphify export callflow-html
/graphify query "what connects auth to the database?"
/graphify path "UserService" "DatabasePool"
/graphify explain "RateLimiter"

You can also add a paper or video to the graph:

1
2

/graphify add https://arxiv.org/abs/1706.03762
/graphify add <youtube-url>

For PR-assisted analysis:

graphify prs
graphify prs 42
graphify prs --triage
graphify prs --conflicts

These commands fit code review scenarios: identify which graph communities a PR affects, whether it risks conflicts with other PRs, and which review queues deserve priority.

MCP, Neo4j, and CI

Graphify is not only an HTML graph generator. It can also expose the graph to AI assistants for repeated tool use.

For example, start an MCP server:

`1`	`python -m graphify.serve graphify-out/graph.json`

The MCP server provides capabilities such as query_graph, get_node, get_neighbors, shortest_path, list_prs, get_pr_impact, and triage_prs.

It also supports Neo4j export or push:

1
2

/graphify ./raw --neo4j
/graphify ./raw --neo4j-push bolt://localhost:7687

For team collaboration, the README suggests committing graphify-out/ so everyone on the team starts with the same project map. You can also run:

`1`	`graphify hook install`

This rebuilds the graph after each git commit and sets up a merge driver so graph.json does not get left with conflict markers when multiple people commit in parallel.

Privacy and Cost

Graphify’s README is fairly clear about privacy boundaries.

Code files are parsed locally through tree-sitter and do not trigger API calls. Video and audio can be transcribed locally with faster-whisper. Docs, PDFs, and images used for semantic extraction go through your AI assistant’s model API.

For headless graphify extract, you may need these environment variables:

ANTHROPIC_API_KEY
GEMINI_API_KEY
GOOGLE_API_KEY
OPENAI_API_KEY
DEEPSEEK_API_KEY
MOONSHOT_API_KEY
OLLAMA_BASE_URL

Local Ollama, AWS Bedrock, and Claude Code CLI can also be used as backends. The README also states that the project has no telemetry, usage tracking, or analytics.

In practice, remember that local code parsing does not mean everything stays offline. When docs, PDFs, images, or cloud models are involved, you still need to consider the backend, API keys, enterprise compliance, and data boundaries.

Suitable Scenarios

Graphify is suitable for several types of users:

Developers who want Claude Code, Codex, Cursor, and Gemini CLI to better understand project structure.
People who need to quickly understand a large unfamiliar codebase.
Teams that need to analyze code, SQL schemas, docs, and configuration together.
People doing architecture review, PR review, or refactor impact analysis.
Teams that want to expose project knowledge as an MCP tool for Agents.
Technical leads who want to keep a “project map” for the team.

It is not necessary for every project. For small scripts, one-off demos, or very simple repositories, normal search and README files may be enough. Graphify’s value shows up more clearly in projects with many modules, many docs, team collaboration, and frequent AI assistant involvement.

Summary

Graphify matters because it moves AI coding assistants from “temporarily reading files” toward “a long-lived, queryable project knowledge graph.”

For developers, it does not replace the IDE, search, or LSP. It adds a structured memory layer for AI assistants: which modules matter, which concepts are tightly connected, which docs explain design rationale, and which communities a PR may affect. As Codex, Claude Code, Gemini CLI, Antigravity, and similar Agent tools become more common, this kind of project graph layer will become increasingly useful.

References:

GitHub: safishamsi/graphify

agentmemory: Persistent Memory for Claude Code, Codex, Cursor, and Other Coding Agents

Tue, 19 May 2026 10:56:50 +0800

rohitg00/agentmemory is a persistent memory system for AI coding agents. Its goal is straightforward: Claude Code, Codex CLI, Cursor, Gemini CLI, OpenCode, and similar tools should not have to relearn the project background, architecture decisions, and historical problems every time a new session starts.

Project URL: https://github.com/rohitg00/agentmemory

At the time of writing, the GitHub API showed about 13k stars, TypeScript as the main language, and an Apache-2.0 license. The README describes it as “Persistent memory for AI coding agents.”

What Problem Does It Solve

A common pain point for coding agents is memory fragmentation. You may ask an agent to fix an authentication issue today, then open a new conversation tomorrow, and it no longer knows:

Why a certain architecture decision was made.
Which files are sensitive and should be changed carefully.
What bugs were fixed before.
What commands, tools, or local services the project uses.
Which conventions the team follows.

Static notes help, but they are often forgotten or not connected to the active workflow. agentmemory tries to provide a shared memory layer that can be used across different AI coding tools.

Supported Agents

The README lists support for Claude Code, Codex CLI, Cursor, Gemini CLI, OpenCode, and other MCP-compatible tools. The core idea is to expose memory through a local service, MCP, hooks, and integrations, so multiple assistants can share the same project context.

This is especially useful for teams that switch between tools. One developer may use Cursor, another may use Claude Code, while automation runs through Codex CLI. A shared memory layer reduces repeated explanation.

Quick Start

Install globally:

npm install -g @agentmemory/agentmemory
agentmemory
agentmemory demo
agentmemory connect claude-code

Or run with npx:

`1`	`npx @agentmemory/agentmemory`

The local service is available at:

`1`	`http://localhost:3113`

In practice, the first step is usually to start the memory service, connect the coding assistant, and then let the agent read or write project memories during development.

How It Differs From Static Memory Files

Many teams already maintain AGENTS.md, CLAUDE.md, README notes, or local documentation. These files are useful, but they are static. They do not automatically capture session history, task outcomes, or recurring decisions.

agentmemory is closer to a persistent context service. It can store and surface memories that are relevant to the current project or task. The goal is not to replace documentation, but to make working context easier to reuse.

Typical Scenarios

Useful scenarios include:

Remembering project setup steps and common commands.
Recording why a risky refactor was avoided.
Keeping notes about flaky tests or local services.
Sharing domain terminology across coding assistants.
Helping agents continue work after a new session starts.

This is particularly valuable for long-running products, monorepos, and projects with many hidden conventions.

Things To Watch Out For

First, memory quality matters. If old or wrong information is written into memory, future agents may repeat the mistake. Teams should keep important memories short, clear, and reviewable.

Second, privacy matters. Do not store secrets, API keys, customer data, or sensitive production information in a memory system unless the security model is clear.

Third, memory is not a substitute for tests. It helps agents understand context, but the final guarantee still comes from code review, tests, and verification.

Who It Is For

agentmemory is suitable for developers who use multiple AI coding tools, teams working on large codebases, and users who often need agents to continue previous work. It is less necessary for very small one-off scripts.

Summary

agentmemory is interesting because it treats memory as infrastructure for AI coding, not as a small prompt trick. If coding agents are becoming part of daily development, persistent project memory is a practical missing piece.

Let AI Operate Your Computer? UI-TARS-desktop Connects Desktop, Browser, and Tools

Tue, 19 May 2026 10:56:50 +0800

bytedance/UI-TARS-desktop is ByteDance’s open source multimodal AI agent project. It is not just a single desktop app, but an agent stack. The current README mainly contains two directions: Agent TARS and UI-TARS Desktop.

Project URL: https://github.com/bytedance/UI-TARS-desktop

Official site: https://agent-tars.com

At the time of writing, the GitHub API showed about 34k stars, TypeScript as the main language, and an Apache-2.0 license. The README describes it as an “Open-Source Multimodal AI Agent Stack.”

Difference Between Agent TARS and UI-TARS Desktop

The README places the two projects in one comparison table:

Agent TARS: a general multimodal AI agent stack that connects GUI agents, vision, terminal, browser, and product workflows.
UI-TARS Desktop: a desktop application based on UI-TARS models, providing native GUI agent capabilities for operating local or remote computers and browsers.

Simply put, Agent TARS is more like a general agent runtime, while UI-TARS Desktop is the desktop GUI operation entry point.

What Agent TARS Can Do

Agent TARS mainly provides a CLI and Web UI. Its goal is to let multimodal models complete task flows closer to human operation through MCP and various tools.

Core capabilities listed in the README include:

One-command CLI startup, supporting headful Web UI and headless server.
Hybrid browser agent control through GUI Agent, DOM, or mixed strategies.
Event Stream for tracing and debugging data flows.
MCP integration for mounting MCP Servers and real tools.

Quick start:

`1`	`npx @agent-tars/cli@latest`

Global installation:

`1`	`npm install @agent-tars/cli@latest -g`

Run with a model provider:

1
2

agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key

What UI-TARS Desktop Can Do

UI-TARS Desktop is a desktop GUI Agent. Based on UI-TARS and Seed-1.5-VL / 1.6 model families, it focuses on letting the model understand the screen and execute mouse and keyboard operations.

Capabilities listed in the README include:

Natural language control.
Screenshots and visual recognition.
Precise mouse and keyboard control.
Cross-platform support for Windows, macOS, and browsers.
Real-time feedback and status display.
Local processing with an emphasis on privacy and security.

Example tasks include changing VS Code settings, checking GitHub issues, and operating remote computers or browsers.

Why GUI Agents Matter

Traditional automation depends on APIs, DOM, or scripts. A GUI Agent starts from the interface: it sees buttons, input boxes, menus, and state, then operates through mouse and keyboard.

This has two values. First, many applications do not have stable APIs, or APIs do not cover the full workflow. A GUI Agent can interact from the same surface a human uses.

Second, multimodal models can handle screenshots, documents, web pages, and app interfaces, combining visual understanding with execution.

The limitation is also clear. GUI operations are affected by resolution, language, layout changes, pop-ups, and network latency. Production workflows still need permission control, confirmation steps, and rollback plans.

Relationship With MCP

Agent TARS emphasizes MCP integration. MCP is useful because it gives agents a unified way to call browsers, files, command lines, databases, internal services, and other tools.

For complex tasks, GUI clicking alone is not stable enough. A better pattern is often:

Use APIs where APIs are available.
Use vision when page state must be understood.
Use browser control when real web interaction is needed.
Use GUI Agent when local software must be operated.

Projects like UI-TARS-desktop are exploring how to place these capabilities in one agent stack.

What To Watch Out For

First, desktop agents have execution risk. They can operate mouse, keyboard, and browser, so permissions must be limited to avoid accidental file changes, account operations, payment, or production system actions.

Second, remote computer and remote browser control needs a clear security boundary. Do not expose unauthenticated control endpoints to the public internet.

Third, multimodal models can misread interfaces. Critical operations should require human confirmation, especially delete, submit, pay, publish, trade, or other irreversible actions.

Who It Is For

UI-TARS-desktop is suitable for developers exploring GUI agents, teams building AI assistants for desktop workflows, and researchers comparing browser, DOM, MCP, and visual-control strategies. It is not a simple consumer assistant yet.

Summary

UI-TARS-desktop is worth watching because it moves AI agents from “answering in chat” toward “seeing the screen and operating tools.” Its value is not only in desktop control, but in combining GUI, browser, terminal, and MCP capabilities in one stack.

Too Many Platforms to Post To? AiToEarn Wants AI Agents to Help Creators Save Time

Tue, 19 May 2026 10:56:50 +0800

yikart/AiToEarn is an AI content marketing project for creators, brands, and one-person companies. It tries to put content creation, publishing, engagement, and monetization into one agent workflow, covering platforms such as Douyin, Xiaohongshu, Kuaishou, Bilibili, WeChat Channels, TikTok, YouTube, Facebook, Instagram, Threads, X, Pinterest, and LinkedIn.

Project URL: https://github.com/yikart/AiToEarn

Official site: https://aitoearn.ai/

At the time of writing, the GitHub API showed about 15k stars, TypeScript as the main language, and an MIT license. The README describes it as a content marketing agent platform for OPCs, creators, brands, and enterprises.

Positioning

AiToEarn is not just a copywriting generator or a scheduled posting tool. It breaks content marketing into four agent capabilities:

Monetize: content monetization.
Publish: cross-platform content publishing.
Engage: content interaction and community operations.
Create: content creation.

That positioning fits the current creator workflow. The hard part for many teams is not only “can AI write a post”, but what happens after that: scheduling, distribution, replies, review, and connecting content to business tasks.

Core Features

Monetize: Making Money From Content

AiToEarn provides monetization capabilities around promotional tasks. The README mentions three settlement models:

Model	Full name	Meaning
CPS	Cost Per Sale	Settlement by sales
CPE	Cost Per Engagement	Settlement by engagement
CPM	Cost Per Mille	Settlement by impressions or views

This part is closer to a content task marketplace that connects brand promotion needs with creator distribution.

Publish: Content Publishing Agent

Publish distributes content across multiple platforms and reduces the repeated work of posting manually. The README covers mainstream short video, graphic, and social platforms in China and overseas.

Its practical value is unified scheduling and management. For account matrices, cross-platform distribution, and global content teams, this is often more useful than a single AI copywriting feature.

Engage: Content Engagement Agent

Engage uses a browser extension to support automated engagement operations such as likes, saves, follows, comment replies, and brand monitoring.

This capability should be used carefully. Automated engagement can trigger platform risk controls, so teams need to check account permissions, frequency limits, platform terms, and internal compliance rules.

Create: Content Creation Agent

Create handles content generation. The README mentions video generation models, video translation, video editing, image generation, and batch creation tasks.

This is useful for large-scale content production, but human review is still necessary. Brand content, ad materials, and multilingual assets need factual accuracy, copyright checks, and tone consistency.

Five Ways To Use It

Method	Best for	Deployment needed
Use the website directly	All users	No
Use it in OpenClaw	OpenClaw users	No
Use it in Claude / Cursor and other AI assistants	AI tool users	No
One-click Docker deployment	Teams that want self-hosting	Server needed
Source development	Developers	Development environment needed

MCP support is a notable point. It means Claude, Cursor, or other MCP-compatible agents can call AiToEarn as an external capability.

A common MCP configuration contains:

1
2

MCP URL: https://aitoearn.ai/api/unified/mcp
Auth Header: x-api-key: your-API-Key

Self-hosted users should replace it with their own service URL.

Docker Deployment

The README provides a Docker deployment path:

1
2
3

git clone https://github.com/yikart/AiToEarn.git
cd AiToEarn
docker compose up -d

Then visit:

`1`	`http://localhost:8080`

For teams that care about data control, private deployment, or custom workflows, Docker is more practical than only using the hosted website.

Who It Is For

AiToEarn is suitable for creators who publish across many platforms, small teams running content operations, one-person companies, brands that need creator collaboration, and developers who want to connect content workflows to AI agents.

It is less suitable if you only need a simple text generator. Its value is in connecting creation, publishing, engagement, and monetization.

Notes Before Use

First, automated posting and engagement must respect platform rules. A tool can improve efficiency, but it cannot remove the need for account safety and compliance.

Second, generated content still needs human review. Ads, brand posts, and cross-language content can all carry factual, copyright, or tone risks.

Third, monetization features involve commercial tasks, so settlement rules, disclosure requirements, and platform policies should be checked before use.

Summary

AiToEarn is worth watching because it treats content operations as a workflow, not just a writing task. For creators and small teams, the attractive part is saving repeated work across platforms. For developers, the interesting part is MCP and agent integration.

Claude Code Token-Saving Guide: How Models, MCP, CLAUDE.md, and Skills Affect Cache

Mon, 18 May 2026 18:30:24 +0800

In long Claude Code tasks, Prompt Cache hit rate directly affects cost and speed. Many users know that caching can save tokens, but not which actions make the cache suddenly miss.

The simplest mental model is a left-to-right context chain:

`1`	`tools -> system -> CLAUDE.md / skills -> messages`

The farther left something sits, the more stable it should be and the larger the cache benefit. If a left-side section changes, everything after it may need to be recalculated. If a right-side section changes, the impact is smaller.

So optimizing Prompt Cache in Claude Code is not guesswork. The rule is simple: before a task begins, prepare the model, MCP servers, Skills, CLAUDE.md, and other base context. Once the task starts, change as little of that fixed context as possible.

Prompt Cache does not cache plain text

Prompt Cache is not just a string cache for prompts. In Transformer inference, what matters is the Key/Value state calculated by attention layers from the prefix context, usually called KV cache.

That means two things:

If the prefix stays stable, part of the previous computation can be reused.
If the model, tool definitions, system prompt, or prefix messages change, old cache entries may no longer match.

Anthropic’s documentation summarizes the invalidation hierarchy as tools -> system -> messages. Changes to tool definitions can invalidate the whole cache; system changes affect system and messages; message changes mainly affect message cache.

Claude Code adds more context sources such as CLAUDE.md, Skills, MCP, plugins, and subagents, so it is easier to accidentally break cache reuse.

Cache killer 1: switching models mid-task

Switching models is one of the most expensive changes.

Prompt Cache is isolated by model. Opus, Sonnet, and Haiku have different architectures and weights, so the KV cache calculated from the same text is not interchangeable. If you build a long context in Opus and then switch to Sonnet, Sonnet cannot reuse Opus’s cache.

This creates a counterintuitive result: switching models mid-task to save money may make the previous cache useless. Context that could have been read at cache-read price may need to be written and computed again.

A steadier pattern is:

Keep the main conversation on one model.
Use a subagent for side tasks that can run on a cheaper model.
Let the side agent search, explore, or summarize, then hand a concise result back to the main conversation.

This keeps the long main-context prefix stable and improves cache hit consistency.

Cache killer 2: adding MCP or reloading plugins mid-task

MCP provides tools to Claude Code. When you add an MCP server, the tool list changes, and tool definitions sit at the far left of the context chain.

From a Prompt Cache perspective, when the tool list changes, the system and messages that follow may need to be recalculated. If you use many MCP servers, the tool definitions themselves can be large, so the cost of invalidation becomes obvious.

One detail matters: Claude Code usually reads MCP configuration at session startup. Changing config mid-session may not affect the current session immediately. The dangerous moments are restart, resume, plugin reload, or anything that rebuilds the tool list.

Recommended practice:

Install required MCP servers before starting a long task.
Avoid discovering missing tools halfway through and then reloading.
Reduce default-enabled MCP servers when possible.
Do not keep rarely used MCP servers always enabled.

Stable tool definitions are the foundation of stable Prompt Cache hits.

Cache killer 3: editing CLAUDE.md mid-session

CLAUDE.md is Claude Code’s project memory file. It is useful for build commands, test commands, architecture conventions, code style, and project-specific constraints.

It is helpful, but it also enters the context. Claude’s help documentation explains that CLAUDE.md is read at session start and delivered as a user message. It also benefits from Anthropic Prompt Cache: the first request pays full input price, while later requests can hit the lower cache-read price if the cache is still valid.

The catch is that CLAUDE.md is content-addressed. Once the file changes, the old cache no longer matches.

So avoid frequently editing CLAUDE.md during a long task. Better practices:

Check whether CLAUDE.md is sufficient before the task starts.
Put stable rules in the file and temporary instructions in the current conversation.
Do not edit long-term memory for one-off instructions.
If you must change it, treat the next stage as a new session or new phase.

CLAUDE.md should be stable project guidance, not a scratchpad that changes every round.

Cache killer 4: installing or updating Skills mid-task

Skills are also part of the context. Installing a new Skill, updating a Skill, or changing the Skill list changes what gets injected into the session.

These changes often do not fully take effect until reload, resume, or a new session. Once messages are rebuilt, old cache entries may no longer match.

The same advice applies:

Decide which Skills are needed before starting.
Keep the Skill set stable for the same kind of task.
Avoid installing Skills in the middle of a long task.
If you install a new Skill, treat it as the beginning of a new stage.

For repeatable workflows such as content production, review, deployment, and translation, keeping a fixed Skill set helps keep the context structure stable.

Cache killer 5: idle time exceeding TTL

Prompt Cache does not last forever. A common default TTL is on the order of minutes, and Claude Code-related documentation often refers to roughly a five-minute cache window. After TTL expires, even the same request may need to rebuild the cache.

This explains a common feeling in long tasks: everything was cheap and fast, then after a coffee break the token cost jumps again.

Long tasks hit this easily. You may review Claude Code output, inspect files, run tests, or think about the next step. Five minutes can disappear quickly.

If your environment supports it, you can request a one-hour Prompt Cache TTL before long tasks:

`1`	`export ENABLE_PROMPT_CACHING_1H=1`

In Windows PowerShell:

`1`	`$env:ENABLE_PROMPT_CACHING_1H="1"`

One-hour cache writes usually cost more than five-minute cache writes. It is not always worth it for short tasks, but for large codebases, long conversations, and complex multi-step development, it may be cheaper than repeated cache expiration.

A token-saving Claude Code workflow

A steadier long-task setup looks like this:

Choose the model before the task starts and avoid frequent switching.
Enable the MCP servers you need and disable the ones you do not.
Keep CLAUDE.md short, stable, and focused on durable rules.
Prepare the Skills needed for this task in advance.
For complex tasks, consider one-hour TTL.
Split the task into phases, but keep context structure stable within each phase.
Use subagents or separate sessions for side exploration instead of disturbing the main conversation.

The goal is not to prevent every cache miss. It is to avoid the high-cost misses that are easy to overlook.

A simple rule of thumb

Ask one question:

Does this operation change the model, tool definitions, system context, or fixed messages near the start of the session?

If yes, it probably affects Prompt Cache. The farther left it is in the context chain, the greater the impact.

Common operations:

Switch model: high risk, model caches are isolated.
Add MCP or reload plugins: high risk, tool list changes.
Edit CLAUDE.md: medium-high risk, project memory changes.
Install Skills: medium-high risk, injected context changes.
Continue normal conversation: low risk, mostly appends messages.
Idle past TTL: high risk, server-side cache expires.

Summary

Prompt Cache optimization in Claude Code is about keeping the session prefix stable.

Do not switch models casually. Do not install MCP servers and Skills halfway through. Do not use CLAUDE.md as a temporary scratchpad. For complex tasks, consider a longer TTL. Once these basics are stable, token cost and response speed become much more predictable.

The most practical sentence is: configure before you start, change less after you start.

References

easy-vibe: A Learning Map for Vibe Coding Beginners

Sat, 16 May 2026 22:44:43 +0800

easy-vibe is an open source Vibe Coding learning project from Datawhale. It is not aimed at developers who are already fluent with AI coding tools. It is aimed at students, product managers, designers, operators, indie developers, and technical hobbyists who are just starting with Vibe Coding.

The value of this project is not that it lists another batch of AI tools. It turns “how to start building projects with AI” into a learning path that is easier to understand. For many beginners, the hard part is not knowing that Claude Code, Cursor, MCP, or Agents exist. The hard part is knowing what to learn first, how to practice, and when to move into more advanced tools.

Beginners Need a Path Most

Vibe Coding has become popular in recent years, but it is not very friendly to beginners.

On the surface, as long as you can describe a requirement, you can ask AI to write code. In reality, as soon as the task becomes slightly more complex, problems appear: the requirement is unclear, the model edits the wrong file, the project structure is confusing, errors are hard to handle, dependencies fail to install, prompts become messier, and the workflow falls back to “copy code into a chat box”.

So getting started with Vibe Coding cannot only mean learning “how to write prompts”. It needs to solve several things:

How to split an idea into executable tasks;
How to let AI understand a project structure;
How to read code generated by the model;
How to handle errors and iterate;
How to use the terminal and local development environment;
How to move from web chat to real AI coding tools.

This is where easy-vibe matters: it tries to organize these topics into a learning route, instead of leaving beginners lost among tools, tutorials, and terminology.

It Is a Roadmap, Not a Single Tutorial

According to the project description, easy-vibe covers basic tutorials, interactive exercises, visual content, RAG, terminal tools, AI coding tools, and more advanced topics such as Claude Code, MCP, Skills, and Agent Teams.

This structure is suitable for beginners because AI coding is not a single skill. It is a combination of abilities:

Describing requirements;
Splitting tasks;
Reading projects;
Asking the model to edit code;
Running and verifying results;
Iterating based on errors;
Turning repeated workflows into tools or skills.

If you only learn one tool, it is easy to be constrained by that tool’s interface. Switch models, editors, or CLIs, and the workflow becomes unclear again. A roadmap helps build the working method first, then places tools where they belong.

Especially Useful for Non-Programmers

The biggest appeal of Vibe Coding is that it lets non-professional programmers build prototypes.

Product managers can turn product ideas into interactive demos. Designers can validate interaction logic. Operators can write internal tools. Students can quickly build course projects. Founders can validate demand early. These people do not necessarily need to become full-time engineers in the traditional sense, but they do need a method for “letting AI help me turn ideas into working things”.

This is also why easy-vibe fits the Chinese community. Many Chinese users already know AI can write code, but they still lack systematic beginner materials. Development environment, prompts, project structure, debugging methods, and Agent tools are easier to learn when explained clearly in Chinese and paired with exercises.

For these users, the most important thing is not to learn a complex framework immediately. It is to complete a full loop first: propose a requirement, generate a project, run it, find problems, keep modifying, and finally get a usable version.

The Advanced Part Moves Toward Real AI Development Workflows

The Claude Code, MCP, Skills, and Agent Teams mentioned in easy-vibe are no longer just beginner concepts.

Claude Code represents terminal coding Agents: the model can enter a local project, read files, edit code, and run commands. MCP solves tool and data source integration, so the model is not trapped in a chat box. Skills preserve reusable workflows, such as fixed project generation, document organization, test checks, or content production processes. Agent Teams further split tasks across multiple agents.

These topics may feel distant for beginners, but they are worth understanding early. The direction of Vibe Coding is already clear: from “let AI write a piece of code” to “let AI participate in a complete project workflow”.

If a learning route stops at prompts, it will quickly fall behind tool evolution. On the other hand, if every advanced concept is thrown at beginners immediately, they will not know where to start. The useful part of easy-vibe is that it places these topics on a gradual upgrade path.

Two Mistakes to Avoid

The first mistake is thinking that Vibe Coding means you can ignore code entirely.

AI can generate a lot, but the user still needs to judge whether the result is correct. At minimum, you need to understand the project structure, know how to run it, and roughly know where an error is happening. Even if you do not write complex code, you still need basic engineering common sense.

The second mistake is thinking that more advanced tools are always better.

Beginners do not necessarily need Claude Code, MCP, or multiple Agents at the start. A better order is to first build a feedback loop with simple projects, then gradually introduce the terminal, version control, testing, tool calling, and automated workflows. Tools should match task complexity; otherwise they look powerful but have no clear use.

How to Use It

If you are just starting with Vibe Coding, you can use easy-vibe as a learning checklist.

Start with basic concepts and simple exercises. Do not rush to chase every tool. Build a small project, such as a personal homepage, data dashboard, form tool, automation script, or knowledge base demo. During the process, observe where AI helps and where you still need to confirm things yourself.

Once you can complete small projects consistently, move into more complex topics:

Use terminal tools to work with local projects;
Use Git to manage each change;
Use RAG to connect your own materials;
Use MCP to connect external tools;
Use Skills to solidify repeated workflows;
Use Agent Teams to split complex tasks.

Learning Vibe Coding this way is not just learning to ask AI. It is learning to put AI into your own workflow.

Conclusion

easy-vibe is best seen as a Chinese learning map for Vibe Coding. It organizes scattered AI coding concepts, tools, and exercises into a route that helps beginners move from “I heard AI can write code” to “I can build a project with AI”.

The real value of Vibe Coding is not that it lets people skip all learning. It lowers the threshold from idea to prototype. You still need to understand requirements, organize tasks, verify results, and control risks. But many repetitive, tedious, and blocking steps can be handled with AI assistance.

If you want a systematic entry point into AI coding, without getting trapped immediately in tool names and complex engineering setup, easy-vibe is a good place to start.

Anthropic financial-services: Reusable Templates for Financial Agents

Sat, 16 May 2026 22:43:08 +0800

anthropics/financial-services is a reference project from Anthropic for the financial services industry. It is not a single application, but a set of examples that can be studied and reused separately: Agents, Plugins, Skills, MCP connectors, and prompts and integration patterns designed around financial workflows.

This project is worth watching not because it provides a “universal financial assistant”, but because it breaks common AI implementation problems in finance into more concrete components: what kind of Agent each role needs, which data sources need to be connected, which tasks can be automated, and which steps still require human judgment.

It Is More Like a Showroom for Financial Agents

When companies talk about AI Agents, the discussion can easily stay abstract: reading files, querying data, writing reports, and calling tools. Once the scenario enters finance, the questions become much more specific.

Investment banking analysts need to organize company materials, generate transaction briefs, and compare comparable companies. Equity research needs to read filings, follow news, perform valuation, and analyze risks. Private equity and asset management teams need to screen deals, write memos, and track portfolio companies. Wealth management needs to place client profiles, market information, and investment advice within a compliance framework.

These scenarios cannot be handled by a generic chat box alone. They require roles, processes, data sources, output formats, and permission boundaries. The value of this Anthropic repository is that it turns multiple typical financial services roles and tasks into Agent templates that can be used as references.

Why Provide Agents, Plugins, Skills, and MCP Together

Judging from the project structure, Anthropic did not only provide a set of prompts. It provides several kinds of components at the same time. This maps to several layers of enterprise Agent implementation.

Agents are more like work units for roles or tasks. They define what the agent should do, how it should do it, when to call tools, and how to produce output.

Plugins are more like external capability extensions. Financial work rarely happens only inside the model. It often needs to connect databases, document systems, market data, CRM, research libraries, and internal workflow systems.

Skills are reusable professional capability packages. Fixed analysis frameworks, report structures, checklists, and data processing methods can be turned into skills instead of being rewritten as prompts every time.

MCP connectors solve tool integration and context standardization. For enterprises, the more tools there are, the more they need a relatively unified way to connect them. Otherwise every system needs separate adaptation, and maintenance cost rises quickly.

Only when these pieces are combined does the result begin to resemble a real enterprise AI workflow.

Why Finance Is a Good Industry for Agent Examples

Financial services is a good industry for showing Agents because it has three traits at the same time.

First, information density is high. Financial work relies heavily on filings, announcements, meeting notes, research reports, trading data, client records, and regulatory documents. If a model only relies on general knowledge, it quickly becomes ineffective. It must connect to real data sources.

Second, output formats are stable. Investment memos, company profiles, KYC documents, research summaries, client briefings, and fund operation reports all have relatively fixed structures. This makes it easier for Agents to form verifiable workflows.

Third, risk boundaries are clear. Finance has strict requirements for compliance, auditability, permissions, and traceability. AI cannot casually provide investment advice or bypass approval processes. This forces Agent design to become more engineering-driven: keep references, separate facts from inferences, record tool calls, and limit executable actions.

That means this project is not only for financial companies. Any team building enterprise Agents can use it to observe how Anthropic decomposes industry scenarios.

What Typical Workflows It Covers

According to the project description, the repository covers several financial services areas, including:

Investment banking;
Equity research;
Private equity;
Wealth management;
Fund operations;
KYC and compliance-related workflows.

These workflows have one thing in common: they all require a lot of reading, organizing, comparison, and structured document generation. The best role for AI here is not to make decisions directly, but to reduce the time spent on information processing and document production.

For example, in investment banking, an Agent can help organize target company information, extract key financial metrics, and generate a first draft of a transaction summary. In research, it can read filings and news first, then list key changes and open questions. In KYC, it can help check whether materials are complete and whether there are unusual signals.

The final judgment should still belong to professionals. The Agent’s role is closer to assistant, analyst, and workflow accelerator.

What It Suggests for Enterprise Adoption

The most useful part of this repository is that it turns “model capability” into “business components”.

Internal AI projects often run into the same problem: model demos look impressive, but once they are connected to real business, they are hard to reuse. One team writes one set of prompts, another team writes another. One system connects a database, another builds its own interface. Security and audit requirements are scattered everywhere.

A steadier approach is to split capabilities into several types of assets:

Role-oriented Agents;
Process-oriented Skills;
MCP connectors for system integration;
Execution rules for permissions and audit;
Templates and checklists for business output.

The benefit is that the enterprise does not restart from “building a chatbot” every time. It gradually accumulates maintainable AI workflow assets.

Compliance and Responsibility Boundaries Cannot Be Ignored

The easiest misunderstanding around financial Agents is treating “can generate analysis” as “can replace decisions”.

In financial services, AI output should usually be treated as supporting material. It can organize facts, draft documents, highlight risks, and complete files, but it cannot bypass investment research, risk control, legal, compliance, and suitability requirements. Especially when investment advice, trading decisions, asset allocation, or identity checks are involved, human approval and responsibility chains must remain.

That is why enterprise Agents cannot be evaluated only by answer quality. They must also be evaluated by:

Whether data sources are reliable;
Whether references and evidence are traceable;
Whether tool calls are recorded;
Whether sensitive data is restricted;
Whether output has human confirmation;
Whether wrong results can be discovered and rolled back.

If these questions are not solved, the more automated the Agent becomes, the larger the risk radius becomes.

Conclusion

anthropics/financial-services is more like a financial Agent reference implementation than an out-of-the-box financial product. It shows one way Anthropic thinks about enterprise AI adoption: do not build only generic chat assistants; organize Agents around specific roles, specific workflows, specific data sources, and specific permission boundaries.

For financial institutions, it can serve as a reference for designing internal AI workflows. For developers, it is a sample for observing enterprise Agent architecture: Agents handle roles and tasks, Skills preserve professional processes, Plugins and MCP connect external systems, and the model eventually enters real business workflows.

If early AI tools solved “how to make models answer questions”, projects like this care more about “how to let models participate in work within controlled boundaries”. That is where enterprise Agents become truly difficult.

How Did AI Agents Evolve? A Complete 2022-2026 Five-Generation Timeline

Sat, 16 May 2026 19:19:52 +0800

AI Agents did not appear overnight.

At the end of 2022, ChatGPT was still mainly a chat window. By 2026, agents had begun to gain tool calling, file operations, computer control, long-term memory, remote collaboration, and persistent execution. In four years, they moved from “models that answer questions” toward “digital workers that can move tasks forward.”

If we look at the timeline, AI Agents have roughly gone through five generations. Each generation solved the previous one’s core limitation, while creating new bubbles and new safety problems.

Overview: five generations of Agents

Stage	Time	Keyword	Capability shift	Core problem
Generation 0	Late 2022 - early 2023	Chat box	Generates text, but cannot act	Model and real world are disconnected
Generation 1	Mid-2023 - late 2023	Tool calling	Outputs structured calls, connects APIs and RAG	Open-loop execution and task drift
Generation 2	Late 2023 - 2024	Engineered workflows	Planning, state, reflection, and multi-agent collaboration	Workflows are easy to copy; low-code bubble
Generation 3	2024 - 2025	Computer Use	Sees screens, clicks, and operates GUIs	Permission, safety, and misoperation risks
Generation 4	2025 - 2026	MCP / Skills / persistence	Tool networks, long-term context, and professional skills	Persistent execution expands the risk radius
Generation 5 preview	After 2026	Loops and world models	Stronger memory, validation, and physical action	Governance becomes harder

Late 2022: Generation 0, the ChatGPT chat-box era

Generation 0 begins with the release of ChatGPT on November 30, 2022.

This generation was not yet a real Agent. It had strong language generation ability, but it was mostly trapped in a chat box. It could write Python code, but not run it on your computer. It could plan a trip, but not book tickets. It could tell you how to edit a file, but not enter the file system and make the change.

Its capability boundary was clear:

understand natural language;
generate articles, answers, code, and plans;
no active access to fresh data;
no stable access to internal company knowledge;
no external action;
no long-term task state.

The core issue was the break between model capability and the real world. It could think and speak, but not act.

This stage also produced the first bubble: prompt engineers, prompt template markets, prompt courses, and prompt certifications. Early models were indeed sensitive to prompts, but the market mistook a temporary patch for a long-term moat.

As GPT-4-level models, system prompts, function calling, and better product defaults matured, many prompt templates lost scarcity. This pattern would repeat: a new capability creates a middle layer; the next generation internalizes it; the middle layer evaporates.

Mid-2023: Generation 1, tool calling wakes up

The keyword for Generation 1 is tool calling.

In June 2023, OpenAI released function calling. Developers could describe function names, purposes, parameter types, and JSON Schema. After understanding a user request, the model could output a structured JSON call instead of ordinary natural language, and an external system would execute it.

The architectural significance was large: the model started moving from a brain that only talks to a brain that can drive external tools.

Key capabilities included:

choosing tools based on user intent;
outputting structured arguments;
calling external APIs;
feeding API results back into the model;
using RAG to access external knowledge;
forming early personas through plugins and knowledge bases.

At the same time, RAG and vector databases became popular. They addressed the model’s lack of fresh information, private enterprise materials, and internal knowledge. The system retrieved relevant document chunks, injected them into context, and let the model answer from those materials.

The basic Agent structure became:

who you are: system prompt and persona;
what you know: knowledge base, RAG, private documents;
what you can do: function calling, plugins, external APIs.

The most dramatic bubble of this generation was AutoGPT. It showed an attractive idea: the user gives a broad goal, and AI breaks it down, searches, writes files, evaluates, loops, and stops when it believes the work is done.

But AutoGPT quickly exposed the problem. It lacked state constraints, stopping conditions, and reliable feedback. Tasks drifted, APIs were called with bad arguments again and again, and bills could be burned by huge numbers of model calls. The lesson was simple: tools plus an infinite loop do not make a production-grade Agent.

Late 2023 to 2024: Generation 2, engineered workflows

AutoGPT’s failure taught the industry that models cannot simply be left to improvise. Complex tasks need structure.

Generation 2 is about engineered workflows. An Agent became not just one model call, but a software system with state, control flow, and evaluation.

Key capabilities included:

task planning: breaking large goals into steps;
state management: tracking where work stands;
reflection and revision: generating, reviewing, and improving;
tool orchestration: switching between tools;
human-in-the-loop: asking for confirmation at key points;
multi-agent collaboration: dividing roles.

A typical pattern is ReAct, or Reasoning + Acting. The model reasons, calls a tool, observes the result, and then reasons again. The Agent no longer acts blindly; each step has auditable logic and feedback.

Common agentic workflow patterns emerged:

reflection: generate, review, revise;
tool use: choose search, databases, code execution, and enterprise APIs;
planning: decompose goals and track state;
multi-agent collaboration: product, developer, tester, reviewer roles.

The value of Generation 2 was putting model capability inside a controllable process. A well-designed workflow can sometimes make a smaller model produce more stable results than a single large-model call.

This generation also produced the low-code Agent platform bubble. Many tools used drag-and-drop interfaces to combine prompts, RAG, plugins, and flows. They lowered the building barrier, but if a workflow can be copied cheaply, the platform itself has a weak moat.

Low-code tools can capture early demand, but a demand window is not a defensible wall.

2024 to 2025: Generation 3, Computer Use reaches real interfaces

The keyword for Generation 3 is Computer Use.

Earlier tool calling relied mostly on APIs. What an Agent could do depended on what developers had connected. But many real-world apps do not have clean APIs, or their APIs are incomplete, closed, or inconsistent.

Computer Use lets models look at screens, click, and operate GUIs. The general computer interface itself becomes a tool.

Key capabilities included:

recognizing screen content;
clicking buttons, typing text, switching windows;
operating web and desktop software;
reading repositories, editing files, running tests;
inspecting terminal output and errors;
behaving more like a real engineering assistant.

This pushed Agents from “using connected tools” toward “operating software like a person.” It also made coding agents closer to real workflows: read a project, change code, run tests, and continue from errors.

But the trust boundary expanded. If AI operates a computer, it can click the wrong button, delete the wrong file, submit the wrong form, or be manipulated by webpage text, documents, and UI instructions. Prompt injection becomes a file-operation, permission, and system-safety problem.

Vibe coding debates also concentrated in this stage. Fast AI-generated projects feel exciting, but without tests, evaluation, permissions, and deployment boundaries, fast prototypes can become fast incidents.

Generation 3’s lesson: the closer an Agent gets to real operations, the more it needs sandboxing, approvals, rollback, and least privilege.

2025 to 2026: Generation 4, MCP, Skills, and persistent digital workers

Generation 4 is about persistence, connection, memory, and specialization.

The focus is not only stronger single tasks. Agents start to have long-term context, tool networks, professional skills, and a sense of time. They become less like helpers in one chat and more like digital workers that can continue working.

MCP addresses tool connection. It lets Agents connect to file systems, databases, browsers, design tools, project management tools, and enterprise systems in a more standardized way. Once the protocol stabilizes, many “tool-connection middle layer” products get compressed.

Skills address professional method. Tools tell an Agent what it can do; skills tell it how to do the work. A good skill is not just a prompt. It packages domain workflows, constraints, checks, common pitfalls, and tool-call order.

Key capabilities included:

long-term memory: storing preferences, project rules, and history;
project context: understanding repositories, docs, and work rules;
tool networks: connecting through MCP, APIs, browsers, and file systems;
professional skills: packaging task methods through Skills;
persistent execution: waiting, waking, reminding, and following up;
remote collaboration: users can return from different devices to approve and steer.

This generation starts to feel like an employee:

identity and responsibility boundaries;
long-term context;
professional work methods;
time awareness;
tool permissions;
ability to continue work without being watched.

But the more it resembles an employee, the more its risk radius resembles an employee’s. Persistent execution, local data access, secrets, tool calls, and task handling move security from the edge to the center.

One point matters especially: text is also an attack surface. If an Agent reads and follows Markdown, documentation, skill packs, or webpages, malicious text can change its behavior. Prompt injection becomes a supply-chain, permission, and execution-safety problem.

Generation 4’s lesson: persistent Agents need governance, not just capability.

After 2026: Generation 5 preview, loops, internal memory, and world models

Generation 5 is not established history yet. It is an extrapolation from the previous four years.

The first direction is more complete closed loops.

A mature Agent needs at least three loops:

execution loop: verify after each action, rollback, revise, and retry if needed;
time loop: track long-term goals across multiple wake cycles;
cognitive loop: know what is certain, what is guessed, and what is outdated.

The second direction is internal memory.

Most memory so far is outside the model: RAG, vector stores, chat logs, local files, and memory.md. If future model architectures support persistent state across sessions, Agent memory systems may be rebuilt.

The third direction is world models.

Many Agents today are still reactive: observe, respond, observe again. High-risk tasks require the model to simulate consequences. Before changing a database script, it should think about data loss, rollback failure, and compatibility issues, not learn only after an accident.

The fourth direction is embodiment.

Earlier generations mainly happened in digital space: APIs, screens, files, browsers, and enterprise tools. The next step may extend Agent action into the physical world, including robots, device control, industrial systems, and standardized physical interfaces.

Generation 5 will need to solve not only how Agents execute tasks, but how they understand consequences, manage long-term state, and stay reliable inside a larger risk radius.

Six patterns behind the timeline

First, base-model capability remains the ceiling. An Agent is not magic outside the model; it is a way to release model capability through engineering systems.

Second, engineered architecture amplifies model capability. Planning, verification, reflection, revision, evaluation, and permission control are closer to deliverable work than one-shot generation.

Third, open protocols reshape value distribution. Once MCP, Skills, and project-context standards stabilize, competition shifts from “who connected the tool first” to “who accumulated real domain capability.”

Fourth, the hidden main line of Agent evolution is expanding human-machine trust. From trusting text, to API calls, to workflows, to computer operations, to persistent execution, each generation pushes the risk radius outward.

Fifth, every generation’s accidents become the next generation’s rules. AutoGPT’s loops pushed structured orchestration; vibe coding failures pushed evaluation-driven development; production deletions pushed least privilege and sandboxing; skill poisoning pushed supply-chain safety.

Sixth, the Agent ecosystem repeatedly booms and collapses. New capabilities create temporary middle layers, and model or platform internalization later removes them. Mistaking a time window for a moat is dangerous.

The real moat

The real moat in AI Agents is not packaging a new capability first.

More reliable moats include three things.

First, vertical depth. Do you truly understand an industry’s workflow, risks, exceptions, and responsibility boundaries? General models can learn concepts, but they may not replace hard-earned domain execution experience.

Second, a data flywheel. Can you collect high-quality feedback from real usage and improve workflows, evaluation, fine-tuning, and product decisions?

Third, user trust. Will users hand you higher-value, longer-running, riskier work, or only treat you as a one-off tool?

If a platform or base model absorbs a capability, the products that still retain process, feedback, responsibility boundaries, and trust are more likely to survive. Many others are temporary bubbles.

Final note

From 2022 to 2026, AI Agent evolution was not “models getting better at chatting.” It was “humans becoming willing to hand more work to AI.”

A mature Agent is not the system most eager to execute automatically. It is the system that knows when to execute, when to verify, when to pause, and when to ask a human.

To judge whether an Agent product has long-term value, ask one question: when the next model or platform builds this capability in, what remains?

If the answer is domain workflow, real data, verifiable results, and user trust, there may be long-term value.

Connecting Claude to Fusion 360: An Example of Editing STEP Models With AI

Thu, 14 May 2026 20:58:04 +0800

After Claude is connected to Fusion 360, it can do more than “talk through ideas”. It can directly participate in CAD model editing. A typical workflow is to open an existing STEP file, let Claude read the current model, analyze structural conflicts, plan dimensions, and then execute modeling changes through the Fusion plugin.

The following uses a planetary gear indexer modification as an example to summarize the basic Claude + Fusion 360 workflow.

Enable Fusion 360’s API/MCP Service First

Start with a basic Fusion 360 setup:

Open Preferences in the upper-right corner.
Go to General.
Find the API option.
Enable the MCP server.
Note the port number. The default example is 27182.

Then return to Claude, go to Connectors, find the Fusion connector, and enter the Fusion 360 address and port. In most cases, the default port 27182 is enough.

After the connection succeeds, Claude can interact with the currently opened model through the Fusion plugin.

Open the STEP File and Define the Goal Clearly

The part to modify is a gear inside a planetary gear indexer. In the original design, the gear is fixed to the bracket with a screw acting as the central shaft.

The goal is to convert it into a bearing-based structure:

the center hole needs to fit a bearing;
surrounding screw holes must not interfere with the enlarged center hole;
the self-tapping screw hole on the bracket should also be adjusted into a shaft structure suitable for bearing rotation;
the final model should be importable into slicer software and usable for 3D printing.

The key is not to simply tell Claude “modify this for me”. You need to clearly state the use case, assembly method, material, and manufacturing process.

Claude Can Understand the Current Model Through Screenshots

Some people worry that the Fusion plugin can only execute commands and cannot let Claude see the model. In actual testing, Claude can recognize the current model state through screenshots.

In this case, Claude could see the gear structure and complete several tasks:

identify the gear and center hole;
measure or estimate related dimensions;
recommend bearing dimensions;
judge which structures would affect bearing installation;
notice that after enlarging the center hole, surrounding screw holes might create geometric interference.

This step matters. It shows that Claude is not blindly editing from text instructions. It can combine the current model view with structural reasoning.

Specify Material and Manufacturing Method in Advance

If the model will be used for 3D printing, you must clearly tell Claude the material and process.

For example, when printing with PLA, the bearing hole should not be designed strictly according to CNC metal machining tolerances. For a 6mm bearing that needs a press fit, a hole diameter around 6.1mm may be considered. Whether that size is appropriate still depends on printer accuracy, material shrinkage, slicer settings, and real testing.

If you do not specify the material, Claude may default to CNC-style tolerances. The resulting hole size may be too small for 3D printing, making assembly difficult.

A useful prompt might be:

1
2
3

This model is for FDM 3D printing, using PLA.
The goal is to install a 6mm bearing, so printing tolerance and press fit should be considered.
Do not handle it as CNC metal machining tolerance.

Let Claude Modify the Gear Structure

After the goal is clear, Claude can perform specific modifications:

enlarge the center hole;
adjust surrounding screw holes that interfere;
add a bearing seat;
add chamfers to edges;
keep the gear body and key meshing structure unchanged.

In this case, Claude first produced a plan and then called Fusion 360 to perform modeling operations. For example, after detecting a conflict between the original screw holes and the center hole, it moved the holes slightly outward to protect the bearing installation space.

After modification, check the model:

whether the central bearing seat is formed correctly;
whether surrounding holes still preserve their function;
whether the gear structure was accidentally damaged;
whether chamfers affect assembly;
whether there are overhangs, thin walls, or slicing risks.

The Bracket Must Be Modified Too

Changing only the gear is not enough. The original bracket had a self-tapping screw hole. If the gear center is converted to a bearing, the bracket must also be changed into a bearing shaft structure.

You can ask Claude to perform a similar modification on the bracket:

preserve the overall mounting position;
convert the original self-tapping screw hole into a cylindrical shaft;
control shaft diameter and height;
reserve space for bearing rotation;
avoid interference with other bracket structures.

After printing, the gear can be pressed into the bearing, and the bracket can provide the new rotation center. The final result changes a screw-fixed structure into a smoother bearing-rotating structure.

Export, Slice, and Print for Verification

After the CAD modification is done, the actual manufacturing process still matters:

Export the modified model from Fusion 360.
Import it into slicer software.
Check holes, thin walls, overhangs, and supports.
Print the gear and bracket.
Press the bearing into place.
Check whether rotation is smooth.

AI-edited CAD results cannot be judged only by whether the on-screen model looks good. They must be verified through printing. For mechanical structures such as bearings, holes, clips, and gears, an error at the 0.1mm level can decide whether the part fits and rotates smoothly.

Usage Suggestions

Claude + Fusion 360 is well suited for:

making local modifications to existing STEP models;
adjusting holes, chamfers, brackets, and mounting seats;
converting screw-fixed structures into bearing, snap-fit, or pin structures;
correcting tolerances for 3D printed models;
quickly generating multiple revised versions.

But it is not suitable for directly producing final parts without inspection. A more reliable workflow is:

Define the assembly goal and material process yourself.
Let Claude analyze the structure and propose modifications.
Let Claude call Fusion to execute modeling.
Manually check key dimensions and interference.
Print a small test sample.
Iterate based on the physical result.

Summary

The value of connecting Claude to Fusion 360 is not replacing CAD fundamentals. It is making local edits to existing models much faster.

As long as you clearly specify the goal, material, dimensions, tolerance, and assembly method, it can help read the model, find interference, modify structures, add chamfers, and push the model toward a printable state. For 3D printing, open-source mechanical part modification, and small-batch iteration in personal workshops, this AI CAD workflow is already practical.

goose: An Open Source AI Agent with Desktop, CLI, and API

Fri, 08 May 2026 13:41:15 +0800

goose is an open source AI agent that runs on your own machine. It is not limited to code completion; it aims to cover code, research, writing, automation, data analysis, and other tasks. The README positions it as a desktop app, CLI, and API that can serve both normal users and custom workflows.

The project has moved from block/goose to the Agentic AI Foundation (AAIF) at the Linux Foundation. The current repository is:

`1`	`https://github.com/aaif-goose/goose`

goose is mainly written in Rust and TypeScript and uses the Apache-2.0 license. Its GitHub description says it is an open source, extensible AI agent that goes beyond code suggestions and can install, execute, edit, and test with any LLM.

What Problem It Solves

Many AI coding tools focus on suggestions or local code edits. goose takes a broader view: let an AI agent complete tasks directly on your machine.

It can be used for:

Code changes and tests.
Local automation.
Research and writing.
Data analysis.
Multi-step workflows.
Embedding through an API.
Tool extension through MCP.

If you only need IDE completion, a Copilot-style tool may be enough. goose is more useful when you want AI inside the local task execution chain.

Desktop, CLI, and API

goose has three entry points.

The desktop app supports macOS, Linux, and Windows. It is good for users who prefer a visual interface.

The CLI fits terminal workflows and local development automation.

The API lets other systems or internal tools embed goose as an agent runtime.

Personal users can start with the desktop app or CLI. Teams and workflow builders should also look at the API and custom distribution support.

Installation

The README recommends downloading the desktop app:

`1`	`https://goose-docs.ai/docs/getting-started/installation`

CLI install:

`1`	`curl -fsSL https://github.com/aaif-goose/goose/releases/download/stable/download_cli.sh \| bash`

GitHub Releases provide builds for multiple platforms. The latest release checked here was v1.33.1, published on 2026-04-29, with macOS, Linux, Windows, deb, rpm, and Flatpak assets.

After installation, configure a provider from the official quickstart and test in a low-risk directory first. goose can execute local tasks, so avoid giving it broad permissions in a production repository from the start.

Providers

goose supports 15+ providers, including:

Anthropic
OpenAI
Google
Ollama
OpenRouter
Azure
Bedrock
other cloud or OpenAI-compatible providers

It can use API keys, and it can also use existing Claude, ChatGPT, or Gemini subscriptions through ACP.

ACP is important because many users already pay for subscriptions, but different tools cannot easily reuse them. goose uses ACP providers to bring those subscriptions into an agent workflow.

Provider policies change quickly. Check whether the access method is allowed, whether there are quotas, and whether it is suitable for company code or sensitive data.

MCP Extensions

goose supports Model Context Protocol extensions. The README mentions 70+ extensions.

MCP matters because an agent should not only chat and edit files. Through standard protocol servers, it can connect to documentation, databases, browsers, internal systems, search services, design tools, or project management tools.

For teams, MCP can become a safer integration layer: expose internal capabilities through explicit interfaces instead of letting the model touch every system directly.

Difference from a Coding Assistant

goose is not just a code completion tool. It is closer to a local agent runtime.

Common coding assistants focus on:

Code completion.
Code explanation.
Function generation.
Local editor edits.

goose emphasizes:

Local task execution.
Multi-step workflows.
Switchable providers.
Extensions.
Desktop and CLI.
Embeddable API.
Non-code tasks too.

This also means more complexity. You must think about model configuration, permissions, extensions, workspace scope, logs, and credentials.

Custom Distributions

The repository includes CUSTOM_DISTROS.md, which explains how to build a custom goose distribution with preconfigured providers, extensions, and branding.

This is useful for teams:

Preconfigure allowed model providers.
Connect internal MCP servers.
Set safety policies and logging.
Block disallowed external services.
Apply company branding and onboarding.

Members do not need to configure everything from scratch, and the risk of wrong provider or key setup is reduced.

Suggested Use

Start gradually:

Install the desktop app or CLI.
Configure one known-good provider.
Run simple tasks in a test directory.
Observe what it reads and executes.
Add MCP extensions.
Try larger repositories later.

Keep a few habits:

Commit important changes before agent work.
Do not store API keys in project files.
Use high-permission modes only in trusted workspaces.
Review company data and provider policy first.
Keep human review for automation results.

Who Should Use It

goose is a good fit if you want a desktop and CLI AI agent, multiple model providers, MCP integration, API embedding, or custom team distributions. It may be heavy if all you need is IDE code completion.

Summary

goose is an open source AI agent under AAIF/Linux Foundation. It provides desktop, CLI, and API entry points, supports 15+ providers, ACP subscription access, and 70+ MCP extensions.

Its value is not only writing code, but placing models, tools, extensions, and local execution into one agent framework. Start small, define permission and data boundaries, then expand usage.

References

CC Switch: A desktop tool for managing Claude Code, Codex, Gemini CLI, and OpenClaw in one place

Wed, 06 May 2026 09:03:08 +0800

CC Switch is a desktop management tool for heavy AI coding users. The problem it tries to solve is straightforward: many people now use Claude Code, Codex, Gemini CLI, OpenCode, and OpenClaw at the same time, but each tool has its own configuration format, Provider syntax, MCP setup, and Skills management method.

When you only use one tool, manually editing configuration files is still tolerable. Once several tools are mixed together, plus official accounts, third-party APIs, relay services, local models, and shared team configuration, editing JSON, TOML, and .env files by hand quickly becomes tedious.

CC Switch is positioned as a way to pull these scattered configurations into one cross-platform desktop app.

What problem does it solve

Modern AI coding tools increasingly feel like “development colleagues inside the command line”, but their ecosystems are still not fully unified.

Common pain points include:

Claude Code, Codex, Gemini CLI, OpenCode, and OpenClaw all use different configuration formats.
Switching API Providers requires repeated configuration-file edits.
MCP servers are configured repeatedly across different tools.
Prompt files such as CLAUDE.md, AGENTS.md, and GEMINI.md are hard to maintain consistently.
Skills installation, sync, backup, and removal lack a single central entry point.
Switching between multiple accounts, relays, and model services can easily become confusing.
Once a manually edited configuration file breaks, troubleshooting is costly.

The idea behind CC Switch is to stop forcing users to remember every tool’s configuration details, and instead use one unified interface to manage Providers, MCP, Prompts, Skills, Sessions, and proxies.

Supported tools

The README lists five core supported targets:

Claude Code
Codex
Gemini CLI
OpenCode
OpenClaw

These tools are similar in positioning: all center on AI coding, Agent workflows, and command-line collaboration. But their configuration systems differ, and the value of CC Switch lies in wrapping those differences.

For people who often compare different AI coding tools, this is much easier than manually opening configuration files every time.

Provider management

The first layer of CC Switch is Provider management.

It includes more than 50 Provider presets. The README mentions directions such as AWS Bedrock, NVIDIA NIM, and various community relays. Users can copy an API key, import it with one click, and then switch from the interface.

The practical points include:

Add Providers with one click.
Reorder Providers by dragging.
Quickly switch from the system tray.
Import and export Providers.
Sync some common Providers across multiple apps.

For many people, this feature alone is already attractive. In daily AI coding work, the problem is often not “I do not know how to use the model”, but “which tool, endpoint, and account should this key use today”.

Local proxy and failover

Besides writing configuration files, CC Switch also provides a local proxy mode.

The focus of this capability is:

Hot-switching Providers.
Format conversion.
Automatic failover.
Circuit breakers.
Provider health checks.
Request correction.

In simple terms, it does not only write configuration into target tools. It can also add a local proxy layer in the middle, so different tools access model services through the proxy.

This is useful for users with multiple Providers: if one service is down, switch to another; if one model is expensive, move to a cheaper one; if a request format is incompatible, adapt it through the proxy layer.

MCP, Prompts, and Skills

The second important layer of CC Switch is unified management for MCP, Prompts, and Skills.

MCP

It provides a unified MCP panel for managing MCP servers across multiple apps, with support for bidirectional sync and Deep Link import.

This is practical for users already working with MCP. Once there are many MCP servers, configuration easily becomes scattered across different clients. A unified panel reduces duplicate configuration and makes migration easier.

Prompts

The Prompts section supports Markdown editing and can sync corresponding files across different tools, such as:

CLAUDE.md
AGENTS.md
GEMINI.md

These files are essentially project manuals for Agents. Unified management makes it easier to maintain team rules, project conventions, and global prompts.

Skills

Skills can be installed with one click from GitHub repositories or ZIP files. Custom repository management, symbolic links, and file copying are also supported.

If you use tools such as Claude Code, Codex, and OpenClaw at the same time, Skills can easily turn into scattered files across different directories. CC Switch centralizes them and reduces maintenance cost.

Sessions and workspace

The README also mentions Session Manager and Workspace features.

It can browse, search, and restore session history from multiple apps. For people who use AI coding tools over a long period, session management is genuinely important: many valuable contexts, debugging trails, and solution comparisons are buried in old conversations.

It also provides a Workspace editor for OpenClaw, allowing users to edit agent files such as AGENTS.md and SOUL.md with Markdown preview.

This shows that CC Switch is not just a small “key switching” utility. It is expanding toward an AI Agent workstation.

Cloud sync and data storage

CC Switch supports syncing Provider data through Dropbox, OneDrive, iCloud, NAS, or WebDAV.

Local data storage is also clearly defined:

Database: ~/.cc-switch/cc-switch.db
Local settings: ~/.cc-switch/settings.json
Automatic backups: ~/.cc-switch/backups/
Skills: ~/.cc-switch/skills/
Skill backups: ~/.cc-switch/skill-backups/

It uses SQLite as the main data source and emphasizes atomic writes and automatic backups, with the goal of avoiding configuration-file corruption during switching or writing.

This design matters for heavy users. If the configuration management tool itself writes a bad configuration, every AI coding tool can be affected.

Installation

CC Switch is a cross-platform desktop app built on Tauri 2.

The approximate system requirements are:

Windows: Windows 10 or later
macOS: macOS 12 Monterey or later
Linux: Ubuntu 22.04+, Debian 11+, Fedora 34+, and other mainstream distributions

Windows users can download the .msi installer or a portable compressed package.

macOS users can install it with Homebrew:

1
2

brew tap farion1231/ccswitch
brew install --cask cc-switch

To update:

`1`	`brew upgrade --cask cc-switch`

Linux users can choose .deb, .rpm, or AppImage. Arch Linux users can also install it through paru -S cc-switch-bin.

As of May 6, 2026, the repository page shows the latest release as CC Switch v3.14.1, published on April 23, 2026.

Tech stack

Judging from the repository structure, CC Switch is a typical Tauri desktop app:

Frontend: React 18, TypeScript, Vite, TailwindCSS, TanStack Query, shadcn/ui
Backend: Tauri 2, Rust, SQLite, Tokio
Testing: Vitest, MSW, Testing Library

Core design patterns include:

SQLite as the Single Source of Truth.
JSON for device-level local settings.
Writing into target tools’ live config during switching.
Filling current Provider edits back from live config.
Atomic writes using temporary files plus rename.
Locked database connections to avoid concurrent write issues.

This architecture suggests the project is not a simple script, but a desktop tool designed for long-term use.

Who it is for

CC Switch suits these users:

People who use Claude Code, Codex, Gemini CLI, OpenCode, and OpenClaw together.
People who frequently switch between official accounts, third-party relays, local models, or team Providers.
Users already making heavy use of MCP.
Teams that want to maintain CLAUDE.md, AGENTS.md, and GEMINI.md in one place.
Users who often install, test, and migrate Skills.
People who want to view session history and usage across different tools.

If you only use one AI coding tool, rely on official login, and rarely touch Providers, MCP, or Skills, its value may not be obvious.

But if you have already entered a “many tools, many accounts, many Providers, many projects” state, it can remove a lot of repetitive configuration work.

What to watch out for

Tools like this are convenient, but they also need clear boundaries.

First, it manages configuration for multiple AI CLIs, so users should be sure they trust the tool and its write logic.

Second, API keys, relay endpoints, and MCP servers are all sensitive configuration. Before enabling cloud sync, make sure the sync folder and WebDAV service are secure and trustworthy.

Third, after switching Providers, most tools still need the terminal or CLI to be restarted before changes take effect. The README mentions that Claude Code supports hot-switching Provider data, but other tools usually still require a restart.

Fourth, when switching back to official login, it is better to add the official provider according to the project instructions and then rerun the corresponding tool’s login flow.

Summary

The value of CC Switch is not that it creates yet another AI coding tool. Its value is that it acknowledges a reality: the AI coding ecosystem has entered a stage where multiple tools coexist.

Claude Code, Codex, Gemini CLI, OpenCode, and OpenClaw each have their own configuration systems, while MCP, Skills, Prompts, and Providers are expanding quickly. Continuing to edit configuration files by hand will eventually become a burden.

CC Switch pulls these pieces into one desktop app, making it easier to switch Providers, sync MCP, manage Skills, maintain prompt files, and view sessions. For heavy AI coding users, tools like this may move from “optional utility” to “daily infrastructure”.

References

farion1231/cc-switch

Codex App Beginner Guide: Installation, Sandbox, Parallel Tasks, Skills, and MCP

Wed, 06 May 2026 08:41:17 +0800

Codex App can be understood as a task workspace for AI coding. It is not a traditional IDE, nor just a chat window. It brings multitasking, project management, sandbox permissions, Git, cloud execution, plugins, Skills, MCP, and automation into one interface.

If you already use Codex CLI, Claude Code, Cursor, or other coding agents, the most interesting part of Codex App is that it turns “running multiple agents in parallel” into a clearer desktop workflow.

What Codex App Is Good For

The core value of Codex App is not answering questions, but letting AI continuously execute tasks inside a project directory:

Edit code, run commands, and start development servers.
Manage multiple projects and multiple tasks.
Run long tasks locally or in the cloud.
Call plugins, Skills, and MCP for extended capabilities.
Manage changes through Git, worktree, and PR workflows.

OpenAI also positions Codex App as an interface for managing multiple coding agents. It is suitable for people who need to advance several coding tasks at once, especially frontend pages, scripts, small apps, documentation, and automation workflows.

Preparation Before Installation

Before using Codex App, it is best to prepare three basic tools:

Git
Node.js
VS Code or your preferred IDE

Codex App supports macOS and Windows. After installation, sign in with your ChatGPT account. On first launch, you can choose your main usage scenario, such as programming or daily work. Codex will preload some plugins and Skills based on your choices, and you can adjust them later in settings and the plugin marketplace.

The main features on Windows and macOS are broadly similar, but some computer automation capabilities may depend on platform and plugin support. Use whatever your current version actually displays.

Interface Structure: Projects, Tasks, and Chats

Codex App uses a classic three-column layout:

Left: projects, tasks, chat history, plugins, and automation entry points.
Middle: current chat window.
Right: files, browser, terminal, run results, and other panels.

A project usually corresponds to a local folder. You can open multiple chats inside the same project, or open several projects at once so different agents can work in parallel.

The task list shows different states:

Running: the agent is still executing.
Waiting for approval: you need to confirm permissions, networking, dependency installation, or a high-risk action.
Completed: the task has finished, and you can inspect the result or continue asking.

This is more intuitive than switching between multiple terminal windows, and it is better suited to managing several AI tasks at once.

Sandbox and Permission Control

Codex App’s permission system is built around the sandbox. By default, the current project folder becomes the agent’s main workspace.

Common permission boundaries include:

It can read and modify files inside the project directory.
It cannot freely modify files outside the project by default.
Networking or high-risk commands are restricted by default.
When elevated access is needed, it asks the user for approval.

A practical mode is “auto review”: low-risk actions are automatically allowed, while high-risk actions are still confirmed by the user. This reduces frequent pop-ups while keeping dangerous operations from happening silently.

“Full access” should be enabled cautiously. It is suitable when you know exactly what the agent needs to do and the project already has Git backups and important files have separate backups. It is not recommended as a long-term daily default.

Context, Models, and Quotas

Codex App shows the current chat’s context usage. The longer the conversation and the more history it contains, the more context the model needs to process.

Useful habits:

Start a new chat after finishing a task.
Long chats can be compressed manually, but do not treat compression as perfect memory.
For complex tasks, clearly state goals, boundaries, and acceptance criteria.
Do not dump large irrelevant logs, errors, or files into a chat all at once.

For model selection, adjust reasoning strength according to task complexity. Simple edits, writing, and repetitive tasks do not always need the strongest model. Architecture migration, difficult bugs, and cross-file refactors are better suited to stronger models.

If the interface has a fast mode, remember that it usually consumes more quota. Use it when speed matters, but not as a daily default.

Image Generation and Multimodal Inputs

Codex App can accept images and files as context, and can call image generation in suitable scenarios.

This is useful for frontend and content projects. For example, you can ask Codex to:

Fix page styles based on screenshots.
Replace unsuitable images in a webpage.
Generate product images, carousel images, or page assets.
Point out what needs to be changed from a UI screenshot.

A more efficient approach is not to say only “make it look better”, but to use screenshots and point to concrete problems, such as “the spacing in this card is too large”, “this image does not match the service scene”, or “make the map area clearer”.

Steer: Correcting Direction During Execution

Steer can be understood as taking over the direction during execution. If the agent has already started but you realize it misunderstood the direction, you should not always wait for it to finish before correcting it.

You can use steering to insert a new instruction into the current execution flow and make Codex correct course.

Good use cases for Steer include:

The agent misunderstood the requirement.
The generated page style is clearly wrong.
The current plan is too expensive or heavy.
You need to add a key constraint temporarily.

In general, keep the default queued behavior and manually use Steer only when intervention is needed. This avoids disrupting normal tasks while still letting you pull the direction back at key moments.

Plan Mode and Built-In Browser

For complex tasks, start with plan mode. In plan mode, Codex does not immediately modify code. It first outputs a plan and may ask key questions with cards.

Tasks suitable for plan mode include:

Framework migration, such as moving a React project to Next.js.
Large refactors.
Features involving databases, authentication, or deployment.
Requirements where you have not decided the technical path.

The right panel in Codex App can open a built-in browser to preview the local development server. You can annotate the page and let Codex modify a specific UI location. This “look at the page, click the position, ask AI to change it” workflow is often better for frontend debugging than pure text descriptions.

Git, IDE, and Code Rollback

Codex App is not a full IDE. It can view code and add annotations, but handwritten editing is still better done in VS Code, Cursor, Windsurf, or another IDE.

Every Codex project should initialize Git early:

Ask Codex to create or check .gitignore.
Commit once after reaching a usable state.
Ensure a clean commit point before each large change.
Roll back with Git if you are not satisfied.

If you roll back only the chat history, the code will not automatically roll back. A safer approach is to return the chat to the right point, then use a Git commit hash to return the code to the corresponding state.

Worktree: Parallel Development in Multiple Directions

git worktree is especially suitable for parallel agents in Codex App.

It creates multiple independent working directories from the same repository, each corresponding to a different branch. This lets different agents work in different folders at the same time without overwriting each other.

Typical usage:

One worktree optimizes the customer review component.
One worktree adjusts store information and map layout.
Merge both tasks back to main after completion.
Remove temporary worktrees after merging.

This is much safer than letting multiple agents modify code in the same directory. If conflicts happen, review and merge them using normal Git workflows.

Cloud Execution Environment

Codex can work not only on your local machine, but also in a cloud environment.

Cloud execution is suitable when:

You are outside and only have a phone.
You want agents to run long tasks in the background.
The code has already been synced to GitHub and Codex needs to modify the remote repository.
You want changes reviewed and merged through PRs.

A typical flow is: push local code to GitHub, let Codex pull the repository in a cloud environment, execute the task, generate changes, then present them as a PR or diff for review.

When continuing local development, remember to pull down the latest remote changes.

Memory System: Write a Good AGENTS.md

New chats do not have complete historical memory by default. Once a project becomes complex, repeatedly explaining the background is inefficient.

The most general solution is to maintain AGENTS.md in the project root. This file can record:

Project goals and main tech stack.
Common commands.
Directory structure.
Code style and naming conventions.
Prohibited actions, such as bulk deleting files.
Test, build, and deployment rules.

You can also ask Codex to read the project and generate a first version of AGENTS.md, then review it manually. For complex projects, this file is worth maintaining.

Global rules should be used carefully. They are suitable for universal safety constraints, such as “do not recursively delete directories” or “confirm before destructive operations”. Do not put project-specific details into global rules, or they will pollute other projects.

Plugins and Automations

Plugins connect Codex to external services such as GitHub, Gmail, Google Drive, databases, and deployment platforms.

Their value is reducing copy and paste. For example, Codex can:

Check star trends for a GitHub repository.
Summarize email content and send it to you.
Run a recurring check.
Write the result as a summary.

Automations are suitable for repeated tasks. For example, checking repository data every Friday afternoon and sending an email report. Simple automation tasks usually do not require the strongest model; a lighter model is enough.

Skills: Turn Workflows Into Reusable Capabilities

Skills are “professional playbooks” for Codex. They are not one-off prompts. They package a task flow, rules, scripts, and notes so Codex can reuse them reliably later.

Common sources include:

Official Skills.
Third-party Skills.
Skills you write yourself.

Good candidates for Skills include:

Turning subtitles into illustrated notes.
Writing weekly reports in a company format.
Batch-processing images or documents.
Fixed-format code reviews.
Project initialization for a specific framework.

If you have copied and pasted the same prompt many times, it is worth turning it into a Skill.

MCP: Connect External Tools and Databases

MCP can be understood as a standardized tool protocol for large models. Through MCP, Codex can call external services to complete more concrete tasks.

For example, after connecting Supabase, Codex can:

Create database tables.
Read database schemas.
Modify backend endpoints.
Submit frontend forms to the database.
Debug problems based on database state.

This is powerful, but permissions matter. Databases, production environments, deployment platforms, and email accounts are high-risk resources. When connecting for the first time, use a test project and a low-privilege account.

Deployment Plugins

Deployment platform plugins can let Codex complete builds and releases directly, such as deploying a frontend project to Netlify.

These plugins are suitable for small websites, prototypes, internal tools, and demo projects. In real use, pay attention to:

Run a local build before deployment.
Do not write environment variables directly into code.
Check whether the page opens normally after publishing.
Keep human review for production projects.

AI can help connect the deployment flow, but deployment permissions should still be managed carefully.

Computer Automation

With supported platforms and plugin environments, Codex can also operate browsers or desktop apps, completing tasks closer to RPA.

Examples:

Open a chat app and prepare a message.
Browse a project board and summarize task status.
Generate an English brief.
Send it to a specified recipient after you confirm.
Turn the flow into a scheduled automation.

These capabilities are imaginative, but they require the strongest safety boundaries. Any operation involving sending messages, sending email, submitting forms, payments, or deleting data should retain human confirmation.

Usage Suggestions

The right way to use Codex App is not to let it fully take over everything at once, but to break tasks down and let it execute efficiently in a controlled environment.

Recommended habits:

Initialize Git for every project.
Use plan mode for complex tasks.
Use worktree for parallel tasks.
Put project rules in AGENTS.md.
Keep human confirmation for high-risk actions.
Turn repeated workflows into Skills or automations.
Validate plugins and MCP in a test environment first.

References

Summary

Codex App is not “one more AI chat window”. Its focus is turning AI coding into a manageable workspace where local projects, cloud tasks, Git, worktree, plugins, Skills, MCP, and automation can connect.

The key to using it well is balancing freedom and control. Small tasks can be handed to Codex boldly. Complex tasks should start with a plan. High-risk actions must be confirmed. Used this way, Codex can become not just a code-writing assistant, but a long-term engineering tool.

Claude for Creative Work: Anthropic Brings Claude into Adobe, Blender, Ableton, and SketchUp

Fri, 01 May 2026 05:52:14 +0800

Anthropic released Claude for Creative Work on April 28, 2026. The point is not another new chatbot, but bringing Claude into the software that creative industries already use.

The partnership list is telling: Blender, Autodesk, Adobe, Ableton, and Splice, along with tool ecosystems such as Affinity by Canva, Resolume, and SketchUp.

In simple terms, Anthropic wants Claude to do more than offer suggestions in a chat box. It wants Claude to enter concrete workflows for design, 3D, music, video, and live visuals.

Claude Cannot Replace Taste, but It Can Replace a Lot of Drudgery

Anthropic’s announcement is fairly restrained: Claude cannot replace a creator’s taste and imagination.

That is the right judgment. The hard part of creative work is often not “generating something,” but deciding which direction is worth pursuing, which details should be kept, and which proposal fits the character of a project.

But creative workflows also contain a lot of repetitive labor:

Batch-resizing images
Renaming layers
Exporting files in different formats
Organizing assets
Looking up software documentation
Writing scripts to modify scenes
Converting formats between multiple tools
Turning an idea into a visible draft quickly

These steps do not necessarily require “inspiration,” but they consume a lot of time. Claude’s role is more like freeing creators from these mechanical steps.

Connectors Are the Core of This Release

The key to this release is connectors.

connectors can be understood as bridges between Claude and external platforms or software. Instead of copying a request into Claude and then manually returning to the software to act on it, users can let Claude understand the tool directly, call capabilities, or read relevant documentation.

The connection areas mentioned in Anthropic’s announcement include:

Ableton: lets Claude answer questions based on official Live and Push documentation.
Adobe for creativity: connects to more than 50 tools in Creative Cloud, including Photoshop, Premiere, and Express.
Affinity by Canva: automates repetitive production tasks in professional creative workflows, such as batch image adjustment, layer renaming, and file export.
Autodesk Fusion: lets designers and engineers with Fusion subscriptions create and modify 3D models through conversation.
Blender: uses Blender’s Python API through natural language, helping users understand complex scenes, access documentation, and extend functionality.
Resolume Arena and Resolume Wire: let VJs and live visual artists control Arena, Avenue, and Wire in real time using natural language.
SketchUp: turns a conversation with Claude into a starting point for 3D modeling, such as describing a room, furniture, or a site concept before refining it in SketchUp.
Splice: lets music producers search royalty-free sample libraries directly from Claude.

These integrations cover design, audio, 3D, video, live performance, and engineering modeling. They are not a small experiment in one direction; they show Anthropic clearly moving toward a “creative software workbench.”

What It Means for Creative Work

Based on the announcement, Claude’s uses in creative work can be grouped into several categories.

The first is learning complex tools.

Many creative applications are powerful, but their learning curves are steep. Blender, Ableton, Fusion, and Premiere are classic examples. Users can ask Claude to explain a modifier stack, describe a compositing technique, or demonstrate an unfamiliar feature instead of jumping between search results, forums, and official docs.

The second is writing scripts and plugins.

Creative software contains a lot of room for automation. Claude Code can help users write scripts, plugins, shaders, procedural animations, or parametric models. For creators who know a little technology but do not want to keep digging through APIs, this is very practical.

The third is connecting toolchains.

Real projects are rarely completed in a single application. Design may happen in Adobe, 3D in Blender or SketchUp, audio in Ableton, assets from Splice, and the final result may still need to enter a video or performance system. Claude can help convert formats, reorganize data, synchronize assets, and reduce manual handoffs.

The fourth is rapid exploration and delivery.

Anthropic also mentioned Claude Design, a new product from Anthropic Labs for exploring software experience ideas. It can iterate visual proposals based on feedback, and its design results can be exported to other tools, starting with Canva.

The fifth is reducing repetitive production work.

For example: batch-processing assets, setting up project structures, modifying scene objects in bulk, and automating exports. Many creators know how to do these things; they simply do not want to spend an afternoon on repeated clicking.

Blender Is the Most Notable Piece

In this announcement, Blender has a particularly interesting position.

Blender is a free and open-source 3D creation suite used in indie games, motion graphics, architectural visualization, film production, and more. It already has a powerful Python API and many complex workflows.

Blender developers have created an MCP connector that can now be used officially in Claude.

This connector can do things such as:

Analyze and debug an entire Blender scene
Modify objects in a scene in bulk
Write custom scripts with the Blender Python API
Add new tools directly to the Blender interface
Help users understand complex settings and documentation

More importantly, Anthropic has joined the Blender Development Fund as a patron, supporting Blender’s continued development of its Python API.

This sends two signals.

First, Anthropic is not only trying to connect with commercial software; it is also betting on open-source creative tools.

Second, this connector is based on MCP, so in theory it is not limited to Claude. Other large models could connect to it as well. That aligns well with Blender’s open-source and interoperability direction.

This Is Not “AI Replacing Designers”; It Is “AI Entering the Tool Layer”

The most important thing about this release is not whether Claude can generate an image, a piece of music, or a 3D model.

The more important point is that AI is moving from the chat box into the tool layer.

In the past, many AI creative tools worked like this:

Describe a need inside an AI tool.
Get a result.
Download or copy it out.
Return to professional software and modify it manually.

The new direction looks more like this:

Claude understands your creative software.
Claude reads relevant documentation or project context.
Claude generates scripts, operates tools, organizes assets, or builds drafts.
The creator continues judging and refining inside familiar software.

This is more attractive to professional users because they do not want to leave their existing toolchains or migrate all their work to a completely new AI platform.

The Impact on Students and Creative Education

Anthropic also mentioned that it is working with art and design programs to support courses involving creative computation.

The first group of programs includes:

Art and Computation at Rhode Island School of Design
Fundamentals of AI for Creatives at Ringling College of Art and Design
MA/MFA Computational Arts at Goldsmiths, University of London

Students and teachers will receive access to Claude and the new connectors, and their feedback will help Anthropic understand what creative practitioners actually need.

This is interesting as well. If AI creation stays at the level of “generating assets,” it can easily become a showpiece. Once it enters courses, the more important questions become:

How should students understand the processes behind tools?
How can AI be used as a tool for exploration and prototyping?
How can they preserve their own judgment?
How can code and automation expand creative boundaries?
How can they avoid every work taking on the same AI flavor?

These questions are more practical than simply debating whether AI will replace creators.

Who Should Pay Attention to This Release

Claude for Creative Work is especially worth watching for several groups:

People using Blender, SketchUp, or Fusion for 3D modeling
People using Adobe or Affinity for design and video production
People using Ableton or Splice for music production
People who need to connect multiple creative tools into a workflow
People with some scripting ability who want to automate creative software
People working in creative education, interaction design, or computational arts courses

If you only occasionally use AI to generate images, this release may not immediately change your experience.

But if you already work inside professional software and often run into the feeling of “I know what to do, but these steps are too tedious,” connectors could be very valuable.

Boundaries to Keep in Mind

These tools are not omnipotent.

First, Claude still needs users to judge whether the result fits the aesthetics, brand, and project goals.

Second, when automating operations in professional software, it is best to start with small tasks rather than immediately letting it batch-modify project files that may be hard to recover.

Third, connector quality is crucial. A connector that can only look up documentation and a connector that can actually operate software are two very different experiences.

Fourth, creative software projects often contain complex files, asset dependencies, and version management. Once AI is involved, backups and rollback workflows become even more important.

Fifth, copyright, licensing, and asset sources still need to be checked by the user. For example, Splice emphasizes royalty-free samples, but real project use still requires confirming the specific license terms.

Conclusion

Claude for Creative Work is not a single feature update. It is Anthropic’s step toward pushing Claude into the creative software ecosystem.

The point is not to turn Claude into the creator, but to make Claude a tool assistant beside creators: looking up docs, writing scripts, batch-processing, connecting software, generating drafts, and reducing repetitive labor.

The long-term value lies in Claude beginning to enter the environments creators use every day, such as Blender, Adobe, Ableton, and SketchUp.

When AI is no longer just a standalone web page, but can understand and call professional tools, creative workflows will change in more practical ways.

Reference link:

Claude for Creative Work - Anthropic

qmd: Local Markdown Document Search for AI Agents

Fri, 01 May 2026 03:12:57 +0800

qmd is a search tool for local Markdown documents, with AI Agents as its main target users.

It solves a specific problem: when a project contains many .md documents, AI coding assistants often do not know which file to read, which section to cite, or which instructions are current. Full-text grep can find keywords, but it does not understand meaning well. Putting all documentation into the context wastes window space and easily introduces irrelevant content.

The idea behind qmd is to index Markdown documents first, then return the most relevant snippets through a search interface for AI to use. It can be used as a command-line tool, integrated through an SDK, or exposed as an MCP Server for clients that support MCP.

What Problem It Solves

Real projects usually have more than one or two README files.

You may have:

Architecture notes
API documentation
Development conventions
Deployment procedures
Architecture decision records
Troubleshooting notes
Requirement documents
AI usage instructions
Toolchain notes and reminders

Humans can browse documents through directories, but AI Agents need a clear retrieval entry point. Otherwise, they may:

Read the wrong document
Miss key constraints
Use outdated instructions
Put irrelevant content into context
Invent rules in answers based on experience

This is where qmd is useful. It turns local Markdown documents into a searchable knowledge source, so AI can search first when it needs context, then answer or act based on matched snippets.

Search Approach

The README says qmd combines several retrieval methods:

BM25 keyword search
Vector search
LLM reranking

BM25 is good for clear keywords. If you search for a function name, configuration key, error code, or file name, it is usually direct and effective.

Vector search is better for semantic questions. For example, if you ask “how does this project handle permission validation,” the documentation may not contain that exact phrase, but it may contain related descriptions about authentication, access control, and role checks.

LLM reranking is used to reorder candidate results. The first two steps find potentially relevant content, and the model then judges which snippets best match the current question.

This combination is more suitable for AI Agents than plain keyword search, because Agent questions are often task intentions rather than fixed keywords.

Why Markdown

Markdown is the most common documentation format in development projects.

It is simple enough to store in Git and structured enough to include headings, lists, code blocks, links, and tables. For AI, Markdown is also easier to parse than PDFs, web snapshots, or screenshots.

Because qmd focuses on Markdown, it can process developer documentation more directly:

Split content by headings and paragraphs
Preserve code blocks
Preserve document paths
Return snippets suitable for citation
Let the Agent know which document an answer comes from

This is more stable than asking AI to randomly scan a repository, and it saves more context than putting every document into a prompt at once.

Three Entry Points

qmd provides three entry points: CLI, SDK, and MCP Server.

1. CLI

The CLI is suitable for direct terminal use and for scripts.

You can index a documentation directory and then search related content with commands. For developers, the CLI is the easiest way to validate the tool: first see whether it can find the correct documents, then consider integrating it into more complex workflows.

This kind of tool is useful inside local projects. For example, before changing code you can search design documents; before debugging, search troubleshooting notes; before writing an API, search API conventions.

2. SDK

The SDK is suitable for integrating qmd into your own tools.

If you are building an internal development assistant, documentation Q&A system, code review bot, or project knowledge base, you can call the search capability through the SDK instead of asking users to run commands directly.

The SDK gives more control over:

Search directories
Query content
Number of returned results
Result format
Whether to pass results to a model for summarization

This fits scenarios that need deeper integration.

3. MCP Server

MCP is the most valuable entry point for AI Agents.

Through MCP Server, clients that support MCP can call qmd as a document search tool. This lets an Agent search local Markdown documents before acting, instead of guessing project rules.

A typical workflow could be:

The user asks AI to modify a feature
AI calls qmd to search related design documents
qmd returns the most relevant Markdown snippets
AI modifies code based on those document constraints

This is more natural than manually pasting all rules into a new session, and it is better suited to long-term projects.

Suitable Scenarios

qmd is suitable for:

Projects with many Markdown documents
AI Agents that often need to look up project rules
Teams that want AI answers to cite local documents
Documentation spread across multiple directories
Reusing the same retrieval capability across CLI, SDK, and MCP
Reducing AI coding assistants’ tendency to guess project conventions
Connecting local knowledge bases to Claude Desktop, Claude Code, or other MCP clients

If your project only has one short README, directly asking AI to read the file is enough.

But if the documentation has grown to dozens or hundreds of files, or if you want the Agent to search documents before acting, this type of indexing tool becomes meaningful.

Difference from grep

Tools such as grep and rg are excellent for exact search.

If you know you need DATABASE_URL, authMiddleware, 404, or docker compose, keyword search is usually the fastest.

qmd is better when you do not know the exact words.

For example, you may ask:

What is the release process for this project?
What conventions apply when adding a new API?
Was the caching strategy documented before?
Which documents should AI read before changing code?
Where is the design background for a module?

These questions usually require semantic retrieval rather than matching one word. The BM25 + vector + reranking combination in qmd is intended to make these questions find the right context more easily.

Relationship with RAG

qmd can be seen as a lightweight RAG component for Markdown documents.

It does not try to build a full Q&A system for you. It focuses on one step: finding relevant document snippets. How those snippets are used afterward can be handled by CLI, SDK, an MCP client, or your own Agent workflow.

This positioning is practical. Many projects do not need a large knowledge base system; they only need AI to search local documents more accurately and quickly, then bring the results back into the current task.

Notes for Use

First, documentation quality still matters.

A retrieval tool can only find existing content. If the documents are outdated, duplicated, or contradictory, AI may still receive wrong context. Before connecting qmd to an Agent, clean up the key documents first.

Second, do not make the index scope too broad.

Indexing every Markdown file in the repository is not always better. Dependency documentation, temporary notes, and old draft solutions can pollute results. A better approach is to define which directories are trusted documentation sources.

Third, search results should preserve sources.

When AI uses document snippets, it should know which file and section they came from. This makes human review traceable and reduces the risk of “this looks like a document conclusion, but it is only a model summary.”

Fourth, do not replace human judgment completely.

qmd can improve context recall quality, but it is not a replacement for the source of truth. Important changes still require current code, test results, and the latest requirements.

Suitable Teams

If your team has already started putting AI Agents into daily development workflows, tools like qmd can be valuable.

They are especially suitable for teams that:

Write a lot of documentation
Have a long project history
Need both new people and AI to quickly understand context
Maintain architecture decision records
Have many Markdown convention documents
Want AI to check rules before modifying code

Its goal is not to make AI all-knowing. It is to make AI guess less and look things up more.

Reference

tobi/qmd

Final Thought

The value of qmd is that it turns local Markdown documents into a search entry point that AI Agents can reliably call.

When project documentation moves from “instructions for humans” to “a context source searchable by both humans and AI,” AI coding assistants can follow project rules more easily.

Prompt Optimizer: An Open-Source Tool for Prompt Optimization, Testing, and MCP

Fri, 01 May 2026 03:09:07 +0800

Prompt Optimizer is an open-source tool for improving prompts. Its goal is straightforward: help you turn a rough prompt into something clearer, more stable, and easier for large language models to follow.

It is not just a page that “polishes my prompt.” The project provides prompt optimization, result testing, comparison and evaluation, multi-model access, image prompt handling, and MCP integration. For people who often write system prompts, user prompts, and AI workflow templates, it feels more like a dedicated prompt workbench.

What Problem It Solves

Many people run into similar problems when using AI:

Prompts keep getting longer, but output quality does not clearly improve
The same task behaves differently after switching models
System prompts and user prompts are mixed together and hard to debug
After changing a prompt, it is unclear whether the new version is better
Variable templates are useful, but manual replacement and testing are tedious
Prompt optimization should be available to other AI tools, but there is no standard interface

Prompt Optimizer is designed around these problems. It breaks “writing a prompt” into optimization, testing, evaluation, comparison, and iteration, so prompt tuning is no longer based only on intuition.

Main Features

1. Optimize System Prompts and User Prompts

There is more than one kind of prompt.

System prompts usually define roles, goals, boundaries, output rules, and working methods. User prompts are closer to the input for one specific task. When the two are mixed together, the model can miss the key point, and reuse becomes harder.

Prompt Optimizer supports both system prompt optimization and user prompt optimization. You can improve long-term reusable role definitions separately from the input for a specific task.

This is useful for:

Writing rules for AI coding assistants
Designing customer service, reviewer, translation, and analysis roles
Optimizing text-to-image prompts
Turning temporary requirements into reusable templates
Preparing different prompt styles for different models

2. Test and Compare Outputs

Optimizing a prompt is not enough. The important question is whether the optimized prompt actually performs better.

The project supports analysis, single-result evaluation, and multi-result comparison. You can run the original prompt and the optimized prompt on the same task, then compare whether the output is more accurate, stable, and aligned with the goal.

This is more practical than prompts that only “look more professional.” Many prompts look complete on the surface but produce verbose, rigid, or even misdirected output. Comparison testing helps reveal that early.

3. Multi-Model Support

The README says the project supports model services such as OpenAI, Gemini, DeepSeek, Zhipu AI, and SiliconFlow, as well as custom OpenAI-compatible APIs.

This matters because prompt performance depends heavily on the model. The same prompt can behave very differently across models. Multi-model testing helps determine:

Whether the prompt itself is weak
Whether a specific model is unsuitable for the task
Whether different model-specific prompt versions are needed
Whether a smaller model can become usable with a clearer prompt

If you use Ollama locally, or your company has an OpenAI-compatible internal model service, it can also be connected through a custom API.

4. Advanced Testing Mode

The project provides context variable management, multi-turn conversation testing, and Function Calling support.

Variable management is useful for templated tasks. For example, if you have prompts for second-hand sales replies, product descriptions, email responses, code reviews, or document generation, you can replace variables such as product, price, tone, and target user to test different inputs quickly.

Multi-turn conversation testing helps validate long-running dialogue behavior. Many prompts look fine in a single turn, but once follow-up questions begin, they may forget constraints, drift away from the role, or repeat explanations. Multi-turn testing is closer to real usage.

Function Calling support is suitable for more engineering-oriented AI applications. It helps validate model behavior around tool calls, parameter generation, and structured output.

5. Image Generation Prompts

Prompt Optimizer also supports text-to-image and image-to-image workflows. The README mentions integration with image models such as Gemini and Seedream.

Image prompt optimization is different from text tasks. It focuses more on subject, composition, spatial relationship, style, material, lighting, mood, and constraints. Turning a vague idea into a controllable visual description is often more valuable than simply making the prompt longer.

If you often generate product images, covers, illustrations, key visuals, or style references, this type of optimization is useful.

Ways to Use It

The project provides several entry points:

Online version
Vercel self-hosting
Desktop app
Chrome extension
Docker deployment
Docker Compose deployment
MCP Server

The online version is good for quick trials. The project notes that it is a pure frontend app: data is stored locally in the browser and sent directly to AI providers.

The desktop app is better when you need to connect directly to different model APIs. Browser environments can run into CORS limits; the desktop app avoids those issues, especially when connecting to local Ollama or commercial APIs with strict cross-origin policies.

Docker deployment is suitable for your own server or intranet environment. The README gives this basic command:

`1`	`docker run -d -p 8081:80 --restart unless-stopped --name prompt-optimizer linshen/prompt-optimizer`

To configure API keys and access passwords, pass environment variables:

docker run -d -p 8081:80 \
  -e VITE_OPENAI_API_KEY=your_key \
  -e ACCESS_USERNAME=your_username \
  -e ACCESS_PASSWORD=your_password \
  --restart unless-stopped \
  --name prompt-optimizer \
  linshen/prompt-optimizer

If Docker Hub is slow in China, the project also provides an Alibaba Cloud image address in the README.

What MCP Enables

Prompt Optimizer supports Model Context Protocol, or MCP.

When running through Docker, the MCP service can start together with the Web app and be accessed through the /mcp path. This turns it from a Web tool into something that can be called by MCP-compatible apps such as Claude Desktop.

The README lists these MCP tools:

optimize-user-prompt: optimize user prompts
optimize-system-prompt: optimize system prompts
iterate-prompt: perform targeted iteration on an existing prompt

These interfaces are well suited for AI workflows. For example, when writing a complex task prompt, an MCP-compatible client can call the prompt optimization tool directly instead of requiring you to open a Web page and copy text manually.

Difference from Normal Chat Tools

Normal chat tools can also help rewrite prompts, but they usually lack several parts:

Saving and comparing multiple versions is inconvenient
Testing multiple models at once is inconvenient
Turning variables into templates is inconvenient
Multi-turn conversation validation is inconvenient
Integrating through MCP or self-hosting is inconvenient

The value of Prompt Optimizer is that it turns prompt optimization into a repeatable process. It does not just give you a version that “looks more complete”; it lets you keep adjusting prompts around real outputs.

Who Should Use It

This project is worth attention if you:

Often write system prompts
Design roles and output formats for AI applications
Need to compare outputs from different models
Want to turn prompts into reusable templates
Need to test multi-turn dialogue or tool calls
Want to connect prompt optimization to an MCP workflow
Want to deploy a prompt tool locally or inside an intranet

If you only occasionally ask AI a simple question, a normal chat page is enough. This tool is better for people who treat prompts as maintainable assets.

Notes for Use

First, do not treat optimization results as absolutely correct.

Prompt optimization tools can improve expression quality, but they cannot guarantee that a model will never misunderstand. Important tasks still need test cases, manual review, and version comparison.

Second, do not only chase length.

A good prompt is not necessarily longer. It should express goals, boundaries, input and output formats, and evaluation criteria more clearly. Meaningless rule stacking can make the model miss the point.

Third, tune prompts by model.

Different models respond differently to role settings, format constraints, reasoning steps, and examples. A prompt that works well on a large model may not suit a smaller model. Multi-model testing is one reason this tool is useful.

Fourth, consider keys and access control when deploying.

If you deploy it publicly, configure an access password and handle API keys carefully. The project supports access control through environment variables; do not write sensitive configuration directly into public repositories.

Reference

linshenkx/prompt-optimizer

Final Thought

Prompt Optimizer is useful for turning prompts from “a temporary paragraph I wrote by hand” into “a work asset that can be tested, compared, and iterated.”

When you start maintaining prompts across multiple models, scenarios, and versions, this kind of tool is more convenient than a normal chat window.

AI Terms Explained: Agent, MCP, RAG, and Token in Plain Language

Thu, 23 Apr 2026 13:13:40 +0800

When people first get into AI, what pushes them away is often not the models themselves, but the long list of terms that keeps showing up in every discussion. Agent, MCP, RAG, AIGC, and Token all look familiar, but without a simple explanation, many people only recognize the words without really understanding them.

This article follows a common beginner-friendly line of explanation and condenses 10 high-frequency AI terms into a set of meanings that is easier to remember. The goal is not to sound academic. It is to help you build a basic mental model that lets you follow everyday AI conversations.

10 common AI terms and what they mean

1. Agent: an AI that does more than chat

Agent can be understood as an AI assistant that actually gets work done.

A normal chatbot usually works in a simple question-and-answer pattern. An Agent goes a step further. It can break a task into steps, arrange a process, call tools, and return a finished result. If you ask it to organize materials, look something up, or generate a document, it may do more than give advice. It may actually chain those actions together and complete them.

That is why the key point of an Agent is not whether it can talk, but whether it can act.

2. OpenClaw: an AI assistant that stays on your computer

Here, OpenClaw is described as a kind of AI assistant that lives on your computer.

You can think of this type of tool as a more desktop-oriented AI helper. It does not only receive text. It may also observe the interface, call local tools, and execute tasks step by step. Compared with a normal web chat interface, this kind of tool emphasizes operational ability much more.

If Agent is the abstract idea of an execution-oriented AI, this kind of desktop assistant is a more concrete personal-computer version of that idea.

3. Skills: capability packs added to an Agent

Skills can be understood as functional modules or operating instructions for an Agent.

The same Agent can behave very differently depending on which Skills it has. Some may focus on copywriting, some on data organization, and some on code-related work. They are a bit like apps on a phone, and a bit like reusable workflows.

So in many cases, it is not that the model suddenly became smarter. It is that a clearer set of rules, tools, and steps was added behind it.

4. MCP: a unified way for AI to connect to tools

MCP stands for Model Context Protocol.

In everyday terms, it is a bit like a Type-C connector for the AI world. In the past, connecting a model to different tools often meant building separate integrations one by one. With a unified protocol, the way those tools connect becomes more standardized and easier to reuse.

For most users, the most important thing to remember is this: MCP is not about whether a model can answer a question. It is about how a model can connect to external tools and resources in a safe and stable way.

5. Gacha: AI output is inherently random

The term “gacha” often appears in AI image generation, video generation, and creative work.

The idea is simple. Even with the same prompt and the same general direction, the result can still be different each time. Sometimes the output is great. Sometimes it falls apart. That is why people compare repeated generation attempts to pulling gacha in a game.

What this really reminds us is that AI generation is not a fixed formula. It is a probabilistic process with variation.

6. API: the connection between an app and a model

API stands for Application Programming Interface.

You can think of it as the standard entry point through which programs communicate. When you call a model service from your own app, script, or editor, you are essentially using an API to send a request and receive a result.

If you compare a model service to a restaurant, then:

the menu is like the API documentation
placing an order is like making an API request
the kitchen sending back the dish is like the model returning a result

That is why many tools may look different on the surface while still calling some form of API underneath.

7. Multimodality: AI handles more than text

Multimodality means AI no longer only reads and writes text. It can process multiple kinds of input and output.

For example, it may be able to read images, understand voice, interpret video, generate pictures, or even support real-time voice and video interaction. Compared with early text-only models, multimodal models are much closer to having the combined abilities to see, hear, speak, and write.

That is also why many AI products are no longer centered around a single text box.

8. RAG: retrieve information first, then generate an answer

RAG stands for Retrieval-Augmented Generation.

It is useful for solving a practical problem: a model’s training data has a time boundary, and it does not automatically know your company’s newest documents, customer-service records, or business rules. The idea behind RAG is to retrieve relevant material from specified sources first, and then generate an answer based on that material.

Its value usually shows up in three ways:

answers are more likely to stay close to real source material
you can trace where the answer came from
new documents can be added and reflected quickly

That is why many enterprise knowledge bases, AI customer-service systems, and internal Q&A tools rely on RAG.

9. AIGC: the general term for AI-generated content

AIGC stands for AI Generated Content.

It is not a single tool. It is a broad label for content produced by AI, including text, images, audio, video, and more. AI writing, AI illustration, AI short-form video generation, and AI voice synthesis all fit under the umbrella of AIGC.

What matters most about this term is that it describes a way of producing content, not one specific model.

10. Token: the unit used to measure model processing

Token can be understood as the basic unit a model uses to process text.

It is not exactly the same as one character or one word, but in practice, you can treat it as the common unit used for model computation and billing. Your input consumes Token, the model’s output consumes Token, and the context kept in memory also takes up Token.

That is why model services keep talking about context length, cost control, and prompt compression. At the core, all of those topics are tied to Token.

Using Claude Code Quota More Efficiently: Models, Context, Caching, and /compact

Sun, 19 Apr 2026 15:29:06 +0800

Many Claude Code or Claude Max users run into the same problem: even after paying for Pro, Max 5x, or Max 20x, the usage warning appears quickly, or they have to wait for the next reset. This feels especially obvious when Claude Code reads many files, fixes complicated bugs, or runs long tasks in a large project.

The key point is this: usage is not deducted linearly by “minutes.” It depends on the model, context length, attachments, codebase size, conversation history, tool calls, and current capacity. In the same 5-hour window, one person may work for a long time while another hits the limit in minutes. Usually the account is not broken; each request is simply too heavy.

This note collects a set of practical habits for using quota more efficiently.

01 First Understand Claude’s Usage Window

Claude Pro and Max both have usage limits. Claude Code usage is shared with Claude on web, desktop, and mobile under the same subscription quota. Anthropic’s help center explains that message counts depend on message length, attachment size, current conversation length, model or feature used, and that Claude Code usage is also affected by project complexity, codebase size, and auto-accept settings.

A simple way to think about it:

Pro: suitable for light usage and small projects.
Max 5x: suitable for more frequent usage and larger codebases.
Max 20x: suitable for heavier daily collaboration.
Usage windows reset on a 5-hour session basis.
Long messages, long conversations, large files, and complex tasks consume usage faster.
Stronger models such as Opus hit limits faster than Sonnet.

So “I only used it for 20 minutes” does not explain much by itself. What matters is how much context Claude read during those 20 minutes, which model was used, whether large files were processed repeatedly, and whether the same long conversation kept accumulating more tasks.

02 First Habit: Do Not Default to the Most Expensive Model

The Claude model family is commonly positioned like this:

Opus: strongest capability, suitable for complex reasoning, architecture decisions, and hard bugs.
Sonnet: balanced capability and cost, suitable for most everyday coding tasks.
Haiku: lighter, suitable for simple classification, summarization, and format conversion.

For daily scripts, small bug fixes, documentation cleanup, and code explanation, Sonnet is usually enough. Save Opus for cases such as:

Complex architecture design.
Deep multi-file refactors.
Bugs that are hard to reproduce.
Long-chain troubleshooting.
Tasks where the normal model is clearly stuck.

In Claude Code, use /model to switch models, or set the default in /config. A steadier habit is to use Sonnet by default and switch to Opus only at key points, rather than running the whole task on Opus.

03 Second Habit: Control Context, Do Not Drag Old Tasks Along

The longer the context, the more Claude needs to process on each turn, and the faster usage is consumed. The Claude Code docs explicitly recommend proactive context management:

Use /clear when switching to an unrelated task.
Use /compact when one phase is done but important context should remain.
Use /context to see what is taking space.
Configure a status line if you want continuous status visibility.

A useful rhythm:

Small phase done: /compact
Large task done: /clear
Switching to unrelated work: /clear
Context usage getting high: /compact early

/compact summarizes earlier conversation history while preserving key task state, conclusions, file paths, and remaining work. It reduces the amount of history carried into later requests. You can also add a short instruction:

`1`	`/compact Preserve changed files, test results, remaining TODOs, and key design decisions`

Do not wait for automatic compaction. The docs note that Claude Code auto-compacts when context approaches the limit, but manually compacting at phase boundaries is usually easier to control.

04 Third Habit: Long Conversations and Large Files Make Every Request Heavier

Many people assume that “I only asked one more question” should be cheap. But in a long conversation, that question may carry a lot of history, file summaries, tool definitions, and system rules behind it.

Things that easily bloat context include:

Long conversations that are never cleared.
Asking Claude to read entire large files.
Pasting long logs, build output, or test output.
Adding many screenshots or images at once.
Asking it to repeatedly scan the whole repository.
An overly long CLAUDE.md.
Too many MCP servers enabled.

A more efficient approach: paste only key errors from logs, include only failing parts of test output, and let Claude use rg, head, tail, and symbol search before reading only the necessary parts. If command-line filtering can shrink the content, do not paste the whole thing into context.

05 Fourth Habit: Understand Caching, but Do Not Worship It

Anthropic’s Prompt Caching can cache repeated prompt prefixes. The default cache lifetime is 5 minutes, and a 1-hour cache is also supported. When cache hits, large repeated context does not need to be fully reprocessed, which helps reduce cost and improve rate limit utilization.

But caching has limitations:

Content must match exactly, including text and images.
The default cache is short-lived.
Changing models, tools, system prompts, or context structure may reduce cache hits.
Output tokens do not disappear because of caching; the response still needs to be generated.
How Claude Code uses caching is a product-level implementation detail, so do not treat it as permanent “free memory.”

In practice, the important part is not studying every caching detail. It is keeping the session stable:

Avoid frequent model switching within the same phase.
Do not repeatedly rewrite large rule blocks mid-task.
Do not keep adding new images inside the same task.
Do not leave a long task idle for too long and then return with another huge request.
Use /compact at phase boundaries.

This makes repeated context easier to reuse and reduces later request weight.

06 About Peak Hours: Avoid Them When You Can, but Do Not Treat Them as a Formula

People often say certain hours feel tighter. Anthropic’s help center is more careful: message counts can be affected by current Claude capacity, conversation length, attachments, model, and features. In other words, peak capacity can affect the experience, but do not treat a specific local time window as a permanent rule.

Practical suggestions:

Put large refactors and heavy analysis in periods when both your network and the service are stable.
Do not start a huge task right before you plan to step away.
If you expect to leave for a long time, run /compact or /clear first.
For small edits, do not use Opus with a long context unless you really need it.

This is more reliable than memorizing a fixed “do not use it from X to Y” rule.

07 Slim Down CLAUDE.md, rules, MCP, and skills

Claude Code loads project rules, tool information, and some environment context into the session. The official docs also recommend separating general rules from specialized rules so every session does not start with a large amount of unrelated text.

A useful split:

CLAUDE.md: only global rules that always apply.
rules: path-specific or file-type-specific rules.
skills: specific workflows, such as publishing posts, deployment, image generation, or committing code.
MCP: only enable servers that the current task actually needs.

If CLAUDE.md is hundreds or thousands of lines long, every session carries that cost. A better pattern is to move occasional workflows into skills and load them only when needed.

MCP is similar. More tools do not automatically mean more efficiency. The Claude Code docs mention using /mcp to view and disable unnecessary servers, and /context to see what is consuming context space.

08 Practical Command List

These are the most useful daily commands:

/model

Switch models. Sonnet is a good default; use Opus for complex reasoning.

/clear

Clear the current context. Use it when switching to unrelated work.

`1`	`/compact`

Compress conversation history. Use it when a phase is done but the same task continues.

`1`	`/context`

Inspect context usage and find what is taking space.

/status

Check subscription or usage-related status. Anthropic’s help center also recommends monitoring remaining allocation.

/mcp

View and manage MCP servers, and disable tools not needed for the current task.

If you use API billing, /cost can be useful. But for Pro/Max subscriptions, the Claude Code docs explain that the dollar estimate from /cost is not the right billing reference; subscribers should rely more on usage information such as /stats and /status.

09 A Quota-Saving Workflow

A practical workflow looks like this:

Run /clear before starting a new task.
Use Sonnet by default.
Let Claude inspect project structure and key files first, not the whole repository.
Run /compact after each small phase.
Switch to Opus only for hard blockers.
Filter logs, errors, and test output before pasting them.
Run /clear after the task is done; do not start new work with stale context.
Periodically review CLAUDE.md, MCP, and skills to shrink always-on context.

The core idea is simple: let Claude see only what it truly needs for the current task.

10 Summary

Claude Code usage running out quickly is usually not caused by one thing. It is often a combination of high-cost models, long uncleared conversations, too many files and logs, heavy MCP and rule context, weaker cache reuse, and peak capacity fluctuations.

The practical fixes are also simple:

Use Sonnet for daily work.
Save Opus for truly complex problems.
Use /compact when a phase is done.
Use /clear when switching tasks.
Use /context to find context bloat.
Slim down CLAUDE.md, rules, MCP, and skills.
Do not dump the whole repository, full logs, or large image batches into context.

How much work the same Pro or Max plan can support depends heavily on how you manage context. Make the context smaller and task boundaries clearer, and Claude Code will feel much steadier.

References

Claude Help Center: Using Claude Code with your Pro or Max plan: https://support.claude.com/en/articles/11145838-using-claude-code-with-your-pro-or-max-plan
Claude Help Center: About Claude’s Max Plan Usage: https://support.anthropic.com/en/articles/11014257-about-claude-s-max-plan-usage/
Claude Code Docs: Manage costs effectively: https://code.claude.com/docs/en/costs
Anthropic Docs: Prompt caching: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

Firecrawl Project Notes: Web Search, Scraping, and Interaction APIs for AI Agents

Wed, 15 Apr 2026 13:45:03 +0800

Firecrawl has a clear purpose: turning web pages into data that AI agents can consume more easily. It is not just a crawler script. It wraps search, single-page scraping, site crawling, page interaction, structured extraction, and agent workflows into APIs, so models and automation systems can spend less effort dealing with web noise.

01 What It Solves

Many AI applications need to read web pages, but real websites are messy: JavaScript-rendered content, pop-ups, pagination, login state, anti-bot defenses, PDFs or DOCX files, and plenty of navigation, ads, scripts, and styling that have nothing to do with the main content.

Firecrawl tries to solve this middle-layer problem. The application asks for data from a page, a site, or a topic; Firecrawl handles opening, scraping, cleaning, and returning output in formats that are easier for LLMs to use, such as Markdown, HTML, screenshots, or JSON.

The value of this kind of tool is not merely whether it can request a URL. The real question is whether it can reliably turn complex pages into usable data. For RAG, AI search, competitive research, automated information gathering, and web content monitoring, this layer often becomes the unpleasant plumbing in the system.

02 Core Features

The Firecrawl README groups its capabilities into several areas:

Search: Search the web and return full page content from the results.
Scrape: Convert a single URL into Markdown, HTML, screenshots, or structured JSON.
Interact: Scrape a page, then use prompts or code to click, scroll, type, wait, and perform other actions.
Agent: Describe what you want, and let the agent search, navigate, and return the result.
Crawl: Scrape multiple pages under a website.
Map: Quickly discover URLs on a website.
Batch Scrape: Asynchronously scrape large batches of URLs.

At first glance, it looks like a scraping service. But as a full set of features, it is closer to a data entry point for AI applications: search discovers sources, scraping cleans content, interaction handles dynamic pages, and Agent pushes the whole “find information” task further toward automation.

03 Why It Fits AI Agents

Traditional crawlers usually assume that you already know the URL and understand the page structure. Agent workflows are often different. A user might simply ask, “Find the differences between the latest pricing plans on a company’s pricing page.” The system then has to search, open pages, compare content, and return sources.

Firecrawl’s Agent endpoint is designed for this kind of task. It can accept only a natural-language prompt, or it can be constrained to specific URLs. If structured results are needed, it can also work with a schema to return fixed fields.

This gives the application layer two benefits:

You do not have to write a separate parser for every website.
The returned result is easier to send into an LLM, a database, or a downstream automation flow.

Of course, this does not mean it replaces every custom crawler. For highly constrained, high-frequency, large-scale tasks with very stable fields, writing dedicated parsing logic may still be cheaper and easier to control. Firecrawl is a better fit when sources are scattered, page structures change often, and you want to connect web data to an AI workflow quickly.

04 MCP, CLI, and Integrations

Firecrawl is also clearly moving toward the agent tooling ecosystem. The README provides MCP Server setup, along with Skill/CLI initialization commands for AI coding agents.

This means it is not only intended for backend API calls. It also wants to plug directly into Claude Code, OpenCode, Antigravity, MCP clients, and similar workflows. For people who frequently ask agents to research, scrape, and organize web content, this kind of integration is lighter than hand-writing API calls.

It also lists integrations with platforms such as Zapier, n8n, and Lovable. That direction is practical: web data does not always go into code. It may flow into automation tables, low-code workflows, content systems, or internal knowledge bases.

05 Open Source, Self-Hosting, and Licensing

Firecrawl is open source. The main repository is primarily licensed under AGPL-3.0; the README also notes that SDKs and some UI components use the MIT license, with details depending on the LICENSE files in each directory.

This matters. If you only use the cloud service, the main concerns are API cost, reliability, and compliance boundaries. If you plan to self-host it and provide a service to others, the obligations of AGPL-3.0 need careful review.

The README also reminds users to respect website policies, privacy policies, and terms of use, and says that Firecrawl respects robots.txt by default. The stronger this type of tool becomes, the more important it is to design compliance and scraping boundaries into the system instead of patching them in after launch.

06 Suitable Use Cases

I would consider Firecrawl first in these scenarios:

Scraping web content for a RAG system and wanting clean Markdown directly.
Building AI search or research assistants that need to read full pages after search.
Scraping JavaScript-heavy sites without maintaining a browser cluster yourself.
Monitoring public information such as competitors, pricing, documentation, news, and job pages.
Giving MCP clients or AI coding agents real-time web reading ability.
Quickly validating a web-data product before building crawler infrastructure.

The less suitable cases are also clear:

The target site has very few fields, a stable structure, and can be handled by a simple script.
The scraping volume is huge, and cost sensitivity matters more than development and maintenance cost.
The business needs very fine control over sources, retry strategy, anti-bot behavior, and audit trails.
Licensing or compliance requirements do not allow AGPL components or external cloud services.

07 Quick Take

Firecrawl’s core value is productizing the messy path from “web page” to “AI-usable data.” It puts search, scraping, cleaning, interaction, batch processing, and agent-style research into one interface, which is convenient for AI application developers.

If your project often needs models to read real web pages, especially when sources are scattered, structures are unstable, and MCP or agent workflows are involved, Firecrawl is worth keeping in the toolbox. If the task is just low-cost bulk collection from fixed websites, a traditional crawler or dedicated parser may still be the better choice.

GitHub project: https://github.com/firecrawl/firecrawl

What Is Hermes Agent: Overview, Strengths, Getting Started, and How It Compares to OpenClaw

Sun, 12 Apr 2026 14:07:58 +0800

If you have been following open-source AI agents lately, Hermes Agent is a project worth paying attention to. Built by Nous Research, its main appeal is not simply that it is “another chat wrapper,” but that it tries to bring long-term memory, reusable skills, context files, MCP extensions, a messaging gateway, and parallel sub-agents into one unified agent runtime.

Based on the official README, Hermes Agent has a very clear goal: it can work like a local CLI assistant in your terminal, or like a cloud-hosted personal assistant that stays available through Telegram, Discord, Slack, WhatsApp, Signal, and other channels. For users who want to combine a coding assistant, an automation assistant, and a personal AI workspace into one system, that positioning is compelling.

01 An overview of Hermes Agent

Hermes Agent is an open-source self-improving AI agent from Nous Research. It supports multiple model providers, including Nous Portal, OpenRouter, OpenAI, and custom OpenAI-compatible endpoints. It can also run across different execution backends such as a local terminal, Docker, SSH, Daytona, and Modal.

What separates Hermes from many “tool-using chatbots” is that it does not focus only on tool calls within a single session. It puts much more emphasis on building persistent capability across sessions. The official docs break this idea down into several parts:

Persistent memory: stores key information about the environment, project, and user preferences through MEMORY.md and USER.md.
Skills system: turns successful workflows into reusable skills that can be loaded on demand.
Context files: automatically reads files such as AGENTS.md, SOUL.md, and .cursorrules to inject project conventions directly into the session.
MCP integration: can connect to any MCP-compatible tool server to extend database, GitHub, filesystem, and scraping capabilities.
Messaging gateway: beyond the CLI, it can also be used through Telegram, Discord, Slack, WhatsApp, Signal, Email, and other entry points.

In one sentence, Hermes Agent feels more like a general-purpose agent operating layer with memory, skills, extensibility, and multi-channel access.

02 Where it stands out

1. It covers both CLI workflows and messaging workflows

Many agent projects lean either toward terminal-based developer assistance or toward chat-platform bots. Hermes tries to combine both. You can run hermes directly in the terminal, or continue with the same assistant through Telegram or Discord after starting the gateway.

The practical benefit is that Hermes is not limited to being useful only when you are sitting in front of your computer. If you deploy it to the cloud or a VPS, it can become a continuously available personal AI assistant.

2. It is designed for long-term use

Hermes does more than chat and call tools. It is also built around long-term accumulation:

Persistent memory with boundaries, instead of endlessly stuffing more context into each conversation.
A skills system that lets you save and reuse successful workflows.
Search across past sessions for retrieval and recall.
Project context files that reduce the need to repeatedly explain the same background.

This matters a lot for people who work repeatedly inside the same repositories, workflows, and team conventions. It means the agent is not just helping once; it can gradually become more familiar with your environment.

3. MCP support gives it strong extensibility

The Hermes documentation explicitly supports MCP and describes both stdio and HTTP integration modes. In practice, that means if an external system already has an MCP server, Hermes can usually connect to it with much lower integration cost.

That is more flexible than writing a custom plugin for every single system. For users who already have tools built around the MCP ecosystem, Hermes should be much easier to extend.

4. It is friendly to OpenClaw users

This part is especially interesting. The Hermes README directly provides hermes claw migrate, and explicitly says it can import configuration, memory, skills, API keys, and messaging platform settings from OpenClaw.

That suggests Hermes is not trying to ignore the existing ecosystem and start from zero. It is clearly positioning some OpenClaw users as a migration audience.

03 How to get started quickly

The officially recommended Hermes Agent installation method is very straightforward:

`1`	`curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh \| bash`

According to the official README, it supports Linux, macOS, WSL2, and Android Termux. One important note is that native Windows is explicitly not supported right now, so Windows users are advised to use WSL2.

After installation, you would usually refresh your shell first:

`1`	`source ~/.bashrc`

Then you can launch it directly:

hermes

If you want to go through a more complete step-by-step initialization flow, the easiest command is:

`1`	`hermes setup`

Based on the official documentation and README, a simple first-time setup path looks like this:

Run hermes setup to finish the base configuration.
Use hermes model to choose a model provider and model.
Use hermes tools to enable the toolsets you want.
Run hermes to enter the interactive CLI.
If you want channels such as Telegram or Discord, continue with hermes gateway.

If you are already an OpenClaw user, it is also worth previewing the migration command:

`1`	`hermes claw migrate --dry-run`

That lets you inspect what can be migrated before doing a real import.

04 How to think about it versus OpenClaw

From the official docs and README, Hermes Agent and OpenClaw are not simply a case of one replacing the other. Their positioning overlaps, but their priorities are clearly different.

What Hermes Agent feels like

Hermes feels more like a product centered on an agent core and workflow system. It emphasizes:

CLI experience
Memory and skill accumulation
Project context files
MCP extensibility
Parallel sub-agents
Switching execution backends across local, container, remote, and serverless environments

If your main goal is to make the agent understand your project better, reuse capabilities over time, and connect more naturally into MCP and developer workflows, Hermes is likely the better fit.

What OpenClaw feels like

OpenClaw feels more like a platform centered on a personal AI assistant plus a messaging gateway. It emphasizes:

Rich messaging channel integration
A continuously running Gateway
A browser-based Control UI
Device pairing, remote access, and status management
Stronger assistant-oriented surfaces such as voice, mobile access, and Canvas

If your main goal is to keep a personal AI assistant reliably available across multiple chat channels and devices, with a control panel to manage it, OpenClaw has a stronger product feel in that direction.

A more practical rule of thumb

You can roughly think of the two like this:

Hermes Agent: more of a “growing general-purpose agent workspace”
OpenClaw: more of a “multi-channel always-on personal AI assistant platform”

That distinction is not absolute, because both projects are still expanding and Hermes also offers a migration path from OpenClaw. But based on the currently public material, Hermes is more prominent on the memory, skills, context, MCP, and developer-workflow side, while OpenClaw looks more mature on the gateway, multi-channel, Control UI, and device-access side.

05 Who should try it

Hermes Agent is especially worth trying first if you fit one of these profiles:

You already rely heavily on AI tools in the terminal and want an agent that better understands your codebase and project rules.
You want to combine AGENTS.md, skills, memory, and MCP into one workflow.
You do not want to be locked into a single model vendor and prefer flexible provider switching.
You already use OpenClaw and want to explore a direction that is more centered on agent workflows.

If you care more about mobile reach, broad IM platform integration, a browser control console, and the feeling of an always-online personal assistant, OpenClaw still has a lot of appeal.

References

Hermes Agent GitHub: https://github.com/NousResearch/hermes-agent
Hermes Agent Docs: https://hermes-agent.nousresearch.com/docs/
Hermes Features Overview: https://hermes-agent.nousresearch.com/docs/user-guide/features/overview
Hermes MCP: https://hermes-agent.nousresearch.com/docs/user-guide/features/mcp/
OpenClaw GitHub: https://github.com/openclaw/openclaw
OpenClaw Getting Started: https://docs.openclaw.ai/start/quickstart
OpenClaw Control UI: https://docs.openclaw.ai/web/control-ui

Drop MCP? Why CLI Is Becoming the Default Tool Layer for Agents

Fri, 10 Apr 2026 21:55:12 +0800

Over the last year, debates about agent toolchains have increasingly centered on one question:

Does MCP (Model Context Protocol) make tool calling simpler, or does it make simple tasks more complex?

For most day-to-day engineering tasks, CLI is becoming the more practical default.

Cost gap is not a UX issue, but an order-of-magnitude issue

The biggest practical pressure in MCP is token overhead.

In common scenarios, MCP often has to load large tool schemas before actual execution. Using a GitHub MCP Server as an example, initialization alone can consume tens of thousands of tokens. For long tasks, this directly squeezes context budget.

Community benchmarks keep pointing to the same conclusion:

Single MCP calls commonly cost several to dozens of times more than CLI
Retry recovery is also more expensive (reconnect plus context reload)

This is not just “a little slower.” It scales into API cost, latency, and reliability issues.

Why models are naturally better at CLI

A frequently overlooked fact is training distribution.

LLMs have seen massive amounts of terminal text during training: commands, outputs, errors, scripts, and man pages. In other words, CLI interaction is already close to the model’s native input pattern.

By contrast, MCP’s JSON-RPC and tool schema style became widespread only in recent years. Models can learn it, but familiarity and compression efficiency are often still weaker than long-established CLI patterns.

That also explains why, in many cases:

for the same goal, CLI commands are shorter
outputs are easier to continue reasoning over
error recovery paths are more stable

Security and isolation: MCP still has catching up to do

MCP is not incapable of security, but its ecosystem is still early.

Common concerns today include:

Tool Poisoning in descriptions
behavior drift (Rug Pull)
same-name tool override (Shadowing)

CLI also has security risks (injection, privilege misuse, path risks), but its process model, permission boundaries, and audit chain have been validated through decades of engineering practice. In production, that predictability matters.

This does not mean MCP has no value

I do not think MCP should be abandoned.

A more reasonable positioning is:

CLI handles the execution layer (local, low-latency, high-frequency calls)
MCP handles the connection layer (remote service discovery, unified auth, audit, and multitenancy)

That is the commonly discussed hybrid architecture: CLI + MCP Gateway.

When integrating many remote systems and enforcing unified governance and compliance, MCP still has clear value. But for helping agents complete engineering work quickly, CLI-first usually better matches current model capability boundaries.

In today’s engineering reality, CLI is closer to an agent’s working native language; MCP is better positioned as a connection protocol rather than the only execution protocol.