Developer Tools on KnightLi Blog

Google Pay and Wallet Launch Developer MCP Server: Bringing Payment Integration Into AI Assistants

Fri, 29 May 2026 15:24:18 +0800

Google Developers has released the Google Pay and Wallet Developer MCP Server. Built for developers integrating the Google Pay API and Google Wallet API, it connects official documentation, account status, integration checks, and some business metrics to MCP-compatible AI development tools.

This kind of update may not look as flashy as a model launch, but it is very practical for developers. Payment and wallet integrations are often less about whether someone can write the code and more about dense documentation, many configuration items, review requirements, and scattered error feedback. The value of an MCP Server is that it lets an AI assistant help developers troubleshoot from a position much closer to the real context.

What Problem It Solves

When developers integrate Google Pay or Google Wallet, they usually have to switch back and forth across several places:

Checking official documentation and sample code
Confirming account and merchant configuration
Inspecting request parameters and returned errors
Verifying whether the integration meets requirements
Observing call performance and key metrics

If these resources are treated only as web documentation, an AI assistant can mostly explain concepts or generate examples. After connecting to an MCP Server, the assistant can access more specific information through tools and provide suggestions that are closer to the current project state.

Why MCP Fits Here

MCP provides standardized tool interfaces for AI applications. For developer products such as Google Pay and Wallet, it is especially well suited to tasks that combine documentation, status, and validation.

For example, developers can ask an MCP-capable AI tool to help answer:

Which configuration items are missing from the current integration
Why a particular Google Pay request failed
Which fields in a Wallet pass definition may not meet the requirements
How a code sample should be adapted to the current business need
Which items still need to be checked before the integration goes live

These questions are risky if answered only from a large model’s memory. When the answer can combine official tools with current account information, it becomes much more actionable.

What It Means for AI Coding Workflows

This release also points to a trend: AI coding assistants are moving from “reading code” to “reading product systems.”

In the past, when developers asked AI to help integrate payment capabilities, they usually had to paste in documentation snippets, error logs, and code. AI could explain them, but it did not necessarily know the real state of the current integration. MCP Server pushes that capability one step forward, giving AI assistants a chance to work directly around the product integration environment.

This is especially valuable in several scenarios:

A new project integrates Google Pay or Wallet for the first time.
An old project migrates to a new API or configuration approach.
The team performs integration checks before launch.
The team needs to quickly locate issues involving error codes, review problems, or configuration mismatches.

For teams, the benefit is not just reading a few fewer pages of documentation. It is reducing the friction of “the documentation was understood correctly, but the live configuration does not match.”

It Will Not Replace Developer Judgment

Payment and wallet integrations involve security, compliance, and user experience requirements. MCP Server can help AI assistants find information faster, check status, and generate suggestions, but developers still need to confirm the code, security strategy, merchant configuration, and launch process.

This is especially true for payment flows. A seemingly reasonable AI suggestion should not be pushed directly to production. A more reliable approach is to treat MCP tools as inspectors and navigators: they can shorten the time needed to locate a problem, but they cannot replace review, testing, or business responsibility.

My Take

The significance of the Google Pay and Wallet Developer MCP Server is not that there is “one more MCP example.” It is that MCP has been placed inside a real, complex, highly constrained developer scenario.

If MCP only connects to documentation, its value is limited. If it can connect to account status, integration validation, metrics, and product backends, AI assistants can take on more practical development work. Google’s release shows exactly that direction.

Similar capabilities are likely to appear in more cloud services, advertising platforms, payment systems, and enterprise SaaS products in the future. Developers need to adapt not just to “AI can write code,” but to “AI can participate in the entire integration process through standardized tool interfaces.”

Original link: Supercharge your integration workflow with the Google Pay & Wallet Developer MCP Server

Remotion: Generate Videos Programmatically with React

Wed, 27 May 2026 14:39:22 +0800

remotion-dev/remotion is a framework for creating videos programmatically with React. It pulls video production out of traditional timeline tools and turns it into a frontend engineering problem that can be controlled with components, state, data, API, CSS, Canvas, SVG, WebGL, and algorithms.

Project address: remotion-dev/remotion

This kind of tool fits today’s AI coding workflows very well: if an agent can generate web pages, charts, and data views, it can also keep going and generate video scripts, animation components, and renderable short films.

What Problem Does Remotion Solve

Traditional video tools are good at manual editing, but not at scale, parameterization, or automation.

For example, these tasks:

Generate a personalized annual recap video for each user
Automatically generate product demo videos from a database
Combine charts, code snippets, and explanatory subtitles into technical short videos
Batch-generate marketing assets, social media short videos, or course clips
Render videos on demand through CI/CD or backend services

With traditional editing software, these tasks are hard to fully automate. Remotion’s approach is to write video as a React application: every frame is the result of components and data at a specific point in time.

Why React

The reason given in the Remotion README is straightforward: React can reuse Web technologies and component-based development.

It lets you use:

CSS for layout and animation
SVG for vector graphics
Canvas and WebGL for complex drawing
JavaScript / TypeScript for variables, functions, API calls, math, and algorithms
React components for reuse, composition, and fast iteration

This means frontend developers do not need to learn an entirely unfamiliar video DSL from scratch. Many existing UI pieces, charts, design systems, and data logic can be moved into video generation scenarios.

Quick Start

If Node.js is already installed, the entry command given in the README is:

`1`	`npx create-video@latest`

After creating a project, you typically write React components to describe the scene, then let Remotion render the video frame by frame.

For more complete documentation, see:

Docs: remotion.dev/docs
API Reference: remotion.dev/api

What Scenarios Is It Good For

Remotion is best suited to scenarios where “video content is driven by data or code.”

Personalized Videos

Examples include annual recaps, user achievements, order summaries, and learning reports. Each user’s data is different, but the visual structure is the same. Using React components plus data-driven rendering feels more natural than manual editing.

Technical Demo Videos

If a video contains code, charts, product interfaces, step animations, and explanatory text, Remotion is well suited to organizing these elements into templates that can be rendered repeatedly.

Data Videos and Chart Animations

Data visualization is already a frontend strength. Remotion lets charts appear not only on web pages, but also enter videos along a timeline.

AI-Generated Video Workflows

An AI agent can first generate scripts and asset structures, then generate Remotion components, and finally render the video. This is more controllable than asking a model to directly generate the final video, because the intermediate artifact is code that can be inspected, edited, versioned, and reused.

Why It Matters for AI Coding Tools

Remotion is especially interesting for AI coding tools such as Codex, Claude Code, Cursor, and Gemini CLI.

The reason is that video generation is broken down into development tasks:

Generate React components.
Adjust styles and layout.
Connect data.
Preview the scene.
Modify based on feedback.
Render the output.

This workflow is a very good fit for agents: every step has files, code, a preview, and clear feedback. Compared with “directly generating a video file,” code-based video is easier to review and iterate on.

Combined with browser sidebars, screenshot inspection, automated rendering, and comment feedback, Remotion can become the video artifact layer inside an AI workflow.

Check the License Before Use

The Remotion README specifically notes that Remotion has a special license, and that certain company usage scenarios require a company license.

So do not treat it as just another small MIT utility. License requirements may differ for personal projects, open-source projects, commercial projects, and internal enterprise tools. Before using it in company production, you should first read its LICENSE page and official licensing notes.

This is important, especially when connecting Remotion to automated content generation, marketing asset generation, or internal enterprise video pipelines.

My Take

Remotion’s value is not just “making videos with React”; it is turning video into something programmable, reusable, and automatable.

For ordinary frontend teams, it is suitable for data-driven video templates. For AI tools, it is more like a stable output target: the model does not need to generate a black-box video in one shot, but can instead generate readable, editable, renderable React code.

If your content needs batch generation, personalization, updates based on data, or repeated visual adjustments by an agent, Remotion is worth putting into the toolbox. It is not a replacement for traditional editing software, but a way to connect video production to a software engineering workflow.

RTK: A CLI Proxy That Saves Tokens for AI Coding Agents

Wed, 27 May 2026 13:52:01 +0800

rtk-ai/rtk is a command-line proxy for AI coding agents. The idea is straightforward: many agents repeatedly call ls, cat, grep, git status, git diff, test commands, and build commands while working on a project, and the raw output from those commands is often long and repetitive. RTK filters and compresses command output before it enters the LLM context, so the model sees shorter and more useful results.

Project: rtk-ai/rtk

What Problem It Solves

The real cost of AI coding tools is not only the number of model calls. It is also the useless information pushed into the context window.

For example:

ls -la may output lots of permissions, timestamps, and irrelevant files.
git diff may include a lot of repeated context.
When pytest, cargo test, or npm test fails, the important part is the failing case and stack trace, not every passing case.
docker ps, kubectl pods, and aws commands often expose many fields, while the agent only needs a few key details.

If all of that output enters the model context unchanged, it burns tokens quickly. RTK does not replace those commands. It adds a compression layer between them and the AI coding agent.

How RTK Works

RTK’s README positions it as a tool that filters and compresses command results before they reach the LLM context. It is a single Rust binary, supports many common development commands, and emphasizes low overhead.

It mainly applies four strategies:

Smart Filtering: removes comments, whitespace, boilerplate, and low-value noise.
Grouping: aggregates similar items, such as by directory, error type, or status.
Truncation: keeps the relevant context and cuts repeated or unimportant parts.
Deduplication: collapses repeated log lines into shorter counted output.

From the agent’s perspective, the commands are still familiar development commands. The difference is that the result returned to the model is shorter.

Supported Commands

RTK focuses on everyday development workflows:

Files: rtk ls, rtk read, rtk find, rtk grep
Git: rtk git status, rtk git log, rtk git diff
GitHub CLI: rtk gh pr list, rtk gh pr view, rtk gh issue list
Tests: rtk pytest, rtk go test, rtk cargo test, rtk vitest, rtk playwright test
Build and lint: rtk lint, rtk tsc, rtk next build, rtk cargo clippy
Containers and cloud: rtk docker ps, rtk docker logs, rtk kubectl pods, rtk aws sts get-caller-identity
Data and logs: rtk json, rtk deps, rtk env, rtk log

This kind of tool is most useful when an agent reads command output frequently. It does not write code for you. It helps the agent read less noise.

Installation and Integration

The README lists several installation options.

Homebrew:

`1`	`brew install rtk`

Quick install on Linux/macOS:

`1`	`curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh \| sh`

Cargo:

`1`	`cargo install --git https://github.com/rtk-ai/rtk`

After installation, check the version and stats:

1
2

rtk --version
rtk gain

To integrate it with AI coding tools, use the init commands:

rtk init -g
rtk init -g --gemini
rtk init -g --codex
rtk init -g --agent cursor
rtk init --agent windsurf
rtk init --agent cline

Restart the corresponding AI tool after initialization. For hook-based agents, Bash commands are rewritten before execution. For example, git status becomes rtk git status, and the agent receives compressed output.

What To Watch Out For

RTK’s benefit depends on whether the agent actually reads information through shell commands.

The README specifically notes that built-in tools such as Claude Code’s Read, Grep, and Glob do not pass through the Bash hook, so they are not automatically rewritten by RTK. To make RTK participate, use shell commands such as cat, head, tail, rg, grep, and find, or call rtk read, rtk grep, and rtk find directly.

That point matters. RTK is not a globally transparent compressor for all agent I/O. It is closer to a proxy at the shell-command layer.

Windows users should also note that the README recommends placing rtk.exe in PATH and running it from Command Prompt, PowerShell, or Windows Terminal instead of double-clicking it. It also says WSL is more natural if you want the full hook experience.

Who Should Use It

RTK is useful for three groups:

Heavy AI coding users: people who ask agents to run many git, rg, test, and build commands every day.
Large-repository users: teams whose command output often reaches hundreds of lines and drowns the agent in irrelevant context.
People who care about token cost and context windows: users who want the model to focus on failures, changes, and key files.

If your project is small, or if your agent mostly reads files through IDE-native tools, RTK may feel less impactful. It shines when command-line output is both long and frequent.

My Take

RTK points in a practical direction. Many AI coding workflows focus on stronger models, larger context windows, and longer tasks, but there is still a plain development problem: agents often read far more than they need.

Compressing command output before handing it to the model can reduce token usage and lower the chance that noise steers the model in the wrong direction.

It is not a magic switch, though. To use RTK well, the agent workflow needs to lean on shell commands, and you need to confirm that the hook actually works in your current tool. For Codex, Claude Code, Gemini CLI, Cursor, and Windsurf, RTK is a useful context throttle to test: it does not change the development commands themselves, but it gives the agent cleaner results to read.

Reading the Official Codex Article: How to Get the Most Out of Codex

Wed, 27 May 2026 08:21:18 +0800

Most developers start using Codex with code tasks: reading a repository, editing a diff, running tests, and opening a pull request.

That is still Codex’s core use case. But a lot of work on a computer is already surrounded by code and tools: running shell commands, browsing web pages, calling APIs, exporting documents, responding to messages, and triggering automation. As these capabilities gradually connect to Codex, it becomes less a narrow coding assistant and more a system that helps you complete work on a computer.

The Codex app makes this shift more concrete. A thread can preserve context, call tools, display artifacts, and keep moving across multiple rounds of prompts instead of restarting every conversation from scratch.

To use Codex more fully, the key is combining these capabilities:

Durable threads for preserving long-term context
Voice input, steering, and queuing, so the user still controls the process
browser, computer use, MCP servers, and connectors, so Codex can move beyond the repository
thread automations and Goals, so tasks can keep progressing after the user leaves
The sidebar for reviewing code, documents, slides, web pages, and other artifacts
Shared memory, which writes important context outside the thread

Durable threads

Durable threads are long-running threads that can preserve work context across multiple sessions.

Pinned threads are a very practical entry point. They are a good place for workflows you return to repeatedly, such as:

A Chief of Staff thread
A release thread
A document review thread
An external monitoring thread

These are not temporary chats, but persistent workspaces. Codex can return to the same thread later and reuse prior decisions, preferences, and background information, avoiding the need to rebuild context from zero every time.

Keyboard shortcuts also make this smoother. Command-1 through Command-9 can jump directly to saved threads.

Voice input

The value of voice input is that it captures ideas before they have been organized into formal text.

Codex has built-in voice input. It is especially useful for fuzzy starting points that feel natural to say but awkward to type:

1
2
3

我记得 Slack 里好像有个叫 Ben 的人提过这个。
具体细节我不记得了。
帮我去找一下。

For an agent that can search, organize context, and report back, this is often enough to get started.

Voice is also useful for two- or three-minute thought dumps. Meeting transcripts, dictated planning notes, and unorganized raw records are often more useful than a one-sentence summary, because they preserve uncertainty, emphasis, and unfinished lines of thought.

Steering and queuing

Voice becomes more useful when combined with explicit control.

Steering means inserting a new direction while a Codex task is running, so it can change course before the current step finishes.

For example, while reviewing a web page, the user can annotate in the sidebar and interrupt the current task at the same time:

1
2
3

这里再小一点。
这两个元素之间的间距不对。
这句文案写错了。

Queuing is different. It does not interrupt the current task; it places the next piece of work in the queue:

`1`	`这项工作完成后，把预览链接发给 Slack 里的 reviewer。`

Steering changes what Codex is doing right now. Queuing changes what it should do next. Both keep the user close to the work as the task unfolds.

Tools and reachable scope

Once threads have continuity, the next question is: what can they operate?

Codex can expand outward layer by layer:

$browser: good for web page inspection, annotation, and review in the sidebar
@chrome: good for browser workflows that depend on the user’s Chrome login state
@computer: good for tasks that can only be completed through a desktop GUI

MCP servers and connectors extend the same idea into more workflows. Slack, Gmail, and Calendar matter because many tasks do not first appear as code. They appear as messages, emails, and calendar problems.

Skills are good for solidifying repeated work. Once a process has proven useful, it can be packaged as a skill so Codex does not need to relearn the same steps next time.

Continue working from anywhere

The Codex mobile app changes how long the user has to stay in front of a computer.

A task can start on a Mac because the files, permissions, and local environment are there. Later, the user can leave the desktop and continue confirming, adding details, or changing direction from a phone.

This is valuable in many small scenarios: while Codex runs a long task, the user can leave their desk; if it needs confirmation, they can respond while away; if the direction is wrong, they can redirect it in time. What truly stays in place is the local environment, not the user.

Automation

Automations can run Codex work on a schedule.

If a recurring task should restart from a specific workspace, such as a daily report or routine repository check, scheduled automation is a good fit. If the schedule should return to an existing conversation and reuse its context, thread automation is better.

Thread automations are more like heartbeat wake-ups: they return to the same Codex thread on a fixed rhythm.

Pinned threads require the user to come back actively, while thread automation can check every few minutes or every few hours, keep running until a condition is satisfied, and adjust its rhythm over time.

For example, a Chief of Staff thread could run every 30 minutes:

1
2
3

每 30 分钟检查 Slack 和 Gmail，找出需要我注意但还没有回复的消息。
帮我判断哪些最重要。
如果有人问我问题，尽可能深入研究答案，并替我起草回复，但不要发送。

When the user returns, the most time-consuming context gathering is often already done. The actual decision about whether to send still belongs to a human.

Thread automations are also useful for feedback loops. They can periodically check pull request comments, Google Docs comments, or Slack replies, continuing adjacent work while the user is away.

For example, in an animation workflow, a reviewer sends video feedback in Slack, and a thread automation checks the thread on a schedule. If there is a new comment, it re-renders the version and replies to the reviewer in the same Slack thread. If an integration cannot complete the final upload, desktop automation can still fill in the last step through the GUI.

This loop may cross Slack, the codebase, and desktop applications, but to the user it still stays inside one workflow.

Goals

Goals are best suited to tasks that have a clear endpoint and can be pushed forward continuously by an agent.

A weaker goal might be:

`1`	`实现这个 Markdown 文件里的计划。`

A stronger goal has measurable completion criteria.

For example, when migrating an internal tool from Python to Rust, you can first create the new directory and then define the target clearly: the new implementation is only complete once the unit tests pass.

A Goal is essentially continuous execution plus a verifier. The user needs to define the outcome, the stopping condition, and the signals that indicate whether Codex is getting closer to the goal.

Common verifiers include:

Test suites
benchmark
bug reproduction
validation matrix
End-to-end workflows that must keep passing

A task can be ambitious, but without verification criteria, it is more like a wish than a goal.

The sidebar places work artifacts next to the conversation that generated them. The user does not need to export files, switch context, and then describe the problem afterward. The artifact might be code, but it could also be a deck, PDF, web page, spreadsheet, or another artifact generated during the work.

It is especially useful for four types of work:

Inspecting an artifact
Annotating places that need changes
Operating a web interface
Reviewing changes

Markdown, spreadsheets, data tables, documents, and slides can all be viewed directly in the sidebar. The user can inspect, annotate, and revise them without turning the process into another handoff.

If it is a deck or PDF, it can stay beside the thread that produced it and accept review and fixes at any time.

The browser is a similar work surface. Codex can open a rendered page, inspect it, respond to user annotations on the page, and continue fixing the same object. A web page is both the output and the control surface.

These surfaces are especially good fits for the sidebar:

Lightweight static artifacts such as index.html
Storybook
Remotion Studio
Browser slides
Data analysis applications

A standalone index.html file can become a long-lived interactive artifact without necessarily requiring a server. Thread automations can also refresh static artifacts on a schedule, so the user sees new results when they return.

Shared memory

Long threads are useful, but important context should not exist only in the conversation history.

Shared memory means storing durable context outside the thread, so future work can continue from an explicit, reviewable place.

One stable practice is to anchor durable threads in an Obsidian vault. In practice, this can be very simple: a set of ordinary files that are easy to inspect, edit, move, and preserve long term. Teams can put it in cloud storage, Git, Dropbox, Google Drive, or another sync layer.

A vault might look like this:

vault/
├── TODO.md
├── people/
├── projects/
├── agent/
└── notes/

A top-level AGENTS.md can explain how Codex should maintain this workspace: what information should be written down, where it should go, and when not to create noise.

A practical AGENTS.md might look like this:

- Treat ~/vault as durable work memory.
- Prefer canonical notes over note sprawl.
- Route TODOs, people, projects, daily summaries, and scratch notes explicitly.
- Preserve decisions, blockers, owners, dates, and useful links.
- If nothing meaningful changed, do not churn the vault.

Do not copy a particular vault structure blindly. What matters more is teaching the agent where long-term context should live, which information is worth preserving, and when it should avoid repeatedly changing files.

The repository stores code. The vault stores rolling context: relevant people, what happened, where things are blocked, who owns what, what comes next, and the details that would otherwise disappear between conversations if they were not written down.

Codex also has first-party memory capabilities, configurable in Settings > Personalization > Memories. They are good for recording preferences, repeated workflows, and common pain points, but they work better as a supplement to explicit written context than as a replacement. Chronicle is moving in the same direction as well: helping Codex build memory from recent screen context.

Expanding outward from code

Codex still starts with code. But more of the work around code can now be reached by the same system: MCP servers, browser interfaces, desktop control, thread automations, and reviewable artifacts.

This changes how Codex is controlled. Steering interrupts work in progress. Queuing schedules the next step. Thread automations keep a thread active after the user leaves. Goals add clear endpoints and verification signals to long-running tasks.

When these capabilities connect, Codex can move a workflow from instruction to execution and then onward to artifact review. Even after a task has left the code repository, it can still be completed inside the same system.

Original link: Getting the most out of Codex

How to Fix Codex Goal Failed to Set Goal

Wed, 27 May 2026 08:17:57 +0800

Some users have recently reported that Codex Goal immediately shows Failed to set goal or a goal-setting failure when they try to use it. This error does not appear to depend on prompt length, and it can happen in both the Codex app and the VS Code extension.

Based on the feedback in the discussion, this issue looks more like an abnormal local feature flag or configuration state than a mistake in the goal content itself.

First check the goals feature switch

The most direct fix is to check the Codex configuration file:

`1`	`~/.codex/config.toml`

Confirm that it has a [features] section and that goals is enabled:

1
2

[features]
goals = true

If [features] already exists, just add goals = true under that section. If the section does not exist, create it.

After making the change, restart the Codex app or the VS Code extension, then try setting the Goal again.

If the issue continues, check the configuration directory

Some feedback also mentioned that abnormal cache or temporary files inside the .codex directory can trigger similar issues.

A safer handling sequence is:

Back up ~/.codex/config.toml.
Close Codex-related applications.
Temporarily move or rename the ~/.codex directory.
Open Codex again and let it recreate the configuration directory.
Merge the settings you still need from the original config.toml.

Do not delete the configuration directory directly, especially because it may contain configuration, skills, sessions, or other local state that you maintain manually.

Also check security software on Windows

There has also been feedback that Windows Defender may treat config.toml as a suspicious file. This is not necessarily the cause for everyone, but if you run into the same issue on Windows, it is worth quickly checking your security software’s quarantine history.

If the configuration file has been quarantined, renamed, or blocked from access, Codex may not be able to read the feature switch, which can also make Goal fail to enable.

How to tell whether the prompt is the problem

A simple way to check is to test with a very short goal.

For example:

`1`	`Fix a failing test`

If even a very short goal immediately reports Failed to set goal, it is probably not a prompt-writing issue. It is more likely a problem with local configuration, the feature switch, extension state, or the cache directory.

If only very long and complex goals fail, then consider whether the goal content is too complex, contains special links, or uses a field format the UI does not accept.

Summary

You can troubleshoot Codex Goal’s Failed to set goal in this order:

Check ~/.codex/config.toml.
Add goals = true under [features].
Restart the Codex app or the VS Code extension.
If it still fails, back up the configuration and rebuild ~/.codex.
On Windows, also check whether Defender or other security software has mistakenly blocked the configuration file.

The key point is not “how should I write the goal”, but first confirming that the Goal feature itself is properly enabled in the local configuration.

oh-my-codex: Adding Workflows, Skills, and Runtime Guardrails to Codex CLI

Mon, 25 May 2026 07:41:45 +0800

Yeachan-Heo/oh-my-codex, or OMX, is a workflow layer around OpenAI Codex CLI.

Project: https://github.com/Yeachan-Heo/oh-my-codex

It is not trying to build yet another coding agent. Its goal is to give people who already use Codex CLI a steadier daily workflow: start sessions with project guidance, clarify before planning when tasks get complex, keep durable goals and state during execution, and use review plus QA to hold the result together at the end.

At the time of writing, the GitHub page shows about 29.4k stars, and the latest release is v0.18.1, published on May 21, 2026. The README also makes it clear that the official project is Yeachan-Heo/oh-my-codex, and the official npm package is oh-my-codex. Third-party projects using names such as “OMX v2” should not be treated as official continuations unless this repository says so.

What it is

OMX does not replace Codex.

It keeps Codex CLI as the actual execution engine and mainly adds three things:

A more consistent task workflow.
Reusable prompts, skills, and specialist agents.
Plans, logs, state, and runtime records under .omx/.

In other words, Codex does the work, while OMX makes that work look more like an engineering process. That is also the main difference between OMX and a normal prompt pack: it is not just a pile of rules inside a system prompt. It breaks clarification, planning, execution, checking, team coordination, and runtime diagnostics into callable surfaces.

Recommended installation

The README and Getting Started docs both emphasize that the recommended default path is macOS or Linux with Codex CLI. Native Windows and Codex App are not the primary experience today and may behave inconsistently or receive less complete support.

If Codex CLI is already installed, start with:

codex --version
npm install -g oh-my-codex
omx setup
omx doctor

If you do not have Codex CLI yet and want npm to manage it:

npm install -g @openai/codex
npm install -g oh-my-codex
omx setup
omx doctor

One detail matters: do not combine @openai/codex and oh-my-codex into one global install command on a machine where Homebrew already owns the codex binary. The README notes that npm may hit an EEXIST conflict with a Homebrew-owned binary. OMX only needs a working, authenticated codex command on PATH; Codex does not have to be installed through npm.

After installation, run a real execution smoke test:

1
2

codex login status
omx exec --skip-git-repo-check -C . "Reply with exactly OMX-EXEC-OK"

omx doctor can show that the local install shape looks sane, but it cannot prove that the Codex account, proxy, base URL, and authentication path in the current shell/profile can actually make a model call. This distinction matters when you switch between different HOME directories, containers, remote environments, or local OpenAI-compatible proxies.

Default workflow

The main OMX workflow is roughly:

$deep-interview "clarify the authentication change"
$ralplan "approve the auth plan and review tradeoffs"
$prometheus-strict "stress-test the plan before durable execution"
$ultragoal "turn the approved plan into durable Codex goals"

The most common path is three steps:

$deep-interview: ask for boundaries, goals, and non-goals when the request is still unclear.
$ralplan: turn the request into a plan, then confirm it through architectural and critical review.
$ultragoal: turn the approved plan into more durable goals and checkpoints.

If a task needs parallel coordination, use $team inside an Ultragoal story. If it only needs one persistent owner, use $ralph. The naming can feel heavy at first, but the idea is simple: do not let an agent start changing files as soon as it hears a request. First write down what to do, how to do it, how to verify it, and when to stop.

What skills and agents provide

OMX groups skills into several families.

Canonical Workflow includes $deep-interview, $ralplan, $prometheus-strict, $ultragoal, $code-review, and $ultraqa. These are aimed at complete engineering tasks: clarify, plan, execute, review, and QA.

Execution Modes include $team, $ralph, $autopilot, $ultrawork, and others. They decide whether a task moves through a single line, a team runtime, or a stronger autonomous loop.

The Agent Catalog is more like a role library. It includes analyst, planner, architect, debugger, executor, verifier, security-reviewer, performance-reviewer, code-reviewer, test-engineer, designer, researcher, and more. You do not need to invoke these roles manually every day, but they show that OMX is not a “one big prompt” system. It decomposes engineering work into reusable roles and phases.

That matters for long-running projects. Many AI coding failures do not happen because the model cannot write code at all. They happen because it moves into execution too quickly and skips requirement confirmation, architecture boundaries, test baselines, and final review. OMX tries to make those steps harder to skip.

Plugin shape and runtime state

The README says the repository also includes an official Codex plugin layout at plugins/oh-my-codex, with marketplace metadata.

The docs are also clear that this plugin shape is not a replacement for npm install -g oh-my-codex plus omx setup. The plugin mainly packages hooks, the skill surface, and Codex lifecycle integration. At runtime, it still depends on the installed omx CLI.

The latest v0.18.1 release also focuses on this area: plugin installs now use a pinned OMX launcher, hook failures are safer, Ultragoal state mutations are serialized, release packaging excludes crate-local .omx runtime caches, and npm, Cargo workspace, lockfiles, and the plugin manifest all share the same version.

These changes show that OMX is no longer just a prompt repository. It is also dealing seriously with install shape, hook safety, state writes, release package contents, and runtime consistency. For developer tooling, these are not flashy details, but they matter a lot.

Who it fits

OMX is best for developers who already use Codex CLI seriously, especially in situations like these:

You often ask Codex to handle multi-file, multi-step tasks.
You want the agent to clarify requirements before editing code.
You want planning, execution, checking, review, and QA to be separate stages.
You want .omx/ state, plans, and logs in the project.
You want to try tmux/team runtime or stronger long-task execution.
Your team wants to turn its engineering habits into reusable skills and prompts.

If you only ask Codex to change one line of config, generate a small script, or explain a code snippet, OMX may feel heavy. It is more like a tool belt for frequent AI coding users than a required first layer for beginners.

Things to watch

First, do not treat OMX as a guarantee of unattended completion. It can strengthen the workflow, but it cannot decide for you whether a requirement is reasonable, whether an architecture should change, or whether a risk is acceptable.

Second, pay attention to platform boundaries. The README currently recommends macOS/Linux plus Codex CLI. Native Windows exists, but it is not the default best experience. If you use Windows, WSL2 is usually the steadier path.

Third, omx doctor is not final validation. The stronger proof is a real model call such as codex login status plus omx exec.

Fourth, stronger workflow requires clearer task boundaries. $ultragoal, $team, and $autopilot are best for tasks with concrete acceptance criteria. If the request is still vague, use $deep-interview or a normal conversation first.

Summary

The value of oh-my-codex is not that it turns Codex into a different tool. It adds a more engineering-oriented working layer to Codex CLI.

It moves AI coding from “I say one thing, you make one pass” toward “clarify, plan, execute, check, and record state.” For lightweight tasks this may be too much. For people who use Codex on real projects often, stable workflows, reusable skills, runtime diagnostics, and durable goals can be exactly what saves effort.

If Codex CLI is already part of your daily development setup, OMX is worth trying. Even if you do not install it directly, its breakdown of skills, agents, planning, and acceptance flow is useful material for improving your own AI coding workflow.

References

Yeachan-Heo/oh-my-codex: https://github.com/Yeachan-Heo/oh-my-codex
Getting Started: https://github.com/Yeachan-Heo/oh-my-codex/blob/main/docs/getting-started.html
Agent Catalog: https://github.com/Yeachan-Heo/oh-my-codex/blob/main/docs/agents.html
Skills Reference: https://github.com/Yeachan-Heo/oh-my-codex/blob/main/docs/skills.html
v0.18.1 release: https://github.com/Yeachan-Heo/oh-my-codex/releases/tag/v0.18.1

CLI-Anything: Turning Software into an Agent-Usable Command Line

Mon, 25 May 2026 00:24:36 +0800

CLI-Anything is an open-source Agent tooling project from HKUDS. Its goal is to turn software that was originally designed for human GUI operation into command-line interfaces that AI Agents can call more easily. It does not reimplement a simplified version of the software. Instead, it builds a CLI harness around the existing codebase and real backend, allowing Agents to complete tasks through stable commands, stateful sessions, and structured output.

This direction addresses one of the most common gaps when Agents use software: GUI automation depends on screenshots, clicks, and coordinates, so it is easily affected by interface changes; a single API is also often incomplete, forcing the Agent to stitch together a large amount of context on its own. CLI-Anything chooses to condense software capabilities into a command line because commands are naturally easier for models to read, combine, and verify, while also fitting neatly into scripts and automation workflows.

How it works

The official repository describes CLI-Anything as a pipeline for automatically generating CLIs. After receiving a local software source path or a GitHub repository URL, the process analyzes the code structure, identifies the backend and data models, designs command groups, and then implements the CLI, tests, and documentation.

The generated CLI usually supports two usage modes. One is a REPL for continuous work, which preserves project state. The other is a subcommand mode, which is better suited to scripts and pipelines. Commands also provide JSON output so Agents can parse results directly, while still keeping a human-readable format for debugging.

In the official example, the Claude Code plugin can be used like this:

1
2
3

/plugin marketplace add HKUDS/CLI-Anything
/plugin install cli-anything
/cli-anything <software-path-or-repo>

If a harness has already been generated for a piece of software, later usage is closer to a normal Python CLI:

cd <software>/agent-harness
pip install -e .
cli-anything-<software> --help
cli-anything-<software>
cli-anything-<software> --json <command>

Where it fits

CLI-Anything is especially suitable for scenarios where “the capability exists in real software, but the Agent cannot operate it reliably.” Examples include image, video, audio, office documents, 3D modeling, data analysis, or AI/ML toolchains. As long as the project has an analyzable codebase, a callable backend, or a clear data model, it has a chance to be wrapped as a command set that Agents can use.

Its value is not merely adding another layer of wrapping in the command line. The real value is turning key software operations into discoverable, composable, and testable interfaces. An Agent can first understand capabilities through --help, then receive results through JSON output, and connect multiple commands into a workflow. For tasks that require batch processing, automatic validation, and continuous iteration, this is more controllable than temporarily asking an Agent to click through an interface.

Boundaries to keep in mind

CLI-Anything does not mean that any software can be integrated instantly at no cost. It depends on the target software’s source code, backend capabilities, file formats, and testability. If a piece of software is highly closed and its key logic exists only in the GUI layer, the difficulty of generating a high-quality CLI rises significantly.

The official methodology also emphasizes real backends and test validation. This means generating a harness is not finished after writing a few command wrapper scripts. To use it for serious work, you still need to confirm command coverage, output format, dependency installation, real software invocation, and end-to-end test reliability. A more realistic approach is to first generate a CLI for a clearly defined workflow, then gradually fill in capabilities through commands such as refine, test, and validate.

Summary

CLI-Anything’s idea is direct: instead of making Agents adapt to fragile human interfaces, add a stable, structured, and testable command-line entry point to existing software. It is suitable for people who want to bring professional software into Agent workflows, and also for developers studying the shape of “Agent-native software.” In real adoption, the key question is not how much code one command can generate, but whether the generated CLI can call real capabilities, preserve state, output structured results, and stand up to testing.

What Is GitHub Spec Kit? Using Spec-Driven Development to Tame AI Coding

Mon, 25 May 2026 00:19:14 +0800

GitHub’s Spec Kit is a new toolkit for AI coding. Its goal is to help developers practice Spec-Driven Development.

The problem it tackles is straightforward: many AI coding workflows today feel too much like “chat while coding.” A human gives a rough idea, and an Agent immediately starts changing code. It looks fast in the short term, but the requirement boundaries, acceptance criteria, technical trade-offs, and task breakdown often never settle into anything durable. Once a project becomes even slightly complex, it easily turns into one-off vibe coding.

Spec Kit takes the opposite route: write the spec clearly first, then move into planning, tasks, and implementation. Code is no longer the first step. The spec is.

What Is Spec Kit?

Spec Kit is GitHub’s open-source toolkit for spec-driven development. It provides the specify CLI, templates, scripts, and commands for AI coding agents, allowing teams to advance development around the same set of structured artifacts.

Its emphasis is not “make AI ask fewer questions.” Instead, it asks AI to generate and refine these things before writing code:

Project principles: the team’s constraints around quality, testing, experience, performance, and similar concerns;
Feature specs: what to build, why to build it, user stories, and functional requirements;
Technical plans: which technology stack to use, how to implement it, and what architecture decisions are involved;
Task lists: breaking the plan into executable steps;
Implementation process: changing code step by step according to tasks, instead of making one chaotic batch of edits.

This workflow makes AI coding feel more like engineering collaboration, rather than a one-time prompt performance.

Basic Usage Flow

The official README describes a getting-started flow roughly like this:

1
2
3

uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@vX.Y.Z
specify init my-project --integration copilot
cd my-project

After initialization, the project gets a .specify directory, templates, scripts, and commands for Agent integration. You then use /speckit.* commands inside a supported AI coding agent to move development forward.

A typical sequence is:

/speckit.constitution
/speckit.specify
/speckit.clarify
/speckit.plan
/speckit.tasks
/speckit.implement

Here, /speckit.constitution establishes project principles, /speckit.specify describes product requirements, /speckit.clarify fills in ambiguity, /speckit.plan generates a technical plan, /speckit.tasks breaks the work into tasks, and /speckit.implement finally performs the implementation.

This is very different from directly telling an Agent, “Help me build an app.” Spec Kit asks you to first make clear what to build and how it will be accepted, then let the Agent start working.

It Changes the Entry Point of AI Coding

Traditional AI coding often starts with code:

`1`	`I want to build a task management app. Help me write it.`

Spec Kit is closer to this:

First define the users, scenarios, feature boundaries, acceptance criteria, and non-goals for this task management app;
then choose the technical approach based on those specs;
then break it into tasks;
finally implement it step by step.

This shift matters. AI is very good at executing from context, but if the context itself is loose, faster execution can also mean faster drift. Spec Kit turns context into files and templates, so requirements, plans, and tasks can all be reviewed, revised, and version-controlled.

In other words, it is not making AI more “free.” It is giving AI clearer engineering rails on which to work freely.

How to Understand the Core Commands

`/speckit.constitution`

This is the project’s “constitution.” It creates or updates .specify/memory/constitution.md, which records long-term principles for the project, such as code quality, testing standards, user experience consistency, performance requirements, and rules for technical decisions.

This step is best for writing down team consensus, not requirements for a single feature.

`/speckit.specify`

This is the feature specification phase. You describe what you want to build, who the users are, what problem it solves, and what the core flows are.

The official guidance specifically emphasizes that this phase should not focus too early on the technology stack. First make the what and why clear, then discuss the how.

`/speckit.clarify`

This is the phase for filling gaps. Many requirements have holes the first time they are written: how should permissions work? What are the error states? Does the data need to be persisted? How should edge cases be accepted?

The value of /speckit.clarify is that it lets the Agent actively find uncertain points in the spec and write the answers back into the specification document, reducing rework later.

`/speckit.plan`

This is the technical planning phase. Only here do you start to define the framework, database, architecture, APIs, testing strategy, and constraints.

If /speckit.specify is product language, then /speckit.plan is engineering language.

`/speckit.tasks`

This step breaks the plan into executable tasks. A good task list should let the Agent advance step by step, while still letting humans understand the purpose of each step.

`/speckit.implement`

Only at the end do you enter implementation. The Agent modifies code according to the specs, plans, and tasks that have already been settled. At this point, it is no longer guessing requirements from a single large prompt; it is executing inside a set of structured documents.

Why It Fits AI Coding

Spec Kit’s value is not in any single magic command. It is in restoring the things most easily lost in AI coding:

Requirements can be reviewed;
Plans can be discussed;
Tasks can be traced;
Decisions have context;
Artifacts can enter Git history;
Teams can reuse templates and principles;
The Agent’s implementation no longer depends only on a one-time chat record.

This is especially useful for complex projects. The more a project involves multiple collaborators, long-term maintenance, or high quality requirements, the less it can rely only on temporary prompts to drive development.

Extensions and Presets

Spec Kit also provides two kinds of customization:

Extensions: add new commands, new templates, or integrations with external tools;
Presets: change the format and terminology of existing spec, plan, and task templates.

In simple terms, use an Extension when you want to add new capability; use a Preset when you want to reshape the workflow style.

For example, a team can use a Preset to require security review, compliance traceability, domain terminology, or test-first rules. It can also use an Extension to add Jira integration, code review, project health checks, and other new phases.

This means Spec Kit is not trying to lock every team into the same workflow. It provides an extensible skeleton for spec-driven development.

Who Is It For?

Spec Kit is suitable for scenarios like these:

Using an AI coding agent to prototype a new project;
Turning vibe coding into a repeatable workflow;
Standardizing the requirement and planning format before teams let AI generate code;
Projects that need clear acceptance criteria and testing requirements;
Bringing requirements, plans, tasks, and implementation history into version control;
Teams exploring GitHub Copilot, Claude Code, Codex CLI, and similar tools in a team setting.

It is not necessarily a good fit for very small one-off scripts. For problems that can be solved in a few lines of code, the full spec workflow may feel heavy. But once a task involves multiple pages, multiple modules, state management, permissions, data models, or long-term maintenance, Spec Kit’s structure starts to pay off clearly.

My Take

Spec Kit represents an important turn in AI coding tools: from “make the Agent write code faster” toward “make the Agent participate in software engineering more reliably.”

Earlier AI coding focused on prompts and model capability. Spec Kit focuses more on process, artifacts, and constraints. It reminds us that the faster AI writes code, the less we can afford to skip specs, plans, and acceptance criteria.

If you are already used to letting AI implement features directly, you can try changing the starting move with Spec Kit:

First let AI help you write the requirement as a spec, then let it write the code.

That step may look slower, but in practice it reduces the later rework of “the code is done, but it is not what I wanted.”

References

github/spec-kit

What Is OpenAI Symphony? Codex Orchestration, Issue-Driven Development, and AI Agent Workflows

Mon, 25 May 2026 00:17:32 +0800

OpenAI recently open-sourced an interesting Codex orchestration specification: Symphony.

It is not another chat-based coding assistant, nor is it a complete new IDE. More precisely, Symphony is a way to orchestrate work around Codex: it turns an issue tracker similar to Linear into the control plane for coding agents, so every open task can correspond to a continuously running Agent.

One line from the official article captures its direction well: in the past, engineers had to monitor multiple Codex sessions at once, continually assigning work, reviewing output, correcting course, and restarting sessions. Symphony is designed to address exactly that context-switching bottleneck.

Symphony is not solving code writing, but Agent management

A single Codex session works well for interactive development: you give it a task, it changes the code, you review it, and then you keep asking follow-up questions. But once a team starts using multiple Agents at the same time, the problem shifts from “can the code be written?” to “who is working on what, how far along is it, and who takes over after a failure?”

OpenAI’s approach is to move the center of gravity from “sessions” to “tasks”:

the issue is the real unit of work;
every open issue can map to an independent Agent workspace;
Symphony continuously polls the task board and decides which tasks should be started, retried, stopped, or reclaimed;
Codex performs implementation, testing, commits, PR creation, status updates, and related actions inside the workspace;
humans no longer micromanage every session, but instead review results, adjust goals, and maintain boundaries.

The shift behind this is important: an Agent is no longer just a tool that humans temporarily summon, but a continuously running kind of executor inside the development workflow.

Why an issue tracker?

Because teams already use issue trackers to manage real work.

Requirements, bugs, refactors, migrations, research, priorities, blockers, owners, and milestones are already recorded in Linear, GitHub Issues, or similar systems. Symphony does not reinvent a large console. Instead, it treats these existing systems as the task entry point for Agents.

This has several advantages:

Work does not need to be copied from an issue into a chat window.
Humans can keep creating, splitting, scheduling, and closing tasks in familiar ways.
Agent state changes can be written back to the same work system, making async collaboration easier for the team.
Task dependencies can naturally form a DAG, allowing unblocked tasks to move forward in parallel.

If traditional CI is “automation after a code commit,” Symphony is closer to “automation after an issue is created.”

Its core workflow

A typical Symphony flow can be understood as:

创建 issue
  -> Symphony 轮询到可执行任务
  -> 为该 issue 创建独立 workspace
  -> 启动 Codex agent session
  -> Agent 阅读任务、修改代码、运行测试
  -> 创建或更新 PR
  -> 写回任务状态、评论、证据和交付物
  -> 人类 review、合并或要求修改

The official specification also emphasizes several engineering details:

each issue uses an independent workspace to reduce cross-contamination;
the orchestrator maintains retry, concurrency, and recovery state;
workflow policy lives in the repository’s WORKFLOW.md, so teams can version the rules that describe how Agents should handle tasks;
implementations need to preserve observability, with at least structured logs;
a successful state does not have to be Done; it can also be an intermediate state handed to humans for review.

This shows that Symphony is not simply about “letting AI write code automatically.” It defines a runnable, recoverable, and auditable Agent work system.

Goal-driven, not a rigid state machine

OpenAI mentions an important shift in the article: early on, they tried hard-coding many actions in the outer harness, such as committing code, running tests, and handling GitHub workflows. But as Codex became more capable, that approach started to constrain the Agent.

The later direction was to give the Agent a goal, rather than encoding every step as a fixed state transition.

For example, a task’s goal might be “complete the Vite migration and ensure CI passes.” The Agent can decide for itself whether it needs to change configuration, fix tests, read CI logs, handle review feedback, or even create new follow-up issues. Symphony provides boundaries, context, and the runtime framework instead of prescribing every action for the Agent.

This is also where it differs from traditional automation scripts: scripts are good at repeated, deterministic processes; Symphony is aimed at engineering tasks with uncertainty.

How is this different from normal Codex usage?

A normal Codex session is more like “a human writes code with AI”:

the human opens a session;
the human describes the task;
the human watches the output;
the human corrects course at any time;
after one task ends, the human starts the next session.

Symphony is more like “a team hands a task pool to a group of Agents”:

humans write clear issues;
the system continuously discovers executable tasks;
Agents make progress in isolated environments;
results come back as PRs, comments, test status, videos, or analysis reports;
humans review at key checkpoints.

This is not about replacing engineers. It is about freeing engineers from the burden of simultaneously watching many sessions. OpenAI notes in the official article that some teams saw a significant increase in PRs merged to the main branch. But the more important point is the change in working style: the startup cost of trying an idea, launching a refactor, or validating a hypothesis becomes lower.

Where does it fit?

Symphony is better suited to tasks such as:

routine feature implementation;
small refactors in an existing codebase;
infrastructure migrations;
dependency upgrades;
filling in tests;
CI fixes;
research followed by an implementation plan;
continuing to revise a PR based on review feedback.

It is not necessarily a good fit for highly ambiguous tasks that require strong business judgment or architectural decisions. For those problems, an interactive Codex session is still more natural because humans need to stay involved throughout the process.

Risks and boundaries

Symphony is appealing, but in real adoption, teams cannot look only at the “automation” side.

Several boundaries need to be made clear in advance:

issues must be written clearly, otherwise Agents will amplify vague requirements into incorrect implementations;
Agent permissions should be constrained, especially access to repositories, secrets, production environments, and third-party services;
every workspace should be isolated to avoid contamination between tasks;
CI, tests, lint, and review remain necessary quality gates;
task status, PR links, logs, and failure reasons need to be traceable;
human review cannot be skipped, especially for changes involving security, billing, data migration, and permission logic.

The official repository also positions Symphony as an engineering preview and reference implementation for trusted environments, not a finished platform that can blindly replace a development process.

My understanding of Symphony

The most valuable part of Symphony is not that it uses Linear, nor that the reference implementation chose Elixir. Its value is that it redefines the entry point for programming Agents.

In the past, we were used to starting AI coding from a chat window. That is flexible, but once the scale grows, human attention becomes the bottleneck. Symphony puts the entry point back in the issue tracker and lets Agents work continuously around real tasks. In that sense, AI coding starts moving from a “personal productivity tool” toward “team workflow infrastructure.”

If you are already using Codex, Claude Code, Cursor Agent, or similar tools, the most important thing to notice about Symphony is not any specific implementation, but the pattern behind it:

Do not only manage Agent sessions. Manage the work that needs to be done.

This may become a key dividing line for the next stage of AI coding tools.

References

What is browser-harness? A browser automation tool that lets AI agents control real Chrome

Sun, 24 May 2026 17:19:54 +0800

browser-use/browser-harness is a browser control tool for AI agents. Its goal is not to build another heavy automation framework, but to connect large language models directly to real Chrome through CDP, so they can browse pages, click, take screenshots, download files, upload files, and fill forms.

The README describes the project as a thin, editable CDP harness for letting LLMs connect to a real browser. When a task lacks a helper, the agent can add code during execution and turn reusable experience into domain skills.

This is worth watching because the browser is still the entry point for many real workflows: admin panels, SaaS dashboards, ecommerce sites, recruiting platforms, CRMs, reimbursement systems, cloud consoles, and document platforms. Many of them do not expose stable APIs, or their API permissions are harder to obtain than webpage access. Giving an agent reliable browser control is a way to fill that last mile of automation.

What browser-harness is

Structurally, browser-harness is closer to a browser runtime for agents than a browser extension for manual users.

Its core ideas are:

Connect directly to Chrome or Chromium.
Control pages through a CDP WebSocket.
Let agents combine screenshots, coordinate clicks, DOM inspection, network requests, and raw CDP.
Put task-specific helpers in agent-workspace/agent_helpers.py.
Store site-specific experience in agent-workspace/domain-skills/.
Keep the core thin instead of turning it into a large automation platform.

The README says the core architecture is roughly four core files and about 1,000 lines of code, covering install.md, SKILL.md, src/browser_harness/, agent-workspace/agent_helpers.py, and agent-workspace/domain-skills/.

The point is not to ship built-in support for every website. The point is to give the agent an operation layer close enough to a real browser, so it can fill in missing capabilities for the task at hand.

How it differs from traditional browser automation

Traditional browser automation usually revolves around testing frameworks such as Playwright, Selenium, or Puppeteer. They are good for deterministic scripts: open a page, locate an element, click it, and assert the result.

browser-harness targets a different kind of work. A user gives a goal, and the agent explores the page, judges the state, handles popups, adds helpers, and reuses site knowledge. It emphasizes adaptation during interaction.

The difference can be summarized like this:

Playwright is better when humans write scripts and agents run them.
browser-harness is better when agents look at the page and act step by step.
Traditional automation favors fixed flows.
browser-harness favors open-ended tasks.
Traditional scripts often depend on selectors.
browser-harness encourages screenshots first, visible UI actions next, and DOM or CDP when needed.

This does not mean it replaces Playwright. For stable tests, Playwright is still more mature. browser-harness is valuable because it turns real webpages into an environment an agent can operate, especially when page structure is complex, steps are not fixed, and situational judgment matters.

Why real Chrome matters

Many browser-agent tools use isolated headless browsers. That is simple to deploy and good for batch jobs, but it does not always reuse the user’s real working environment: login state, extensions, history, bookmarks, and daily browser setup.

browser-harness supports local Chrome and the Browser Use cloud browser. For local browsers, it offers two approaches:

Use chrome://inspect/#remote-debugging to allow the current Chrome instance to be connected.
Start an isolated profile with --remote-debugging-port=9222 --user-data-dir=....

If you want an agent to help with tasks inside real accounts, the docs lean toward the first approach because it reuses everyday Chrome login state, extensions, and bookmarks. For unattended automation, or when you do not want popups to interrupt work, an isolated profile or cloud browser is usually safer.

The trade-off is clear: real Chrome is closer to the user’s workflow, but the security boundary is more sensitive. An isolated browser is easier to control, but login and environment setup must be handled again.

Editable helpers and domain skills

The most interesting part of browser-harness is that it designs “what the agent learns” into the project structure.

agent-workspace/agent_helpers.py stores helpers that are created during tasks. For example, if an agent needs to upload a file and the existing tools are not enough, it can add a stable upload helper. The next time it sees a similar page, it does not have to start from scratch.

agent-workspace/domain-skills/ stores site-level experience. The README mentions areas such as LinkedIn outreach, Amazon ordering, and reimbursement systems. The project recommends letting agents generate these skills from real tasks instead of hand-writing them, because they should reflect actual page behavior.

This fits browser automation well. The hard part is often not “how to click a button,” but:

How a website redirects after login.
Which popups block the main flow.
Which selectors are stable and which are temporary class names.
How uploads, downloads, iframes, shadow DOM, and cross-origin components behave.
What hidden waits and asynchronous states exist in a specific backend.

If this knowledge only stays in one run log, it is quickly lost. Turning it into domain skills gives the agent a chance to improve over time.

Suitable scenarios

browser-harness is better suited for:

Operating real web admin panels for users.
Completing repeated flows in systems without APIs.
Personal or enterprise web tasks that depend heavily on login state.
Complex interactions where screenshots are needed to judge page state.
Agents that need to add tools and site knowledge while running.
Multiple sub-agents each using an isolated browser.
Researching browser-agent runtime design.

Concrete examples include organizing web tables, submitting internal forms, downloading invoices, uploading files, handling reimbursement workflows, checking order status, configuring SaaS dashboards, and extracting information from logged-in pages.

If the task is only to fetch static pages, a browser may not be needed. The project’s own SKILL.md also notes that static pages can often be fetched through HTTP in bulk. Browsers should be reserved for tasks that truly need page state, login state, and interaction.

Risks to watch

Letting an AI agent control real Chrome is powerful, but risky.

First, the permission boundary must be clear. Real Chrome may contain email, payment dashboards, cloud consoles, company systems, and personal accounts. Once an agent can operate the browser, it effectively has access to part of those webpage permissions.

Second, do not hand credentials to the model. For login pages, payment verification, and second confirmations, the user should handle the sensitive step. The agent can wait for login to finish, but it should not read or enter passwords, verification codes, or payment details from screenshots.

Third, automation is not the same as delegation. Many web tasks look simple but may involve risk controls, mistaken clicks, data deletion, bulk submissions, or irreversible operations. Start with read-only, low-risk, reversible workflows.

Fourth, domain skills should not leak private data. Site knowledge can be shared, but account names, internal URLs, customer data, coordinate logs, and one-off task details should not be written into skills.

Fifth, choose the browser connection mode carefully. Reusing daily Chrome is convenient when login state matters. For long-running automation, an isolated profile or cloud browser is more controllable.

Why it matters for AI agent tools

browser-harness represents a pragmatic direction for agent tooling: build less platform, and give the model a direct interface to the real environment.

Many agents fail at two ends. On one end, the model can reason but cannot touch the real page. On the other, automation frameworks are powerful but require humans to hard-code the flow. browser-harness tries to connect the two: the browser holds real-world state, while the agent observes, decides, and adds tools.

That is also the meaning of a self-improving harness. It does not mean the agent magically becomes smarter. It means reusable operation experience is placed into the project structure, so the next task can avoid some of the same detours.

For developers, its value is mainly in three areas:

A browser control layer for personal agents.
A reference for studying browser automation and agent workflows.
An experimental framework for turning web workflows into reusable skills.

It is not the answer to every browser automation problem, but it points in a clear direction: when agents truly help people do work, the tool layer should not only call APIs. It should also understand and operate the web interfaces people use every day.

Conclusion

browser-use/browser-harness is interesting not because it wraps many advanced features, but because it brings several key browser-agent questions into focus: real Chrome, CDP, screenshot-driven control, editable helpers, site skill accumulation, and user permission boundaries.

If you are writing stable end-to-end tests, Playwright or Selenium is still a better fit. If you want agents such as Codex or Claude Code to handle real webpage tasks, browser-harness offers an entry point that matches how agents work.

In practice, start with low-risk tasks: let it read pages, take screenshots, and extract information first. Then gradually try clicking and submitting. Once it can reliably understand page state, you can consider giving it longer workflows.

References:

GitHub project: https://github.com/browser-use/browser-harness
README: https://github.com/browser-use/browser-harness/blob/main/README.md
Installation guide: https://github.com/browser-use/browser-harness/blob/main/install.md
Usage guide: https://github.com/browser-use/browser-harness/blob/main/SKILL.md

CLIProxyAPI Management Center: A Visual Admin Console for CLIProxyAPI

Sun, 24 May 2026 10:05:15 +0800

Cli-Proxy-API-Management-Center can be understood as the cockpit for CLIProxyAPI.

In the previous article, CLIProxyAPI was the service that proxies Gemini CLI, Codex, Claude Code, OpenRouter, and other capabilities into unified APIs. This Management Center solves a different problem:

Once the proxy service is running, should configuration, accounts, OAuth, logs, quotas, and credentials all be managed by manually editing files and scrolling through terminals?

It provides a web management interface so you can manage CLIProxyAPI configuration and runtime state from a browser.

What It Is

According to the project description, Cli-Proxy-API-Management-Center is an independent management frontend for CLIProxyAPI. Its core features include:

Visual editing for CLIProxyAPI configuration.
Uploading and managing authentication files such as auth.json.
Viewing request logs and model response logs.
Managing OAuth authentication flows.
Checking Gemini CLI account quotas.
Providing daily maintenance entry points for accounts, configuration, logs, and related tasks.

The official repository also notes that newer versions of CLIProxyAPI already include this management interface, accessible directly through /management.html. The standalone repository remains useful for people who need separate deployment or secondary development.

That point matters. Most ordinary users may not need to deploy this repository separately. First check whether your CLIProxyAPI version already ships with the management page.

It Manages the Entry Point, Not the Model Call Itself

The hard part of CLIProxyAPI is not only forwarding a request to a model.

The real trouble is in things like:

How to put multiple Gemini, OpenAI, Claude, and Codex accounts into a pool.
Which account has expired, and which account is close to its quota limit.
How to import, refresh, and troubleshoot OAuth login states.
How to edit configuration files without missing commas or fields.
Which provider, model, and account a request actually used.
Whether a failed request came from an upstream issue, a protocol issue, or local configuration.

This is where Management Center is useful: it turns daily maintenance of proxy infrastructure into visual operations.

If you only run one account locally and call the API occasionally, it may not be essential. But once you start using multiple accounts, multiple models, and multiple client integrations, a backend UI becomes noticeably easier to live with.

Typical Use Cases

First: managing account pools.

CLIProxyAPI supports multi-account rotation and load balancing, but the more accounts you have, the less suitable manual configuration-file editing becomes. The management center helps you view account state, import credentials, and troubleshoot abnormal accounts.

Second: troubleshooting failed requests.

When a client reports an error, you need to know whether the request reached the proxy, which provider it used, and what error came back. A log UI is much more comfortable than searching through terminal output.

Third: handling OAuth.

Tools such as Codex, Claude Code, and Gemini CLI often involve OAuth login state. Management Center provides OAuth-related operation entry points, reducing repeated command-line work.

Fourth: internal team usage.

If CLIProxyAPI becomes a shared team gateway, administrators need an interface for quickly checking configuration and state. Otherwise, every change requires logging into a server and editing files, which is inefficient and error-prone.

How It Relates to CLIProxyAPI

You can think of the two pieces as separate layers:

Client / IDE / Script
        |
        v
CLIProxyAPI: protocol proxy, account pool, model routing
        |
        v
Gemini CLI / Codex / Claude Code / OpenRouter / upstream models

Management Center is not in the core inference request path. It is more like an operations panel:

Browser
  |
  v
Management Center: edit config, view logs, manage accounts, check quotas
  |
  v
CLIProxyAPI admin APIs / config / logs / credentials

So do not treat it as another model proxy. It is a tool for managing CLIProxyAPI, not a replacement for CLIProxyAPI.

Why the Standalone Repository Still Matters

Since CLIProxyAPI already includes /management.html, why pay attention to the standalone repository?

There are three main reasons.

First, the standalone repository makes the management center’s boundaries easier to inspect. You can see what belongs to the frontend, and what must be provided by CLIProxyAPI backend APIs.

Second, if you want secondary development, such as changing the UI, adding authentication, or integrating your own monitoring system, the standalone repository is a better entry point.

Third, if your deployment environment is unusual, such as separate frontend and backend hosting, a dedicated management domain, or static assets served through an internal gateway, the standalone version is more flexible.

For ordinary individual users, the built-in CLIProxyAPI version is usually enough. For teams or deep customization, the standalone repository is more meaningful.

What to Watch During Deployment

The admin console touches sensitive things: accounts, OAuth, API keys, logs, request contents, and upstream quotas.

So the first rule is: do not expose the management page directly to the public internet.

Safer options include:

Allow only local access, such as binding to 127.0.0.1.
If remote access is necessary, put it behind a VPN, Tailscale, an internal jump host, or an authenticated reverse proxy.
Add authentication to admin endpoints. Do not rely on “nobody knows the URL”.
Avoid exposing full keys, cookies, OAuth tokens, and raw user requests in logs.
In team environments, separate people who can call the API from people who can change configuration.

Many proxy-tool incidents do not come from model call failures. They come from unprotected admin endpoints, logs, and credential files.

What to Use It With

If you only deploy CLIProxyAPI, a management center already solves basic maintenance needs.

If you care more about statistics and observability, you can also combine it with other tools in the CLIProxyAPI ecosystem:

CPA Usage Keeper: focused on usage syncing and SQLite storage.
CLIProxyAPI Usage Dashboard: focused on local-first usage, quota, and chart views.
CPA-Manager: a heavier management center for request monitoring, cost estimation, account inspection, and abnormal account cleanup suggestions.

A simple way to understand the division:

Management Center handles configuration and daily maintenance.
Usage Dashboard handles usage and quota visibility.
CPA-Manager handles heavier operations and inspections.

Which one to use depends on your deployment size. A personal local setup does not need the whole suite.

Usage Suggestions

If you are just starting with CLIProxyAPI, try this order:

First get CLIProxyAPI itself running and confirm the API responds normally.
Open the built-in /management.html and check whether configuration and logs can be read.
Import one account or one provider, and confirm the management UI reflects the state change.
If you need public access, add authentication and network isolation before exposing the entry point.
After the number of accounts and requests grows, add usage statistics and more complete management tools.

Do not connect every account, every provider, and every management component at once from the beginning. Proxy and account-pool projects are much easier to verify in small steps.

Conclusion

Cli-Proxy-API-Management-Center has a clear role: it is not a model, not a chat client, and not a new API gateway. It is the visual management layer for CLIProxyAPI.

When CLIProxyAPI is just a small local tool, you can ignore it. When CLIProxyAPI starts carrying multiple accounts, multiple models, and multiple client integrations, it becomes a very useful console.

The real thing to watch is the security boundary. The admin backend can modify configuration, view logs, and touch credentials. If exposed incorrectly, its risk is higher than an ordinary API calling endpoint. Keep it in a trusted network, protect it with authentication, and then enjoy the convenience of visual management.

References:

CLIProxyAPI: Wrapping Codex, Claude Code, and Gemini CLI into a Unified API

Sun, 24 May 2026 10:03:33 +0800

CLIProxyAPI is a very practical, community-engineering kind of project. It is not another large model, and it is not merely an API forwarder. Instead, it repackages a set of AI tools that are originally interactive, CLI-oriented, or OAuth-login-oriented into a unified API service.

It supports Gemini CLI, OpenAI Codex, Claude Code, Amp CLI, AI Studio Build, and upstream OpenAI-compatible services. In plain terms, it tries to answer this question:

I have CLI tools, subscription accounts, and OAuth login sessions. Can I connect these capabilities to my own client, scripts, IDE, or internal services just like calling a normal API?

CLIProxyAPI’s answer is yes: put a proxy layer in the middle and translate CLI capabilities from different sources into OpenAI-, Gemini-, Claude-, and Codex-compatible interfaces.

The Real Pain Point It Solves

Many AI coding tools are powerful, but their default usage patterns are not automation-friendly.

For example:

Gemini CLI can log in with an account, but your program may prefer calling an HTTP API.
Claude Code is excellent for interactive coding, but integrating it into other clients can run into protocol mismatches.
Codex CLI supports OAuth login and Responses-style capabilities, but not every upper-layer tool knows how to talk to it.
A team may have multiple accounts and need rotation, load balancing, unhealthy account removal, and quota visibility.
You may want some tools to see only an OpenAI-style interface, while the backend is actually Gemini, Claude, or Codex.

CLIProxyAPI is positioned as the protocol adaptation layer between these tools and your clients.

It hides the complex side behind the scenes: OAuth, CLI login, multiple accounts, different protocols, and different providers. On the front side, it exposes familiar interfaces such as OpenAI Chat Completions, OpenAI Responses, Gemini, Claude Messages, and Codex-related endpoints.

Capability Overview

According to the official README and documentation, CLIProxyAPI currently focuses on:

Providing OpenAI-, Gemini-, Claude-, and Codex-compatible API endpoints for CLI models.
Connecting OpenAI Codex and Claude Code through OAuth login.
Supporting streaming and non-streaming responses, plus WebSocket in some scenarios.
Supporting function calling, tool calling, and multimodal input.
Supporting multi-account rotation and load balancing for Gemini, OpenAI, and Claude.
Supporting Gemini AI Studio API keys.
Supporting account pools for AI Studio Build, Gemini CLI, Claude Code, and OpenAI Codex.
Connecting OpenAI-compatible upstreams through configuration, such as OpenRouter.
Providing a Go SDK so the proxy capability can be embedded into your own services.

The most valuable part of this kind of project is not that it supports a few more model names. It is that it packages account login, protocol translation, and request routing into one operational layer.

Who It Is For

CLIProxyAPI is better suited to several groups of users.

The first group is heavy AI coding users. You already use Codex, Claude Code, and Gemini CLI, but you want to connect them to Cursor, Cline, RooCode, Amp, internal scripts, or custom workflows.

The second group is people with multiple account pools. For example, you may have several Gemini, OpenAI, or Claude login sessions and do not want to switch manually. You want automatic rotation, balanced usage, and quick troubleshooting when an account becomes abnormal.

The third group is people building internal team gateways. The team may not want every client to separately adapt to Gemini, Claude, and Codex. Instead, it wants one middle layer that exposes a unified API.

The fourth group is people who like working with protocols. You may care how Responses, Chat Completions, Claude Messages, and Gemini v1beta interfaces can be converted between one another, or you may want to switch backends from the same client.

If you only ask AI a few questions occasionally, or only use the official apps for chat, the deployment and maintenance cost of CLIProxyAPI may feel heavy.

How It Differs from a Regular API Proxy

A typical API proxy looks like this:

`1`	`Client -> Proxy API -> Upstream model API`

CLIProxyAPI is closer to this:

`1`	`Client -> CLIProxyAPI -> CLI / OAuth session / account pool -> model service`

The difference is that it handles more than API key forwarding. It also deals with CLI tools, OAuth accounts, protocol surfaces, and model aliases.

Tools such as Codex and Claude Code are not traditional “give me one API key and I can call it stably” services. CLIProxyAPI wraps their login sessions and calling logic so external clients can access them as if they were normal APIs.

That is what makes it attractive, and also what makes it complex.

Common Misunderstandings

First, do not assume that a unified /v1/... path eliminates all protocol differences.

The CLIProxyAPI documentation specifically notes that when you need the request and response shape of a certain backend type, you should prefer provider-specific paths. For example, use /api/provider/{provider}/v1/messages for messages-style requests, /api/provider/{provider}/v1beta/models/... for Gemini model paths, and /api/provider/{provider}/v1/chat/completions for chat-completions-style requests.

A unified entry point is convenient, but the semantics of different protocols do not disappear. Tool calling, streaming responses, multimodal input, and system message handling may all differ by backend.

Second, a model name does not uniquely identify a backend.

If multiple backends expose the same client-visible model name, the path alone may not lock the request to the backend that actually performs inference. To strictly pin a backend, use unique aliases, prefixes, or avoid exposing the same model name from multiple backends.

Third, multi-account rotation is not unlimited quota.

Rotation only spreads usage more evenly across the account pool. It cannot bypass the real limits of upstream services. Abnormal accounts, exhausted quota, risk controls, and expired OAuth sessions still need monitoring.

Fourth, it is not a maintenance-free magic box.

Once you put it into your daily workflow, you need to care about configuration, logs, upstream account status, version upgrades, client compatibility, and security boundaries.

Management and Monitoring

The official README notes that since v6.10.0, CLIProxyAPI and CPAMC no longer include built-in data statistics. If you need usage statistics, you can use separate projects:

CPA Usage Keeper: syncs CLIProxyAPI data into SQLite and provides aggregation APIs and a dashboard.
CLIProxyAPI Usage Dashboard: a local-first usage and quota dashboard that can show accounts, models, time windows, and remaining Codex quota.
CPA-Manager: a fuller management center for request monitoring, cost estimation, account pool inspection, abnormal account location, and cleanup suggestions.

This suggests that CLIProxyAPI’s core is closer to a proxy and protocol layer, not an all-in-one commercial admin backend. If a team uses it, logs, monitoring, and account pool management should be considered from the beginning.

A Reasonable Way to Try It

If you want to test it, a safer order is:

Start it with the official Quick Start documentation.
Connect only one provider first, such as Gemini CLI or Codex, and confirm basic requests work.
Then test higher-risk capabilities such as streaming responses, tool calling, and multimodal input.
Confirm which endpoint the client actually uses, and avoid mixing protocol paths.
Finally add multi-account rotation, management panels, and usage statistics.

Do not connect Gemini, Codex, Claude, OpenRouter, multiple accounts, and all clients at once from the start. When something breaks, it becomes hard to tell whether the issue is authentication, protocol conversion, model naming, or the upstream account.

Think Through the Security Boundary

CLIProxyAPI can touch account login sessions, API keys, OAuth-related credentials, and request contents. If it only runs on your own machine, the risk is relatively manageable. If it is exposed to the public internet or a team intranet, authentication, access control, log redaction, and network isolation become mandatory.

Management endpoints especially should be limited to localhost or a trusted internal network. Do not expose management interfaces directly just to save a few minutes.

Conclusion

CLIProxyAPI’s value is that it gathers AI capabilities scattered across multiple CLIs, accounts, and protocols into one programmable API layer.

It fits heavy AI coding users, multi-account users, and internal team gateway scenarios. It is less suitable for lightweight users who want something completely plug-and-play with no maintenance.

If you are already experimenting with Codex, Claude Code, and Gemini CLI, and want to connect them to your own client or automation workflow, CLIProxyAPI is worth a serious look. Treat it as infrastructure, not as a disposable small utility.

References:

Two Ways to Use DeepSeek Models with Codex: Local Gateway and OpenRouter BYOK

Sun, 24 May 2026 09:52:55 +0800

If you want Codex to use DeepSeek, the first instinct is usually to edit ~/.codex/config.toml:

1
2

model = "deepseek-chat"
base_url = "https://api.deepseek.com"

That idea can work in some older versions or in regular OpenAI SDK scenarios. But with the current Codex CLI, it can easily run into a lower-level mismatch: custom model providers in Codex use the OpenAI Responses protocol, while DeepSeek’s official API is mainly exposed through an OpenAI-compatible Chat Completions interface.

My local version is currently codex-cli 0.111.0. codex --help shows support for configuration entry points such as --config, --model, and --profile. The official OpenAI Codex configuration reference is also explicit: model_providers.<id>.wire_api currently supports only responses, and defaults to responses when omitted.

DeepSeek’s official docs, meanwhile, show the call path as https://api.deepseek.com/chat/completions, with examples such as client.chat.completions.create(...). So the issue is not that DeepSeek cannot be called through OpenAI-style tooling. The issue is that the request semantics Codex sends are not exactly the same as what DeepSeek’s native API understands.

That is why changing base_url directly to https://api.deepseek.com may produce symptoms such as:

The request path does not match, resulting in a 404 or an unexpected response format.
Multi-turn conversations, tool calls, or patch generation fail during parsing.
tool_calls order, message structure, or streaming event format does not line up.
The model seems able to answer a plain prompt, but starts failing once Codex does real work.

The steadier approach is to put a translation layer between Codex and DeepSeek. There are two common routes.

Method 1: Bridge DeepSeek Through a Local Gateway

A local gateway should do more than simple forwarding. Its job is to convert Responses-style requests from Codex into Chat Completions-style requests that DeepSeek can handle, then convert DeepSeek’s result back into a format Codex can consume.

If you use a local gateway such as ccx, the configuration idea looks roughly like this:

[profiles.deepseek-ccx]
model = "deepseek-v4-flash"
model_provider = "ccx-bridge"

[model_providers.ccx-bridge]
name = "Local CCX Gateway"
base_url = "http://localhost:3000/v1"
env_key = "DEEPSEEK_API_KEY"

Then set the DeepSeek key in your terminal and start Codex with that profile:

1
2

export DEEPSEEK_API_KEY="your-deepseek-key"
codex --profile deepseek-ccx

In PowerShell:

1
2

$env:DEEPSEEK_API_KEY="your-deepseek-key"
codex --profile deepseek-ccx

There are two details to watch.

First, base_url should point to the gateway endpoint exposed to Codex, not the official DeepSeek address. The gateway calls DeepSeek behind the scenes.

Second, the correct value for env_key depends on how the gateway handles authentication. Some gateways read the official DeepSeek key directly. Others ask you to provide a local proxy key, while storing the DeepSeek key in the gateway backend. In that case, env_key should be changed to whatever environment variable the gateway expects.

This route is local and controllable, and it is easier to reason about latency and cost. The tradeoff is that you must confirm the gateway really supports the current Responses semantics used by Codex, rather than only acting as a basic Chat Completions proxy.

Method 2: Use OpenRouter BYOK as an Online Bridge

If you do not want to run a local gateway, OpenRouter BYOK is another option. BYOK means binding your own upstream provider key to OpenRouter, which then handles routing and forwarding.

The most common mistake here is the environment variable. Codex is calling OpenRouter, so env_key should usually be OPENROUTER_API_KEY, not DEEPSEEK_API_KEY. The DeepSeek key should be added in OpenRouter’s BYOK or provider key settings.

Example configuration:

[profiles.deepseek-openrouter]
model = "deepseek/deepseek-chat"
model_provider = "openrouter"

[model_providers.openrouter]
name = "OpenRouter"
base_url = "https://openrouter.ai/api/v1"
env_key = "OPENROUTER_API_KEY"

Start it like this:

1
2

export OPENROUTER_API_KEY="your-openrouter-key"
codex --profile deepseek-openrouter

PowerShell:

1
2

$env:OPENROUTER_API_KEY="your-openrouter-key"
codex --profile deepseek-openrouter

Then add your DeepSeek provider key in the OpenRouter dashboard. OpenRouter’s BYOK documentation says provider keys are stored encrypted and used for routing to the corresponding provider.

This route saves you from maintaining a local gateway and feels more like using a regular third-party API proxy. The downside is that an online service sits in the middle, so troubleshooting may require checking Codex, OpenRouter, and DeepSeek error messages together.

Should You Keep Using the deepseek-chat Model Name?

In DeepSeek’s documentation as of May 2026, the recommended model names include deepseek-v4-flash and deepseek-v4-pro, with a note that compatibility aliases such as deepseek-chat and deepseek-reasoner will be deprecated after 2026-07-24.

For new configurations, it is better to test:

`1`	`model = "deepseek-v4-flash"`

If you are using OpenRouter, follow OpenRouter’s model naming format, for example:

`1`	`model = "deepseek/deepseek-chat"`

The actual available names depend on your gateway or OpenRouter’s model page. When the model name is wrong, errors usually look like model not found, 404, or the provider failing to find the matching endpoint.

Why Directly Setting DeepSeek’s Official base_url Is Not Recommended

You can certainly try this as an experiment:

[profiles.deepseek-direct]
model = "deepseek-v4-flash"
model_provider = "deepseek"

[model_providers.deepseek]
name = "DeepSeek"
base_url = "https://api.deepseek.com"
env_key = "DEEPSEEK_API_KEY"

But this is more of a debugging experiment than a stable setup. Codex talks to custom providers through the Responses protocol, while DeepSeek’s official examples use /chat/completions. If DeepSeek or Codex adds a full compatibility layer later, direct connection may become simple. Until then, a bridge layer is more reliable.

What If Codex Still Uses OpenAI After Editing the Config?

First, confirm the config file location. The global config should be:

`1`	`~/.codex/config.toml`

The project-level .codex/config.toml is not the right place for machine-level provider settings such as model_provider and model_providers. The official OpenAI docs also note that project-level configuration does not override local provider and authentication fields.

If Codex still asks you to log in through the web, or appears to use the default OpenAI model, log out first:

`1`	`codex logout`

Some older tutorials write this as /logout inside the interactive UI. With the current CLI, running codex logout directly in the terminal is the more reliable option.

You can also run a quick check with a temporary profile:

`1`	`codex --profile deepseek-ccx`

Or:

`1`	`codex -c model_provider=ccx-bridge -c model=deepseek-v4-flash`

If that works, the config itself is readable. If it does not, check the profile name, TOML syntax, and whether the environment variable only exists in the current shell session.

Troubleshooting Checklist

401: The key is wrong, or env_key points to the wrong environment variable.
404: base_url or the model name is wrong, or a Responses request is being sent to an endpoint that only supports Chat Completions.
tool_calls, patch, or streaming parse errors: the protocol bridge is likely incomplete.
Still prompted to log in to OpenAI: run codex logout, then confirm you are using the correct profile.
PowerShell environment variable disappears in a new window: $env:... only applies to the current session. Use user environment variables if you need it to persist.
OpenRouter BYOK is not using your own DeepSeek key: check whether the provider key is bound in OpenRouter, whether the current OpenRouter API key is allowed to use it, and whether fallback is enabled.

Conclusion

Using DeepSeek with Codex is not impossible through config.toml. The catch is that changing only base_url is usually not enough.

The two steadier routes today are:

Use a local gateway as a protocol bridge: Codex talks to the local gateway, and the gateway talks to DeepSeek.
Use OpenRouter BYOK as an online proxy: Codex talks to OpenRouter, while the DeepSeek key is bound in the OpenRouter dashboard.

If you only want a quick test, OpenRouter is easier. If you want tighter control over keys, cost, and logs, a local gateway is better for long-term tinkering.

References:

What is CodeGraph? A local code map for Claude Code, Codex, and Cursor

Sat, 23 May 2026 21:09:46 +0800

CodeGraph is a local code knowledge graph designed for AI coding tools. It indexes a project ahead of time and organizes symbol relationships, call graphs, code structure, route relationships, and related information into a queryable graph. That lets Claude Code, Codex CLI, Cursor, OpenCode, Hermes Agent, and similar tools avoid relying on grep, glob, Read, and exploratory subagents every time they need to understand a project.

It solves a very practical problem: when an AI Agent works on a large codebase, much of the cost is not spent on changing code, but on finding where the relevant code lives. If every task starts with repeated searches, reads, and filtering, tokens, time, and tool calls are wasted. CodeGraph tries to turn the repository into a local map first, so the agent can ask the map before deciding which files to read.

What pain points does it address?

AI coding tools usually work well in small projects. There are few files, search is fast, and reading files is cheap. In larger projects, common problems appear:

The agent repeatedly calls grep, find, ls, and Read just to understand one module.
Exploratory subagents read many irrelevant files, while the main task context remains unclear.
Architecture questions spend too many tokens locating files.
Before changing a function, it is unclear who calls it and what it calls.
In web projects, URL routes and handler functions are not always obvious.

CodeGraph tries to move this “find the way first” work earlier. Once the project index exists, the agent can query related symbols, callers, callees, impact scope, and code snippets directly.

Installation

The project provides cross-platform installation scripts and does not require users to prepare Node.js manually:

`1`	`curl -fsSL https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.sh \| sh`

On Windows PowerShell:

`1`	`irm https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.ps1 \| iex`

If you already have a Node environment, you can use npm directly:

`1`	`npx @colbymchenry/codegraph`

Or install it globally:

`1`	`npm i -g @colbymchenry/codegraph`

The installer detects and configures installed agents such as Claude Code, Cursor, Codex CLI, opencode, and Hermes Agent. It writes the relevant MCP server configuration and instruction files so those tools know when to call CodeGraph.

Initializing a project

After installation, build an index inside the target project:

1
2

cd your-project
codegraph init -i

This command creates a project-level knowledge graph index. The README notes that as long as a .codegraph/ directory exists in the project, agents can automatically use CodeGraph tools.

To stop using it, you can remove the global configuration:

`1`	`codegraph uninstall`

That removes the MCP server configuration, instructions, and permissions written by the installer. The .codegraph/ index in the project is not deleted automatically. To remove the project index, use codegraph uninit.

Why it helps agents

Tools like Claude Code, Codex CLI, and Cursor often explore before making changes: find files, read entry points, inspect references, and follow call chains. For humans this feels like browsing a project. For models, it becomes a series of tool calls and context cost.

CodeGraph turns that into index queries. An agent can first use codegraph_context to find relevant entry points, symbols, and snippets, then use codegraph_explore or other tools to read the necessary details. The benefits are:

Fewer irrelevant files read.
Fewer search tool calls.
Faster discovery of relevant code.
Clearer impact scope before edits.
Easier answers to architecture questions in large repositories.

The README reports benchmark results across seven real open source repositories comparing runs with and without CodeGraph. On average, enabling CodeGraph reduced cost, tokens, latency, and tool calls. The exact numbers depend on project size, language, question type, and agent behavior, but the direction is clear: the larger the repository, the more valuable pre-indexing becomes.

Core capabilities

1. Smart context construction

One tool call can return entry points, related symbols, and code snippets, reducing the need for the agent to launch many exploratory tasks before filtering the results. This is useful for architecture understanding, module location, and feature entry-point analysis.

2. Full-text search

CodeGraph uses FTS5 for full-text search, letting it quickly search names and text across the codebase. It does not replace every grep use case, but it gives the agent a more structured first stop.

3. Impact analysis

Before changing a function, class, method, or route, the agent can query callers, callees, and impact radius. This is especially useful for refactoring, bug fixing, and deleting old code, where missing upstream or downstream calls is the main risk.

4. Automatic freshness

The README says CodeGraph uses native filesystem events such as FSEvents, inotify, and ReadDirectoryChangesW, along with debounced auto-sync. In practice, the index updates as local code changes, so users do not need to rebuild it manually after every edit.

5. Multi-language support

The project lists support for more than 19 languages, including TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Lua, Luau, Svelte, Liquid, and Pascal / Delphi.

That makes it suitable for multi-language repositories and full-stack projects, not just one language.

6. Web route awareness

CodeGraph also detects route files and route declarations in many web frameworks, connecting URL patterns with handler functions. The README mentions Django, Flask, FastAPI, Express, NestJS, Laravel, Rails, Spring, Gin, Axum, ASP.NET, Vapor, React Router, SvelteKit, and others.

This is practical because the real entry point of many web projects is not an obvious main function, but routes, controllers, handlers, views, or resolvers. If an agent can first understand the URL-to-handler relationship, it can understand business flow much faster.

Local-first design

CodeGraph emphasizes being 100% local. It does not require an API key or external service. Index data is stored in a local SQLite database.

For enterprise projects, private repositories, or sensitive code, this matters. The concern with AI coding tools is often not only “can they find the code?”, but “will the code structure and index be sent elsewhere?” CodeGraph is positioned as local indexing, local querying, and local service for agents.

Of course, local indexing also means considering disk usage, indexing time, file watching, and project size. Very large repositories still require resources for initial indexing and later synchronization.

Suitable scenarios

CodeGraph is a good fit for:

Large codebases where architecture and call-chain questions are common.
Teams using Claude Code, Codex CLI, Cursor, or similar agents for code understanding and edits.
Reducing random file reads, broad searches, and repeated exploration by agents.
Analyzing impact before code changes.
Web projects with complex routing, where URL-to-handler lookup matters.
Teams that want a more stable local project index for AI agents.

For a small project with a few dozen files, normal search may be enough and CodeGraph’s advantage may not be obvious. It is most valuable in medium-to-large repositories and workflows where agents do a lot of exploration.

Things to watch out for

First, CodeGraph is not a substitute for code review or testing. It helps agents find relevant code faster, but it does not guarantee that their changes are correct.

Second, index quality affects results. If a project has complex structure, lots of generated code, mixed languages, or unignored build artifacts, the index may become noisy. Before using it seriously, check .gitignore, project layout, and indexing scope.

Third, MCP configuration and agent instructions matter. The README also warns that CodeGraph helps only when it is queried properly. If an agent ignores it and still reads many files directly, pre-indexing becomes extra overhead.

Fourth, even though it is local, permissions still matter. The installer writes agent configuration and permission lists. In team environments, review those configurations centrally.

Summary

CodeGraph can be understood simply: it gives AI agents a local map of the codebase. It does not make the model smarter; it helps the model get less lost.

When tools like Claude Code, Codex CLI, and Cursor face large repositories, the expensive part is often exploration. CodeGraph uses pre-indexed symbol relationships, call graphs, route graphs, and full-text search to handle “where is the code?” earlier, leaving more budget for understanding and editing.

If you already use AI coding tools in real projects and often see the agent read many files without finding the point, CodeGraph is worth trying. It represents an important direction for AI coding tools: not only stronger models, but better local code context for those models.

References:

GitHub project: https://github.com/colbymchenry/codegraph

Claude Code has a plugin marketplace now: what you can install, how to install it, and what to watch out for

Sat, 23 May 2026 19:03:30 +0800

anthropics/claude-plugins-official is the official Claude Code plugin directory managed by Anthropic. It is not just a normal code repository. It is a marketplace that Claude Code’s plugin system can use directly, collecting Claude Code plugins maintained or curated by Anthropic.

This repository matters because Claude Code is moving from “an AI coding command-line tool” toward “an extensible development environment.” Plugins can package Skills, Agents, Hooks, MCP servers, LSP servers, background monitors, and default settings so teams and communities can distribute them in a consistent way.

What is this repository?

The README describes it directly: it is a curated directory of high-quality Claude Code plugins.

The directory is mainly split into two parts:

/plugins: plugins developed and maintained internally by Anthropic.
/external_plugins: third-party plugins from partners and the community.

In other words, it contains both official capabilities and curated external ecosystem entries. For regular users, the direct value is that plugins can be discovered and installed through Claude Code’s /plugin system. For developers, it is a useful window into Claude Code’s plugin format and ecosystem direction.

How to install plugins

The README gives a simple installation command. You can install directly through Claude Code’s plugin system:

`1`	`/plugin install {plugin-name}@claude-plugins-official`

You can also open the plugin discovery entry inside Claude Code:

`1`	`/plugin > Discover`

The key part is @claude-plugins-official, which refers to the official plugin marketplace. According to the Claude Code documentation, claude-plugins-official is the official marketplace maintained by Anthropic and is available by default in Claude Code installations.

What does a plugin look like?

The repository README shows a standard plugin structure:

plugin-name/
├── .claude-plugin/
│   └── plugin.json
├── .mcp.json
├── commands/
├── agents/
├── skills/
└── README.md

.claude-plugin/plugin.json is the metadata file, usually declaring the plugin name, description, version, author, and related fields. Other directories are optional and depend on what the plugin provides:

skills/: instructions for skills Claude can invoke automatically.
commands/: slash commands.
agents/: custom agent definitions.
hooks/: event-triggered logic.
.mcp.json: MCP server configuration.
.lsp.json: language server configuration.
monitors/: background monitor configuration.
settings.json: default settings shipped with the plugin.

This means a Claude Code plugin is not one single kind of extension. It is a packaging format. A plugin can be a tiny command, or it can be an entire workflow for a specific stack.

What directions are already in the official directory?

The /plugins directory already covers many development scenarios, including:

LSP plugins: typescript-lsp, pyright-lsp, rust-analyzer-lsp, gopls-lsp, clangd-lsp, csharp-lsp, jdtls-lsp, kotlin-lsp, lua-lsp, php-lsp, ruby-lsp, swift-lsp.
Programming workflows: code-review, feature-dev, code-modernization, code-simplifier, commit-commands, pr-review-toolkit.
Claude Code configuration and plugin development: claude-code-setup, claude-md-management, plugin-dev, skill-creator, mcp-server-dev.
Output styles and specialized capabilities: explanatory-output-style, learning-output-style, security-guidance, session-report, math-olympiad.

The /external_plugins directory points toward more third-party tools and services, such as github, gitlab, linear, asana, firebase, playwright, terraform, context7, serena, telegram, and discord.

Together, these plugins suggest a trend: Claude Code does not only want to edit files. It also wants to connect with code intelligence, project management, cloud services, testing, infrastructure, and team collaboration tools.

Why the plugin system matters

Previously, many Claude Code customizations could live inside a project’s .claude/ directory, such as commands, agents, skills, or hooks. That works for personal workflows or one project, but it is not ideal for reuse across projects or consistent team distribution.

Plugins solve the reuse and distribution problem:

The same configuration can be installed across multiple projects.
Commands and skills are namespaced, reducing conflicts.
Plugins can be published and updated through a marketplace.
Teams can package internal best practices as standard plugins.
The community can maintain extensions for specific frameworks, languages, or services.

This resembles VS Code extensions, JetBrains plugins, or browser extensions. Once a tool has a stable plugin ecosystem, it is no longer just a single product; it starts becoming a platform.

What does it mean for developers?

If you are only a Claude Code user, the most practical use of this repository is finding plugins. For example, if you need LSP support for TypeScript, Python, Rust, or Go, you can first check whether the official directory already has the corresponding plugin. If you need PR review, commit helpers, or code modernization workflows, the official plugins are also a good starting point.

If you develop plugins, this repository is more like a reference library. You can study its directory layout, plugin.json style, README structure, and how Anthropic combines skills, agents, MCP, LSP, and hooks.

The Claude Code documentation also gives a clear guideline: use .claude/ for single-project customization, but turn it into a plugin when you want to share it with a team, reuse it across projects, version releases, or distribute it through a marketplace.

Security boundaries matter

The repository README opens with an important warning: make sure you trust a plugin before installing, updating, or using it. The reason is simple. A plugin may include MCP servers, files, scripts, or other software. Anthropic maintaining the directory does not mean every plugin will behave exactly as expected in your local environment.

In practice, it is worth doing at least a few checks:

Read the plugin homepage and README before installing.
Check whether it includes .mcp.json, hooks, executable scripts, or background monitors.
Be extra careful with plugins that access accounts, code repositories, chat tools, or cloud services.
Test plugins in a sandbox or test repository before enabling them in important projects.
In team environments, review plugin sources and versions centrally.

AI coding plugins often have much higher privileges than ordinary editor themes. They may read project files, call external services, start local commands, or affect commit and deployment flows. Treat the trust boundary more strictly than “installing a small tool.”

Relationship with the community marketplace

The Claude Code documentation says Anthropic maintains two public plugin marketplaces:

claude-plugins-official: a curated set of plugins maintained by Anthropic.
claude-community: a community plugin directory where third-party submissions go through review.

They have different roles. Community plugins can enter the review pipeline through submission forms. The official directory is curated separately by Anthropic, with no public application process. In short, claude-plugins-official is closer to an official curated directory, while claude-community is the open community directory.

Summary

The significance of anthropics/claude-plugins-official is not merely that another GitHub repository exists. It shows Claude Code’s extension mechanism becoming platform-like: Skills, Agents, Hooks, MCP, LSP, background monitors, and default settings can now be packaged, installed, updated, and distributed.

For individual developers, the official plugin directory can lower the cost of configuring Claude Code. For teams, it offers a way to standardize internal workflows. For plugin developers, it shows the plugin structure and ecosystem direction Anthropic is endorsing.

The next thing to watch is not just any single plugin, but whether the Claude Code plugin ecosystem forms stable layers: official curated plugins, community plugins, private team marketplaces, and specialized extensions for mainstream languages, frameworks, and SaaS services. If that path works, Claude Code will look more and more like a programmable AI development platform, not just a command-line assistant.

References:

GitHub project: https://github.com/anthropics/claude-plugins-official
Claude Code plugin documentation: https://code.claude.com/docs/en/plugins

What is oh-my-pi? An AI coding assistant that connects the terminal, IDE, and debugger

Sat, 23 May 2026 19:02:20 +0800

oh-my-pi is an AI Coding Agent for the terminal and editor. It is a fork of Mario Zechner’s Pi project, extended by can1357. Its goal is not just to provide a command-line chat UI, but to connect file reading, code search, structured edits, LSP, debuggers, browsers, subagents, and multiple model providers into one coding workflow.

From the project README, it feels more like an AI coding tool layer than a simple assistant: you can use it interactively in the terminal, connect editors through ACP, or embed it in Node projects through the SDK. For people already using Claude Code, Codex CLI, Cline, Cursor, or other agent tools, the interesting part is that oh-my-pi turns many abilities that usually live in separate tools into one built-in tool surface.

What problem is it trying to solve?

For many AI coding tools, the weak point is not the model itself, but the tool interface around it. When a model wants to change code but only has rough full-file reads, fragile string replacement, and one-off shell commands, the toolchain amplifies failure.

oh-my-pi tries to reduce those common points of friction:

File reads prefer structured summaries instead of dumping whole files into context.
Search, glob, find, syntax highlighting, and token counting are implemented natively where possible, reducing dependence on external commands.
Code writes can use LSP so renames, references, and file moves behave more like IDE operations.
Debugging can use DAP tools such as lldb, dlv, and debugpy, instead of relying only on logs and guesses.
Complex tasks can be split across subagents and returned as structured results.
Edits use content anchors and previews to reduce the chance of a bad patch landing directly on disk.

These choices show that the focus is not “can the model answer?”, but “can the model reliably complete a real code change?”

Installation

The project provides several installation paths. On macOS and Linux, you can use the install script:

`1`	`curl -fsSL https://omp.sh/install \| sh`

If you use Bun, the README recommends installing the npm package globally:

`1`	`bun install -g @oh-my-pi/pi-coding-agent`

On Windows PowerShell:

`1`	`irm https://omp.sh/install.ps1 \| iex`

The README also mentions pinning versions with mise:

`1`	`mise use -g github:can1357/oh-my-pi`

Before installing, check the Bun version requirement. The README lists macOS, Linux, and Windows support, and requires bun >= 1.3.14.

Capabilities worth watching

1. Tool calling goes beyond shell commands

oh-my-pi includes tools for file reads, search, writes, edits, AST edits, browser use, task splitting, debugging, and LSP. The README mentions 32 built-in tools, 13 LSP operations, and 27 DAP operations.

That means the agent does not have to wrap everything as command-line output. Reference lookup can go through LSP, PRs and issues can be read through a unified file-like interface, and web pages or PDFs can be converted into Markdown with link structure intact before being passed to the model.

2. LSP integration is useful for real codebases

In large projects, renaming and moving files often breaks re-exports, aliased imports, barrel files, or cross-directory references. The oh-my-pi README highlights that write paths can go through LSP. For example, file renames use workspace/willRenameFiles, making edits closer to semantic IDE operations.

This is useful for everyday refactoring in TypeScript, Rust, Go, Python, and similar projects, especially in cases where manual edits are possible but easy to miss.

3. The debugger is a first-class tool

Many AI coding flows still debug by adding logs, rerunning, and reading output. oh-my-pi connects DAP debuggers to the tool surface. The README gives examples with C programs using lldb, Go services using dlv, and Python processes using debugpy.

That changes how an agent handles bugs: it can pause a process, inspect stack frames, read local variables, and then decide what to do next, instead of guessing from error text alone.

4. Hashline editing reduces patch failures

The project emphasizes Hashline, an editing approach based on content anchors. The goal is to let the model point to the content it wants to change instead of repeatedly emitting large diffs. This reduces edit failures caused by whitespace, stale context, or failed string matching.

For agent tools, this matters a lot. Even a capable model feels clumsy if the write interface keeps failing and forcing retries.

5. Subagents and workspace isolation

The README introduces a task subagent capability: a task can be split across isolated workers, and results are returned as structured objects. The project also includes workspace isolation logic for parallel tasks, branch exploration, and avoiding overlapping edits.

This fits code review, migrations, bulk fixes, and test investigation. The value is not only speed; it is also cleaner separation between different exploration paths.

6. It can inherit existing rules and configs

On first run, oh-my-pi can read rules and configuration left by other tools, including .claude, .cursor, .windsurf, .gemini, .codex, .cline, .github/copilot, and .vscode.

That is practical. Many teams have already written rules for several AI tools. Rewriting them for every new tool is expensive, so oh-my-pi tries to reuse what is already on disk.

Four entry points

The project provides four main ways to use it:

Interactive TUI: run omp directly in the terminal.
One-shot command: use omp -p to send a single prompt.
Node SDK: embed it in a Node or TypeScript project through @oh-my-pi/pi-coding-agent.
RPC / ACP: connect other programs and editors through stdio or Agent Client Protocol.

This means it is not only for individual terminal users. It also leaves room for IDEs, plugins, automation platforms, and internal tools.

Who should try it?

oh-my-pi is a good fit for:

Developers who often edit, debug, and review code in the terminal.
People already using AI Coding Agents but unhappy with file reading, patching, search, or debugging reliability.
Developers who want to connect an agent to an editor, RPC layer, or Node service.
Users who need to switch between multiple models and providers in one tool.
Tool builders interested in LSP, DAP, AST editing, and subagent workflows.

If you only want a chat-style coding assistant that works immediately, the learning curve may feel high. It is better suited to people willing to understand the toolchain and treat the agent as a configurable development environment.

What to keep in mind

First, oh-my-pi is still a fast-moving open source project. Commits are frequent, and there are many issues and pull requests, so installation and usage may change.

Second, its capabilities depend heavily on your local environment. LSP, debuggers, Bun, model-provider authentication, terminal setup, and Windows or Unix differences can all affect the experience.

Third, having many built-in tools does not mean every scenario should enable everything. In practice, it is better to enable the tools needed for the task and configure rules, permissions, and workspace boundaries clearly.

Fourth, an AI Agent can write code, but it can also change the wrong code. Even with previews and content-anchored edits, important projects still need version control, tests, and human review.

Summary

The interesting part of oh-my-pi is not that it is another AI terminal shell. It is that it reorganizes the tool layer that often holds AI coding back: file reading, search, editing, LSP, debugging, browser access, subagents, and SDK integration all sit inside one agent workflow.

It is worth watching for people who care about AI coding infrastructure, and for developers comparing different Coding Agent approaches. The competition in AI coding tools is no longer just about model answer quality. It is also about who can connect models reliably to real codebases, real debugging workflows, and real team rules. oh-my-pi is an ambitious open source attempt in that direction.

References:

GitHub project: https://github.com/can1357/oh-my-pi
Official site: https://omp.sh/
SDK documentation: https://omp.sh/docs/sdk

Bun: JavaScript runtime, package manager, test runner, and bundler in one tool

Sun, 17 May 2026 17:42:25 +0800

Bun is an open source JavaScript / TypeScript all-in-one toolchain from oven-sh.

It is not just trying to be a faster Node.js replacement. It puts the runtime, package manager, script runner, test runner, and bundler behind the same bun command. For frontend and Node.js developers, the appeal is simple: install fewer tools, wait less during installation and builds, and complete many common tasks with one command.

Project: https://github.com/oven-sh/bun

Bottom line

Bun is best for people who want to simplify the JavaScript toolchain.

It can:

Run JavaScript, TypeScript, JSX, and TSX.
Act as a Node.js-compatible runtime.
Replace npm / yarn / pnpm for package management.
Run scripts from package.json.
Execute tests.
Bundle frontend or backend code.
Use bunx to run commands from npm packages.
Provide built-in APIs such as Bun.serve, bun:sqlite, Bun.sql, Bun.redis, and Bun.s3.

Its clearest value is developer experience: fast installs, fast startup, unified commands, and TypeScript / JSX that work out of the box.

But Bun is not something every project should switch to immediately. Large Node.js applications, projects with many native extensions, and production services with strict stability requirements still need careful compatibility, build, test, and deployment validation.

What Bun is

According to the official README, Bun is an all-in-one toolkit for JavaScript and TypeScript applications. It ships as a single executable named bun.

At its core is the Bun runtime: a fast JavaScript runtime intended as a drop-in replacement for Node.js. Bun is written in Zig, built on JavaScriptCore, and optimized for startup time and memory usage.

That means you can run:

`1`	`bun run index.tsx`

TypeScript and JSX can run without extra setup.

The same bun command also includes:

test runner
script runner
Node.js-compatible package manager
bundler
package runner

Common commands include:

bun test
bun run start
bun install <pkg>
bunx cowsay 'Hello, world!'

A typical project may otherwise include node, npm, pnpm, tsx, jest, vitest, webpack, esbuild, and ts-node. Bun tries to absorb many of those high-frequency paths into one tool.

Installation

Bun supports Linux, macOS, and Windows on x64 and arm64.

The official install script is:

`1`	`curl -fsSL https://bun.com/install \| bash`

Windows:

`1`	`powershell -c "irm bun.sh/install.ps1 \| iex"`

It can also be installed through npm:

`1`	`npm install -g bun`

macOS Homebrew:

1
2

brew tap oven-sh/bun
brew install bun

Docker:

1
2

docker pull oven/bun
docker run --rm --init --ulimit memlock=-1:-1 oven/bun

Linux users should note the kernel requirement. The README strongly recommends Linux kernel 5.6 or newer, with 5.1 as the minimum.

Upgrade to the latest version:

`1`	`bun upgrade`

Upgrade to canary:

`1`	`bun upgrade --canary`

Canary is usually not a good choice for production unless you are validating a new feature or investigating a specific bug.

Why Bun is fast

Bun’s speed comes from several layers.

First, runtime startup is fast. Many CLI tools and development scripts are not limited by long CPU work, but by process startup, module loading, TypeScript transpilation, and dependency resolution. Bun optimizes these paths.

Second, package management is fast. bun install aims to replace npm / yarn / pnpm installation workflows. It uses a global cache and its own lockfile, which can noticeably reduce waiting time in dependency-heavy projects.

Third, TypeScript / JSX work out of the box. Many projects only need to run a .ts or .tsx script. Traditional Node.js setups need tsx, ts-node, Babel, or a build step. Bun can run these files directly and reduces glue tooling.

Fourth, built-in tools reduce process and configuration switching. Testing, scripts, bundling, and runtime execution all live in one tool.

Still, “Bun is fast” does not mean every project will be faster. Real results depend on dependency types, script logic, test framework, build configuration, Node.js API usage, and CI cache strategy.

Package management: replacing npm / yarn / pnpm

Bun can install dependencies directly:

`1`	`bun install`

Add a dependency:

`1`	`bun add react`

Add a development dependency:

`1`	`bun add -d typescript`

Remove a dependency:

`1`	`bun remove react`

Explain why a dependency exists:

`1`	`bun why react`

Security audit:

`1`	`bun audit`

When migrating from npm or pnpm, check:

Whether bun.lock is committed.
Whether CI uses bun install --frozen-lockfile.
Whether private registries and .npmrc are compatible.
Whether workspace behavior matches expectations.
Whether lifecycle scripts introduce security risk.
Whether pnpm-specific monorepo behavior can migrate smoothly.

Small projects can try it directly. Large monorepos should not switch everything at once; start with one package or a non-blocking CI job.

Running scripts and TypeScript

Bun can run scripts from package.json:

`1`	`bun run start`

It can also run files directly:

1
2

bun run index.ts
bun run index.tsx

This is useful for tool scripts such as:

scripts/build.ts
scripts/seed.ts
scripts/migrate.ts
scripts/check.ts

With Node.js, these often require a TypeScript loader or precompilation. Bun’s direct execution makes those scripts lighter.

If a script depends on Node.js-specific behavior, especially loaders, ESM/CJS edges, native modules, child processes, file watching, or edge APIs, it still needs testing.

Test runner

Bun includes a test runner:

`1`	`bun test`

It is suitable for small projects that want less Jest / Vitest configuration, and for moving some unit tests, tool tests, or library tests to a lighter runner.

When migrating tests, pay attention to:

expect behavior.
mock API differences.
snapshot behavior.
DOM test environment.
test discovery rules.
coverage output.
CI reporter support.

If a project deeply depends on Jest features, such as custom matchers, complex mocks, jsdom, babel-jest, or ts-jest, migration should be gradual. Let new modules use bun test while old suites remain on the existing framework.

Bundling and builds

Bun also provides a bundler:

`1`	`bun build ./src/index.ts --outdir ./dist`

It can be used for frontend bundles, backend scripts, CLI tools, and libraries. Bun’s docs also cover loaders, plugins, macros, CSS, HTML, HMR, minifier, and single-file executables.

Good early candidates:

Small frontend tools.
Node.js CLI projects.
Internal scripts.
Single-file services.
Projects that do not depend on complex webpack loaders.

Be careful with:

Complex webpack plugin chains.
Large Vite plugin setups.
Deep Babel transformations.
Special CSS / asset pipelines.
Micro-frontends and module federation.
Projects with strict requirements for hashes, chunks, and compatibility.

Bun’s bundler is attractive, but migrating build tools is usually riskier than changing the package manager. Validate it separately.

Running HTTP services

Bun provides Bun.serve for HTTP services:

Bun.serve({
  port: 3000,
  fetch(req) {
    return new Response("Hello from Bun")
  },
})

This is convenient for small APIs, internal services, webhook receivers, and edge-style services. Bun also provides APIs such as WebSockets, Workers, Streams, SQLite, PostgreSQL, Redis, S3, and TCP/UDP sockets.

If you already use Express, Fastify, NestJS, Next.js, Hono, Elysia, or another framework, check their Bun compatibility before rewriting anything.

A practical path is:

Use Bun first for development scripts and package management.
Then try Bun for tests.
Only later evaluate whether the production runtime should move to Bun.

Runtime migration needs the most care because it directly affects production behavior.

Relationship with Node.js

One of Bun’s goals is to be a drop-in replacement for Node.js, but compatibility is not the same as complete equivalence.

The Node.js ecosystem has accumulated many subtle behaviors around:

CJS / ESM interop.
Node built-in modules.
Native extensions.
npm lifecycle scripts.
Filesystem edge behavior.
stream and Buffer details.
worker / child_process.
debugging and profiling.

Bun is improving compatibility quickly, but production migration should be judged by tests.

Practical questions:

Do your tests pass under Bun?
Do key dependencies support Bun?
Are build artifacts equivalent?
Are CI and local behavior consistent?
Do you have monitoring and rollback in production?
Are Docker images and deployment scripts stable?

Replacing only the package manager is lower risk. Replacing the production runtime is much higher risk.

Suitable projects

Bun fits well for:

New small JavaScript / TypeScript projects.
Internal tools and scripts.
CLI projects.
CI jobs that need faster dependency installation.
Frontend projects that want less toolchain complexity.
Scripts and services where startup speed matters.
Developers who want TypeScript to run out of the box.

Use more caution with:

Very large monorepos.
Projects deeply tied to pnpm workspace behavior.
Services with many Node.js native extensions.
Frontend builds with highly customized pipelines.
Backends that require strict production runtime consistency.
Test suites deeply tied to Jest.

A conservative but practical strategy is to treat Bun as a development tool first, not as an immediate replacement for every production runtime.

Migration advice

To try Bun in an existing project:

Install Bun locally.
Run bun install and inspect dependency installation.
Commit or temporarily keep bun.lock to avoid lockfile confusion.
Try bun run <script> on common scripts.
Migrate a small number of tests with bun test.
Add a non-blocking Bun job in CI.
If compatibility looks good, consider changing the main install flow.
Evaluate production runtime migration last.

For team projects, keep a rollback path:

Preserve the old npm / pnpm / yarn flow during migration.
Run both flows in CI for a while.
Do not change runtime, package manager, test framework, and bundler in the same change.
Split migration into small, verifiable steps.

This is slower, but it avoids mixing too many problems together.

Common commands

Install:

`1`	`curl -fsSL https://bun.com/install \| bash`

Upgrade:

`1`	`bun upgrade`

Install dependencies:

`1`	`bun install`

Add a dependency:

`1`	`bun add lodash`

Run a script:

`1`	`bun run dev`

Run TypeScript directly:

`1`	`bun run scripts/build.ts`

Test:

`1`	`bun test`

Bundle:

`1`	`bun build ./src/index.ts --outdir ./dist`

Run a package command:

`1`	`bunx cowsay 'Hello, world!'`

Summary

Bun’s value is not just “faster than Node.js.” More importantly, it pulls many scattered JavaScript / TypeScript development tools into one bun command: runtime, package manager, script runner, test runner, and bundler.

For new projects and internal tools, this integrated experience can be pleasant: fast installs, fast startup, less configuration, and direct TypeScript / JSX execution. For large existing projects, Bun is better introduced through low-risk areas first, such as package installation, scripts, and selected tests, before validating builds and runtime behavior.

If Node.js toolchain installation speed, configuration fragments, and slow test startup often bother you, Bun is worth a serious try. But production migration still comes back to basic engineering judgment: do tests pass, are dependencies compatible, is CI stable, and do you have rollback?

References:

Godot Game Development Beginner Guide: From Nodes and Scenes to Your First 2D Game

Sun, 17 May 2026 12:37:30 +0800

Godot is an open-source game engine suitable for 2D games, independent prototypes, and medium-scale 3D projects.

Its biggest strengths are that it is lightweight, open source, fast to start, and built around a clear node-and-scene model. For beginners, Godot is usually easier to approach than Unity. For independent developers, it is also easier to keep projects small and controllable.

This guide keeps the path simple: understand nodes and scenes, learn basic GDScript, then build a small 2D game that connects input, physics, collision, UI, audio, and export.

Bottom line

Do not start by learning every engine feature.

A better path is:

Start with 2D, not 3D.
Understand nodes and scenes before complex architecture.
Learn GDScript first, not C#.
Build one small game that can start, fail, and restart.
Then add animation, audio, UI, levels, and export.

Completing one tiny game teaches more than watching many unrelated tutorials.

Who Godot is good for

Godot is a strong fit if you:

want to learn game development from zero,
want to build 2D indie games,
need fast gameplay prototypes,
do not want a heavy commercial engine workflow,
want to understand how engines organize game objects,
have some Python or JavaScript experience and want an easy scripting language.

Unity and Unreal still have advantages for large commercial pipelines, asset stores, mobile monetization SDKs, or high-end 3D visuals. But for learning and making your own games, Godot is an excellent starting point.

Installation and project creation

Godot is simple to install: download it from the official website, unzip it, and run it.

For the first project:

Use the default renderer.
Use an English project name.
Avoid paths with too many spaces or non-ASCII characters.
Use Git from the beginning.

A simple project name:

`1`	`first-godot-game`

Learn the editor areas first: Scene tree, FileSystem, Inspector, Script editor, and 2D / 3D viewport.

Core idea: nodes and scenes

The two most important Godot concepts are nodes and scenes.

Nodes are functional units:

Node2D: base 2D object.
Sprite2D: displays an image.
CollisionShape2D: collision shape.
CharacterBody2D: controllable character.
Camera2D: 2D camera.
AudioStreamPlayer: audio playback.
Label: text display.

A scene is a reusable group of nodes. A player can be a scene, an enemy can be a scene, and a level can be a scene.

For example:

Player (CharacterBody2D)
├── Sprite2D
├── CollisionShape2D
└── Camera2D

The player handles movement, Sprite2D displays the image, CollisionShape2D handles collision, and Camera2D follows the player.

What to build first

Do not start with an RPG, open world, networking, or complex platformer.

Build a tiny 2D dodging game:

The player moves up, down, left, and right.
Enemies spawn from screen edges.
Touching an enemy means failure.
Survival time becomes score.
There is a start button, game-over screen, and restart.

This small project covers movement, input, collision, spawning, UI, timers, audio, and scene reloads.

Player movement

Use CharacterBody2D for a controllable player.

Attach a script to Player:

extends CharacterBody2D

@export var speed := 300.0

func _physics_process(delta):
    var direction := Vector2.ZERO

    direction.x = Input.get_axis("move_left", "move_right")
    direction.y = Input.get_axis("move_up", "move_down")
    direction = direction.normalized()

    velocity = direction * speed
    move_and_slide()

Then define input actions:

move_left  -> A / Left
move_right -> D / Right
move_up    -> W / Up
move_down  -> S / Down

Use input actions instead of hard-coded key codes. It makes keyboard, gamepad, touch, and remapping easier later.

Collision and physics

Common 2D nodes:

CollisionShape2D: defines collision shape.
Area2D: detects overlap and triggers.
CharacterBody2D: player or controlled character.
RigidBody2D: physics-driven object.
StaticBody2D: walls, floors, fixed obstacles.

For beginners: if you control movement manually, use CharacterBody2D. If you only need to detect contact, use Area2D.

Example:

1
2
3

func _on_body_entered(body):
    if body.name == "Player":
        get_tree().reload_current_scene()

Later, replace direct reloads with a game_over signal. For the first project, getting the loop working is more important.

Instancing enemies

Godot scenes can be used like prefabs.

If you have Enemy.tscn, spawn it from the main scene:

@export var enemy_scene: PackedScene

func spawn_enemy():
    var enemy = enemy_scene.instantiate()
    enemy.position = Vector2(800, randf_range(50, 550))
    add_child(enemy)

Add a Timer to call spawn_enemy() repeatedly.

UI and score

Use Control nodes for UI:

CanvasLayer: keeps UI fixed on screen.
Label: text.
Button: button.
Panel: background.
VBoxContainer / HBoxContainer: layout.

Simple survival score:

var score := 0.0

func _process(delta):
    score += delta
    $CanvasLayer/ScoreLabel.text = str(int(score))

Audio and feedback

Even simple games need feedback:

click sounds,
hit sounds,
score sounds,
failure sounds,
screen shake or flash,
button hover / pressed states.

Use AudioStreamPlayer:

`1`	`$HitSound.play()`

Functionality is only the first step. Good feedback makes the player feel the game responding.

Project organization

Start with clean folders:

res://
├── scenes/
│   ├── Main.tscn
│   ├── Player.tscn
│   └── Enemy.tscn
├── scripts/
│   ├── player.gd
│   └── enemy.gd
├── assets/
│   ├── sprites/
│   └── audio/
└── ui/
    └── GameOver.tscn

Do not dump everything into the root directory. Small games grow faster than expected.

Common beginner mistakes

Choosing the wrong node type.
Forgetting collision shapes.
Hard-coding keyboard keys.
Putting player, enemies, UI, audio, and level logic into one script.
Starting with a project that is too large.

A tiny finished game is more valuable than an abandoned big idea.

Suggested learning path

Editor basics.
Nodes and scenes.
GDScript basics.
Input actions.
2D movement.
Collision and Area2D.
Timer and spawning.
UI and score.
Audio and animation.
Export to desktop or web.

Godot vs Unity

Godot is comfortable for 2D indie games, teaching projects, and personal prototypes.

Unity still has stronger asset stores, commercial plugins, mobile ads, monetization SDKs, and mature team workflows.

Unreal is better for high-end 3D and advanced rendering.

For beginners, do not let engine choice become procrastination. The core skills are game loops, input, collision, feedback, scene organization, state management, and version control.

Completion checklist

You have really completed the first step when your game has:

player movement,
enemies or obstacles,
collision failure,
score or objective,
start screen,
game-over screen,
restart,
at least two sounds,
simple animation or visual feedback,
an exported playable build.

Summary

Godot’s entry point is not memorizing every feature. It is understanding how the engine organizes games: nodes form scenes, scenes form games, scripts drive behavior, signals connect events, and resources live in folders.

Start with a small 2D game. Once movement, collision, UI, audio, and restart work, then learn animation trees, TileMap, saving, state machines, 3D, shaders, and export optimization.

DeepSeek-TUI: Turning DeepSeek V4 into a Terminal Coding Agent

Sat, 16 May 2026 22:41:41 +0800

DeepSeek-TUI is an open source project that brings DeepSeek V4 into terminal-based development workflows. It is not just a chat wrapper. It is closer to a “command-line coding agent” like Claude Code or Codex CLI: it can read files, edit code, run commands, call tools, and keep working through tasks in a TUI.

If you already switch between an editor and a terminal, the value of this kind of tool is straightforward: you do not need to copy code back and forth into a web chat window, and you do not need to manually describe the whole project structure. You give it a task, and it can read context from the current workspace, plan steps, make changes, then return the result for your review.

It Solves the Entry Point Problem for DeepSeek

DeepSeek models already provide strong reasoning and coding capabilities, but model capability needs an engineering layer before it can land in real development workflows.

Web chat is suitable for asking questions, but not for long-running project edits. APIs are suitable for system integration, but individual developers still need to build tool calling, context management, file operations, and permission control themselves. DeepSeek-TUI tries to fill this layer: it wraps DeepSeek V4 into an Agent that can work inside the terminal.

According to the project description, its main capabilities include:

A terminal TUI;
Conversation and task execution for DeepSeek V4;
Tool calling and file operations;
1M context support;
Auto mode;
Sub-agents;
Sandboxed execution;
A persistent task queue.

Together, these features are not aimed at making the model sound more human. They are aimed at making the model easier to bring into the development environment.

A TUI Fits Long Tasks Better Than Plain CLI Text

Many AI CLI tools start with plain text interaction: enter a prompt, wait for output, then copy commands or add more context. This is simple, but longer tasks quickly become messy.

The advantage of a TUI is that it can place conversations, files, execution results, and task status in a more stable interface. For a coding Agent, that matters. A code task is rarely a single question and answer. It often includes:

Understanding the project structure;
Finding relevant files;
Editing code;
Running tests or commands;
Fixing issues based on errors;
Summarizing changes.

If the interface is only a stream of logs, it is hard for the user to see where the Agent is in the process. A TUI at least provides a better place to observe and take over.

Auto Mode Is Best for Tasks with Clear Boundaries

The Auto mode mentioned by DeepSeek-TUI is best for tasks with clear boundaries. For example: fixing a small bug, adding a script, changing a configuration, organizing a set of documents, or implementing a local feature.

These tasks have something in common: the goal is clear, the verification method is clear, and the impact scope is controllable. The Agent can inspect files, edit files, run commands, and then hand the result back to the user for confirmation.

But Auto mode should not mean unlimited permission. In real projects, file deletion, large-scale refactors, database migrations, and deployment commands should all require explicit confirmation. The efficiency of coding Agents comes from automation, but so does the risk. The more a tool can execute commands, the more it needs sandboxing, permission boundaries, and human review.

Sub-Agents Matter Because They Split Tasks

Sub-agents are not a new concept, but they are useful in coding scenarios.

A moderately complex task usually requires several kinds of work at the same time: someone reads the code, someone changes the implementation, someone checks tests, and someone organizes documentation. Traditional multi-agent systems often feel ornamental because they have no real tools or real workspace; they only discuss inside a conversation.

If sub-agents can work with the file system, command execution, and task queues, they become more like a task decomposition mechanism. For example, one sub-agent can analyze dependencies, another can modify a specific module, and the main agent can integrate the result. This can reduce the problem of putting too much unrelated information into one context.

Of course, sub-agents also add cost: more tokens, more complex state, and responsibility boundaries that are harder to track. They are better suited to medium-complexity tasks and above, not necessarily every small edit.

1M Context Is Not Magic, but It Helps with Projects

1M context sounds exaggerated, but in coding scenarios it is not just a marketing number.

The context of a real codebase is fragmented: README files, configuration files, type definitions, tests, call chains, historical conventions, and error logs can all affect one change. Longer context can reduce the problem of editing after seeing only a local fragment, and it can help the model retain more project constraints.

Still, longer context does not automatically mean better judgment. Code tasks still need retrieval, filtering, and verification. Putting an entire project into context is not necessarily better than reading the relevant files precisely. A good coding Agent should treat long context as a buffer, not as a shortcut that replaces engineering judgment.

Who It Is Best For

DeepSeek-TUI is better suited to several groups:

Developers who want to use DeepSeek for coding tasks in the terminal;
People who do not want to build tool calling and file operation frameworks themselves;
Users familiar with Claude Code or Codex CLI who want to try a DeepSeek-based entry point;
People who need local project context instead of only asking about code snippets in a web page;
Developers who want to put AI coding workflows into a command-line environment.

If you only occasionally ask how to write a function, web chat is enough. If you want the model to participate directly in project edits, a terminal Agent becomes more meaningful.

Risks to Watch

There are three things to watch most closely with this kind of tool.

The first is permissions. As long as a tool can read and write files or execute commands, you need to know what it can access by default, whether it can delete files, whether it can access the network, and whether dangerous commands require confirmation.

The second is rollback. Before using it, it is best to keep the Git working tree clean, so every Agent change can be clearly seen through git diff. Do not let an Agent automatically edit a project while many unrelated changes are already uncommitted.

The third is verification. Code written by an Agent does not mean the task is complete. Tests, builds, linting, and human review still need to remain. AI coding tools can speed up progress, but they cannot replace final engineering confirmation.

Conclusion

The significance of DeepSeek-TUI is not that it adds another chat client. It puts DeepSeek V4 into a terminal environment that is closer to real development work.

For developers, model capability is only the first step. The real experience depends on whether it can read a project, safely edit files, run verification commands, maintain state in long tasks, and let the user take over at any time.

If you want to use DeepSeek for daily code changes, project reading, and automated development tasks, DeepSeek-TUI is worth watching. The direction is also clear: AI coding tools are moving from “answering code questions” to “participating in project execution.”

Codex Mobile Remote Access: Use the ChatGPT App to Follow Coding Tasks on Your Mac

Sat, 16 May 2026 17:42:40 +0800

In mid-May 2026, OpenAI brought Codex remote access into the ChatGPT mobile app. The point is not to write code on a phone. It is to let you follow and steer Codex while it keeps working on a Mac.

Think of it as a mobile approval and monitoring surface: Codex still reads the project, runs commands, edits files, and checks test results on the computer; the phone is used to review progress, answer questions, add instructions, and approve actions.

For people who often let Codex run longer tasks, this is useful. You no longer need to sit in front of the Mac waiting for Codex to ask for approval or get stuck.

What it can do

According to OpenAI’s Codex remote connections documentation, mobile access can:

start new threads in projects on the host, or continue existing ones;
send follow-up instructions, answer questions, and steer active work;
approve commands and other actions;
review outputs, diffs, test results, terminal output, and screenshots;
receive notifications when Codex completes a task or needs attention;
switch between connected hosts and threads.

So the mobile app is not just a small chat box. It connects to the actual Codex work context on the host.

Requirements

First, you need Codex access in the ChatGPT account and workspace you want to use. The phone and Mac must use the same account and workspace.

Second, install the latest ChatGPT mobile app on iOS or Android. If Codex does not appear in the app, update ChatGPT first.

Third, the host currently needs to be a Mac that is awake, online, running the Codex App, and signed in to the same account and workspace. OpenAI’s documentation says mobile setup and device control currently require Codex App for macOS; setup is not available from Codex CLI or the IDE Extension.

Fourth, complete any required MFA, SSO, or passkey flow. In a ChatGPT workspace, an admin may also need to enable Remote Control access.

This makes the feature a mobile control layer for Codex App on macOS, not a generic remote desktop or a universal Codex connection for every environment.

Limits of Codex mobile remote access

The feature is convenient, but the limits matter.

First, it currently needs a macOS host. The phone connects to Codex App running on a Mac, not directly to Codex CLI, the IDE Extension, or any Linux / Windows development machine.

Second, the host must stay online. The Mac needs to remain awake, connected to the network, and running Codex App. If it sleeps, loses network access, or closes Codex, the remote session can disconnect.

Third, connection uses a QR-code setup flow. You start Set up Codex mobile on the Mac, scan the QR code with your phone, and finish setup in ChatGPT. It is not a simple “enter host address” flow.

Fourth, remote actions still go through approvals. You can approve commands and actions from the phone, but you should read what Codex is asking to do before confirming, especially for terminal commands, file edits, tests, and external access.

In short, it is for following up after you leave the computer. It is not a full development environment replacement and should not be treated as unattended remote control.

How to connect

Start from Codex App on the Mac:

Open Codex on the Mac.
Select Set up Codex mobile in the sidebar.
Codex enables remote access for this host and shows a QR code.
Scan the QR code with your phone to open the Codex mobile setup flow in ChatGPT.
Confirm the same ChatGPT account and workspace.
Complete any required MFA, SSO, or passkey verification.
After setup succeeds, the Mac appears in Codex on your phone.

After connection, use Settings > Connections in Codex on the Mac to manage connected devices. You can also configure whether the computer stays awake, whether Computer Use is enabled, and whether the Chrome extension is installed.

What the phone is good for

The phone is best for approvals, course corrections, and result review.

Approvals are the obvious case: Codex asks to run a command or continue an action, and you can decide from the phone.

Course correction is just as useful. If Codex misunderstood the task, chose the wrong direction, or hit a failing test, you can send a short instruction and let it continue.

Result review is the third case. You can inspect diffs, test output, terminal logs, and screenshots without returning to the computer.

The value is not “coding on a phone”; it is turning the phone into a small control surface for engineering work that still runs on the host.

Common issues

If the host does not appear on your phone, confirm that Codex App is running on the Mac, Allow other devices to connect is enabled, and both devices use the same ChatGPT account and workspace.

If the approval request does not appear, open the ChatGPT mobile app, go to Codex, scan the QR code again, or restart setup from the host. Workspace users should also confirm that Remote Control access is enabled by an admin.

If the remote session disconnects, check whether the Mac slept, lost network access, or closed Codex App.

If authentication blocks setup, complete MFA, SSO, or passkey prompts first. In enterprise environments, workspace permissions may require admin help.

Best use cases

It fits users who run longer Codex coding tasks, want to approve commands away from the desk, manage multiple projects or threads, and already use a Mac as the main development machine.

It is less useful if you mainly use Windows or Linux, only use Codex CLI or an IDE Extension, expect the phone to be an independent development environment, or work on an unstable network.

My take

Codex mobile remote access is not about moving development to a phone. It is about making the waiting time around Codex more manageable.

Previously, long Codex tasks often stopped at approval, clarification, failing tests, or direction changes. Now those moments can be handled from the ChatGPT mobile app, while the Mac continues to do the actual engineering work.

If you already use Codex heavily on a Mac, this is worth enabling. If you only ask occasional coding questions, the value will be less obvious.

References

Claude Code + Ollama Local Deployment Guide: Build a Free AI Coding Assistant with CC Switch

Fri, 15 May 2026 23:27:50 +0800

Claude Code has become a popular AI coding assistant recently. Its appeal is not just that it can chat about code, but that it can read a project, modify files, run commands, install dependencies, and keep fixing errors in an agent-like workflow.

The hard part is cost. Once a project grows, long context and repeated agent turns can burn through API quota quickly. If you just want to experiment, refactor small utilities, generate scripts, or work on a private local project, it is natural to ask: can Claude Code’s workflow be kept while the model runs locally?

The key tool in this setup is CC Switch. It lets Claude Code connect to the local Ollama service through an OpenAI-compatible API endpoint, so requests can be forwarded to a local model instead of the official Claude API.

What This Setup Solves

You can think of the whole setup as:

1
2
3

Claude Code desktop
+ CC Switch API forwarding layer
+ Ollama local model

Claude Code is still responsible for the coding workflow and project operations. CC Switch handles model provider configuration and API compatibility. Ollama runs the model locally.

This does not make a local model suddenly become Claude. Its real value is that it makes Claude Code’s agent workflow usable in lower-cost, offline, and private local scenarios.

Basic Preparation

Before you start, prepare these pieces:

Install Git.
Install Ollama.
Pull a local model suitable for coding.
Install CC Switch.
Have Claude Code available on your machine.

For the model side, you can start with coding-oriented models, such as Qwen Coder, DeepSeek Coder, or other models with decent tool-calling and code generation behavior. The larger the model, the better the result may be, but memory and GPU pressure will also rise.

If your machine only has limited memory, start with a smaller model first. Confirm that the workflow runs smoothly before trying a larger one.

Key CC Switch Configuration

After Ollama starts, its default local API address is usually:

`1`	`http://127.0.0.1:11434/v1`

In CC Switch, choose an OpenAI-compatible provider type, commonly:

`1`	`OpenAI Chat Completions`

Then point the base URL to Ollama’s local address.

For the API key field, local Ollama normally does not need a real key, but many tools still require an environment variable or placeholder. You can use:

`1`	`ANTHROPIC_API_KEY`

or another placeholder variable accepted by your local setup.

One configuration item is worth special attention:

`1`	`"inferenceModels"="[\"haiku\",\"sonnet\",\"opus\"]"`

This means mapping Claude Code’s expected model roles to the local provider. In practice, you need to bind haiku, sonnet, and opus to the model names exposed by Ollama or CC Switch. If this mapping is wrong, Claude Code may fail to call the model or may keep falling back to an unexpected configuration.

Where Claude Code Is Strong

Claude Code’s biggest advantage is not raw completion. It is the full coding workflow:

reading and understanding project structure;
locating related files based on a task;
editing code directly;
running commands and tests;
observing errors and iterating;
completing multi-step tasks in one session.

This is why many people want to keep Claude Code even when switching to a local model. A normal chat UI can generate code snippets, but it does not naturally operate inside a repository. Claude Code is closer to an executable development assistant.

What Role Ollama Plays Here

Ollama is responsible for local model runtime and management. It handles model downloading, loading, and local inference.

The advantage is clear: requests stay on your machine, repeated use does not create API bills, and you can use it when the network is limited. For private code, this is also easier to accept than sending every context window to a cloud model.

The trade-off is also clear. Local models depend heavily on your hardware and on model quality. A smaller model can handle simple edits, explanations, and script generation, but it may struggle with large cross-file refactors or subtle architectural decisions.

Where The Experience Has Boundaries

This setup should not be treated as a full replacement for Claude’s strongest cloud models.

You may run into these issues:

weaker long-context understanding;
unstable tool-calling behavior in complex tasks;
slower inference on CPU-only machines;
more hallucinated file paths or APIs;
less reliable multi-round planning;
lower success rate on large repository refactors.

So the better expectation is: use it as a free local development assistant, not as a perfect substitute for a top-tier cloud model.

Multimodal Compatibility Is Still Unstable

Some users want Claude Code to handle screenshots, UI images, diagrams, or other multimodal inputs. This part depends on the local model and the forwarding layer.

If the selected Ollama model does not support vision, or CC Switch does not translate the request format correctly, multimodal features may fail. Even with a vision model, behavior may differ from Claude’s official API.

For now, this setup is more suitable for text and code workflows. Treat multimodal support as experimental.

Who Should Try It

This setup is suitable for:

developers who want to try Claude Code’s workflow at low cost;
users who frequently write scripts, small tools, and automation snippets;
teams that want to keep code on local machines;
learners who want an AI coding assistant without constant API spend;
people testing different local coding models.

It is less suitable if you rely heavily on long context, large monorepos, strict code review quality, or complex full-project refactors.

Usage Advice

Start with small tasks.

For example:

explain a single file;
refactor a small function;
generate a shell script;
fix a simple error;
add a small feature;
write unit tests for a narrow module.

After each change, run tests or at least review the diff yourself. A local model can be useful, but you should not blindly accept every generated edit.

If the model keeps losing context, reduce the task scope. Instead of asking it to “refactor the whole project”, ask it to “refactor this function” or “add validation in this file”.

Summary

Claude Code + CC Switch + Ollama is an interesting combination. It keeps Claude Code’s agent-style development workflow while moving inference to a local model.

Its biggest strengths are lower cost, local privacy, and a smooth development workflow. Its limits are also obvious: model quality, hardware performance, long context, and tool-calling stability all affect the final experience.

If you already use Ollama and want a more practical local AI coding workflow, this setup is worth trying. Just remember to start small, verify every change, and treat the local model as an assistant rather than an automatic engineer.

Prompt-Vault: a prompt specification library for testing AI coding ability

Fri, 15 May 2026 09:00:52 +0800

w512/Prompt-Vault is a small but useful prompt repository. It does not collect magic prompts; it organizes executable coding prompts into difficulty levels so they can be used to test LLMs and coding agents.

Project: https://github.com/w512/Prompt-Vault

The repository is small, but the structure is clear: Easy, Medium, and Hard. Each Markdown file is a standalone task. The README also says these prompts are suitable for testing language models or practicing small projects.

Not a prompt scrapbook

Many prompt repositories look large but are hard to evaluate. The titles are attractive, but the prompts lack acceptance criteria.

Prompt-Vault is closer to a specification library. Each task tries to describe:

What app to build
Required features
UI style
Technical constraints
Whether it must run as a single file
Whether dependencies are allowed
Whether data should persist

This is much better for testing models than “make a nice Kanban board”, because it reveals whether the model truly understands requirements.

Easy: basic interaction

Easy/Bubble_Sort_Visualizer.md asks for a single-file index.html that visualizes bubble sort with bars, start/reset buttons, a speed slider, comparison count, and a dark theme.

It tests whether a model can connect algorithm state to UI, control animation timing, handle reset and running states, and keep the code readable.

Easy/ToDo_List.md starts from static HTML and gradually adds task creation, completed state, deletion, counters, Active / Completed stats, and localStorage.

It is a simple task, but it tests whether a model can evolve code step by step instead of dumping a messy implementation.

Medium: state and animation complexity

Medium/Sorting_Visualization.md upgrades the challenge. The same page must support Bubble Sort, Insertion Sort, Selection Sort, Merge Sort, Quick Sort, and Heap Sort.

It also needs algorithm selection, speed and size sliders, reset, start / pause, and a live stats panel.

This catches many failures: an agent may implement one bubble sort animation, but multiple algorithms plus pause/resume and stats often break state management.

Useful checks include:

Does every algorithm really sort?
Does the animation match the algorithm steps?
Can it pause and resume?
Does reset stop old animation loops?
Does changing array size break state?
Are the statistics credible?

Hard: product completeness

Hard/Kanban_Board.md asks for a complete board: default columns, custom columns, double-click rename, delete empty columns, cards with title and description, priority, deadline, drag-and-drop, search, priority filter, localStorage, footer stats, glassmorphism dark theme, and responsive horizontal scrolling.

This tests product completeness, not just one feature.

Hard/Markdown_Editor_Desktop.md asks for a Tauri 2 cross-platform Markdown editor. It includes split editing and preview, sync scrolling, live rendering, preview mode, focus mode, open/save/save-as, unsaved title markers, formatting toolbar, shortcuts, themes, font settings, Vue 3, Pinia, marked.js, prism.js, and Tauri plugins.

This is no longer a simple web prompt. It tests frontend state, Tauri plugins, filesystem permissions, IPC boundaries, and desktop packaging.

Why it is valuable

Prompt-Vault is valuable because it provides reusable evaluation samples.

If you compare models or coding agents, you can run the same prompt repeatedly and observe:

Which model follows constraints
Which model misses fewer features
Which model handles edge cases
Which output is easier to maintain
Which model is better at UI details
Which model is stable under single-file constraints

This is more reliable than “it feels smarter”.

Frontend tasks are especially useful because many failures are not syntax errors. They are missing button states, broken animation, lost persistence, wrong drag targets, or stale statistics.

How to extend it

The repository could become a stronger benchmark by adding acceptance checklists, failure cases, scoring dimensions, reference implementations, and cross-model result records.

For example, a sorting task should include checks such as “rapid Start / Reset clicks must not create multiple animation loops.” A Kanban task should specify what happens when deleting a non-empty column.

These details make the prompt useful for human review and automated agent evaluation.

Suggested use

To test an AI coding tool:

Give one prompt unchanged.
Do not add extra hints.
Run the generated result.
Check features one by one.
Record missing features and bugs.
Give one repair round.
Compare time, token cost, and final code quality.

This is closer to real development than simply checking whether a page appears.

Summary

Prompt-Vault is a lightweight prompt specification library. It is useful for AI coding tests and for frontend practice projects.

It reminds us that a good coding prompt is not just a wish. It should define requirements, constraints, interactions, state, acceptance, and run mode.

If you compare Codex, Claude Code, Cursor, Gemini CLI, or other coding agents, this kind of leveled prompt is worth keeping.

What is Token Efficiency? DeepSeek V4, big-model planning, and small-model execution

Fri, 15 May 2026 08:59:33 +0800

The next important metric for AI coding may not be who has the strongest model, but who can complete more verifiable work with fewer tokens, lower cost, and a more stable process.

That is the value of Token Efficiency.

Many people hear Token Efficiency and think only about cheaper models, longer context, or cheaper cache hits. Those are base conditions. Real productivity comes from model division of labor, task orchestration, context budgeting, and evaluation.

In other words, Token Efficiency is not a cost-saving trick. It is an engineering method for turning tokens into output.

DeepSeek V4: productizing the split between planner and executor

The missing background in this topic is the positioning of DeepSeek V4.

DeepSeek V4 is not just another stronger model. It splits the two capabilities needed for Token Efficiency into V4 Pro and V4 Flash: V4 Pro is better suited for planning, reasoning, architecture judgment, and critical review, while V4 Flash fits high-frequency execution, batch rewriting, code completion, data organization, and ordinary agent-loop nodes.

That maps directly to two roles in AI coding:

V4 Pro: planner / consultant for requirement breakdown, technical design, complex bug analysis, architecture review, and final acceptance.
V4 Flash: executor for file scanning, simple implementation, test completion, documentation, candidate generation, and repetitive work.

DeepSeek’s API documentation shows that both V4 Flash and V4 Pro support 1M context, JSON Output, Tool Calls, Chat Prefix Completion, and FIM Completion. The pricing page also prices cache-hit input separately and notes that input cache-hit prices have been reduced to one tenth of the launch price.

Together, these are why it matters for Token Efficiency: 1M context reduces compression in complex agent tasks; low cache-hit pricing lowers the cost of repeatedly loading prompts, project docs, code, and history; the Flash / Pro split solves the problem of using a flagship model for every step or an unstable small model for every step.

DeepSeek V4 should therefore be understood in three ways:

Cheap execution layer: many agent nodes can run on V4 Flash.
Usable judgment layer: key steps can still call V4 Pro.
Long-chain friendly: 1M context and cache pricing make codebases, docs, and tool history easier to keep in the usable window.

Its significance for AI coding is not just another model option. It offers a realistic cost structure for the “consultant model + executor model + harness orchestration” pattern.

Do not let the strongest model do everything

The old approach was to pick the smartest model and let it handle requirement analysis, code, tests, and summaries end to end.

That is simple but not always efficient. Many tasks do not need frontier reasoning. Expensive models should behave more like consultants, architects, or planners that appear only at key decision points.

A better structure is:

Big models break down problems and make key decisions.
Small models execute, batch-process, and repeat edits.
Tools and harnesses manage process, state, context, and validation.
Humans define product goals, accept results, and make tradeoffs.

This prevents frontier reasoning from being wasted on mechanical execution.

Context is not always better when larger

Long context matters for coding agents because code, docs, chat history, test output, and logs all consume the window. When the window fills up, compression, forgetting, and misjudgment appear.

But long context does not mean dumping everything into the model.

Token Efficiency means each task should fit inside a clear, controlled context window:

Bring only necessary files.
Include only decision-relevant documents.
Keep only the current state from history.
Give each node clear input and output.
Compress completed work into structured summaries for the next node.

Cheap context can tempt people to include noise. Noise does not make a model smarter.

Harness matters more than a single model

Connecting Claude Code, Codex, or another coding agent to a cheap model is not enough. Small models drift in long-chain tasks unless a stronger process controls them.

A harness is a scheduling system. It decides how to split tasks, run nodes, choose models, validate results, retry failures, and pass context.

A useful orchestration system should answer:

Which tasks need planning?
Which tasks can execute directly?
Which nodes can run in parallel?
Which nodes must be serial?
Which nodes use big models or small models?
What is the context budget for each node?
What structured output does each node produce?
Who reviews and decides whether to continue?

Without this software layer, small models are merely cheap. With it, they can become leverage.

Split tasks with DAGs

A good approach is to split complex work into a directed acyclic graph.

A feature task might become:

Requirement clarification
Technical design
Task decomposition
Implementation
Test completion
Code Review
Fixes
PR submission

Each node can be an independent agent with its own role, prompt, tools, permissions, and output format. Nodes should pass structured results, not long chat transcripts.

This makes each node shorter, easier for small models, and easier to measure.

Run multiple task replicas

When tokens are cheap enough, the same task does not have to run only once.

You can run the same task with different models, prompts, or orchestrations, then pick the best result or merge useful parts. This is suitable for design proposals, copy, test cases, bug hypotheses, refactor options, and code review.

It is not suitable for tasks with external side effects, shared mutable state, or unclear acceptance criteria.

The goal is not gambling. It is collecting comparable samples that can improve orchestration, model selection, and node skills.

Build an evaluation system

Token Efficiency cannot be judged only by price. A cheap model with a high failure rate can consume more human time and become more expensive.

Start recording:

Completion rate
Human interventions
Tool-call failure rate
Test pass rate
Review findings
Token cost per task
Time per task
Rework count
Differences between model combinations

With this data, you can decide which tasks fit small models, which require big models, and which should stay human-led.

Make business workflows atomic

Most users do not need to build a full harness today. But they can start decomposing their business workflow into atomic nodes.

Content production can become topic selection, research, outline, draft, fact check, style rewrite, SEO title, translation, and publishing check.

Software development can become requirement confirmation, technical design, data structure, API change, unit tests, implementation, migration script, documentation, and review.

Each node should have clear input, output, acceptance, and context limits. When harness tools mature, these workflows can plug in directly.

Hardware is not the first priority

Many discussions of Token Efficiency jump to local deployment and GPUs. For most people, API should still be the first choice.

Before the economic model works, local hardware is only prepaid cost. A safer sequence is:

Use API to validate the workflow.
Record task evaluation and cost.
Find stable high-frequency execution nodes.
Consider which nodes should be localized.
Then calculate hardware, power, maintenance, and depreciation.

For personal productivity, API is often enough. For startups exploring inference frameworks and model boundaries, local CUDA platforms can be useful. For production workloads with clear unit economics, multi-GPU deployment becomes worth discussing.

Summary

Token Efficiency is not replacing expensive models with cheap ones. It is redesigning the AI workflow.

Big models make key judgments, small models execute in bulk, the harness schedules and validates, and humans define goals and acceptance. Only when these layers work together can tokens reliably become productivity.

Models will get cheaper, context windows will grow, and small models will improve. The future gap may not be who calls the strongest model, but who can use the same tokens to produce more real output.

Superpowers: a skills framework that pulls coding agents back into engineering process

Fri, 15 May 2026 08:53:17 +0800

obra/superpowers is both a skills framework for coding agents and a software development methodology. Its goal is not to add another universal prompt, but to make agents follow a process: clarify goals, produce a design, write a plan, implement through TDD, then review and finish.

Project: https://github.com/obra/superpowers

At the time of writing, the GitHub API shows more than 190,000 stars, an MIT license, and recent activity. The README describes it plainly: An agentic skills framework & software development methodology that works.

What problem it solves

Many AI coding tools are not weak at writing code; they are too eager to write code.

A user says something vague, the agent edits files, and the result looks finished while boundaries, tests, and architecture remain unclear. Small tasks may survive this. Complex projects turn it into rework and technical debt.

Superpowers makes the agent enter a workflow before touching code:

When the user wants to build something, ask about the goal first.
Turn the conversation into a spec and confirm it in sections.
After design approval, write an implementation plan.
After the user says “go”, begin implementation.
During implementation, emphasize TDD, YAGNI, DRY, and code review.

This is not new software engineering. It is important because fast agents need stronger guardrails.

Supported tools

Superpowers is not tied to a single agent. The README lists installation paths for Claude Code, Codex CLI, Codex App, Factory Droid, Gemini CLI, OpenCode, Cursor, and GitHub Copilot CLI.

That makes it more like a workflow layer across harnesses than a model-specific trick.

The base workflow

The base workflow has several stages.

First is brainstorming. Before implementation, the agent turns rough ideas into an executable design and confirms it with the user.

Second is using-git-worktrees. After design approval, it creates an isolated worktree and branch, then checks that install and test baselines are clean.

Third is writing-plans. It decomposes design into small tasks with paths, code scopes, and validation steps. The plan should be clear enough for someone without context to execute.

Fourth is execution. subagent-driven-development can dispatch tasks to subagents, while executing-plans runs them in batches. Each task should be reviewable and verifiable.

Fifth is test-driven-development: true RED-GREEN-REFACTOR. Write a failing test, confirm failure, implement minimally, confirm pass, refactor.

Sixth is requesting-code-review. Reviews happen between tasks; critical findings block progress.

Finally, finishing-a-development-branch validates tests and offers choices such as merge, PR, keep, or discard the worktree.

What is in the skills library

The skills library can be grouped by purpose.

Testing centers on test-driven-development.

Debugging includes systematic-debugging and verification-before-completion. They focus on reproduction, minimization, hypotheses, validation, and not claiming completion before verification.

Collaboration skills include:

brainstorming
writing-plans
executing-plans
dispatching-parallel-agents
requesting-code-review
receiving-code-review
using-git-worktrees
finishing-a-development-branch
subagent-driven-development

Meta skills include writing-skills and using-superpowers.

Together they give the agent engineering habits: when to ask, when to plan, when to test, and when to stop for review.

How it differs from a prompt

A normal prompt often piles rules into one system message: do not over-edit, think first, test, explain, be concise. As rules accumulate, complex tasks make the model forget or ignore some of them.

Superpowers splits rules into phase-specific workflow modules. Each skill is shorter and focused. The agent knows the current phase, complex processes become checkable, and teams can turn their own practices into reusable skills.

The lesson is not just “use a smarter model”. Give the model a repeatable way to work.

Who should use it

Superpowers is most useful for developers already using coding agents on real projects, especially when:

The task spans multiple files.
The agent should design before implementation.
TDD or validation matters.
Multiple branches or worktrees are common.
Subagents can help with implementation or review.
A team wants to encode its workflow as skills.

For a one-line config change, it may feel heavy. For multi-step development, the constraints are valuable.

Notes before using it

Do not treat it as full autopilot. It gives the agent process, but humans still own requirements, tradeoffs, and final acceptance.

TDD and review add upfront cost. For small tasks they may slow things down; for complex tasks they reduce rework.

Parallel subagents are not always better. They work when boundaries and write scopes are clear. If the requirement is still fuzzy, parallelism only multiplies confusion.

Teams must maintain skill quality. Outdated processes, vague instructions, and conflicting rules can also hurt agents.

Summary

Superpowers is valuable because it pulls coding agents away from “receive request, edit code” and back into software engineering process.

AI coding often lacks not generation speed, but clarification, planning, verification, review, and closure. The stronger the model becomes, the less these steps should be skipped.

If you use Codex, Claude Code, Cursor, or Gemini CLI on real projects, Superpowers is worth studying. Even if you do not install it, its skill decomposition is a good reference for designing your own agent workflow.

Reject Vibe Coding: Matt Pocock's skills repo adds engineering constraints to AI coding

Fri, 15 May 2026 08:46:23 +0800

The faster AI writes code, the faster a project can lose control. The real question is not whether a model can generate functions, but whether it understands the requirement, follows the team’s language, and makes small changes inside the existing architecture.

Matt Pocock’s mattpocock/skills repository points in the opposite direction of casual vibe coding: do not let AI take over the whole development process. Put it inside mature software engineering constraints.

Project: https://github.com/mattpocock/skills

This is not about one magic prompt. It is a set of composable agent skills that turn requirement clarification, domain modeling, TDD, debugging, and architecture review into AI-friendly workflows.

Solve alignment failure first

The most common failure in AI coding is assuming the model understood the request when it merely guessed from a vague sentence.

grill-me flips the interaction. Before writing code, the agent acts like a demanding reviewer and keeps asking about branches, boundaries, and unresolved decisions.

If you ask for a login page, it should first ask:

How should password reset work?
Should third-party login be supported?
What should failed-login errors look like?
Are account lockout, CAPTCHA, or risk controls in scope?
Where should the user go after success?

This feels slower, but it prevents expensive rework later. The cheaper code generation becomes, the more costly unclear requirements become.

Write domain language into context

Another common problem is generic vocabulary. The model does not know the team’s business terms, so names and documents drift.

grill-with-docs asks questions while also checking CONTEXT.md, ADRs, and domain docs. Once terms and decisions are confirmed, they can be written back into shared context.

This is close to the “ubiquitous language” idea in domain-driven design. If a team says customer instead of user, or transaction instead of order, the model should inherit that language.

Context documents are valuable because they reduce guessing.

Use TDD to slow down generation

AI is risky because it is fast. Bad code used to take time to write; now hundreds of lines can appear in seconds. The problem is not speed itself, but lack of feedback loops.

The tdd skill brings back red-green-refactor:

Write a failing test for one behavior.
Implement only enough code to pass.
Refactor.
Continue with the next vertical slice.

The key is one behavior at a time. AI executes, while humans keep control of direction and boundaries.

Debug through a loop

When facing a bug, many agents guess and patch repeatedly until the code becomes messier.

diagnose asks the agent to build a feedback loop:

Reproduce the issue
Minimize the case
Form a hypothesis
Add observations or logs
Fix the cause
Add a regression test

This process is old, but it matters even more with AI. The model is good at trying things; the loop keeps it close to the root cause.

Review architecture regularly

A task passing tests does not mean the codebase is healthier. Repeated AI patches can blur module boundaries, make interfaces more complex, and make tests harder to write.

Skills such as improve-codebase-architecture ask the agent to step back and inspect the whole codebase:

Are responsibilities mixing across modules?
Which interfaces are too complex?
Which paths are hard to test?
Which names conflict with domain language?
Which duplicate logic should be merged?

This is not automatic large-scale refactoring. It is structured observation and suggested direction; humans still decide whether and how far to change.

What really needs limiting is freedom

The core idea is simple: AI coding is not about letting the model improvise freely. It is about giving it clear goals, context, tests, and stopping conditions.

Humans define the problem, architecture, tradeoffs, and acceptance criteria. AI generates code, fills in tests, repeats edits, and handles local refactors. Used well, AI amplifies capability; used poorly, it amplifies confusion.

Software engineering fundamentals did not become obsolete because AI improved. Requirement clarity, domain language, TDD, diagnosis, and architecture review are becoming more important.

More people will be able to write code. The gap will be between people who can put AI inside a maintainable, verifiable, evolving engineering system and those who cannot.

What is cc-haha? A project that turns Claude Code into a desktop workbench

Thu, 14 May 2026 22:38:04 +0800

cc-haha is a project built around a modified Claude Code workflow. Its full repository name is NanmiCoder/cc-haha. The project page says plainly that it is based on Claude Code source code leaked from the Anthropic npm registry on 2026-03-31, and that its current main form is a desktop Claude Code workbench.

Project URL: https://github.com/NanmiCoder/cc-haha

There are two important points in that description.

First, it is not Anthropic’s official Claude Code. The README also states that the original source code copyright belongs to Anthropic and that the project is only for learning and research.

Second, its focus is no longer just “run a Claude Code CLI locally.” Judging from the README and the latest release, cc-haha is more like a desktop app that brings Claude Code sessions, projects, permissions, diffs, Computer Use, remote access, and model provider configuration into one place.

What problem is it trying to solve?

Claude Code is originally terminal-oriented. Sessions, command execution, permission prompts, file edits, and context switching all happen in the terminal. That works for people who are comfortable with CLI tools, but long-term use exposes a few rough edges:

Multiple projects and sessions are hard to manage side by side.
To see what files the AI changed, you often need to switch to Git or an editor.
Permission approvals, command execution, and file diffs are spread across different surfaces.
Remote viewing from a phone or another device requires extra setup.
Connecting non-Anthropic models requires dealing with protocol compatibility.

cc-haha tries to package these pieces into a graphical workbench. It is not just a skin for Claude Code; it moves session management and local development flow control into the desktop app.

Desktop workbench: from terminal to control center

According to the README, the cc-haha desktop app brings these capabilities into a macOS / Windows app:

Multi-session workbench: manage tasks with tabs, project switching, terminal entry points, and session history.
Branch / Worktree launch: choose a repository branch for a new session and decide whether to use the current worktree or an isolated Worktree.
Right-side code changes panel: view modified files, added and removed lines, and workspace status while chatting.
Visualized code edits: inspect AI edits, diffs, and execution steps.
Permission and approval flow: review dangerous commands, tool calls, and AI questions in the desktop app.
Multiple model providers: supports Anthropic-compatible APIs, third-party models, WebSearch fallback, and local configuration.
H5 remote access: use a one-time token to connect to the current desktop session from a phone or another device.
IM integration: use Telegram, Feishu, WeChat, or DingTalk to chat remotely, switch projects, and approve permissions.
Scheduled tasks and token usage: create scheduled tasks and view local token usage trends.

These features make it closer to an “AI coding workbench” than a simple command-line replacement. It tries to put the common surfaces of AI coding into one place: chat, file changes, permissions, projects, remote access, and model configuration.

Installation and startup

Most users should download the desktop installer from Releases.

The README describes the desktop install flow as:

Go to GitHub Releases and download the macOS or Windows installer.
On first launch, configure the model provider, API key, and default model in the desktop settings.
If macOS says the app cannot be opened, follow the installation guide to handle Gatekeeper permissions.

The latest release page shows that v0.2.6 was published on 2026-05-13. That version mainly focuses on restoring secure H5 mobile access, desktop session management, file mention search, and desktop UX polish.

If you want to start the CLI from source, the README provides:

1
2
3

bun install
cp .env.example .env
./bin/claude-haha

That path is better for people who want to debug the lower-level CLI, server, or build their own changes. For normal use, the desktop app is more direct.

What changed in v0.2.6

The main point of v0.2.6 is that H5/LAN access was pulled back from a temporary open state into an explicit enablement and token pairing model.

Notable changes include:

H5/LAN access must be explicitly enabled locally.
QR links carry a one-time visible token.
Remote APIs, proxies, and WebSockets are no longer exposed without protection.
Settings now has a separate H5 Access page.
The desktop sidebar gained batch management for selecting and deleting sessions.
Desktop file mention search became git-first, respects ignore rules, and reduces noise from node_modules and build output.
A pure white theme was added, and bugs such as long URLs breaking chat layout and draft leakage across tabs were fixed.

This shows the project has moved beyond “it runs” and is now filling in the safety boundaries and daily UX details that a desktop product needs.

The H5 access part deserves special care. The author explicitly notes in the release that H5 is a browser access entry for individuals or trusted teams, not a public multi-tenant login system. In practice, it should not be treated as an internet-facing SaaS admin console.

Computer Use: letting the Agent operate the desktop

Another important selling point of cc-haha is Computer Use.

The project docs say this feature is a heavily modified version of the Computer Use implementation in the leaked Claude Code source. The official implementation depends on Anthropic’s private native modules, such as @ant/computer-use-swift and @ant/computer-use-input, which are not publicly available. cc-haha replaces the low-level operation layer with a Python bridge using public libraries such as pyautogui, mss, and pyobjc.

Computer Use supports operations such as:

Screenshot: screenshot, zoom
Mouse: click, drag, move, scroll, and read cursor position
Keyboard: type text, press keys, hold keys
Applications: open applications, switch displays
Permissions: request app access, list granted applications
Clipboard: read and write clipboard content
Other: wait, batch operations

Its workflow is a “screenshot - analyze - act” loop:

The model receives a user request.
It calls screenshot to capture the screen.
The model uses vision to identify buttons, input fields, and coordinates.
It calls click, typing, or application tools.
It screenshots again to confirm the result, then continues.

From the docs, the fully supported platform is mainly macOS, including Apple Silicon and Intel. Windows / Linux are theoretically possible, but the pyobjc app-management parts need platform-specific replacements and are not fully adapted yet.

Runtime requirements include:

Bun >= 1.1.0
Python >= 3.8
macOS Accessibility permission
macOS Screen Recording permission

This kind of feature is powerful, but it also raises permission risk. When letting AI operate desktop apps, it is better to authorize only the applications that are clearly needed and avoid leaving sensitive content open in unrelated windows.

Multi-model access through an Anthropic-compatible layer

cc-haha still communicates using the Anthropic Messages API protocol. The project docs recommend using LiteLLM as a protocol conversion proxy.

The basic structure is:

`1`	`claude-code-haha ──Anthropic协议──▶ LiteLLM Proxy ──OpenAI协议──▶ 目标模型 API`

In other words, cc-haha sends Anthropic Messages API requests, LiteLLM converts them to formats such as OpenAI Chat Completions, and then forwards them to OpenAI, DeepSeek, Ollama, or other model services.

The LiteLLM install command in the docs is:

`1`	`pip install 'litellm[proxy]'`

Then you can configure OpenAI, DeepSeek, Ollama, and other models in litellm_config.yaml. After the proxy starts, set these values in .env or ~/.claude/settings.json:

ANTHROPIC_AUTH_TOKEN=sk-anything
ANTHROPIC_BASE_URL=http://localhost:4000
ANTHROPIC_MODEL=gpt-4o
ANTHROPIC_DEFAULT_SONNET_MODEL=gpt-4o
ANTHROPIC_DEFAULT_HAIKU_MODEL=gpt-4o
ANTHROPIC_DEFAULT_OPUS_MODEL=gpt-4o
API_TIMEOUT_MS=3000000
DISABLE_TELEMETRY=1
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

There are a few practical caveats:

drop_params: true is important, because Anthropic parameters such as thinking and cache_control do not exist in the OpenAI API.
Extended Thinking is an Anthropic-specific feature and is unavailable with third-party models.
Prompt Caching will not work in the Anthropic-native way.
Tool calls must be converted from Anthropic tool_use to OpenAI function calling, so complex tool use may have compatibility issues.
Small local Ollama models may not handle this tool-heavy workflow reliably.

So multi-model access can work, but that does not mean every model will feel the same. cc-haha still demands strong tool use, code understanding, and long-context ability from the model.

Who is it for?

cc-haha is better suited for:

People already familiar with Claude Code who want desktop session management.
Users who often work across multiple repositories, branches, and AI sessions.
People who want to inspect AI file changes, diffs, and workspace status in a side panel.
Users who want to experiment with Computer Use and let an Agent operate desktop apps.
People who want to connect OpenAI, DeepSeek, Ollama, or other models through an Anthropic-compatible protocol.
Users who need phone or IM-based remote viewing and permission approval.

It is less suitable for:

Users who only want the stable official Claude Code experience.
People who cannot accept the leaked-source background and copyright uncertainty.
Users who do not want to grant high system permissions to local tools.
Teams that need enterprise compliance, auditability, and official support.
Users unfamiliar with API keys, proxies, model compatibility, and local service configuration.

Risks and boundaries

This article cannot only talk about features. It also has to talk about risk.

The origin of cc-haha means it is not an ordinary community reimplementation. The README clearly states that it is based on leaked Claude Code source code and that the original source belongs to Anthropic. This creates uncertainty around copyright, compliance, and long-term maintenance.

Computer Use, H5 remote access, IM integration, and local permission approval are also high-permission capabilities. The more convenient they are, the more clearly boundaries need to be defined:

Do not expose H5 access on untrusted networks.
Do not treat the token as a long-term public login credential.
Do not grant the Agent access to unrelated sensitive applications.
Do not casually use it in production or company compliance environments.
Do not expose third-party model proxy settings or API keys in public repositories.

If your goal is to study AI coding tool architecture, desktop workflows, and Computer Use implementation, it is a useful reference. If you want to put it into a long-term production workflow, evaluate legal, permission, security, and maintenance risks first.

Summary

The most interesting thing about cc-haha is not whether it can replicate Claude Code. It is that it pushes Claude Code-style AI coding tools toward a desktop workbench form.

Sessions, projects, Worktree, diffs, permissions, remote access, Computer Use, model providers, scheduled tasks, and token usage are all brought into one desktop experience. That suggests the next step for AI coding tools is not only stronger models, but also a more complete workflow interface.

But its boundaries are also clear: it is not an official Anthropic product, it has a sensitive source-code background, and its high-permission features require caution. A better way to view it is as a project for observing where AI coding tools may evolve, not as a careless replacement for official Claude Code.

References

GitHub repository: https://github.com/NanmiCoder/cc-haha
Latest release: https://github.com/NanmiCoder/cc-haha/releases/tag/v0.2.6
Computer Use documentation: https://github.com/NanmiCoder/cc-haha/blob/main/docs/computer-use.md
Third-party model documentation: https://github.com/NanmiCoder/cc-haha/blob/main/docs/guide/third-party-models.md

Codex /goal vs Claude Code /goal: running long tasks until they are done

Thu, 14 May 2026 22:25:31 +0800

/goal is becoming an important command in AI coding tools.

It is not about making the model write a few more lines of code. It solves a more practical problem: when a task has clear completion conditions, can the agent keep going until those conditions are met, instead of stopping after every turn and waiting for the user to say “continue”?

Codex CLI has already added an experimental /goal command in its official docs. Claude Code has also published its own /goal documentation, describing it as an automation capability that can keep working across multiple turns. The names are the same, but the product direction is not exactly the same.

What problem does `/goal` solve?

Ordinary AI coding conversations usually work as a one-turn-at-a-time loop:

The user describes a task.
The agent analyzes, edits code, and runs tests.
The agent reports the result.
The user decides what to do next.

That workflow is fine for short tasks. But for migrations, refactors, test fixes, or issue backlog cleanup, it gets fragmented. The agent may move forward a little, then stop and wait for you to type “continue”.

/goal changes the question from “what should you do next?” to “what final state counts as done?” For example:

`1`	`/goal 完成登录模块迁移，所有 auth 测试通过，lint 无报错`

This kind of target naturally fits long tasks because it has a clear endpoint: tests pass, the build succeeds, files are split, a queue is empty, or acceptance criteria are satisfied.

Codex `/goal`: experimental and attached to the current thread

OpenAI’s Codex CLI documentation marks /goal as experimental. It is not a stable default capability and requires features.goals to be enabled first.

There are two ways to enable it:

`1`	`/experimental`

Or add this to config.toml:

1
2

[features]
goals = true

Once enabled, you can use it like this:

`1`	`/goal Finish the migration and keep tests green`

Common commands include:

/goal
/goal pause
/goal resume
/goal clear

According to OpenAI’s docs, Codex attaches the goal to the current active thread and keeps tracking that target while a larger task continues.

One detail matters here: the official wording for Codex /goal is restrained. It emphasizes setting an experimental goal for long-running work and attaching the goal to the current thread, but it does not describe, in the same level of detail as Claude Code’s docs, an independent evaluator that automatically checks every turn and starts the next one. So for now, it is better to treat Codex /goal as an experimental long-task goal mechanism, not a fully stable unattended execution mode.

Claude Code `/goal`: multi-turn execution driven by completion conditions

Claude Code’s /goal documentation is more explicit: after the user sets a completion condition, Claude keeps working across turns until that condition is met.

Example:

`1`	`/goal all tests in test/auth pass and the lint step is clean`

Claude Code’s mechanism is roughly:

After the current turn finishes, control is not immediately returned to the user.
A small, fast model checks whether the goal condition has already been met.
If it has not been met, Claude automatically starts the next turn.
If it has been met, the goal is cleared automatically and the completion status is recorded in the transcript.

This makes Claude Code’s /goal more like “auto-continue until the completion condition is satisfied.” It does not merely pin a target to the conversation; it gives an independent evaluation step the decision of whether to continue.

Claude Code also supports checking status directly:

/goal

The status shows the goal condition, elapsed time, evaluated turn count, token usage, and the evaluator’s latest reason.

To stop early, use:

`1`	`/goal clear`

stop, off, reset, none, and cancel also work as clearing aliases. After a goal is enabled, if the session is interrupted and later resumed with --resume or --continue, an active goal can be restored. However, elapsed time, turn count, and token baselines are recalculated.

The biggest difference

Both Codex and Claude Code are pushing AI coding from single-turn answers toward long-running task execution, but their /goal commands have different positioning.

Comparison	Codex CLI `/goal`	Claude Code `/goal`
Status	experimental	documented on a dedicated official page
Enablement	requires `features.goals`	usable directly in a trusted workspace
Goal scope	current active thread	current session
Common operations	set / view / pause / resume / clear	set / view / clear
Automatic evaluation	docs emphasize attachment and tracking	docs explicitly describe evaluator checks after each turn
Auto-continuation	official wording is restrained	starts the next turn automatically when conditions are unmet
Best fit	keeping a long-term target in a Codex task	letting Claude Code keep moving toward completion conditions

In short, Codex /goal is closer to “attach an experimental long-term target to the current thread.” Claude Code /goal is closer to “set a verifiable stop condition for the current session and let it keep working until satisfied.”

How to write a good `/goal`

Whichever tool you use, /goal is not a good place for vague wishes.

Not a great goal:

`1`	`/goal 把项目优化一下`

A better goal:

`1`	`/goal 将 payment 模块迁移到新 API，npm test -- payment 退出码为 0，git diff 只包含 payment 相关文件`

A good goal usually includes three things:

A clear completed state.
An executable validation method.
Boundaries that must be respected.

If the goal is large, add a stop condition:

`1`	`/goal 修复 eslint 报错，npm run lint 退出码为 0；如果超过 20 轮仍未完成，停止并总结剩余问题`

This matters. The stronger /goal becomes, the more it needs boundaries. Otherwise, the agent may modify too many files, run too long, consume too many tokens, or keep pushing forward on a question that should have been paused for human input.

When `/goal` is a good fit

Good fits:

Test fixes: until specific tests pass.
Code migrations: until all call sites are updated and compilation succeeds.
Batch cleanup: until a class of lint or type errors is reduced to zero.
Documentation completion: until all specified modules have documentation.
Issue queue handling: until every issue under a tag is handled or clearly classified.

Poor fits:

The requirement itself is still unclear.
The task needs frequent product judgment.
It involves high-risk deletion, data migration, or permission changes.
Acceptance can only be judged subjectively.
The task spans many unrelated modules.

A practical rule: if you can write “which command to run, what result to see, and which files must not be touched,” it is a good candidate for /goal. If you can only write “make this better,” ordinary conversation, plan mode, or human review is still safer.

What this means for AI coding tools

/goal points to a clear direction: AI coding tools are moving from interactive assistants toward continuously executable work units.

In the past, using an agent often meant staying nearby. If it got stuck, you prompted it. If tests finished, you told it to continue. If errors appeared, you issued another command. /goal compresses that interaction into a completion condition and lets the agent decide what the next turn should do.

But this also raises the bar for users. Writing prompts is no longer just describing a task; it also means defining acceptance criteria, validation commands, modification boundaries, and stop rules. In other words, the user’s job shifts from “keep telling it to continue” to “define what done means.”

The fact that both Codex and Claude Code have reached /goal shows that long-running agents are no longer only for background tasks or cloud queues. Local terminal coding tools now also need stronger autonomous progress.

Summary

Codex CLI and Claude Code both have /goal, but at this stage they should not be treated as the same feature.

Codex /goal is still experimental, requires features.goals, and is better understood as a way to maintain a long-term target in the current Codex thread. Claude Code /goal more explicitly connects completion conditions with auto-continuation, using an independent evaluator to decide whether to keep going.

For everyday development, this kind of command is best for engineering tasks with clear acceptance criteria. It does not replace product judgment or code review, but it can reduce the repetitive “continue,” “run it again,” and “fix until tests pass” loop inside long tasks.

The real skill is not memorizing the command. It is learning how to write tasks as clear, verifiable, stoppable goals.

References

OpenAI Codex CLI Slash Commands: https://developers.openai.com/codex/cli/slash-commands
Claude Code Goal documentation: https://code.claude.com/docs/en/goal

How Can Codex Use Chinese LLMs? OpenAI-Compatible APIs and the CodexBridge Approach

Wed, 13 May 2026 23:08:28 +0800

CodexBridge is a local bridge for exposing Codex CLI/SDK as an OpenAI-compatible HTTP service. With it, Codex no longer has to live only in the terminal. OpenWebUI, Cherry Studio, scripts, automation systems, or any client that supports OpenAI Chat Completions can call it.

The two core endpoints are /v1/chat/completions and /v1/models. The former handles conversations and supports both normal and SSE streaming responses. The latter lets clients discover models in the same way they read an OpenAI-style model list. For tools that already support OpenAI APIs, this usually means changing only the base URL, API key, and model name.

Project: https://github.com/begonia599/CodexBridge

What it is useful for

CodexBridge is useful when you want to plug Codex into existing AI clients or workflows. For example:

Select Codex directly in OpenWebUI or Cherry Studio.
Call local Codex from curl, Python, Node.js, or other scripts.
Let one frontend connect to OpenAI, Ollama, other compatible APIs, and Codex at the same time.
Keep Codex’s local threads, sandbox, working directory, and approval behavior.
Provide a unified /v1/chat/completions endpoint for internal tools.

It is not a new LLM, and it is not a full replacement for Codex CLI. More precisely, it is an adapter layer: Codex remains the upstream engine, while the bridge converts OpenAI-style requests into conversation input that Codex can handle.

Basic requirements

You need:

Node.js 18 or later.
Codex CLI installed and logged in.
npm, or pnpm / yarn if you prefer.

Basic source deployment:

git clone https://github.com/begonia599/CodexBridge
cd codexbridge
npm install
cp .env.example .env
cp .env .env.local

Then edit .env or .env.local to set the API key, default model, working directory, sandbox mode, network access, and related options.

Start the HTTP service:

`1`	`npm run codex:server`

The default port is 8080, and it can be changed with PORT. After startup, the service exposes:

1
2
3

GET /health
POST /v1/chat/completions
GET /v1/models

CLI conversation mode

Besides the HTTP service, CodexBridge also includes a lightweight CLI:

`1`	`npm run codex:chat`

You can type natural-language messages directly. Two useful commands are:

/reset: create a new Codex thread.
/exit: exit the CLI.

The current thread ID is stored in .codex_thread.json. If this file still exists the next time the CLI starts, the previous conversation can continue.

HTTP example

A minimal request looks like this:

curl http://localhost:8080/v1/chat/completions \
  -H "content-type: application/json" \
  -H "authorization: Bearer 123321" \
  -d '{"model":"gpt-5-codex:medium","session_id":"demo","messages":[{"role":"user","content":"ls"}]}'

Key points:

The token in authorization must match CODEX_BRIDGE_API_KEY.
model can include reasoning effort, such as gpt-5-codex:medium or gpt-5-codex:high.
session_id binds the request to a conversation and allows reuse of the same Codex thread.

For streaming output, add stream: true:

curl -N http://localhost:8080/v1/chat/completions \
  -H "content-type: application/json" \
  -H "authorization: Bearer 123321" \
  -d '{"model":"gpt-5-codex:high","session_id":"stream","stream":true,"messages":[{"role":"user","content":"Explain step by step how to create a Node.js project"}]}'

For clients that support OpenAI streaming responses, this feels much closer to a normal chat experience.

How sessions are persisted

Session mapping is one of CodexBridge’s important features. A request can pass a session ID through these fields:

session_id
conversation_id
thread_id
user

It can also be passed through request headers:

x-session-id
session-id
x-conversation-id
x-thread-id
x-user-id

For production use, enable:

`1`	`CODEX_REQUIRE_SESSION_ID=true`

This requires every request to include a session ID, preventing different users or chat windows from being mixed into the same temporary context. The bridge-side mapping is saved in .codex_threads.json. Deleting this file resets the bridge mapping, while Codex’s own threads remain under ~/.codex/sessions.

If CODEX_REQUIRE_SESSION_ID=false and the request provides no session ID, the bridge expands the current messages into one-off input for Codex. This is fine for temporary calls, but not for long-running conversations.

Multimodal input

CodexBridge supports OpenAI-style content blocks and converts images into Codex-compatible local_image input.

Remote images can be written as:

{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/demo.png"
  }
}

Local images can be written as:

{
  "type": "local_image",
  "path": "./images/demo.png"
}

Remote resources are downloaded into a temporary directory and cleaned up after the turn. In real use, watch the request body size, especially when sending base64 images. You may need to increase CODEX_JSON_LIMIT.

Structured output

If the client supports response_format, CodexBridge can map it to Codex’s outputSchema. This is useful when you want Codex to return a fixed JSON structure, such as a check result, summary, classification result, or automation report.

A minimal example:

{
  "model": "gpt-5-codex",
  "session_id": "lint",
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "lint_report",
      "schema": {
        "type": "object",
        "properties": {
          "summary": { "type": "string" },
          "status": {
            "type": "string",
            "enum": ["ok", "action_required"]
          }
        },
        "required": ["summary", "status"],
        "additionalProperties": false
      }
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "Check lint issues under src/ and return the result as JSON"
    }
  ]
}

type: "json_schema" must include a schema, otherwise the service returns 400.

Key environment variables

Common configuration can be grouped as follows.

Service and authentication:

1
2
3

PORT=8080
CODEX_BRIDGE_API_KEY=123321
CODEX_JSON_LIMIT=10mb

Default model:

1
2

CODEX_MODEL=gpt-5-codex
CODEX_REASONING=medium

Codex runtime:

CODEX_WORKDIR=
CODEX_SANDBOX_MODE=read-only
CODEX_APPROVAL_POLICY=never
CODEX_SKIP_GIT_CHECK=true

Network access:

1
2

CODEX_NETWORK_ACCESS=false
CODEX_WEB_SEARCH=false

If the service is only used for frontend chat, keeping network access off by default is safer. Enable these switches only when Codex clearly needs to run curl, git clone, or web search.

Docker and one-line scripts

The project also provides Docker deployment for long-running service use:

1
2

docker compose up -d
docker compose logs -f codexbridge

It also provides a Linux install script:

`1`	`curl -fsSL https://raw.githubusercontent.com/begonia599/CodexBridge/master/scripts/install.sh \| bash`

The script installs dependencies, clones or updates the repository, copies .env.example, and starts the service with Docker Compose. It requires sudo, so it is best suited to a clean server. If the machine already has a complex Node.js, Docker, or Codex setup, read the script before running it.

Common issues

Request returns 413

The request body is usually too large, often because of base64 images. Increase:

`1`	`CODEX_JSON_LIMIT=20mb`

API key is rejected

Check that the request header includes:

`1`	`Authorization: Bearer <your CODEX_BRIDGE_API_KEY>`

or use x-api-key.

Codex reports a Git repository restriction

If the working directory is not a trusted repository, Codex may trigger a check. Use this only in an environment you trust:

`1`	`CODEX_SKIP_GIT_CHECK=true`

Reset conversations

The bridge mapping lives in .codex_threads.json, while Codex’s own threads live in ~/.codex/sessions. Stop the service and delete the corresponding files or directories to reset them.

Recommendations

For local testing, start with the default API key and the read-only sandbox. After OpenWebUI, Cherry Studio, or scripts can call the service normally, gradually adjust CODEX_WORKDIR, CODEX_SANDBOX_MODE, CODEX_NETWORK_ACCESS, and CODEX_APPROVAL_POLICY.

For multi-user use, do at least three things:

Require session_id.
Change the default API key.
Clearly limit the working directory and sandbox permissions.

CodexBridge is valuable not because it is complex, but because it places Codex inside the existing OpenAI-compatible ecosystem. If a client can change its base URL, it can treat Codex like a normal chat model while still retaining Codex’s local threads, sandbox, and tool behavior.

Why DeepSeek Became the Cost-Saving Key in This Round of AI Coding Tools

Mon, 11 May 2026 04:59:00 +0800

In this round of AI coding tool competition, the surface battle is about model capability, plugin ecosystems, and agent automation. But once you actually use these tools, the first wall you hit is cost.

Claude Code, Codex, OpenClaw, and Superpowers are all useful, but they share one trait: once a task becomes complex, they eat tokens aggressively. They need to read the project, build a plan, call tools, summarize context, repeatedly check results, and sometimes launch multiple subtasks. The smarter the model and the more automated the workflow, the easier it is for the bill to quietly grow.

That is why DeepSeek has become important in this cycle. Not merely because it can write code, but because its long context and cache pricing happen to hit the most expensive part of AI coding tools.

Why Agent Tools Burn So Many Tokens

Traditional chat-style coding assistants usually work in question-and-answer mode. You ask how to write a function, and the model returns a code snippet. This still costs tokens, but it is relatively controllable.

Agent tools are different. They do not just answer questions. They enter the project like a temporary engineer:

scan directories and key files;
understand the requirement and existing architecture;
make a plan;
modify files;
run commands or tests;
keep fixing based on errors;
summarize what changed at the end.

During this process, the model repeatedly reads the same context. Project descriptions, code snippets, tool outputs, conversation history, plans, and error logs all get placed back into the context. Once the task is a little complex, hundreds of thousands of tokens can disappear quickly.

If you add more aggressive plugins, the cost becomes even more obvious. Some OpenCode or Claude Code enhancement tools may organize a whole agent team by default. You only wanted to change a small feature, but it may still start planning, review, execution, and retrospective steps. The task may look more “intelligent”, but the token count keeps climbing.

The Advantage of Superpowers Is On-Demand Activation

One advantage of tools like Superpowers is that they do not force a full agent workflow onto every task.

Most of the time, you can still let Claude Code, OpenCode, or Codex work in their normal mode. Only when you explicitly call a skill, such as brainstorming, planning, executing a plan, or doing a retrospective, does it enter a heavier automation flow.

That matters for cost.

AI coding should not use heavy artillery for every task. Changing one config line, checking one error, or writing a small script can be handled through ordinary conversation. Only complex refactors, cross-file changes, long-document processing, and multi-round validation deserve a full agent workflow.

The stronger the tool, the more you need to control when it triggers. Otherwise, more automation simply means more waste.

DeepSeek’s Key Advantage Is Cheap Cache Hits

One important reason DeepSeek fits these agent tools is its low cache-hit cost.

AI coding tasks contain a lot of repeated prefixes: project background, system prompts, tool instructions, file content, and earlier conversation turns often appear again in later requests. If the model service supports prompt caching, those repeated parts become much cheaper after a cache hit.

For many models, a cache hit is only somewhat cheaper than a miss, perhaps around one third of the original price. DeepSeek’s advantage is that the gap after a cache hit can be much larger. For long-context, multi-round agent workflows that repeatedly read the same project, this gap shows up directly on the bill.

In other words, DeepSeek is not necessarily the strongest answer on every single turn. But in scenarios with long tasks, many rounds, and repeated context reads, its cost structure is unusually suitable for AI coding.

Long Context Makes Claude Code More Useful

When Claude Code or similar tools are connected to DeepSeek V4, another clear advantage is long context.

AI coding tools fear insufficient context. Once context runs short, compression becomes frequent. Once compression becomes frequent, previously read details may be lost. The model may start forgetting the project structure, constraints, or why a certain file was changed, and quality declines afterward.

DeepSeek V4’s long-context capability makes it better suited for code repositories, document batch processing, subtitle translation, and site article cleanup. Especially when connected to tools like Claude Code or OpenClaw, the right configuration can delay context compression and preserve more project detail.

That is why some tasks feel “durable” when run on DeepSeek. It may not be dazzling at every step, but it can tolerate long-running, low-cost, repeated calls.

How to Split Work Between V4 Pro and V4 Flash

DeepSeek V4 Pro and V4 Flash should not be mixed casually.

For simple tasks, DeepSeek V4 Flash is usually a better fit. It is fast and cheap, and is often enough for:

subtitle translation;
document cleanup;
ordinary script generation;
small code edits;
lightweight OpenClaw tasks;
simple site content processing.

For complex tasks, consider DeepSeek V4 Pro:

large-scale refactoring;
multi-module code understanding;
complex reasoning;
long-chain agent tasks;
high-risk code changes;
engineering tasks that require stronger planning.

Many people want to attach the strongest model immediately, but that is often uneconomical. The practical way to use AI coding tools is to layer tasks: let the cheaper model handle a large amount of routine work, and reserve the expensive model for key decision points.

MiniMax, Doubao, and DeepSeek Occupy Different Positions

Among domestic models and plans, MiniMax, Doubao, Kimi, and DeepSeek each have their own place.

MiniMax’s advantage is generous quota, low price, and broad functionality. It may not be the smartest coding model, but it is cost-effective for translation, lightweight cleanup, and batch processing. For example, batch subtitle processing, format conversion, and simple proofreading are good fits for MiniMax-style plans.

Doubao’s advantage is a broader tool ecosystem: image, video, search, TTS, possible STT, and embedding can be connected together. It feels more like a comprehensive toolbox.

DeepSeek’s position is clearer: text, code, long context, and low-cost caching. It lacks a complete image generation, voice, and video ecosystem, and its weaknesses are obvious. But in AI coding and long-text agent workflows, its strengths are long enough to matter.

So this is not about one tool replacing another. It is about splitting the task and using each tool where it fits.

Saving Money Is Not Just Choosing a Cheap Model

Saving money in AI coding does not mean simply switching every request to the cheapest model.

The effective methods are:

Do not start a heavy agent for simple tasks.
Do not use Pro when Flash is enough.
Use cache as much as possible for long tasks.
Keep repeated context stable, so meaningless changes do not break cache hits.
Let a cheaper model draft and batch-process first, then use a stronger model for key reviews.
Tell the agent clearly not to repeat facts or summarize the same point again and again.

The last point matters more than it looks. AI tools are prone to verbosity, and verbosity is not only a reading problem; it is also a cost problem. Putting “describe each fact once and state each opinion once” into the prompt can improve both article quality and token consumption.

What AI Coding Workflows DeepSeek Fits Best

DeepSeek is best suited for:

reading long code repositories;
lightweight multi-file edits;
batch document cleanup;
batch subtitle translation;
Hugo article cleanup;
agent plan execution;
low-cost automation with lots of repeated context.

It is not the best fit for every task. If you need especially strong frontend taste, complex product judgment, or cross-modal creation, you may still need Claude, GPT, Gemini, Doubao, or other tools.

But whenever a task is long-text, long-context, repeated-call, and cost-sensitive, DeepSeek can easily become the first choice.

Summary

In this round of AI coding tools, DeepSeek’s value is not just that a domestic model can write code. Its real value is that it addresses the most practical pain point of agent tools: long tasks are too expensive.

Tools like Claude Code, OpenClaw, and Superpowers make the development process increasingly automated, but behind that automation are massive context reads and multi-round calls. Whoever can lower this part of the cost can make AI coding go from “fun once in a while” to “affordable every day”.

DeepSeek’s long context, low cache cost, and layered use of V4 Flash / V4 Pro put it in exactly that position.

The real cost-saving key in this cycle is not avoiding good models. It is combining good models, cheap models, cache, and agent workflows properly. Once you understand that bill, AI coding tools can become real productivity rather than a beautiful but expensive toy.

How to Choose AI Coding Plans: Convenience for Light Users, Flexibility for Heavy Users

Sun, 10 May 2026 08:20:58 +0800

AI coding plans have changed quickly over the past six months. Many tools have shifted from message-style pricing to usage-based pricing, generous low-cost tiers have become tighter, and some overseas services have added stricter identity checks, regional limits, and usage rules.

For developers, the question is no longer just which model is strongest. It is also about how much to spend every month, whether the quota is enough, whether the tool feels comfortable to use, and whether you can switch smoothly when a provider suddenly raises prices or changes the rules.

A practical conclusion is this: light users should buy convenience, mid-level users should buy value, and heavy users should buy flexibility. The heavier your usage, the less you should bind models and tools together in a single plan.

Four things to evaluate before choosing a plan

In the past, people usually looked at three things when choosing an AI coding plan:

Whether the model was strong enough.
Whether the response speed was stable.
Whether the usage quota was sufficient.

Now there is a fourth factor: whether the model and the tool can be separated.

The model provides reasoning ability, while the tool provides context management, file editing, agent orchestration, and workflow experience. Both matter, but they are better not fully tied together. For example, if you like Claude models, you can buy an official plan or connect the API to another tool. If you like a certain editor or agent environment, it is better if it can connect to different models instead of only its own.

The value here is not complexity for its own sake. It is risk reduction. AI coding is one of the fastest-changing segments in the industry. A plan that feels generous today may switch pricing in two months, and a tool that feels good today may become worse after the next model integration change. Separating models from tools gives you room to move.

Overseas plans are getting tighter

Tools such as GitHub Copilot, Cursor, Windsurf, and Claude Code are still the primary choices for many users, but the trend is clear: cheap plans with unusually high quotas are becoming harder to sustain, and usage-based billing is becoming more common.

Once services like GitHub Copilot lean more heavily on usage-based billing, the room for plan-based arbitrage becomes much smaller. For light users, these products are still convenient. But for people who frequently use agents, long context, and complex code tasks, actual consumption starts to look much closer to real API cost.

Cursor and Windsurf essentially package model capability into an IDE experience. Their strength is convenience and a mature editor workflow. Their weakness is tighter tool lock-in. Once you become dependent on their proprietary agents, indexing, and automation flow, migration costs can rise quickly.

Claude Code remains attractive in terms of experience and ecosystem attention, but overseas subscriptions, identity verification, regional restrictions, and the safety of relay services are all risks that users in China have to factor in. Third-party relay services may mix models, be unstable, expose user data, or even disappear entirely, which makes them hard to treat as long-term infrastructure for important work.

The strengths and limits of domestic plans

One advantage of domestic AI coding plans is that many of them are offered through APIs, which means they are less tightly bound to a specific tool. You can connect them to OpenCode, Cline, Continue, your own scripts, or internal agents.

The weakness is also clear: if you want model strength, high speed, and generous quota all at once, very few plans can deliver everything together.

GLM models are strong within the domestic model landscape, but throughput during peak hours may not be stable, which can make heavy tasks feel slow. Kimi is capable, but pricing and quota rules still need ongoing attention, especially whether backend quota is transparent. Models like MiniMax are friendlier in speed and allowance, which makes them suitable for light day-to-day tasks, batch jobs, and simpler coding help, though they may sit a tier lower on harder engineering reasoning. DeepSeek can be highly cost-effective when a new model is still in its promotional pricing period, but once that ends, you have to evaluate it again under normal pricing.

That is why domestic options are often better used as a model pool: different tasks use different models, instead of betting everything on one model and one plan.

Light users: choose what feels convenient and do not overbuild

If you only ask AI to tweak scripts, patch documentation, explain errors, or generate small tools once or twice a week, you probably do not need a complicated setup.

For this kind of user, convenience matters most. Cursor, Windsurf, Trae, CodeBuddy, Tongyi Lingma, GitHub Copilot, and similar tools are all worth trying. The goal is not the absolute lowest unit cost. The goal is low friction: something stable inside your editor, decent completions, and easy recovery when it makes a mistake.

Light users usually should not spend too much time building multi-layer API setups, relays, and proxy chains just to save a little money. The time cost, account risk, and debugging overhead are often more expensive than the subscription fee you save.

Mid-level users: focus on value, but also on portability

If you use AI every day for coding, project edits, test generation, and document work, then quota and actual consumption start to matter much more.

For this kind of user, it makes sense to separate the main tool from backup models. For example, one convenient IDE plan can handle daily editing, while a multi-tool API or aggregator plan can be used for longer-context and more complex agent tasks.

Three things matter most at this stage:

Whether it supports third-party tool integration.
Whether token or quota consumption is visible and understandable.
Whether overage means throttling, downgrade, shutdown, or pure usage-based billing.

If a plan looks cheap but can only be used inside its own tool, you need to count migration cost as part of the real price. If a plan costs more but can plug into multiple tools, it may be the better long-term choice.

Heavy users: do not lock models and tools together

For heavy users, flexibility is the core requirement.

When a person or team uses AI agents intensively every day, consumption grows very quickly. Repository search, long-context edits, multi-round debugging, and automated test repair can all multiply token use. Once you rely on a single plan, three problems show up easily:

The quota suddenly becomes too small.
The pricing rule suddenly changes.
A tool or model becomes temporarily unavailable.

A more stable approach is to prepare a layered setup: one primary agent tool, one or more replaceable model endpoints, one low-cost model for simple work, and one high-capability model for harder tasks. Small routine work should not always go to the most expensive model, and critical work should not rely only on the cheapest model either.

For heavy users, the ability for tools to connect to any model and for models to connect to any tool matters more than saving a few dozen dollars per month. The real expense is not the subscription itself. It is the cost of being locked into one ecosystem and having to rebuild your workflow later.

A more stable combination strategy

A relatively steady way to structure your setup looks like this:

Use a low-cost model for light tasks such as code explanations, small scripts, formatting, and simple documents.
Use a value-oriented model for mid-level tasks such as standard feature work, test completion, and refactor suggestions.
Use a stronger model for difficult tasks such as architecture changes, cross-file fixes, hard bugs, and long-context reasoning.
Keep the tool layer open by choosing tools that can connect to APIs, export configuration, and switch models.
Maintain a backup path so that when a main plan changes rules, you can switch quickly to another model or tool.

This may not be the absolute cheapest setup, but it is much more resilient. AI coding prices and quotas will keep changing. The thing worth investing in for the long term is a portable workflow, not a short-term deal that only looks unusually generous for a while.

Summary

AI coding plans should not be judged by monthly price alone. Light users should keep things simple and choose a convenient tool. Mid-level users should start paying attention to quota, consumption, and portability. Heavy users should decouple models from tools and avoid being trapped in one ecosystem.

The most useful thing to remember is that plans will change, models will change, and tools will change too. Keeping the choice in your own hands is the most important form of cost control in long-term AI coding work.

Codex vs Claude Code: How to Choose Between Two Subagent Designs

Fri, 08 May 2026 14:14:01 +0800

AI coding tools are paying more attention to subagents. This is not just feature chasing. A single agent eventually hits limits when it has to handle real engineering work.

If one agent reads code, checks logs, edits implementation, runs tests, analyzes failures, and summarizes results at the same time, the main context quickly becomes noisy. Search results, command output, test logs, and intermediate reasoning get mixed together. Later decisions become less reliable. Work also becomes hard to parallelize: exploration, implementation, verification, and review all sit on one main thread.

The purpose of subagents is to reduce that pressure. The main session stops doing everything from start to finish and becomes more like a coordinator: define goals, assign work, receive results, and merge them into the final answer. A subagent handles a local piece of work, such as exploration, implementation, verification, or review, and returns a compressed conclusion.

So a subagent is not “another copy of me.” It is a way to split tangled engineering work into clearer roles.

Shared Foundations

A mature subagent system usually needs four foundations:

Context isolation.
Role specialization.
Project and user-level configuration.
Tool and permission boundaries.

Context isolation comes first. Real repositories produce a lot of intermediate material: dozens of search hits, hundreds of test log lines, noisy command output. If all of that is poured into the main session, the main thread gets confused. A subagent can digest that local process and bring back only the signals that matter.

Role specialization is just as important. Multi-agent does not mean opening several identical models. Exploration roles should search, read, and summarize. Implementation roles should focus on local code changes. Verification roles should run checks, identify risks, and report clearly.

Tool and permission boundaries determine whether the system can be used safely. A subagent should not automatically inherit every capability of the main session. A read-only explorer does not need write access. A verifier may not need to change implementation. Background tasks and isolated worktrees need visible boundaries.

Codex and Claude Code share these concerns, but they take different routes.

Codex: Explicit Delegation

Codex’s subagent design feels restrained.

It gives you a controlled, lightweight delegation mechanism around the current main session. When to delegate, who receives the task, and when results are collected are all explicit decisions. The control flow stays in the current task.

Its traits are clear:

The main session explicitly delegates subwork.
The role set stays small.
The main session knows which agent is doing what.
Results return to the main line for final judgment.
Collaboration boundaries are transparent.

This works well for teams that care about manual orchestration, predictability, and execution determinism. You can ask an explorer to inspect a call chain, ask a worker to make a bounded change, then let the main session merge the result and decide whether to test further.

The tradeoff is that orchestration pressure still sits with the main session. The main thread must decide when to split work, how to split it, who should take it, and how to merge the result. For lightweight collaboration this is pleasant; for long-running engineering workflows it can become tiring.

Claude Code: Agents as Workstations

Claude Code takes a more platform-like route.

It treats agents as describable, selectable, configurable, memorable, isolated, and background-capable objects. A subagent is not just a helper in a conversation. It is closer to a workstation in an engineering system.

The system can expose agent lists, use cases, descriptions, and tool boundaries to the model, allowing the model to decide which role should handle a turn. That makes delegation more automatic.

Several capabilities define this direction.

First, a role system. Explorer, planner, general-purpose, and verifier roles can carry usage descriptions, tool restrictions, default models, and runtime conditions. A read-only explorer can be prevented from editing files. A planner can focus on architecture. A verifier can focus on checks.

Second, inheritance and overrides. A subagent is not completely free. It inherits the larger boundary of the main session by default, but can adjust local behavior within allowed rules. The main session defines the big boundary; the agent performs local assembly inside it.

Third, memory. Memory is not just “remember a few things.” It can have scope. User memory is like long-term preference. Project memory is repository background. Local memory is environment-specific state. This lets some agents avoid relearning the project from scratch.

Fourth, background work and worktree isolation. Some verification tasks can keep running in the background, while the main thread continues. When stronger isolation is needed, an agent can work in a separate worktree, keeping the project connected but the operation space separated.

Fifth, plugin ecosystem. If agents are first-class objects, you have to think about distribution, installation, priority, override rules, and safety. Plugin agents can enter the system, but high-risk fields such as permission mode, hooks, and MCP servers should remain guarded.

This makes Claude Code feel more like an agent runtime than a one-session collaboration tool.

The Difference

Codex is closer to a controlled delegation tool:

Explicit delegation.
Lightweight role set.
Clean control flow.
Subtasks centered on the current session.
Good for deterministic, human-orchestrated work.

Claude Code is closer to an engineering workstation system:

Agents are formally modeled.
Roles are more systematic.
Memory, background execution, isolation, and plugins are part of the runtime.
The model can help choose roles.
Good for long-term projects and platform-like workflows.

The real question is not which one has more features. It is whether you want a subagent to be “a helper I explicitly call” or “a long-lived workstation in the system.”

How to Choose

Choose the Codex style if you value explicit control, lightweight delegation, and safe parallelism inside the current session. It is good for code review, small changes, clearly scoped implementation tasks, and workflows where a human wants to keep the rhythm.

Choose the Claude Code style if you want systematic roles, long-term memory, background execution, worktree isolation, plugin extension, and a more complete agent runtime.

Ask two questions:

Are you comfortable with the model choosing who should do the work?
Do you need a fuller agent runtime?

If the first question makes you uncomfortable, explicit delegation is likely better. If the second answer is yes, a platform-like workstation system may fit better.

Practical Advice

Do not treat subagents as “more models means stronger.” Better practice is:

Give every role a clear task boundary.
Limit the tools each role can use.
Ask subagents to return conclusions, not raw logs.
Keep final decisions in the main session.
Make background tasks and worktree isolation visible.
Set clear safety boundaries for plugin agents.

The value of subagents is not quantity. It is clean division of labor, cleaner context, and more stable main-thread decisions.

Summary

Codex and Claude Code solve the same problem: one agent cannot comfortably carry all real engineering work. Both recognize the importance of context isolation, role specialization, permissions, and local summarization.

Codex is more restrained, emphasizing explicit delegation and main-session control. Claude Code is more systematic, treating agents as configurable, memorable, isolated, background-capable workstations that can also enter a plugin ecosystem.

The choice is not which brand wins. It is whether your workflow needs a controlled collaboration tool or a full agent runtime.

9Router: Connect Claude Code, Codex, and Cursor to One AI Router

Fri, 08 May 2026 13:41:15 +0800

9Router is a local router for AI coding tools. It lets Claude Code, Codex, Cursor, Cline, Copilot, OpenCode, OpenClaw, and similar tools connect to one OpenAI-compatible endpoint, then routes requests to different models and providers.

It is not trying to be another chat client. It sits between your AI coding tools and model providers, solving a few practical problems: incompatible API formats, manual provider switching, fast token burn from tool output, interrupted work when quotas run out, and messy multi-account configuration.

According to the project README, 9Router supports 40+ providers and 100+ models. It includes RTK Token Saver, automatic fallback, quota tracking, multi-account rotation, format translation, and request logging. The project is written in JavaScript and uses Node.js, Next.js, React, Tailwind CSS, and LowDB. It is licensed under MIT.

What It Is Good For

9Router is most useful when you use multiple AI coding tools and multiple model sources at the same time.

Examples:

Claude Code uses a subscription account.
Codex or Cursor needs a custom OpenAI endpoint.
Cline, Continue, or RooCode needs an OpenAI-compatible API.
Free providers are used for experiments.
GLM, MiniMax, or Kimi is used as a cheaper backup.
High-quality models are reserved for difficult tasks.

Without 9Router, these settings are scattered across many tools. Each tool needs its own endpoint, API key, model name, and fallback plan. 9Router centralizes that into one local routing layer.

Default local API:

`1`	`http://localhost:20128/v1`

Dashboard:

`1`	`http://localhost:20128/dashboard`

Quick Install

For local use, npm is the simplest path:

1
2

npm install -g 9router
9router

The dashboard opens locally, and the README uses 20128 as the default port.

Run from source:

git clone https://github.com/decolua/9router.git
cd 9router
cp .env.example .env
npm install
PORT=20128 NEXT_PUBLIC_BASE_URL=http://localhost:20128 npm run dev

Production mode:

1
2

npm run build
PORT=20128 HOSTNAME=0.0.0.0 NEXT_PUBLIC_BASE_URL=http://localhost:20128 npm run start

The npm package requires Node.js >=18.0.0. For VPS or Docker deployment, configure JWT_SECRET, INITIAL_PASSWORD, DATA_DIR, and API_KEY_SECRET instead of exposing defaults.

Connect Coding Tools

9Router exposes an OpenAI-compatible API, so most tools that support custom OpenAI endpoints can connect to it.

Typical configuration:

1
2
3

Base URL: http://localhost:20128/v1
API Key: copied from the 9Router dashboard
Model: a model name or combo name configured in 9Router

For Codex CLI:

export OPENAI_BASE_URL="http://localhost:20128"
export OPENAI_API_KEY="your-9router-api-key"

codex "your prompt"

For Cline, Continue, or RooCode, choose OpenAI Compatible and set:

1
2
3

Base URL: http://localhost:20128/v1
API Key: your-9router-api-key
Model: cc/claude-opus-4-7

Model names depend on connected providers. The README shows prefixes such as cc/, cx/, gh/, glm/, minimax/, kr/, and vertex/.

RTK Token Saver

AI coding tools often burn tokens fastest on tool outputs:

git diff
git status
grep
find
ls
tree
logs
long file lists

9Router includes RTK Token Saver, which compresses these outputs before they are sent to the model. The project says this can save 20%-40% input tokens in many requests.

The value is that you do not need to change tools or models. The routing layer removes waste before the request reaches the provider. Still, for critical logs or complete file content, test the behavior first and make sure answer quality does not drop.

Automatic Fallback

9Router can arrange models in priority order:

1
2
3

1. Subscription model
2. Cheap API
3. Free provider

When the first tier is rate-limited, out of quota, or failing, it can switch to the next one. This reduces manual switching and keeps coding sessions from stopping suddenly.

Example:

1
2
3

1. cc/claude-opus-4-7
2. glm/glm-5.1
3. kr/claude-sonnet-4.5

Fallback changes output consistency. Different models have different style and reasoning quality. For large refactors, protocols, migrations, or other consistency-sensitive work, prefer a fixed model and switch manually only when needed.

Be Careful with Free Providers

The README highlights Kiro, OpenCode Free, Vertex, and also notes that some old free tiers have changed or are no longer recommended.

Always confirm provider policy at the time of use:

Is it really free?
Is it region-limited?
Is third-party tool access allowed?
Can it trigger bans or rate limits?
Does the free quota expire?

9Router manages routing, not upstream terms. Be especially careful when using personal subscriptions, OAuth tokens, or free quotas with automated tools.

Local Deployment Advice

For personal use, bind to localhost. Local tools can reach it, but the internet cannot.

For VPS or LAN deployment:

Change the default login password.
Set a strong JWT_SECRET.
Set API_KEY_SECRET.
Put authentication in front of the dashboard.
Do not expose the dashboard directly to the public internet.
Require Bearer API keys for /v1/*.
Back up DATA_DIR.

Docker example:

docker run -d \
  --name 9router \
  -p 20128:20128 \
  --env-file ./.env \
  -v 9router-data:/app/data \
  -v 9router-usage:/root/.9router \
  9router

Start locally first, verify providers, combos, logs, and model names, then decide whether server deployment is worth it.

Who Should Use It

9Router is a good fit if you use multiple AI coding tools, multiple providers, subscription plus free or cheap tiers, and want a central fallback policy. It is less useful if you only use one model and one tool.

Its real value is turning scattered model access into a configurable local routing layer.

Summary

9Router is a local gateway for AI coding tools. It lets Claude Code, Codex, Cursor, Cline, and similar tools talk to http://localhost:20128/v1, while it handles model selection, format translation, token compression, quota tracking, and fallback.

It is best for heavy AI coding users who already switch between providers. Start with one tool and one provider, then add accounts and combos gradually.

References

goose: An Open Source AI Agent with Desktop, CLI, and API

Fri, 08 May 2026 13:41:15 +0800

goose is an open source AI agent that runs on your own machine. It is not limited to code completion; it aims to cover code, research, writing, automation, data analysis, and other tasks. The README positions it as a desktop app, CLI, and API that can serve both normal users and custom workflows.

The project has moved from block/goose to the Agentic AI Foundation (AAIF) at the Linux Foundation. The current repository is:

`1`	`https://github.com/aaif-goose/goose`

goose is mainly written in Rust and TypeScript and uses the Apache-2.0 license. Its GitHub description says it is an open source, extensible AI agent that goes beyond code suggestions and can install, execute, edit, and test with any LLM.

What Problem It Solves

Many AI coding tools focus on suggestions or local code edits. goose takes a broader view: let an AI agent complete tasks directly on your machine.

It can be used for:

Code changes and tests.
Local automation.
Research and writing.
Data analysis.
Multi-step workflows.
Embedding through an API.
Tool extension through MCP.

If you only need IDE completion, a Copilot-style tool may be enough. goose is more useful when you want AI inside the local task execution chain.

Desktop, CLI, and API

goose has three entry points.

The desktop app supports macOS, Linux, and Windows. It is good for users who prefer a visual interface.

The CLI fits terminal workflows and local development automation.

The API lets other systems or internal tools embed goose as an agent runtime.

Personal users can start with the desktop app or CLI. Teams and workflow builders should also look at the API and custom distribution support.

Installation

The README recommends downloading the desktop app:

`1`	`https://goose-docs.ai/docs/getting-started/installation`

CLI install:

`1`	`curl -fsSL https://github.com/aaif-goose/goose/releases/download/stable/download_cli.sh \| bash`

GitHub Releases provide builds for multiple platforms. The latest release checked here was v1.33.1, published on 2026-04-29, with macOS, Linux, Windows, deb, rpm, and Flatpak assets.

After installation, configure a provider from the official quickstart and test in a low-risk directory first. goose can execute local tasks, so avoid giving it broad permissions in a production repository from the start.

Providers

goose supports 15+ providers, including:

Anthropic
OpenAI
Google
Ollama
OpenRouter
Azure
Bedrock
other cloud or OpenAI-compatible providers

It can use API keys, and it can also use existing Claude, ChatGPT, or Gemini subscriptions through ACP.

ACP is important because many users already pay for subscriptions, but different tools cannot easily reuse them. goose uses ACP providers to bring those subscriptions into an agent workflow.

Provider policies change quickly. Check whether the access method is allowed, whether there are quotas, and whether it is suitable for company code or sensitive data.

MCP Extensions

goose supports Model Context Protocol extensions. The README mentions 70+ extensions.

MCP matters because an agent should not only chat and edit files. Through standard protocol servers, it can connect to documentation, databases, browsers, internal systems, search services, design tools, or project management tools.

For teams, MCP can become a safer integration layer: expose internal capabilities through explicit interfaces instead of letting the model touch every system directly.

Difference from a Coding Assistant

goose is not just a code completion tool. It is closer to a local agent runtime.

Common coding assistants focus on:

Code completion.
Code explanation.
Function generation.
Local editor edits.

goose emphasizes:

Local task execution.
Multi-step workflows.
Switchable providers.
Extensions.
Desktop and CLI.
Embeddable API.
Non-code tasks too.

This also means more complexity. You must think about model configuration, permissions, extensions, workspace scope, logs, and credentials.

Custom Distributions

The repository includes CUSTOM_DISTROS.md, which explains how to build a custom goose distribution with preconfigured providers, extensions, and branding.

This is useful for teams:

Preconfigure allowed model providers.
Connect internal MCP servers.
Set safety policies and logging.
Block disallowed external services.
Apply company branding and onboarding.

Members do not need to configure everything from scratch, and the risk of wrong provider or key setup is reduced.

Suggested Use

Start gradually:

Install the desktop app or CLI.
Configure one known-good provider.
Run simple tasks in a test directory.
Observe what it reads and executes.
Add MCP extensions.
Try larger repositories later.

Keep a few habits:

Commit important changes before agent work.
Do not store API keys in project files.
Use high-permission modes only in trusted workspaces.
Review company data and provider policy first.
Keep human review for automation results.

Who Should Use It

goose is a good fit if you want a desktop and CLI AI agent, multiple model providers, MCP integration, API embedding, or custom team distributions. It may be heavy if all you need is IDE code completion.

Summary

goose is an open source AI agent under AAIF/Linux Foundation. It provides desktop, CLI, and API entry points, supports 15+ providers, ACP subscription access, and 70+ MCP extensions.

Its value is not only writing code, but placing models, tools, extensions, and local execution into one agent framework. Start small, define permission and data boundaries, then expand usage.

References

How to Change the VS Code Display Language: Chinese, English, and More

Fri, 08 May 2026 13:18:57 +0800

VS Code supports many display languages. The usual approach is to install the matching language pack, then choose the display language from the Command Palette. If you want to pin VS Code to a specific language, you can also edit the locale value in argv.json.

This method works not only for Simplified Chinese, but also for English, Traditional Chinese, Japanese, Korean, French, German, Spanish, and other languages.

Install the Language Pack

If you want to switch to a non-English interface, you usually need to install a language pack first.

Open the Extensions panel in the left sidebar, or press Ctrl+Shift+X.
Search for the target language, such as Chinese, Japanese, Korean, or French.
Select the matching language pack and click Install.
Restart VS Code when prompted.

For Simplified Chinese, the common language pack is Chinese (Simplified). For Traditional Chinese, use Chinese (Traditional).

Change the Language from the Command Palette

This is the recommended method for most users.

Open the Command Palette with Ctrl+Shift+P.
Type Configure Display Language.
Select the Configure Display Language command.
Choose the language you want from the list.
Restart VS Code when prompted.

After restarting, menus, settings pages, and common prompts will use the selected language. If the target language is not listed, install its language pack from the Extensions panel first.

Set the Language Manually in argv.json

If switching from the Command Palette does not work, or if you want to explicitly lock VS Code to a language, you can edit the runtime arguments file directly.

Open the Command Palette with Ctrl+Shift+P.
Type and select Preferences: Configure Runtime Arguments.
Find or add the locale setting.
Change its value to the target language code.
Save the file and restart VS Code.

For example, switch to English:

1
2
3

{
  "locale": "en"
}

Switch to Simplified Chinese:

1
2
3

{
  "locale": "zh-cn"
}

Switch to Japanese:

1
2
3

{
  "locale": "ja"
}

argv.json is a JSON file, so pay attention to commas and quotation marks. If the configuration is invalid, VS Code may not read the language setting correctly.

Common Display Language Codes

Display language	locale
English (US)	`en`
Simplified Chinese	`zh-cn`
Traditional Chinese	`zh-tw`
French	`fr`
German	`de`
Italian	`it`
Spanish	`es`
Japanese	`ja`
Korean	`ko`
Russian	`ru`
Portuguese (Brazil)	`pt-br`
Turkish	`tr`
Bulgarian	`bg`
Hungarian	`hu`

What to Do If the Language Does Not Change

Check the following items in order:

Confirm that the target language pack is installed.
Confirm that locale uses the correct language code. For example, Simplified Chinese is zh-cn, not zh-CN.
Fully close and reopen VS Code after changing the language.
If you edited argv.json manually, check that the JSON syntax is valid.
If the configuration is messy, remove the locale entry and choose the language again through Configure Display Language.

In most cases, Configure Display Language is the simplest option. Edit argv.json only when you need to force a specific language or the Command Palette switch does not take effect.

References

VScode: Change the VS Code display language to Simplified Chinese and switch the VS Code display language

24 Claude Code Tips: Plan Mode, Rewind, CLAUDE.md, Skills, Agents, and Plugins

Fri, 08 May 2026 08:54:14 +0800

Claude Code is not just a chat box. It is closer to a coding Agent that can enter a project directory, read and write files, run commands, and maintain context.

If you only throw a requirement at it and wait for code, problems appear quickly: unclear plans, repeated permission prompts, growing context, unsatisfactory output, no clear rollback path, and no persistent place for project rules.

Here is a set of common operations for developers getting started with Claude Code.

Start Inside the Project Directory

Claude Code works best when launched inside the project directory, not from a random terminal location.

Create a folder as the project directory, enter it, open a command line, and start Claude Code:

claude

When first entering a project, if Claude Code asks whether to trust the current folder, confirm before continuing. This lets it read files, create files, and run later operations around the current project.

A simple practice task is to ask it to create a photographer portfolio website. The task is visual enough to inspect, and it also lets you practice file generation, command execution, rewind, and later refactoring.

Use Plan Mode First

For more complex tasks, Claude Code may enter plan mode. Plan mode is meant to discuss requirements and break down steps before you approve execution.

After it writes a plan, you usually see options like:

Approve the plan and automatically allow future edit tools.
Approve the plan, but require manual approval for later edits.
Pause and continue discussing the plan with Claude Code.

If the task is clear, approve and continue. If it is not clear yet, ask it to refine the plan, such as page style, tech stack, directory structure, interactions, and acceptance criteria.

Plan mode reduces rework. If an Agent starts directly, it may quickly generate many files; if the direction is wrong, later changes can get messy.

Switch Modes With Shift + Tab

In Claude Code, Shift + Tab can switch between working modes. A common use is entering plan mode or switching into an auto-approve-edit mode.

Suggested habits:

New projects, new features, major changes: start in plan mode.
Small edits and clear fixes: execute directly.
Deletion, bulk replacement, dependency installation: keep manual approval.

In plan mode, Claude Code may ask project-detail questions. Use arrow keys to choose and Enter to confirm. After submitting feedback, it updates the plan.

Do Not Open All Permissions Blindly

When Claude Code runs commands, edits files, or starts programs, it may request permission.

Common choices include:

Allow only this time.
Allow this command type for the current session.
Reject or pause.

For local preview, dev server startup, or file inspection, approve as needed. But do not permanently use a mode that auto-approves all permissions just to save clicks.

Full automation is only suitable when the task is low-risk, clearly understood, and the project already has Git backups. For daily use, keep human approval for deletion, overwriting folders, dependency installation, networking, commits, and scripts.

Run Local Commands in Terminal Mode

Claude Code can enter a terminal-command mode to run local commands.

For example, after generating a page, you can open an HTML file with:

`1`	`start index.html`

start is a Windows command for opening a file, followed by the filename. This is faster than finding the file manually.

Terminal mode is useful for:

Opening generated pages.
Listing directory contents.
Starting local development servers.
Running tests or builds.

Still, be careful with high-risk commands such as recursive deletion, moving directories, bulk overwrites, and system environment changes.

Rewind When the Result Goes Wrong

If the page or code produced by Claude Code is not what you want, and each correction makes it worse, rewind early.

Rewind can return code or conversation to a previous point. Common options include:

Rewind both code and conversation.
Rewind only conversation.
Rewind only code.
Compress earlier content into a summary.
Cancel.

When the direction is clearly wrong, it is usually better to rewind both code and conversation. That returns context and files to a cleaner state together.

Note that Claude Code rewind usually only covers files it created or changed through built-in tools. Files created through external commands may not be fully rewindable. Important projects should still use Git.

Write Long Prompts in an Editor

Do not squeeze complex requirements into one input line.

If the system supports editing a long prompt in a text editor, open the editor, write the requirement clearly, save it, and then send it to Claude Code.

Long prompts should include:

The goal.
The tech stack.
What not to do.
Which files must be kept.
How to verify completion.
Page or feature acceptance criteria.

For example, if you want Claude Code to refactor a plain HTML page into a more modern stack, do not just say “refactor it.” Explain component structure, visual preservation, responsive layout, and ask it to run a build check.

Restore Sessions After Exit

If you need to quit Claude Code midway, exit normally. Later, return to the same project directory and start again:

claude

If previous records do not appear directly, use history-related commands to view and load recent sessions.

This is useful for continuing interrupted work. But do not treat session history as the only memory. Project rules, tech stack, common commands, and notes should live in project files.

Use CLAUDE.md for Project Rules

CLAUDE.md is an important memory file for Claude Code. It usually sits at the project root and tells Claude Code project rules, tech stack, directory structure, and collaboration constraints.

You can ask Claude Code to initialize it:

/init

CLAUDE.md is good for:

Project goals.
Tech stack.
Common start, test, and build commands.
Directory notes.
Code style.
Forbidden actions.
Commit and deployment rules.

During each conversation, Claude Code can use these rules as part of the context. Think of it as a project manual.

A simple test is to add a clear rule into CLAUDE.md, then ask Claude Code something. If its answer follows the rule, it has read the project memory.

Reference Files With @

Typing @ in the input box lets you select files or Agents and add them to the current context.

This is useful when you want Claude Code to:

Read a config file.
Modify a specific page.
Continue based on CLAUDE.md or another document.
Only inspect a specific file instead of guessing the whole project.

Compared with copying file contents into the input box, @ references are clearer and less error-prone.

View and Compress Context

After a long conversation, context grows. When it gets too long, the model may slow down or start ignoring earlier details.

Use:

`1`	`/context`

If context is long, compress history:

`1`	`/compact`

If the result is still poor, consider clearing the current context:

/clear

After clearing, Claude Code can still understand part of the project through files, CLAUDE.md, and the current directory, but it will not keep the full conversation history.

A practical habit: start a new chat after a task is done, write project rules into CLAUDE.md, and do not let temporary discussion grow forever in one chat.

Skills: Turn Repeated Work Into Instructions

Skills are reusable task instructions for Claude Code. They are not one-off prompts, but packaged workflows.

For example, if you often generate weekly reports, create a weekly-report Skill that defines:

Required input.
Output format.
Tone and structure.
What must be preserved.
What must not be invented.

Skills usually contain name, description, and detailed instructions. Once installed in the global Skills directory, Claude Code can recognize and load them for related tasks.

Good Skill candidates include:

Weekly reports.
Code review templates.
Document cleanup.
Image batch processing.
Fixed-format articles.
Project initialization flows.

If you repeatedly copy the same prompt, consider turning it into a Skill.

Agents: Delegate Subtasks to Independent Helpers

Agents are different from Skills.

A Skill is more like an instruction manual. An Agent is more like an independent helper that can work outside the main conversation and return results.

The value of Agents is context isolation. For code inspection, you can create a read-only Agent that only reads the project and outputs a report, without modifying files. This avoids polluting the main conversation and lowers risk.

When creating an Agent, consider:

Project-level or user-level Agent.
Whether Claude Code should generate the config.
Which tools are allowed.
Which model to use.
Whether memory should be saved.
Whether the Agent prompt is clear enough.

For code-audit Agents, give read-only permissions first. Let it output a report, then decide in the main conversation whether to change code.

Plugins: Package Skills, Agents, MCP, and Hooks

Plugins are more complete capability packages. They may include:

Skills
Agents
MCP
Hooks

Compared with installing one Skill, a plugin is better for a full capability set. For example, a frontend design plugin may package visual rules, layout habits, component preferences, and related Agents together.

When installing a plugin, you may choose:

Install to the user directory, effective for all projects.
Install to the project directory, shareable with the project.
Install to a local project directory, effective only on your computer.

Use the user directory for personal common capabilities, the project directory for team conventions, and local project install for temporary testing.

Plugins Can Improve Specific Tasks

For frontend page generation, plugins can be more stable than raw prompts.

For example, for “make a photographer portfolio website,” a plain prompt may generate an acceptable page. If you explicitly use a frontend design plugin, the structure, visual hierarchy, spacing, colors, and overall finish are often better.

This does not mean plugins replace human taste. A better workflow is to let the plugin generate a stronger first draft, then refine details manually.

A More Stable Claude Code Workflow

Putting these tips together gives a steadier workflow:

Start claude inside the project directory.
Discuss requirements in plan mode first.
Confirm tech stack and acceptance criteria before approving the plan.
Keep manual approval for high-risk actions.
Use terminal mode for local preview and tests.
Rewind early when the result goes off track.
Write project rules into CLAUDE.md.
Check and compress context during long chats.
Turn repeated workflows into Skills.
Delegate inspection, research, and analysis to read-only Agents.
Use plugins for domain-specific tasks.
Always keep Git checkpoints for important projects.

This is much more stable than simply sending one requirement and waiting for generation.

Summary

Claude Code efficiency does not come only from model capability. It also comes from workflow control.

Plan mode sets direction, permission approval controls risk, rewind reduces rework, CLAUDE.md stores project rules, /context, /compact, and /clear manage context, Skills reuse fixed workflows, Agents isolate complex subtasks, and plugins package complete capabilities.

The best way to use Claude Code is to let it move tasks forward inside clear boundaries, not to hand the entire project to it at once.

opencode, Claude Code, and Codex: What's the Difference? A Guide to Open Source AI Coding Tools

Fri, 08 May 2026 08:33:37 +0800

opencode is an open source AI Coding Agent from anomalyco. Its positioning is straightforward: give developers a programmable, extensible coding assistant in the terminal that can connect to multiple model providers.

If you compare it with Claude Code and Codex, all three solve the same broad problem: bringing AI into real codebases so it can understand context, edit files, run commands, and execute tests. But their product directions are different.

opencode emphasizes open source, multi-model support, and a terminal TUI. Claude Code emphasizes Anthropic’s model ecosystem and local engineering collaboration. Codex is OpenAI’s AI coding agent, available through the terminal, IDEs, the Codex app, and cloud tasks.

Who opencode Is For

opencode is a better fit for these kinds of developers:

People who want to complete code changes, project analysis, and engineering tasks in the terminal.
People who do not want their AI Coding Agent tied to a single model provider.
People who prefer open source tools and want to audit, extend, or build on top of them.
People already comfortable with Neovim, TUIs, and command-line workflows.
People who want to eventually drive the same coding agent remotely through a desktop app, mobile app, or other clients.

Its point is not to create another chat window, but to put AI coding capability inside the terminal and project directories developers already use.

Installation

The official README provides several installation methods.

# Direct install
curl -fsSL https://opencode.ai/install | bash

# npm
npm i -g opencode-ai@latest

# Windows
scoop install opencode
choco install opencode

# macOS and Linux
brew install anomalyco/tap/opencode
brew install opencode

# Arch Linux
sudo pacman -S opencode
paru -S opencode-bin

# Other methods
mise use -g opencode
nix run nixpkgs#opencode

The official README also recommends removing versions older than 0.1.x before installing to avoid problems caused by older remnants.

The installation script chooses the installation directory by priority:

$OPENCODE_INSTALL_DIR
$XDG_BIN_DIR
$HOME/bin
$HOME/.opencode/bin

If you need to specify a path, use:

1
2

OPENCODE_INSTALL_DIR=/usr/local/bin curl -fsSL https://opencode.ai/install | bash
XDG_BIN_DIR=$HOME/.local/bin curl -fsSL https://opencode.ai/install | bash

The Desktop App Is Still Beta

In addition to the command-line tool, opencode also provides a desktop app, currently marked as Beta. It can be downloaded from GitHub Releases or opencode.ai/download.

The desktop app covers these platforms:

Platform	File
macOS Apple Silicon	`opencode-desktop-mac-arm64.dmg`
macOS Intel	`opencode-desktop-mac-x64.dmg`
Windows	`opencode-desktop-windows-x64.exe`
Linux	`.deb`, `.rpm`, or `.AppImage`

macOS and Windows users can also install the desktop app through package managers.

# macOS
brew install --cask opencode-desktop

# Windows
scoop bucket add extras
scoop install extras/opencode-desktop

Two Built-In Agent Modes

opencode includes two built-in Agents, switchable with the Tab key.

build is the default mode. It has full development permissions and is suitable for editing code directly, running commands, and moving engineering tasks forward.

plan is read-only mode. It is better for analyzing unfamiliar codebases, understanding project structure, and planning changes. It denies file edits by default and asks before running bash commands.

opencode also includes a general subagent for complex searches and multi-step tasks. Users can invoke it by typing @general in a message.

This design is practical: use plan to understand the project before acting, then switch to build when code needs to change. For large repositories, separating read and write permissions helps reduce mistakes.

What Is Codex?

Codex is OpenAI’s AI coding agent for helping developers write code, review code, fix bugs, and ship engineering tasks.

Unlike a simple code completion tool, Codex is closer to an Agent that can operate on a codebase. It can pair with you in local tools, and it can also take delegated tasks in the cloud. OpenAI’s official materials describe Codex as available through multiple surfaces, including CLI, IDEs, the Codex app, and ChatGPT/Codex cloud workflows.

For developers, Codex has several important traits:

It can read codebases, edit files, run commands, and execute tests.
It supports multiple interfaces, including terminal, IDE, app, and cloud.
It fits bug fixing, feature work, refactoring, migrations, code review, and test generation.
It is more closely tied to OpenAI accounts, models, and the Codex product ecosystem.
Cloud tasks are useful for running multiple well-scoped engineering tasks in parallel.

If opencode is more like an open terminal agent framework, Codex is more like a full AI coding workbench from OpenAI: local pairing, cloud delegation, and longer engineering workflows for teams.

Core Differences

opencode, Claude Code, and Codex are all AI coding tools, but the choice becomes clearer if you look at these dimensions.

Tool	Core Positioning	Main Advantages	Best Fit
`opencode`	Open source AI Coding Agent	Open source, multi-model, TUI, client/server architecture	Developers who want an open toolchain, replaceable models, and a terminal-first workflow
`Claude Code`	Anthropic’s command-line coding tool	Claude model experience, code understanding, long context, engineering task collaboration	Developers already using the Claude/Anthropic ecosystem who want to work on local code tasks
`Codex`	OpenAI’s AI coding agent	CLI, IDE, Codex app, cloud tasks, multi-Agent workflows	Teams already using ChatGPT/OpenAI who want both local pairing and cloud delegation

In short, opencode is about openness and replaceability, Claude Code is about the Claude ecosystem and local engineering agents, and Codex is about the OpenAI ecosystem and multi-surface collaboration.

How It Differs From Claude Code

opencode’s official FAQ directly compares it with Claude Code. The two are similar in capability, but the main differences are these.

First, opencode is a 100% open source project, hosted on GitHub and released under the MIT license.

Second, opencode is not tied to a single model provider. It recommends models provided through OpenCode Zen, but it can also work with Claude, OpenAI, Google, or local models. For developers, this means that when model cost, capability, or availability changes, you are not locked into one platform.

Third, opencode includes optional LSP support. For code completion, navigation, diagnostics, and project understanding, LSP is a very important foundation.

Fourth, opencode emphasizes TUI. It is built by Neovim users and the creators of terminal.shop, so the product focus is clearly on the terminal experience.

Fifth, opencode uses a client/server architecture. That means opencode can run on your computer while being controlled in the future by a TUI, desktop app, mobile app, or other clients. The TUI is only one possible frontend.

When to Choose opencode, Claude Code, or Codex

If you already use Claude Code or Codex, opencode does not have to replace them immediately. A better way to think about it is that opencode provides an open, model-replaceable, terminal-first option.

Consider opencode first when:

You want your AI coding tool to be as open source as possible.
You do not want your workflow tied to one model provider.
You want to test Claude, OpenAI, Google, or local models with the same tool.
You like TUI workflows and do not want a desktop or web app to interrupt your main workflow.
You care about the remote-control potential of a client/server architecture.

Consider Claude Code first when:

You mainly use Claude models.
You care about long context, code understanding, and complex engineering task collaboration.
You want to keep moving edits, tests, and refactors forward in a local repository.
You trust Anthropic’s default Claude Code product experience.

Consider Codex first when:

You already use ChatGPT or the OpenAI account ecosystem.
You want one coding agent across terminal, IDE, desktop app, and cloud tasks.
You want to delegate well-scoped bug fixes, feature work, migrations, or test generation to the cloud in parallel.
You need code review, background tasks, team collaboration, and multi-Agent workflows.

If you care more about an official end-to-end experience, default model configuration, enterprise management, and ready-made integrations, Claude Code or Codex may be easier. If you care more about control, openness, and being provider-agnostic, opencode is worth watching.

Things to Note

opencode, Claude Code, and Codex are all moving quickly. GitHub releases, installation commands, desktop app file names, model availability, and plan access can all change. Before installing or choosing a tool, check the official README, documentation, and release pages.

Also, opencode’s desktop app is still marked as Beta, so it should not be treated as the default stable production tool. For everyday engineering tasks, the terminal version is still the main entry point.

From a tooling trend perspective, opencode represents the open-toolchain direction for AI Coding Agents: replaceable models, replaceable clients, and an open core agent capability. Codex and Claude Code are closer to model companies turning coding agents into complete product surfaces. For developers, both directions will likely coexist for a long time.

References

opencode GitHub: https://github.com/anomalyco/opencode
opencode official site: https://opencode.ai
opencode docs: https://opencode.ai/docs
opencode Releases: https://github.com/anomalyco/opencode/releases
OpenAI Codex: https://openai.com/codex/
Using Codex with your ChatGPT plan: https://help.openai.com/en/articles/11369540-codex-in-chatgpt
OpenAI Codex CLI Getting Started: https://help.openai.com/en/articles/11096431-openai-codex-ci-getting-started

uv Installation Guide: Choosing Between macOS, Linux, Windows, pipx, Homebrew, and WinGet

Thu, 07 May 2026 23:23:58 +0800

uv is a Python toolchain manager from Astral. It can manage Python versions, virtual environments, dependencies, scripts, projects, and tools. There are many ways to install it. The official documentation provides standalone installer scripts and also supports PyPI, Homebrew, WinGet, Scoop, Docker, GitHub Releases, and Cargo.

If you just want a quick installation, use the official standalone installer first. If you prefer your system package manager to maintain versions, use Homebrew, WinGet, or Scoop. If you already like installing Python tools in isolated environments, use pipx.

Quick Choice

Scenario	Recommended method	Command
Quick install on macOS / Linux	Official standalone installer	`curl -LsSf https://astral.sh/uv/install.sh \| sh`
macOS / Linux without curl	Official script + wget	`wget -qO- https://astral.sh/uv/install.sh \| sh`
Quick install on Windows	PowerShell installer	`powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 \| iex"`
Isolated Python tool install	pipx	`pipx install uv`
Temporary or traditional Python install	pip	`pip install uv`
macOS package management	Homebrew	`brew install uv`
macOS MacPorts users	MacPorts	`sudo port install uv`
Windows package management	WinGet	`winget install --id=astral-sh.uv -e`
Windows Scoop users	Scoop	`scoop install main/uv`
Rust users	Cargo	`cargo install --locked uv`

The generally recommended options are:

macOS / Linux: official standalone installer;
Windows: official PowerShell installer or WinGet;
If you already manage Python CLI tools with pipx: pipx install uv.

macOS and Linux: Official Installer

The most direct official method is to download the script with curl and run it with sh:

`1`	`curl -LsSf https://astral.sh/uv/install.sh \| sh`

If the system does not have curl, use wget:

`1`	`wget -qO- https://astral.sh/uv/install.sh \| sh`

To install a specific version, put the version number in the URL. For example, the official example uses 0.11.11:

`1`	`curl -LsSf https://astral.sh/uv/0.11.11/install.sh \| sh`

This method fits most personal development environments. It is simple, cross-platform, and works best with uv’s official update mechanism.

The installer puts binaries such as uv and uvx under the user directory, and may modify the shell profile so the commands can be used directly in the terminal. If you do not want the installer to modify PATH, check the official installer options, such as setting UV_NO_MODIFY_PATH=1.

Windows: PowerShell Installer

The official Windows method is to run the installer script with PowerShell:

`1`	`powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 \| iex"`

To install a specific version, also put the version number in the URL:

`1`	`powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/0.11.11/install.ps1 \| iex"`

ExecutionPolicy ByPass allows PowerShell to fetch and run the installer script from the internet. As a safer habit, you can inspect the script before running it:

`1`	`powershell -c "irm https://astral.sh/uv/install.ps1 \| more"`

If you prefer Windows package managers, WinGet or Scoop may be a better first choice.

Installing with pipx

The official documentation notes that uv is published to PyPI. If you install from PyPI, it is recommended to put it in an isolated environment, for example with pipx:

`1`	`pipx install uv`

This is suitable if you already use pipx as your Python CLI tool manager. It avoids mixing uv with your current project environment.

If you do not have pipx, you can also use pip directly:

`1`	`pip install uv`

Note that uv provides prebuilt wheels on many platforms. If a platform does not have a matching wheel, it will build from source, which requires a Rust toolchain.

My suggestion: on a personal machine, pipx install uv is cleaner than pip install uv; inside a project environment, do not install uv as a project dependency.

Homebrew, MacPorts, WinGet, and Scoop

If you prefer system package managers, uv also supports common channels.

Use Homebrew on macOS:

`1`	`brew install uv`

MacPorts users can use:

`1`	`sudo port install uv`

Use WinGet on Windows:

`1`	`winget install --id=astral-sh.uv -e`

Scoop users can use:

`1`	`scoop install main/uv`

The benefit of these methods is that version maintenance is delegated to the system package manager. The downside is that update timing depends on the corresponding package source, not the uv official installer.

Docker, GitHub Releases, and Cargo

uv also provides Docker images on GitHub Container Registry:

`1`	`ghcr.io/astral-sh/uv`

This is suitable for CI, Dockerfiles, image builds, and temporary runtime environments. In real usage, read the official Docker integration documentation as well.

If you want to download binaries manually, use GitHub Releases. Each release page usually includes binaries for supported platforms and explains how to invoke the standalone installer with a GitHub URL.

Rust users can also install from crates.io:

`1`	`cargo install --locked uv`

This builds from source and requires a compatible Rust toolchain. Unless you specifically want to install from the Rust ecosystem, ordinary users do not need to choose Cargo first.

Upgrading uv

If uv was installed with the official standalone installer, use the self-update command:

`1`	`uv self update`

The official documentation notes that updating uv reruns the installer and may modify the shell profile. If you do not want the update to modify PATH, set:

`1`	`UV_NO_MODIFY_PATH=1`

If you installed uv another way, use the corresponding package manager. For example, if you installed it with pip, use:

`1`	`pip install --upgrade uv`

Homebrew, WinGet, Scoop, and MacPorts should also use their own upgrade commands.

Enabling Shell Completion

uv supports shell completion. The official documentation recommends checking your current shell first:

`1`	`echo $SHELL`

Bash:

`1`	`echo 'eval "$(uv generate-shell-completion bash)"' >> ~/.bashrc`

Zsh:

`1`	`echo 'eval "$(uv generate-shell-completion zsh)"' >> ~/.zshrc`

fish:

`1`	`echo 'uv generate-shell-completion fish \| source' > ~/.config/fish/completions/uv.fish`

PowerShell:

if (!(Test-Path -Path $PROFILE)) {
  New-Item -ItemType File -Path $PROFILE -Force
}
Add-Content -Path $PROFILE -Value '(& uv generate-shell-completion powershell) | Out-String | Invoke-Expression'

If you often use uvx, you can enable completion for uvx separately.

Bash:

`1`	`echo 'eval "$(uvx --generate-shell-completion bash)"' >> ~/.bashrc`

Zsh:

`1`	`echo 'eval "$(uvx --generate-shell-completion zsh)"' >> ~/.zshrc`

fish:

`1`	`echo 'uvx --generate-shell-completion fish \| source' > ~/.config/fish/completions/uvx.fish`

PowerShell:

if (!(Test-Path -Path $PROFILE)) {
  New-Item -ItemType File -Path $PROFILE -Force
}
Add-Content -Path $PROFILE -Value '(& uvx --generate-shell-completion powershell) | Out-String | Invoke-Expression'

After configuration, restart the shell or reload the corresponding configuration file.

Uninstalling uv

To uninstall uv, you can first clean the cache and uv-managed data:

1
2
3

uv cache clean
rm -r "$(uv python dir)"
rm -r "$(uv tool dir)"

Then delete the binaries.

macOS / Linux:

`1`	`rm ~/.local/bin/uv ~/.local/bin/uvx`

Windows:

1
2
3

rm $HOME\.local\bin\uv.exe
rm $HOME\.local\bin\uvx.exe
rm $HOME\.local\bin\uvw.exe

The official documentation also notes that before 0.5.0, uv installed into ~/.cargo/bin. If you upgraded from an earlier version, old binaries may still be there and need to be removed manually.

What to Do After Installation

After installation, check the version first:

`1`	`uv --version`

Then start with a few common tasks:

uv python install
uv venv
uv pip install requests
uvx ruff --version

For a new project, continue with:

uv init: initialize a project;
uv add: add dependencies;
uv sync: sync the environment;
uv run: run commands inside the project environment;
uvx: run Python CLI tools temporarily.

My Recommendation

On a personal development machine, prefer the official standalone installer because it follows the uv official documentation most closely and supports uv self update.

Windows users who do not want to run a remote script can use WinGet or Scoop. macOS users who prefer managing all tools with Homebrew can directly use brew install uv.

If you already manage Python CLI tools with pipx, use pipx install uv. But I do not recommend installing uv with pip install uv inside a specific project virtual environment, because that tends to mix the toolchain with project dependencies.

For CI or container builds, start with Docker and GitHub Releases, and pin versions according to your image build process.

uv installation documentation: https://docs.astral.sh/uv/getting-started/installation/
uv First steps: https://docs.astral.sh/uv/getting-started/first-steps/
uv Docker integration: https://docs.astral.sh/uv/guides/integration/docker/
uv GitHub Releases: https://github.com/astral-sh/uv/releases

Codex App Beginner Guide: Installation, Sandbox, Parallel Tasks, Skills, and MCP

Wed, 06 May 2026 08:41:17 +0800

Codex App can be understood as a task workspace for AI coding. It is not a traditional IDE, nor just a chat window. It brings multitasking, project management, sandbox permissions, Git, cloud execution, plugins, Skills, MCP, and automation into one interface.

If you already use Codex CLI, Claude Code, Cursor, or other coding agents, the most interesting part of Codex App is that it turns “running multiple agents in parallel” into a clearer desktop workflow.

What Codex App Is Good For

The core value of Codex App is not answering questions, but letting AI continuously execute tasks inside a project directory:

Edit code, run commands, and start development servers.
Manage multiple projects and multiple tasks.
Run long tasks locally or in the cloud.
Call plugins, Skills, and MCP for extended capabilities.
Manage changes through Git, worktree, and PR workflows.

OpenAI also positions Codex App as an interface for managing multiple coding agents. It is suitable for people who need to advance several coding tasks at once, especially frontend pages, scripts, small apps, documentation, and automation workflows.

Preparation Before Installation

Before using Codex App, it is best to prepare three basic tools:

Git
Node.js
VS Code or your preferred IDE

Codex App supports macOS and Windows. After installation, sign in with your ChatGPT account. On first launch, you can choose your main usage scenario, such as programming or daily work. Codex will preload some plugins and Skills based on your choices, and you can adjust them later in settings and the plugin marketplace.

The main features on Windows and macOS are broadly similar, but some computer automation capabilities may depend on platform and plugin support. Use whatever your current version actually displays.

Interface Structure: Projects, Tasks, and Chats

Codex App uses a classic three-column layout:

Left: projects, tasks, chat history, plugins, and automation entry points.
Middle: current chat window.
Right: files, browser, terminal, run results, and other panels.

A project usually corresponds to a local folder. You can open multiple chats inside the same project, or open several projects at once so different agents can work in parallel.

The task list shows different states:

Running: the agent is still executing.
Waiting for approval: you need to confirm permissions, networking, dependency installation, or a high-risk action.
Completed: the task has finished, and you can inspect the result or continue asking.

This is more intuitive than switching between multiple terminal windows, and it is better suited to managing several AI tasks at once.

Sandbox and Permission Control

Codex App’s permission system is built around the sandbox. By default, the current project folder becomes the agent’s main workspace.

Common permission boundaries include:

It can read and modify files inside the project directory.
It cannot freely modify files outside the project by default.
Networking or high-risk commands are restricted by default.
When elevated access is needed, it asks the user for approval.

A practical mode is “auto review”: low-risk actions are automatically allowed, while high-risk actions are still confirmed by the user. This reduces frequent pop-ups while keeping dangerous operations from happening silently.

“Full access” should be enabled cautiously. It is suitable when you know exactly what the agent needs to do and the project already has Git backups and important files have separate backups. It is not recommended as a long-term daily default.

Context, Models, and Quotas

Codex App shows the current chat’s context usage. The longer the conversation and the more history it contains, the more context the model needs to process.

Useful habits:

Start a new chat after finishing a task.
Long chats can be compressed manually, but do not treat compression as perfect memory.
For complex tasks, clearly state goals, boundaries, and acceptance criteria.
Do not dump large irrelevant logs, errors, or files into a chat all at once.

For model selection, adjust reasoning strength according to task complexity. Simple edits, writing, and repetitive tasks do not always need the strongest model. Architecture migration, difficult bugs, and cross-file refactors are better suited to stronger models.

If the interface has a fast mode, remember that it usually consumes more quota. Use it when speed matters, but not as a daily default.

Image Generation and Multimodal Inputs

Codex App can accept images and files as context, and can call image generation in suitable scenarios.

This is useful for frontend and content projects. For example, you can ask Codex to:

Fix page styles based on screenshots.
Replace unsuitable images in a webpage.
Generate product images, carousel images, or page assets.
Point out what needs to be changed from a UI screenshot.

A more efficient approach is not to say only “make it look better”, but to use screenshots and point to concrete problems, such as “the spacing in this card is too large”, “this image does not match the service scene”, or “make the map area clearer”.

Steer: Correcting Direction During Execution

Steer can be understood as taking over the direction during execution. If the agent has already started but you realize it misunderstood the direction, you should not always wait for it to finish before correcting it.

You can use steering to insert a new instruction into the current execution flow and make Codex correct course.

Good use cases for Steer include:

The agent misunderstood the requirement.
The generated page style is clearly wrong.
The current plan is too expensive or heavy.
You need to add a key constraint temporarily.

In general, keep the default queued behavior and manually use Steer only when intervention is needed. This avoids disrupting normal tasks while still letting you pull the direction back at key moments.

Plan Mode and Built-In Browser

For complex tasks, start with plan mode. In plan mode, Codex does not immediately modify code. It first outputs a plan and may ask key questions with cards.

Tasks suitable for plan mode include:

Framework migration, such as moving a React project to Next.js.
Large refactors.
Features involving databases, authentication, or deployment.
Requirements where you have not decided the technical path.

The right panel in Codex App can open a built-in browser to preview the local development server. You can annotate the page and let Codex modify a specific UI location. This “look at the page, click the position, ask AI to change it” workflow is often better for frontend debugging than pure text descriptions.

Git, IDE, and Code Rollback

Codex App is not a full IDE. It can view code and add annotations, but handwritten editing is still better done in VS Code, Cursor, Windsurf, or another IDE.

Every Codex project should initialize Git early:

Ask Codex to create or check .gitignore.
Commit once after reaching a usable state.
Ensure a clean commit point before each large change.
Roll back with Git if you are not satisfied.

If you roll back only the chat history, the code will not automatically roll back. A safer approach is to return the chat to the right point, then use a Git commit hash to return the code to the corresponding state.

Worktree: Parallel Development in Multiple Directions

git worktree is especially suitable for parallel agents in Codex App.

It creates multiple independent working directories from the same repository, each corresponding to a different branch. This lets different agents work in different folders at the same time without overwriting each other.

Typical usage:

One worktree optimizes the customer review component.
One worktree adjusts store information and map layout.
Merge both tasks back to main after completion.
Remove temporary worktrees after merging.

This is much safer than letting multiple agents modify code in the same directory. If conflicts happen, review and merge them using normal Git workflows.

Cloud Execution Environment

Codex can work not only on your local machine, but also in a cloud environment.

Cloud execution is suitable when:

You are outside and only have a phone.
You want agents to run long tasks in the background.
The code has already been synced to GitHub and Codex needs to modify the remote repository.
You want changes reviewed and merged through PRs.

A typical flow is: push local code to GitHub, let Codex pull the repository in a cloud environment, execute the task, generate changes, then present them as a PR or diff for review.

When continuing local development, remember to pull down the latest remote changes.

Memory System: Write a Good AGENTS.md

New chats do not have complete historical memory by default. Once a project becomes complex, repeatedly explaining the background is inefficient.

The most general solution is to maintain AGENTS.md in the project root. This file can record:

Project goals and main tech stack.
Common commands.
Directory structure.
Code style and naming conventions.
Prohibited actions, such as bulk deleting files.
Test, build, and deployment rules.

You can also ask Codex to read the project and generate a first version of AGENTS.md, then review it manually. For complex projects, this file is worth maintaining.

Global rules should be used carefully. They are suitable for universal safety constraints, such as “do not recursively delete directories” or “confirm before destructive operations”. Do not put project-specific details into global rules, or they will pollute other projects.

Plugins and Automations

Plugins connect Codex to external services such as GitHub, Gmail, Google Drive, databases, and deployment platforms.

Their value is reducing copy and paste. For example, Codex can:

Check star trends for a GitHub repository.
Summarize email content and send it to you.
Run a recurring check.
Write the result as a summary.

Automations are suitable for repeated tasks. For example, checking repository data every Friday afternoon and sending an email report. Simple automation tasks usually do not require the strongest model; a lighter model is enough.

Skills: Turn Workflows Into Reusable Capabilities

Skills are “professional playbooks” for Codex. They are not one-off prompts. They package a task flow, rules, scripts, and notes so Codex can reuse them reliably later.

Common sources include:

Official Skills.
Third-party Skills.
Skills you write yourself.

Good candidates for Skills include:

Turning subtitles into illustrated notes.
Writing weekly reports in a company format.
Batch-processing images or documents.
Fixed-format code reviews.
Project initialization for a specific framework.

If you have copied and pasted the same prompt many times, it is worth turning it into a Skill.

MCP: Connect External Tools and Databases

MCP can be understood as a standardized tool protocol for large models. Through MCP, Codex can call external services to complete more concrete tasks.

For example, after connecting Supabase, Codex can:

Create database tables.
Read database schemas.
Modify backend endpoints.
Submit frontend forms to the database.
Debug problems based on database state.

This is powerful, but permissions matter. Databases, production environments, deployment platforms, and email accounts are high-risk resources. When connecting for the first time, use a test project and a low-privilege account.

Deployment Plugins

Deployment platform plugins can let Codex complete builds and releases directly, such as deploying a frontend project to Netlify.

These plugins are suitable for small websites, prototypes, internal tools, and demo projects. In real use, pay attention to:

Run a local build before deployment.
Do not write environment variables directly into code.
Check whether the page opens normally after publishing.
Keep human review for production projects.

AI can help connect the deployment flow, but deployment permissions should still be managed carefully.

Computer Automation

With supported platforms and plugin environments, Codex can also operate browsers or desktop apps, completing tasks closer to RPA.

Examples:

Open a chat app and prepare a message.
Browse a project board and summarize task status.
Generate an English brief.
Send it to a specified recipient after you confirm.
Turn the flow into a scheduled automation.

These capabilities are imaginative, but they require the strongest safety boundaries. Any operation involving sending messages, sending email, submitting forms, payments, or deleting data should retain human confirmation.

Usage Suggestions

The right way to use Codex App is not to let it fully take over everything at once, but to break tasks down and let it execute efficiently in a controlled environment.

Recommended habits:

Initialize Git for every project.
Use plan mode for complex tasks.
Use worktree for parallel tasks.
Put project rules in AGENTS.md.
Keep human confirmation for high-risk actions.
Turn repeated workflows into Skills or automations.
Validate plugins and MCP in a test environment first.

References

Summary

Codex App is not “one more AI chat window”. Its focus is turning AI coding into a manageable workspace where local projects, cloud tasks, Git, worktree, plugins, Skills, MCP, and automation can connect.

The key to using it well is balancing freedom and control. Small tasks can be handed to Codex boldly. Complex tasks should start with a plan. High-risk actions must be confirmed. Used this way, Codex can become not just a code-writing assistant, but a long-term engineering tool.

How to Use DeepSeek V4 Pro in Cline

Fri, 01 May 2026 20:59:06 +0800

Cline already supports the OpenAI Compatible Provider. DeepSeek API is also compatible with OpenAI SDK-style calls, so connecting deepseek-v4-pro to Cline is not complicated: choose OpenAI Compatible, then fill in DeepSeek’s Base URL, API Key, and model name.

The steps below cover both the VS Code extension UI and Cline CLI.

Prepare a DeepSeek API Key

First, create an API Key on the DeepSeek platform.

You need three values:

Item	Value
Provider	`OpenAI Compatible`
Base URL	`https://api.deepseek.com`
Model ID	`deepseek-v4-pro`

DeepSeek’s official documentation states that the V4 series uses the existing OpenAI-compatible interface. Keep base_url as https://api.deepseek.com, and set model to deepseek-v4-pro or deepseek-v4-flash when calling it.

Configure It in the Cline Extension

If you use the Cline extension in VS Code, configure it this way:

Open Cline from the VS Code sidebar.
Go to Cline settings or model configuration.
Select OpenAI Compatible as the provider.
Enter your DeepSeek API Key.
Set Base URL to:

`1`	`https://api.deepseek.com`

Set Model ID to:

`1`	`deepseek-v4-pro`

Save the configuration and run a simple test in Cline.

Start with a low-risk read-only task:

`1`	`Please read the current project directory structure and summarize what type of project this is. Do not modify any files.`

If Cline can read and answer normally, the model connection is working.

Configure It in Cline CLI

If you use Cline CLI, run cline provider configure openai-compatible to enter interactive configuration.

Example:

`1`	`cline provider configure openai-compatible`

Fill in:

1
2
3

API Key: sk-...
Base URL: https://api.deepseek.com
Model ID: deepseek-v4-pro

After configuration, test it with a read-only task:

`1`	`cline "Summarize this repository structure without changing files."`

If you want to lower cost first, you can temporarily change Model ID to:

`1`	`deepseek-v4-flash`

Then switch back to deepseek-v4-pro for complex planning, fact checking, multi-tool collaboration, or high-risk code changes.

Recommended Model Split

DeepSeek V4 Pro and Flash are better used with a clear split.

Model	Best for
`deepseek-v4-flash`	Routine code reading, small batch fixes, script generation, context summarization, low-risk frontend changes
`deepseek-v4-pro`	Architecture planning, complex bugs, cross-file refactors, fact checking, multi-tool calls, high-risk changes

For Agent tools like Cline, cost mainly comes from long context, repeated file reads, plan generation, and multi-round tool calls. If the task is light, use Flash for volume; if the task needs stronger judgment, switch to Pro.

How to Set Context Length

DeepSeek V4 Pro and Flash both support long context. If Cline requires a manual context window value, you can understand it according to the 1M context listed on DeepSeek’s official model page.

In practice, do not put every file into context at the beginning. Cline reads files according to the task, and a better workflow is usually:

first ask it to inspect the directory structure;
then ask it to locate relevant files;
finally let it modify only the target files.

This saves tokens and keeps the task boundary clearer.

Common Issues

1. Model Not Found

First check that Model ID is exactly:

`1`	`deepseek-v4-pro`

Do not write DeepSeek V4 Pro, deepseek-v4, or another display name.

2. 401 or Authentication Failed

Check the API Key:

whether it was copied completely;
whether it contains extra spaces;
whether it was entered into the provider configuration Cline is currently using;
whether the DeepSeek account has available balance.

3. Connection Failed

Check the Base URL:

`1`	`https://api.deepseek.com`

Do not append /v1/chat/completions at the end. Cline’s OpenAI Compatible Provider will construct compatible interface requests itself.

4. Cline Calls Are Too Expensive

You can switch routine tasks to deepseek-v4-flash and use deepseek-v4-pro only for complex tasks.

Also, make the task description as clear as possible:

`1`	`Only modify files related to the login page. Do not refactor unrelated modules. First provide a plan, and modify code only after confirmation.`

Agent tasks are most expensive when boundaries are unclear. The clearer the boundary, the fewer files it reads, the fewer tool calls it makes, and the more controllable the cost becomes.

5. Error: reasoning_content must be passed back

If you see an error like this:

{
  "message": "400 The `reasoning_content` in the thinking mode must be passed back to the API.",
  "code": "invalid_request_error",
  "modelId": "deepseek-v4-pro"
}

This is usually not a Key, quota, or Base URL problem. It means DeepSeek V4 Pro’s thinking mode and the current client’s multi-round tool-call history are not aligned.

DeepSeek’s official documentation states:

thinking mode is enabled by default;
thinking mode returns reasoning_content;
if a tool call happens in one round, subsequent requests must pass back the reasoning_content from that assistant message;
if the client does not pass it back correctly, the API returns 400.

When Cline connects through the OpenAI Compatible Provider, this error may appear in the second round or after tool calls if the current version does not fully preserve and return DeepSeek’s reasoning_content.

Try this order:

Upgrade Cline to the latest version;
confirm you are using OpenAI Compatible, not the normal OpenAI provider;
if Cline supports a custom request body, try disabling thinking mode:

{
  "thinking": {
    "type": "disabled"
  }
}

if Cline does not support extra body parameters, temporarily use another model or a compatible proxy service;
switch back to deepseek-v4-pro after Cline supports passing back DeepSeek V4 reasoning_content.

Note that disabling thinking mode may reduce complex reasoning ability, but it can work around client compatibility issues where reasoning_content is not passed back.

Copyable Configuration

Provider: OpenAI Compatible
API Key: sk-your DeepSeek API Key
Base URL: https://api.deepseek.com
Model ID: deepseek-v4-pro

For low-cost mode:

Provider: OpenAI Compatible
API Key: sk-your DeepSeek API Key
Base URL: https://api.deepseek.com
Model ID: deepseek-v4-flash

Summary

There are only three key steps to calling DeepSeek V4 Pro in Cline:

choose OpenAI Compatible as the provider;
set Base URL to https://api.deepseek.com;
set Model ID to deepseek-v4-pro.

After configuration, test with a read-only task before giving it real code changes. If you often run Agent tasks, split Flash and Pro: Flash handles high-frequency lightweight work, while Pro handles complex judgment and fallback tasks.

References:

mattpocock/skills: A Practical Skill Collection for AI Coding Agents

Fri, 01 May 2026 03:43:20 +0800

mattpocock/skills is a public collection of AI coding agent skills from Matt Pocock.

It is not a full application, nor a new chat client. It is a set of working skills that can be used by AI coding assistants. The idea is practical: break common AI coding problems into small skills that an Agent can call in the right task, instead of relying on one huge prompt every time.

If you often use Claude Code, Codex, Cursor, or similar AI coding tools, this kind of skills collection is worth watching. What really affects the AI coding experience is often not whether the model can write code, but whether it can move through the task in your preferred working style.

What Problem It Solves

AI coding assistants are powerful, but they can easily go wrong.

Common situations include:

Starting code changes before understanding the requirement
Modifying too many files at once
Producing lots of explanation but little useful action
Blindly trying things after errors
Not running tests or checks in time
Ignoring existing project patterns
Introducing unnecessary abstractions to finish a task
Writing code without truly reviewing risks afterward

These problems are not always caused by weak model capability. Often, the workflow is not constrained well enough.

The value of mattpocock/skills is that it turns these common failure modes into reusable operating methods, making the Agent behave more like an experienced engineering collaborator in different scenarios.

What Are Skills

In the AI Agent context, a skill can be understood as a reusable task instruction, working method, or professional workflow.

It does not have to be a code plugin, and it does not always need to call an external service. In many cases, a skill is simply a clear set of rules:

When to use it
What to do first
What not to do
What output is required
How to judge task completion

This is somewhat like a normal prompt template, but the granularity is closer to a task capability.

Normal prompt templates are usually copied and pasted manually by the user. Skills are better as part of an agent toolbox, allowing the Agent to choose the right workflow for the task.

Why Small and Composable Matters

The README emphasizes that these skills are small and composable.

This direction matters.

If one skill tries to handle everything, it quickly becomes a new giant prompt: long, vague, and hard to maintain. The advantage of small skills is clear boundaries.

For example, one skill can focus on:

Planning first
Fixing TypeScript errors
Running tests and fixing based on results
Doing code review
Summarizing project conventions
Improving prompts
Removing unnecessary abstractions

These skills can be combined according to the task. A simple task may need only one skill, while a complex task can chain several together.

This is closer to real engineering work. You do not use the same workflow for every problem; you choose tools according to the situation.

Keeping the Engineer in Control

One important direction of this repository is keeping the engineer in control.

AI coding can easily slide into two extremes.

The first is fully manual. AI only helps write a few lines of code, while all context, planning, and verification still depend on you.

The second is fully hands-off. You throw a task to an Agent, let it change a lot of things, and then face a diff that is hard to review.

Skills help find a more stable middle position.

They let AI take on more repetitive workflow, while still constraining it with rules:

Understand the task before acting
Read relevant files before editing
Keep the modification scope controlled
Report uncertainty
Verify after changes
Do not refactor unrelated code just to show off

This does not weaken AI. It makes AI actions easier for humans to review and take over.

Alignment Problems

The first kind of AI coding failure is often alignment failure.

The user wants a very specific change, but the Agent may understand it as a larger refactor. The user only wants a bug fixed, but it changes styles along the way. The user wants existing architecture to be followed, but it introduces a new pattern.

Skills can help the Agent do several things at the start of a task:

Restate the goal
Identify the impact scope
Recognize existing implementation patterns
Provide a plan
Clarify what will not be done

This step is like an engineer’s self-check before starting work.

If the Agent cannot clearly state the task boundary and starts writing code directly, it is easy for the task to drift.

Feedback Loop Problems

AI should not write code through one-shot generation alone.

In real development, feedback loops matter:

Change a small piece
Run tests or type checks
Read the errors
Fix them
Verify again

Many Agents fail because they skip the middle feedback. They change many things at once and then summarize from intuition that “it should work.”

Skills can make the feedback loop explicit. For example, they can require the Agent to:

Run relevant checks after modification
Read error messages first if checks fail
Avoid blindly changing unrelated files
Re-verify after each round of fixes
Report final verification results

This makes AI coding more like real debugging and less like one-shot writing.

Architecture Control Problems

AI is good at generating abstractions, and also good at over-generating abstractions.

To complete a small requirement, it may create a service layer, helper functions, configuration objects, type wrappers, and adapters, making the code much more complex than the requirement itself.

This is especially dangerous in large projects. AI-generated abstractions often look “professional,” but they may not match existing project style and may increase maintenance cost.

Good skills remind the Agent to:

Prefer existing patterns
Avoid unnecessary new abstractions
Avoid refactoring unrelated areas
Match the change to the size of the task
Understand the code before designing structure

This reduces output that looks engineered but is actually harder to maintain.

Why Review Skills Matter

Writing code and reviewing code are different states.

When an Agent writes code, it usually tends to prove that its implementation works. It may explain why the change should work, but it does not always actively look for risks.

The purpose of a review skill is to switch the Agent’s role:

Find potential bugs
Find behavior regressions
Find missing tests
Find edge cases
Find increased complexity
Find inconsistencies with existing conventions

This matters for AI coding because AI generates code quickly. Without review, users can easily be overwhelmed by large diffs.

A good review output should list issues first, not praise the implementation first. It should help the engineer decide whether the change can be merged.

Difference from Normal Rules Files

Many AI coding tools support rules, instructions, or memory.

These files usually record long-term rules, such as:

Project tech stack
Naming conventions
Test commands
Directories not to modify
Answer style preferences

Skills are more focused on task workflow.

Rules tell the Agent “how to behave in the long term,” while skills tell the Agent “how to execute this kind of task.”

The two work best together.

For example, rules can say the project uses pnpm test, while a review skill requires checking test coverage after changes. Then the Agent knows not only the command, but also when to use it.

Suitable Scenarios

Repositories like mattpocock/skills are suitable for:

Frequent use of AI coding tools
Agents working on real codebases
Reducing out-of-scope AI edits
Making the Agent verify results more actively
Turning your engineering habits into skills
Learning how others design agent workflows
Turning temporary prompts into a maintainable skill collection

If you only occasionally ask AI to write a small function, you may not need to maintain skills.

But if you already treat AI as a long-term development partner, skills become increasingly important. They are like a reusable working method for the Agent.

How to Learn from This Repository

Even if you do not use every skill directly, you can learn several things from this repository.

First, write down failure modes.

Do not only complain when AI makes a mistake. Turn the patterns it often gets wrong into rules, so a skill can prevent them next time.

Second, keep skills short.

One skill should solve one clear problem. The shorter it is, the easier it is to call correctly and maintain.

Third, make output format clear.

If you want the Agent to list a plan first, execute next, and summarize verification results at the end, write that structure clearly. Vague requirements usually produce vague results.

Fourth, keep human handoff points.

A good skill should not let AI run too far alone. When there is uncertainty, expanded impact scope, failing tests, or a product decision, it should stop and explain the situation.

Notes for Use

First, do not turn everything into a skill.

Too many skills make the system complex, and the Agent may not know which one to choose. Start with the highest-frequency and most painful scenarios.

Second, skills need iteration.

The first version of a skill may not be good. Watch how AI actually executes it, then gradually delete, add, and rewrite.

Third, do not let skills replace engineering judgment.

Skills can improve workflow, but they cannot guarantee correct implementation. Tests, review, build checks, and human judgment still matter.

Fourth, pay attention to differences between Agents.

Claude Code, Codex, Cursor, and Copilot support instructions, skills, and rules differently. The same idea can be reused, but the specific format should be adjusted for each tool.

Reference

mattpocock/skills

Final Thought

What makes mattpocock/skills worth watching is not one magic prompt inside it, but the practical AI coding idea it demonstrates: break engineering experience into small skills, then let the Agent combine them by scenario.

As AI coding moves from occasional assistance into daily workflow, skills become important tools for constraining Agents, keeping engineers in control, and improving feedback quality.

Compound Engineering Plugin: Turning AI Coding into a Plan, Execute, Review Engineering Loop

Fri, 01 May 2026 03:15:39 +0800

Compound Engineering Plugin is an open-source AI coding workflow plugin from Every Inc.

It is not focused on “making AI write a piece of code faster.” Instead, it places AI coding inside a loop that looks more like an engineering team: plan first, implement next, review afterward, then preserve what was learned. For people who frequently use tools such as Claude Code, Codex, Cursor, and Copilot, this kind of plugin solves a workflow problem, not just a prompt problem.

AI coding tools are becoming stronger, but in real projects the hardest part is often not generating code. It is making the AI continuously follow project rules, understand task boundaries, avoid repeating mistakes, and accumulate context across multiple iterations.

What Problem It Solves

Many people use AI coding assistants in a flow like this:

Describe the requirement directly
Ask AI to modify the code
Check whether the result runs
Add more explanation after errors appear
Explain the background again in the next task

This can work for small tasks, but it easily breaks down in complex projects:

Requirements are not clarified before AI starts editing
There is no systematic review after code changes
Project conventions depend on repeated user reminders
Similar mistakes happen again next time
Multiple Agent tools lack a shared working method
Experience is not turned into reusable rules

Compound Engineering Plugin is designed for this class of problems. It splits AI coding into multiple stages, so an Agent is not only executing commands but participating in a more complete engineering process.

What Is Compound Engineering

From the project README, Compound Engineering can be understood as a method for AI-assisted software development.

It emphasizes a loop:

Plan: understand the goal, split the task, confirm the path
Execute: modify code according to the plan, run commands, handle problems
Review: check implementation quality, risks, and test coverage
Learn: preserve experience as reusable rules for future work

This loop resembles how real engineering teams work.

A reliable engineer does not receive a requirement and immediately make random changes, nor does he finish edits and hand them off without checking. He first judges the impact scope, then implements, then checks risks and test results, and finally records the traps he stepped into. AI Agents need similar constraints.

Why a Plugin Is Needed

A prompt can tell AI, “Please plan before executing,” but prompts themselves are not always stable.

Once a conversation becomes long and context becomes complex, the model may skip planning, ignore rules, or become overconfident in order to finish the task. The value of a plugin is that it fixes the workflow so different Agent environments can follow similar methods.

This kind of plugin usually breaks a workflow into commands, rules, templates, or subflows. The user does not need to manually write the full prompt every time. Instead, a fixed entry point triggers a specific stage.

For example:

Ask the Agent to generate a plan first
Implement step by step according to the plan
Trigger review after edits
Return to fixing after problems are found
Write useful experience into memory or rules

This makes AI coding feel more like controlled collaboration instead of one-off chat.

Supported Agent Environments

The README mentions support for multiple AI coding environments, including:

Claude Code
Codex
Cursor
GitHub Copilot
Amp
Factory
Qwen Code

This is worth noting.

Many workflow tools are tied to one client. Once you switch tools, the rules cannot be reused. Compound Engineering Plugin is more like a cross-Agent engineering method, bringing similar planning, execution, and review workflows to different tools.

If you use multiple AI coding assistants at the same time, this unified workflow becomes more valuable. Different tools have different capabilities, but project conventions, review habits, and task decomposition methods should remain as consistent as possible.

Why the Planning Stage Matters

The value of the planning stage is to stop AI from acting too early.

In complex tasks, the truly important questions are usually:

Which files need to change?
Which modules may be affected?
What existing pattern should be followed?
Are there tests?
Where are the risks?
Should documents be read first?
Can the task be split into smaller steps?

If an Agent starts writing code before thinking through these questions, it can easily produce an implementation that looks finished but deviates from the project structure.

A plan does not need to be long. A good plan should be short, specific, and executable. Its purpose is not to create documentation, but to give the following implementation clear boundaries.

What to Avoid in Execution

When AI executes coding tasks, several problems appear easily:

Refactoring unrelated code
Overwriting existing user changes
Only handling the happy path
Ignoring error handling
Not following the existing project style
Not running necessary verification
Blindly trying things after errors

A workflow plugin cannot guarantee these problems will disappear, but it can reduce their probability through rules and staged constraints.

For example, the execution stage can require the Agent to proceed according to the plan. When it discovers something outside the plan, it should explain the risk first. When modifying shared modules, it should add tests or at least run related verification.

This is especially important in large codebases. The faster AI writes code, the more process is needed to constrain its momentum.

Why Review Matters

Many AI coding failures are not caused by code that cannot run at all. They come from detail problems:

Edge cases are not handled
State updates are inconsistent
API contracts are changed quietly
Tests do not cover key paths
Error messages are unclear
Performance or security risks are not mentioned

The review stage switches the Agent from “author mode” to “reviewer mode.”

Author mode tends to justify its own implementation. Reviewer mode should actively look for holes, regression risks, and missing tests. Separating these two stages is more reliable than asking the same response to both implement and self-review.

For users, review output is also more valuable. It helps you quickly judge whether the change is ready to merge or still needs rework.

The Meaning of Learning and Memory

The word “Compound” in the project name suggests an important idea: engineering experience should compound.

If AI fixes a mistake only for the current task and then repeats the same mistake next time, the productivity gain is limited. A better approach is to preserve useful experience:

Directory conventions in this project
Debugging methods for a class of errors
Test commands and notes
Generated files that should not be touched
Code style preferences
Common implementation patterns

These experiences can become rules, memories, documents, or templates. In later tasks, the Agent reads these accumulated notes before starting work.

This is the key to moving AI coding from “one-off Q&A” toward “long-term collaboration.”

Suitable Scenarios

Compound Engineering Plugin is suitable for:

Long-term use of AI Agents for coding
Projects that receive many rounds of modifications
Teams that want AI to plan before implementing
Users who want review thinking after changes
Teams that want a unified AI coding workflow
People who use Claude Code, Codex, Cursor, and other tools at the same time
Teams that want to turn project experience into reusable rules

If you only occasionally ask AI to write a small script, the full workflow may feel heavy.

But if you treat AI coding assistants as daily development partners, the plan, execute, review, learn loop becomes clearly useful.

Difference from Normal Prompt Templates

Normal prompt templates usually solve “how to state the task clearly.”

For example:

Please think step by step
Please read the files first
Please keep code style consistent
Please run tests
Please summarize the changes

These prompts are useful, but they still rely on the user using them correctly every time.

Compound Engineering Plugin operates more at the workflow layer. It organizes these requirements into a repeatable process and adapts them to different Agent tools. You are not writing prompts from scratch every time; you are moving tasks through a workflow.

Simply put, a prompt template is like a reminder, while a workflow plugin is like a system.

Notes for Use

First, do not let the process become a burden.

Small tasks do not always need a full plan and long review. A good workflow should adapt to task complexity: handle simple problems quickly and use the full loop for complex ones.

Second, review cannot replace tests.

Agent review can find many problems, but it can still miss real runtime errors. Final judgment still depends on tests, type checks, build results, and human review.

Third, rules need continuous cleanup.

Preserving experience is important, but rules can become noise as they accumulate. Outdated rules, duplicate rules, and temporary experience that only applied to one task should be cleaned up regularly.

Fourth, cross-tool consistency does not mean everything is identical.

Claude Code, Codex, Cursor, Copilot, and other tools have different capabilities and interaction models. What should be unified is the working method, not necessarily every command or configuration detail.

Suitable Teams

If a team already allows AI Agents to modify real code, it is not enough to discuss only “which model is stronger.”

The more important questions are:

Does AI understand the task before editing?
Does AI follow project boundaries during editing?
Does AI actively review risks after editing?
Can AI learn from historical mistakes?
Does the team have unified Agent usage conventions?

This is where projects such as Compound Engineering Plugin matter. They move AI coding one step away from personal tricks and toward reusable team workflow.

Reference

EveryInc/compound-engineering-plugin

Final Thought

What makes Compound Engineering Plugin worth watching is not that it adds another AI coding command, but that it organizes AI coding into an engineering workflow that can improve over time.

When AI Agents start participating in real projects, planning, execution, review, and experience preservation become more important than one-off code generation.

Claude Code Hooks Mastery: An Introduction to 13 Hook Lifecycle Events and Automation Control

Fri, 01 May 2026 03:11:27 +0800

claude-code-hooks-mastery is a learning project focused on Claude Code Hooks.

It is not just a collection of scattered scripts. It explains the Claude Code hook lifecycle, configuration methods, script patterns, and common automation scenarios in one place. For people who want Claude Code to be more controllable and more like an engineering assistant, this kind of material is worth reading.

Claude Code can already read code, edit files, and run commands by default. But if you want it to automatically check permissions, block risky operations, inject project rules, run tests, or remind it of team conventions at specific moments, chat instructions alone are not stable enough. The value of hooks is that they turn “rules I need to remind the AI about every time” into executable workflow.

What Problems Hooks Solve

After using Claude Code for a while, common pain points include:

Every new session needs the same project rules repeated
You worry that it may run commands it should not run
You want checks before and after file edits
You want formatting, tests, or security scans before committing
You want team conventions as fixed workflow instead of verbal reminders
You want context before and after tool calls for logging or blocking
You want complex tasks to trigger subagents or dedicated scripts

Hooks are designed for these “automatic actions at fixed moments.”

You can think of them as event hooks in the Claude Code workflow. When a session starts, a user submits a prompt, the model is about to call a tool, a tool call finishes, or an agent is about to stop, Claude Code can run the scripts you configured.

The 13 Hook Lifecycle Events

One of the main points in the project README is that it systematically covers the 13 Claude Code hook events.

These events span multiple stages, from session startup to tool calls, and from user input to agent termination. By purpose, they can be roughly grouped as:

Session startup: initialize environment and inject project context
User input: inspect prompts, add rules, and perform auditing
Before tool calls: permission checks, command blocking, and security validation
After tool calls: log results, trigger formatting, and run verification
Task ending: summarize, clean up, notify, or save state

This lifecycle design means you do not need to put every rule into one very long prompt.

For example, permission control should happen before tool calls. Formatting checks are better after file edits. Project rule injection is better at session startup or after user input. Putting rules at the right hook point is usually more reliable than stuffing everything into a system prompt.

Where Configuration Lives

Claude Code hooks are usually configured through settings files.

Common locations include:

User-level configuration: ~/.claude/settings.json
Project-level configuration: .claude/settings.json

User-level configuration is good for personal preferences, such as general security rules, command blocking, and log paths.

Project-level configuration is better for repository-specific rules, such as which tests must run, which directories cannot be edited, how generated files are handled, and which checks are required before commit.

If you use Claude Code in a team, it is better to put project-level configuration into the repository. That way everyone opens the project with the same AI collaboration constraints instead of relying on personal memory.

Why Single-File Scripts Matter

The project emphasizes UV single-file scripts.

The benefit is simple deployment. A single Python file can declare dependencies and run without maintaining a complex environment for one hook. This fits hooks well because many hooks only do one small thing:

Check whether a command is allowed
Determine whether a file path is safe
Read project rules and return them to Claude
Scan output for sensitive information
Run formatting or tests after edits
Write events to logs

The smaller a hook script is, the easier it is to maintain, and the less likely it is to become a new complicated system.

What Automation Can Hooks Do

claude-code-hooks-mastery shows many directions. In real work, the most common ones are below.

1. Permission and Security Control

This is the most direct use of hooks.

Before Claude Code executes a command, a hook can inspect the command content. If it contains high-risk actions such as deletion, reset, cleanup, or overwrite, it can block execution or require manual confirmation.

Similar rules can apply to file paths:

Do not modify production configuration
Do not write to secret files
Do not delete migration scripts
Do not touch specific directories
Do not run unapproved network commands

Putting this protection before tool calls is more reliable than writing “do not perform dangerous operations” in a prompt.

2. Context Injection

Many projects have fixed background information:

Tech stack
Coding conventions
Test commands
Branching strategy
Directory structure
Prohibited actions
Rules for generated files

Telling Claude Code this manually every time is annoying and easy to forget. Hooks can automatically inject necessary context at session startup or after the user submits a prompt.

This is like giving Claude Code a project-level work manual. It does not replace the README or development documentation, but it helps AI enter the correct state before executing a task.

3. Verification After Edits

After Claude Code modifies files, hooks can automatically trigger checks.

Common actions include:

Run formatting
Run lint
Run unit tests
Check type errors
Scan generated files
Validate Markdown or JSON format

This helps reduce low-level mistakes. When AI edits multiple files, a lightweight verification pass after modification can reveal problems earlier.

However, hooks should not run heavy tasks by default. Running the full test suite after every file change can make the experience slow. A better approach is to choose checks based on file type, directory, and task risk.

4. Team Rule Validation

If a team already has clear conventions, some of them can be placed in hooks.

For example:

Commit message format
Code style rules
Do not directly edit certain generated files
Documentation must be updated together
API changes must update tests
Certain directories can only be generated by specific tools

This makes Claude Code more like part of the team workflow rather than an unconstrained external assistant.

Of course, hooks should not replace CI. They are better for local reminders and early blocking. Final validation should still belong to CI, review, and test systems.

5. Subagents and Dedicated Tasks

The README also mentions subagent-related content.

This type of usage is suitable for sending complex tasks into more specialized workflows. For example, the main conversation can understand the requirement, while a hook or configuration triggers dedicated checking, auditing, summarizing, or documentation tasks.

For individual users, the first useful step is not complex agent orchestration. It is better to hand repetitive, clear, low-risk actions to hooks first. More complex automation can come after the rules become stable.

Statusline and Output Styles

The project also covers statusline and output styles.

This may look like a small experience detail, but it matters for long-term Claude Code usage. A statusline can show current context, task state, environment information, or hints. Output styles can make Claude Code answers fit your working habits better.

If you collaborate with AI in the same terminal every day, these details affect efficiency. Good status hints reduce mistakes and help you quickly determine whether the current session is in the right project, branch, and environment.

Do Not Make Hooks Too Heavy

Hooks are powerful, but they are not the place to put everything.

Good rules are:

High-frequency actions should be fast
Security blocking should be clear
Output should be short
Failure reasons should be readable
Scripts should have a single responsibility
Heavy checks should be explicit commands or CI tasks

If a hook takes more than ten seconds every time, users will soon want to disable it. If a hook has vague blocking rules, both Claude Code and the user will struggle to understand what to do next.

Hooks are best for tasks with clear boundaries: allow or reject, add context, log events, run lightweight checks, and suggest the next step.

Who Should Use It

If you only occasionally ask Claude Code to edit a small piece of code, you may not need to study hooks deeply yet.

But this project is useful if you:

Use Claude Code frequently
Often let AI modify real project code
Worry about AI running dangerous commands
Want to automatically inject team rules into AI workflows
Want checks to run automatically after edits
Want to turn repeated reminders into configuration
Are building a more stable AI coding workflow

Hooks are especially meaningful in collaborative projects. They can turn part of team experience into scripts instead of relying on every person to remind AI manually.

Notes for Use

First, start with security hooks.

Compared with complex automation, command blocking, path protection, and sensitive file checks are easier to implement and immediately reduce risk.

Second, commit project-level rules carefully.

.claude/settings.json affects everyone who uses the repository. Before committing rules, make sure they do not over-restrict normal development or depend on paths that only exist on your machine.

Third, keep hook output concise.

Claude Code consumes this output. If it is too long, it pollutes the context. If it is too vague, it does not guide the next step. It is best to return only the necessary judgment and next recommendation.

Fourth, keep hooks debuggable.

When hooks increase in number, problems can come from configuration, scripts, permissions, paths, dependencies, or Claude Code itself. Clear logs make later debugging much easier.

Reference

disler/claude-code-hooks-mastery

Final Thought

The value of Claude Code Hooks is turning “rules I hope AI remembers every time” into workflows that actually execute.

If you already use Claude Code in real projects, hooks are a key step from “a coding assistant that can chat” toward “a constrained engineering collaborator.”

Claude-Mem: Adding Cross-Session Long-Term Memory to Claude Code

Fri, 01 May 2026 03:01:02 +0800

Claude-Mem is a persistent memory system for Claude Code.

It tries to solve a very specific problem: every time an AI coding assistant starts a new session, it often forgets earlier architecture decisions, past pitfalls, project preferences, and implementation context.
If a project lasts for a long time, repeatedly explaining the same background becomes a waste of time.

The idea behind Claude-Mem is to compress Claude Code conversations into memories, store them in a local database and vector store, and then retrieve them later through a search tool.

What Problem Does It Solve?

Claude Code is good at code tasks, but session context is still limited.

Common pain points include:

A new session does not know what previous sessions did
Project design decisions need to be explained repeatedly
Problems that were already debugged are easy to repeat
Long-running tasks lack continuity
Project knowledge is hard to accumulate across conversations

Claude-Mem is designed around these problems.

It is not simply saving chat logs. Instead, it compresses conversations into memory fragments that are easier to retrieve. When needed later, semantic search can bring the relevant context back.

How It Works

From the README design, Claude-Mem mainly consists of several parts.

The first part is hooks.

It integrates with the Claude Code session flow and captures conversation data at the right time.

The second part is a background worker.

The worker processes raw conversation content into shorter, more searchable memories.

The third part is local storage.

The project uses SQLite for structured metadata and Chroma for vector indexing. This preserves basic session information while supporting semantic retrieval.

The fourth part is mem-search.

This is the query entry point for Claude Code. When old context is needed, it can search relevant memories through this tool.

The overall flow can be understood like this:

Claude Code sessions generate content
Hooks capture session data
The worker asynchronously compresses and organizes it
Memories are written to SQLite and Chroma
Later sessions retrieve them through mem-search

When Is It Useful?

Claude-Mem is suitable for long-running projects, not one-off small tasks.

For example:

A repository is developed over many days
The code structure is complex and has a lot of background
Project conventions, naming habits, and architecture choices need to be remembered
Claude Code is often used for bug fixes, features, and documentation
You want the AI to remember why something was changed earlier

If you only ask Claude Code to make a one-line change, long-term memory is not very meaningful.
But if you treat Claude Code as a long-term collaborator, it becomes useful.

Installation and Startup

The README gives a direct installation flow:

1
2

npm install -g claude-mem
claude-mem install

Start it with:

`1`	`claude-mem start`

Check status:

`1`	`claude-mem status`

Stop it when needed:

`1`	`claude-mem stop`

The goal behind these commands is to connect the memory system as a long-running local service to the Claude Code workflow.

How to Use `mem-search`

mem-search is the key entry point for retrieving memory.

It is not meant to replace ordinary search. It lets Claude Code query past conversations by meaning.

For example, Claude Code can search for:

Why a module was designed in a certain way
How a bug was debugged earlier
Naming rules agreed on in the project
Technical trade-offs discussed before
The background behind a refactor

This is different from simple keyword search.
If memory compression and vector indexing work well, you can retrieve semantically related content even if you do not remember the exact wording.

How Is It Different from Project Documentation?

Project documentation is good for stable conclusions.

For example:

Architecture notes
Deployment procedures
API conventions
Database structure
Development rules

Claude-Mem is better for context created during conversations.

For example:

Why a plan was rejected
How a temporary issue was worked around
The discussion behind an implementation
Project preferences not yet written into docs
Task background accumulated across multiple conversations

The two are not replacements for each other.
A good workflow is to write stable knowledge into project docs and use the memory system to help retrieve conversational context.

Things to Watch Out For

First, more long-term memory is not always better.

If every conversation is saved without distinction, later retrieval can become noisy. The most valuable memories are project decisions, implementation background, debugging history, and long-term preferences.

Second, memory cannot replace code and documentation.

Old context found by AI is only a reference. Final judgment still depends on the current code, test results, and latest requirements.

Third, pay attention to privacy and local data.

Since it stores conversation content, you should know which projects are suitable for it and which sensitive information should not enter the conversation.

Fourth, memory systems need maintenance.

As a project moves forward, old memories may become outdated. If outdated context is reused incorrectly, it can mislead later tasks.

Why This Kind of Tool Matters

AI coding tools are moving from one-off Q&A toward long-term collaboration.

In one-off Q&A, the model only needs to answer the current question.
In long-term collaboration, it needs to know project history, earlier decisions, team preferences, and pitfalls that have already been found.

This is where tools like Claude-Mem matter: they turn “remembering context” from a temporary chat capability into a local system that can be installed, run, and searched.

For real engineering projects, this is more practical than simply making the model context window longer.
Much information does not need to be stuffed into context all at once; it needs to be retrieved at the right time.

Who Should Try It?

You may want to try it if:

You use Claude Code frequently
You often work on the same project across multiple days
The project context is complex
You repeatedly explain the same background to AI
You want to preserve experience from conversations

If you only use Claude Code occasionally, or the project is small, you may not need this kind of system yet.

Reference

thedotmack/claude-mem

Final Thought

The point of Claude-Mem is not “saving chat logs.” It is helping Claude Code retrieve useful context in later tasks.

As AI coding moves from one-off tasks to long-running project collaboration, memory systems will become increasingly important.
They cannot replace documentation and tests, but they can reduce repeated explanations and make the AI feel more like an assistant that understands project history.

Getting Started with Compiling UEFI Programs: From uefi-simple to Your First .EFI

Thu, 30 Apr 2026 19:53:08 +0800

Compiling your first UEFI program is not exactly effortless. Environment setup can take time, linker errors are common, and a .EFI program does not have the same direct edit-and-run experience as an ordinary desktop application.

This article organizes the topic from a beginner’s perspective: if you only want to compile your first UEFI program, where should you start, which concepts matter first, and which pitfalls are most likely to appear?

What Is a UEFI Program?

A UEFI program is usually a .EFI file.

It is not an ordinary .exe that you double-click in Windows. It is a PE/COFF executable that runs inside the UEFI firmware environment. Common use cases include:

Boot managers
Hardware initialization tools
Firmware update tools
Pre-boot diagnostic tools
Custom boot flows

Many functions you see early in the system boot process may be related to UEFI applications, drivers, or firmware services.

For beginners, there is no need to understand full firmware development immediately. The first goal is simple: compile a .EFI file that can be loaded by a UEFI Shell or emulator.

Why Not Start with EDK II?

Real UEFI development often involves EDK II.

EDK II is complete and closer to real firmware engineering, but it is not very friendly for beginners:

The project structure is complex
The build system has a learning curve
Environment variables and toolchain setup involve many details
Compiler errors are not always easy to understand
It is easy to get stuck on the environment before writing any code

If the goal is simply to get a minimal UEFI program running, a lightweight example is a better starting point.

pbatard/uefi-simple is one such project. Its goal is straightforward: provide a simple UEFI Hello World example so you can compile a .EFI file first.

What Is `uefi-simple` Good For?

uefi-simple is a good first stepping stone for UEFI beginners.

It solves three practical problems:

It gives you a minimal compilable UEFI application structure
It avoids the complexity of large firmware projects at the beginning
It lets you verify that compiling, linking, and running all work

The project supports multiple build methods, including Visual Studio 2022 and MinGW/gcc. It can also be tested with QEMU and OVMF.

In other words, you do not have to repeatedly reboot a real machine for early experiments. Running the program in an emulator first is much safer.

What to Prepare Before Starting

You need at least a few categories of tools.

The first category is the compiler toolchain.

On Windows, you can start with:

Visual Studio 2022
Or MinGW/gcc

The second category is a UEFI runtime environment.

There are two common options:

Run the .EFI file in a real machine’s UEFI Shell
Test it in a virtual environment with QEMU + OVMF

The third category is an example project.

Beginners should not start by writing build scripts from an empty directory. Using a minimal example such as uefi-simple helps avoid many build-system problems.

Basic Workflow

A minimal UEFI program workflow can be understood like this.

First, get the example project.

`1`	`git clone https://github.com/pbatard/uefi-simple.git`

Second, choose a build toolchain.

If you use Visual Studio, build with the Visual Studio solution in the project.
If you use MinGW/gcc, follow the Makefile or instructions provided by the project.

Third, generate the .EFI file.

The key point here is to confirm the target architecture. A common PC is usually x86_64, meaning a 64-bit UEFI environment.

Fourth, put the .EFI file somewhere the UEFI Shell can access.

On a real machine, this usually means preparing a FAT32 partition or USB drive.
With QEMU, you can mount a directory or disk image.

Fifth, run it in the UEFI Shell.

The result is usually a minimal output, such as a Hello World-style message.

Where Beginners Usually Get Stuck

The hardest part of compiling a UEFI program is usually not the C language itself, but the environment and linking process.

Common issues include:

Wrong compiler architecture
Wrong target format
Incomplete linker parameters
Missing UEFI entry point
Generating an ordinary executable instead of a UEFI-loadable .EFI
QEMU or OVMF not configured correctly
Secure Boot on a real machine blocking an unsigned program

Linker errors are especially easy to misread as code problems.
In many cases, the real issue is the entry function, subsystem, target architecture, or linker script.

So in the first stage, do not rush into complex logic. Make sure the original example can compile and run, then change the output little by little.

Why Use QEMU + OVMF for Testing?

Testing UEFI programs on a real machine is possible, but it is not convenient at the beginner stage.

You may have to repeat this cycle:

Compile
Copy to a USB drive
Reboot
Enter the UEFI Shell
Run the program
Record the error
Return to the system and modify the code

That loop is slow.

QEMU + OVMF lets you simulate a UEFI environment directly inside the operating system. You can verify whether a .EFI file loads more quickly, and it is less likely to affect your real boot entries.

Once the program basically works, testing it on a real machine is much more manageable.

What Should Beginners Modify First?

If you have already compiled your first .EFI with the example project, do not jump into complex features immediately.

A better order is:

Change the output text first to confirm that recompilation really takes effect.
Try reading simple information provided by UEFI.
Understand the entry function, output protocol, and basic services.
Then consider more complex features such as file systems, graphical output, or boot entry management.

This approach makes every step verifiable.
If you change too much at once, it becomes difficult to tell whether the issue is in the code, the build process, or the runtime environment.

How Is It Different from an Ordinary C Program?

Although UEFI programs can be written in C, their runtime environment is completely different from ordinary C programs.

An ordinary C program usually runs inside an operating system and can rely on the standard library, file system, process model, and system calls.

A UEFI program runs before the operating system boots. It relies on services provided by UEFI firmware. Many things you are used to in normal programs are not automatically available here.

When writing UEFI programs, you need to adapt to several differences:

The entry function is different
Output works differently
Available libraries are different
Memory and file access work differently
Debugging works differently

This is why starting from a minimal example is better than writing code as if it were a normal C program.

A Practical Learning Path

For beginners, a realistic path looks like this:

Step 1: Compile uefi-simple
Step 2: Run it with QEMU + OVMF
Step 3: Modify the Hello World output
Step 4: Understand how the UEFI Shell loads .EFI
Step 5: Learn the UEFI entry function and basic output protocol
Step 6: Then read EDK II or more complete UEFI development material

The point of this path is to build a working feedback loop first.

Once you can generate a .EFI from source and see output in a UEFI environment, you have already crossed the hardest first threshold.

References

Final Thought

The hard part of compiling your first UEFI program is usually not writing a bit of C code, but connecting the toolchain, link format, and runtime environment.

Do not rush into complex features.
Start with a minimal example such as uefi-simple, get a runnable .EFI first, and then gradually understand UEFI entry points, protocols, and build methods.

Claude.md Is Not Better When It Is Longer: How to Write Global Memory Files for AI Coding

Wed, 29 Apr 2026 21:07:37 +0800

I recently saw a discussion about global memory files for AI coding: after projects add files such as Claude.md or AGENTS.md, the results do not necessarily improve. In some cases, success rates may even drop while reasoning cost rises.

At first, this feels counterintuitive. We usually assume that if we give AI more project background, more rules, and more explanation, it should write code more accurately.
The real issue is that Claude.md is not an ordinary document. It is a global memory file that gets injected into the context on every conversation. The more it contains, the more the model has to read every time; the vaguer it is, the more judgment the model has to make; and if it contains workflows that should not always run, the model may trigger unnecessary actions in unrelated tasks.

So the hard part of writing Claude.md is not making it complete. It is deciding which pieces of information deserve to occupy context permanently.

What Claude.md Is

In AI coding tools, files such as Claude.md and AGENTS.md are essentially global memory files.

Normal conversation enters the context, but context length is limited. Once the conversation becomes long, historical content is compressed and some details are lost. A global memory file fixes important rules in place so the model can see them at the beginning of every task.

This means two things:

Content written there is harder to forget
Content written there also costs something on every task

It is not like a README that is read only when needed. It is more like a long-lived set of working constraints. Once something is placed there, it affects the model’s judgment by default.

Therefore, Claude.md is not a project introduction, not a collection of tips, and not a place to dump every development process. It should only store rules that the model is likely to violate repeatedly if it does not know them.

Why It Can Make Things Worse

A poorly written global memory file usually causes three kinds of problems.

First, it consumes context.

If Claude.md has one thousand lines, those lines stay in the model context for a long time. Code, error messages, and requirements that are actually relevant to the current task may get squeezed. Context is not free space. The larger the global rule file, the easier it is to dilute the current task.

Second, it can trigger unnecessary behavior.

For example, a global file might say:

1
2

Before every task, fully read the project directory.
After every change, run a complete end-to-end test.

These lines look responsible, but in a global memory file they become “do this for every task.” Even if the task is only changing one line of copy, the model may perform unnecessary exploration and tests because of these rules. The result is slower work, higher cost, and sometimes more interference.

Third, it increases the burden of judgment.

Statements like “keep code elegant, concise, maintainable, and extensible” sound correct, but they are weak constraints. Every time the model generates code, it has to decide what elegant or extensible means, without receiving a clear boundary.

A better approach is to write concrete prohibitions or counterexamples instead of abstract virtues. For example:

1
2
3

Do not add a generic abstraction for a single call site.
Do not change shared parsing logic without test coverage.
Do not put temporary scripts in the application source directory.

These rules are more specific and easier to follow.

What Should Go In

You can use a simple standard to decide whether something belongs in Claude.md:

If the AI will repeatedly make the same mistake without it, then it is worth writing down.

Content suitable for a global memory file usually has these traits:

It is durable
It is strongly tied to the current repository
It cannot be naturally inferred from the code structure
It clearly changes model behavior
It is preferably a constraint, prohibition, path rule, or fixed command

For example:

For all Hugo posts, only edit index.zh-cn.md and do not automatically generate other language versions.
Article front matter must include title/date/draft/tags/categories/slug/description.
Do not modify generated artifacts under public/.
On PowerShell, use scripts/deploy.ps1 for deployment.

These are not vague suggestions. They are tied to how the repository actually works. If the model does not know them, it may make mistakes; once it knows them, it can avoid real missteps.

What Should Stay Out

Many people turn Claude.md into a project manual. That is usually unnecessary.

Content that generally does not belong there includes:

Project vision and background
Large directory structure descriptions
Temporary task plans
One-off debugging steps
Abstract code quality slogans
Long workflows that are only needed in a few situations

For example, a description like “this is an e-commerce project with product, order, and user modules” helps very little with a concrete coding task. During real development, the model should rely on the current requirement, specification, code structure, and tests, not on a rough project introduction in global memory.

The same applies to directory structure. Unless a directory has a special convention, such as “shared components must be imported from this directory,” there is no need to write the entire tree into the file. The model can read the project directory itself. A static directory description is easy to become stale.

Workflows Belong in Skills or Commands

If a section says “first do this, then do that, then do the third thing,” it may not belong in Claude.md.

Long-lived workflows can be turned into skills, scripts, or commands. The benefit is that the global memory only needs to keep the name and trigger condition, while the detailed steps are loaded only when needed.

For example:

1
2

When the user asks to translate a Hugo post, use the post-translate skill.
When the user asks to deploy the site, run the hugo-rsync-deploy workflow.

This is lighter than putting the full translation and deployment processes into Claude.md. Global memory stays short, and detailed workflows live in triggerable tools.

Claude’s newer initialization flow is also moving in this direction. It does not only generate a Claude.md; it also tries to split reusable workflows into skills and fixed events into hooks. The underlying idea is clear: global memory should be an entry point, while details should be loaded on demand.

Claude.md Needs Iteration

Claude.md should not be written once and then ignored.

A better approach is to keep it short at first and let real tasks expose problems. If an error happens once, handle it manually. If the same kind of error appears two or more times, it may deserve to become a global rule.

This kind of iteration is more useful than writing a huge set of rules at the beginning. Early on, you do not know which rules are truly useful or which lines will become noise. As the project grows, collaboration increases, and the model’s behavior becomes clearer, you can gradually add the high-frequency problems.

There is also an important trend: the stronger the model, the shorter the global memory file should become.

Many requirements that once had to be written into prompts are now handled naturally by the model. Continuing to put those basic requirements into Claude.md only increases context load. Global memory should shrink as model capability improves, keeping only what is unique to this repository and cannot be inferred automatically.

A More Practical Way to Write It

When writing Claude.md, think in this order:

What special conventions does this repository have?
Which mistakes has the model made more than once?
Which directories, files, or commands must never be misused?
Which workflows should become skills, scripts, or commands instead of permanent context?
Which parts are merely introductions and can be deleted?

The final file may be only a few dozen lines. It does not need to fully explain the project. It needs to constrain behavior precisely.

A good Claude.md might look like this:

# Working Rules

- Only edit files related to the current task.
- Do not modify generated artifact directories such as public/ or resources/.
- Hugo post rewrites only process index.zh-cn.md and do not generate other language versions.
- If deployment is involved, run the Hugo build first, then execute the existing rsync script.
- When there are existing user changes, do not revert them. Continue from the current state.

It is short, but every line affects real behavior. That is the kind of content worth keeping in context permanently.

Final Thought

The value of Claude.md is not to make AI “know more.” It is to make AI “avoid fixed mistakes.”

It is not a knowledge base or project encyclopedia. It is a long-lived constraint file for AI coding.
The more specific, shorter, and closer to real mistakes it is, the more useful it becomes. The more generic, longer, and more like a project introduction it is, the more likely it is to slow the model down or even make results worse.

Treat global memory as a scarce resource, not an unlimited scratchpad. That may be the most important principle for writing a good Claude.md.

Codex Is Starting to Control the Computer. What Does That Mean for the Future?

Wed, 29 Apr 2026 11:28:25 +0800

The most important part of this Codex update is not that it added another ordinary button. It is that Codex is starting to move toward “controlling the computer.”

In the past, using AI usually meant asking questions in a chat box, copying, pasting, and then manually operating software.
Now that boundary is expanding: AI does not just answer you. It can operate desktop applications according to your goal.

In the short term, this is a new feature. In the long term, it may change how many people use computers.

What This Feature Is

Simply put, Codex’s computer use capability lets it access and operate the desktop environment.

It can do things such as:

select and control an application
receive tasks in natural language
open browsers, AI tools, local files, or other software
enter text, click buttons, and wait for results
connect multiple steps into one task
keep running in the background without requiring the user to follow every step manually

Its role is not just to write a piece of text for you, but to complete an operation flow for you.

That is the key difference between an Agent and an ordinary chatbot:
a chatbot mainly gives answers; an Agent is closer to “receiving a goal and then executing it.”

Why This Matters

In the past, much automation required you to know how to write scripts.

For example, suppose you want to complete a cross-software workflow:

open a web page
find information
copy content
pass it to another AI tool
save a file
open the local directory and check the result

To automate this traditionally, you might need browser scripts, APIs, local programs, and even window automation.

But many ordinary users do not know how to write these things.
Even if they do, it may not be worth writing a script for a temporary task.

This is where computer use matters: it pushes “script-like capability” toward natural language.

You do not necessarily need to tell it exactly where to click.
You can tell it what result you want and let it try to complete the task.

Workflows It May Change

I think the first workflows to change will not be extremely serious or high-risk work, but the tasks that are annoying, fragmented, repetitive, and not worth writing a dedicated program for.

1. Moving Information Across Software

The most typical case is moving information between applications.

Previously, you might switch back and forth between a browser, a document, a chat window, and a local folder.
In the future, you can hand this kind of task to an Agent:

find a certain kind of information
summarize it into a document
save it to a specified directory
open the result for you to review

This work is not hard, but it consumes attention.
The value of an Agent is that it absorbs these small operations.

2. Coordination Between Multiple AI Tools

Many people’s real workflow is no longer based on a single AI tool.

It may look like this:

one tool writes code
one tool researches information
one tool generates images
one tool organizes documents

Previously, these tools were connected by manual copy and paste.
In the future, an Agent can become the middle layer: it opens tools, passes context, waits for output, and organizes results.

This can turn “multiple AI tools working together” from a manual process into a semi-automated process.

3. Office Software Automation

Spreadsheets, presentations, documents, and email share one trait: they are powerful, but many operations are fragmented.

If Agents can reliably control this software, the barrier to office automation will drop noticeably.

You do not need to remember where a menu is or learn complicated shortcuts.
You only need to describe the goal, such as:

turn this spreadsheet into a monthly report
make a one-page summary from this document
combine these materials into a clearly structured explanation

The tedious button operations will gradually be hidden behind natural language.

What It Means for Ordinary Users

For ordinary users, this kind of feature may have a more direct impact than “the model got a bit smarter.”

Because it lowers the operation barrier, not just the knowledge barrier.

Many people can describe what they want, but they do not know where to click or how to combine features inside software.
If Agents can take over this part, using a computer may become:

1
2
3

I describe the goal
Agent operates the software
I check the result

That is closer to real productivity than simple chat.

Its Impact on Software

If this kind of Agent capability continues to mature, software itself will also be affected.

In the past, software design mainly served human clicking.
In the future, software may also need to serve Agent operation.

This means:

interface elements need to be clearer
operation feedback needs to be more stable
local permissions need to be more granular
software may provide interfaces better suited for Agent calls
users may care more about whether software can be operated smoothly by AI

In the long run, the boundaries between applications may become thinner.
Users may care less about “which app should I open” and more about “what task do I want to complete.”

Do Not Overhype It Yet

Of course, it is not time to fully let go yet.

This kind of capability still has several clear limitations:

stability still needs observation
complex tasks may fail in the middle
permission boundaries must be handled carefully
account, payment, and file deletion operations should not be delegated casually
quota consumption is not something you can completely ignore

So at this stage, the best use case is not letting it take over the whole computer, but letting it handle low-risk, reviewable, step-heavy tasks.

For example:

organizing materials
generating drafts
moving content across tools
opening and checking files
running semi-automated workflows that can be reviewed by a human

One Last Line

The real importance of this Codex update is that it pushes AI from “answering questions” toward “operating the environment.”

In the short term, it is a computer use feature.
In the long term, it may mark a shift in how personal computers are used.

In the future, we may spend less time remembering buttons, finding menus, and switching windows.
More often, we will describe the goal, let an Agent execute it, and then let humans make the final judgment.

Why Does a Codex Skill Exist in the Directory but Still Not Show Up?

Wed, 29 Apr 2026 11:18:00 +0800

This problem was easy to miss: several skills were already placed under ~/.codex/skills, but after opening a new Codex thread, the sidebar still showed only a small subset of them.

At first, it looked like a cache or indexing issue. The real cause was more specific: several SKILL.md files started with a UTF-8 BOM. Codex 0.111.0’s skill loader did not skip that byte sequence, so it misjudged the files as having no valid YAML front matter.

Symptom

The local directory contained these skills:

~/.codex/skills/git-commit-push/SKILL.md
~/.codex/skills/hugo-rsync-deploy/SKILL.md
~/.codex/skills/bilibili-speech-transcriber/SKILL.md
~/.codex/skills/product-cutout-normalize/SKILL.md

But after opening a new thread, the actually exposed skills were only:

1
2

bilibili-speech-transcriber
product-cutout-normalize

In other words, a file existing on disk does not mean the current session can load it successfully. Codex parses the front matter of each SKILL.md first. If parsing fails, that skill is excluded directly.

Investigation

Starting a fresh session with codex exec showed a more direct error. In VS Code or other IDEs, these logs may not be visible:

1
2

failed to load skill C:\Users\knightli\.codex\skills\git-commit-push\SKILL.md: missing YAML frontmatter delimited by ---
failed to load skill C:\Users\knightli\.codex\skills\hugo-rsync-deploy\SKILL.md: missing YAML frontmatter delimited by ---

Visually, these files seemed to have a normal header:

---
name: post-rewrite
description: ...
---

The real problem was at the byte level.

The beginning of a failing file was:

`1`	`EF-BB-BF-2D-2D-2D`

The beginning of a file that loaded correctly was:

`1`	`2D-2D-2D`

2D-2D-2D is ---. The preceding EF-BB-BF is the UTF-8 BOM.

Cause

In Codex 0.111.0, the skill loader expects the first byte of SKILL.md to be the first - in ---.

If the file starts with a UTF-8 BOM, the actual beginning becomes:

`1`	`BOM + ---`

So the loader thinks the file does not start with the front matter delimiter and reports:

`1`	`missing YAML frontmatter delimited by ---`

The skill content was not wrong, and the directory was not wrong either. A small encoding detail prevented the parser from recognizing the file.

Fix

Convert the affected SKILL.md files to UTF-8 without BOM.

In PowerShell, this can be done like this:

$paths = @(
  'C:\Users\knightli\.codex\skills\git-commit-push\SKILL.md',
  'C:\Users\knightli\.codex\skills\hugo-rsync-deploy\SKILL.md',
)

$utf8NoBom = New-Object System.Text.UTF8Encoding($false)

foreach ($p in $paths) {
  $text = [IO.File]::ReadAllText($p, [Text.Encoding]::UTF8)
  [IO.File]::WriteAllText($p, $text, $utf8NoBom)
}

After processing, the file header should change from:

`1`	`EF-BB-BF-2D-2D-2D`

to:

`1`	`2D-2D-2D`

Verification

After restarting a Codex session, the visible skills were restored to:

git-commit-push-zh
hugo-rsync-deploy
bilibili-speech-transcriber
product-cutout-normalize

If the sidebar still shows the old list, close the current Codex sidebar or window and reopen the project. The skill list is usually loaded when the session starts, so changes made in the middle of a session may not refresh immediately.

One Last Line

This kind of issue is easy to mistake for “Codex did not re-index” or “the skill was not installed correctly.”

When troubleshooting, check these three things first:

whether SKILL.md is really in the correct directory
whether the file has valid --- front matter at the top
whether the file is UTF-8 without BOM

The key in this case was the third point: the file looked fine, but its first byte was not -, so Codex did not treat it as a valid skill.

What Is the Difference Between ~/.codex/skills and Project .codex/skills in Codex

Wed, 29 Apr 2026 11:08:00 +0800

When organizing Codex skills, people most often get stuck on two questions:

What is the difference between ~/.codex/skills and project/.codex/skills?
Why does a skill exist in the directory but not appear in the current session?

Here is the short version.

The Difference

The simplest way to remember it:

~/.codex/skills is your global skill library
project/.codex/skills is the local skill library for that repository

`~/.codex/skills`

Use it for:

skills you personally reuse across projects
general workflows that are not tied to a specific repository
workflows that clearly belong to your own habits

For example:

post-rewrite
post-translate
git-commit-push
hugo-rsync-deploy
bilibili-speech-transcriber

The key trait of this kind of skill is: it still makes sense outside the current project.

`project/.codex/skills`

Use it for:

workflows that only apply to this repository
rules tightly coupled to the current project structure, scripts, or templates
skills that should be shared by the team

For example:

a publishing workflow specific to this repository
a generation template that only works in this project
automation steps tightly bound to private project scripts

The key trait of this kind of skill is: it stops being meaningful once it leaves this repository.

When to Use Global and When to Use Project Skills

This rule of thumb is enough:

If it is about your personal habits, put it in ~/.codex/skills
If it is about repository rules, put it in project/.codex/skills
If it can be reused across projects, prefer global
If it should be shared by multiple people and evolve with the repository, prefer project-level

The Current Repository

Based on the current state:

your machine has ~/.codex/skills
this repository does not have .codex/skills

So right now, you mainly rely on global skills.

That means workflows such as post-rewrite, post-translate, and git-commit-push are currently more like part of your personal workflow, not something explicitly bundled with this repository.

Why a Skill Exists on Disk but May Not Appear in the Current Session

There are two different things here:

Existing on disk: the skill file exists in a local directory
Exposed to the session: the current session registered it into the available skill list

These are not the same thing.

So this can happen:

a skill already exists under ~/.codex/skills
but it does not appear in the list after /

This usually does not mean the skill is broken. More often, it means: the current session has not re-indexed it.

How to Make a Skill Available in the Current Session

The practical checklist is short.

1. Put It in the Right Directory

Global:

`1`	`~/.codex/skills/<skill-name>/SKILL.md`

Project-level:

`1`	`project/.codex/skills/<skill-name>/SKILL.md`

2. Make the `SKILL.md` Header Recognizable

At minimum, it needs:

---
name: your-skill-name
description: What this skill does
---

3. Open a New Session After Creating or Editing It

In many cases, a skill does not appear because the current session already fixed its available skill list when it started.

So if you create a skill in the middle of a session, it may already exist on disk, but this session may not recognize it.

The most reliable workflow is:

Put the skill in place
End the current session
Re-enter the project
Open a new session
Check whether it appears under /

4. Put Project Skills in Place Before Starting

If you want project/.codex/skills to be recognized more reliably, put those skills into the project before entering the repository and starting the session.

One Last Line

The shortest conclusion is:

~/.codex/skills is your personal skill library
project/.codex/skills is the repository’s local rule library
a skill existing in the directory does not mean the current session will always show it
the most common fix is to put it in the right directory, write a valid SKILL.md, and then start a new session

Developer Tools on KnightLi Blog

Google Pay and Wallet Launch Developer MCP Server: Bringing Payment Integration Into AI Assistants

What Problem It Solves

Why MCP Fits Here

What It Means for AI Coding Workflows

It Will Not Replace Developer Judgment

My Take

Remotion: Generate Videos Programmatically with React

What Problem Does Remotion Solve

Why React

Quick Start

What Scenarios Is It Good For

Personalized Videos

Technical Demo Videos

Data Videos and Chart Animations

AI-Generated Video Workflows

Why It Matters for AI Coding Tools

Check the License Before Use

My Take

RTK: A CLI Proxy That Saves Tokens for AI Coding Agents

What Problem It Solves

How RTK Works

Supported Commands

Installation and Integration

What To Watch Out For

Who Should Use It

My Take

Reading the Official Codex Article: How to Get the Most Out of Codex

Durable threads

Voice input

Steering and queuing

Tools and reachable scope

Continue working from anywhere

Automation

Goals

Sidebar

Shared memory

Expanding outward from code

How to Fix Codex Goal Failed to Set Goal

First check the goals feature switch

If the issue continues, check the configuration directory

Also check security software on Windows

How to tell whether the prompt is the problem

Summary

oh-my-codex: Adding Workflows, Skills, and Runtime Guardrails to Codex CLI

What it is

Recommended installation

Default workflow

What skills and agents provide

Plugin shape and runtime state

Who it fits

Things to watch

Summary

References

CLI-Anything: Turning Software into an Agent-Usable Command Line

How it works

Where it fits

Boundaries to keep in mind

Summary

What Is GitHub Spec Kit? Using Spec-Driven Development to Tame AI Coding

What Is Spec Kit?

Basic Usage Flow

It Changes the Entry Point of AI Coding

How to Understand the Core Commands

/speckit.constitution

/speckit.specify

/speckit.clarify

/speckit.plan

/speckit.tasks

/speckit.implement

Why It Fits AI Coding

Extensions and Presets

Who Is It For?

My Take

References

What Is OpenAI Symphony? Codex Orchestration, Issue-Driven Development, and AI Agent Workflows

Symphony is not solving code writing, but Agent management

Why an issue tracker?

Its core workflow

Goal-driven, not a rigid state machine

`/speckit.constitution`

`/speckit.specify`

`/speckit.clarify`

`/speckit.plan`

`/speckit.tasks`

`/speckit.implement`