<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>AI Agent on KnightLi Blog</title>
        <link>https://knightli.com/en/tags/ai-agent/</link>
        <description>Recent content in AI Agent on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Mon, 25 May 2026 00:17:32 +0800</lastBuildDate><atom:link href="https://knightli.com/en/tags/ai-agent/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>What Is OpenAI Symphony? Codex Orchestration, Issue-Driven Development, and AI Agent Workflows</title>
        <link>https://knightli.com/en/2026/05/25/openai-codex-orchestration-symphony/</link>
        <pubDate>Mon, 25 May 2026 00:17:32 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/25/openai-codex-orchestration-symphony/</guid>
        <description>&lt;p&gt;OpenAI recently open-sourced an interesting Codex orchestration specification: &lt;strong&gt;Symphony&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;It is not another chat-based coding assistant, nor is it a complete new IDE. More precisely, Symphony is a way to orchestrate work around Codex: it turns an issue tracker similar to Linear into the control plane for coding agents, so every open task can correspond to a continuously running Agent.&lt;/p&gt;
&lt;p&gt;One line from the official article captures its direction well: in the past, engineers had to monitor multiple Codex sessions at once, continually assigning work, reviewing output, correcting course, and restarting sessions. Symphony is designed to address exactly that context-switching bottleneck.&lt;/p&gt;
&lt;h2 id=&#34;symphony-is-not-solving-code-writing-but-agent-management&#34;&gt;Symphony is not solving code writing, but Agent management
&lt;/h2&gt;&lt;p&gt;A single Codex session works well for interactive development: you give it a task, it changes the code, you review it, and then you keep asking follow-up questions. But once a team starts using multiple Agents at the same time, the problem shifts from &amp;ldquo;can the code be written?&amp;rdquo; to &amp;ldquo;who is working on what, how far along is it, and who takes over after a failure?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;OpenAI&amp;rsquo;s approach is to move the center of gravity from &amp;ldquo;sessions&amp;rdquo; to &amp;ldquo;tasks&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the issue is the real unit of work;&lt;/li&gt;
&lt;li&gt;every open issue can map to an independent Agent workspace;&lt;/li&gt;
&lt;li&gt;Symphony continuously polls the task board and decides which tasks should be started, retried, stopped, or reclaimed;&lt;/li&gt;
&lt;li&gt;Codex performs implementation, testing, commits, PR creation, status updates, and related actions inside the workspace;&lt;/li&gt;
&lt;li&gt;humans no longer micromanage every session, but instead review results, adjust goals, and maintain boundaries.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The shift behind this is important: an Agent is no longer just a tool that humans temporarily summon, but a continuously running kind of executor inside the development workflow.&lt;/p&gt;
&lt;h2 id=&#34;why-an-issue-tracker&#34;&gt;Why an issue tracker?
&lt;/h2&gt;&lt;p&gt;Because teams already use issue trackers to manage real work.&lt;/p&gt;
&lt;p&gt;Requirements, bugs, refactors, migrations, research, priorities, blockers, owners, and milestones are already recorded in Linear, GitHub Issues, or similar systems. Symphony does not reinvent a large console. Instead, it treats these existing systems as the task entry point for Agents.&lt;/p&gt;
&lt;p&gt;This has several advantages:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Work does not need to be copied from an issue into a chat window.&lt;/li&gt;
&lt;li&gt;Humans can keep creating, splitting, scheduling, and closing tasks in familiar ways.&lt;/li&gt;
&lt;li&gt;Agent state changes can be written back to the same work system, making async collaboration easier for the team.&lt;/li&gt;
&lt;li&gt;Task dependencies can naturally form a DAG, allowing unblocked tasks to move forward in parallel.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If traditional CI is &amp;ldquo;automation after a code commit,&amp;rdquo; Symphony is closer to &amp;ldquo;automation after an issue is created.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;its-core-workflow&#34;&gt;Its core workflow
&lt;/h2&gt;&lt;p&gt;A typical Symphony flow can be understood as:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;创建 issue
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  -&amp;gt; Symphony 轮询到可执行任务
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  -&amp;gt; 为该 issue 创建独立 workspace
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  -&amp;gt; 启动 Codex agent session
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  -&amp;gt; Agent 阅读任务、修改代码、运行测试
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  -&amp;gt; 创建或更新 PR
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  -&amp;gt; 写回任务状态、评论、证据和交付物
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  -&amp;gt; 人类 review、合并或要求修改
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The official specification also emphasizes several engineering details:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;each issue uses an independent workspace to reduce cross-contamination;&lt;/li&gt;
&lt;li&gt;the orchestrator maintains retry, concurrency, and recovery state;&lt;/li&gt;
&lt;li&gt;workflow policy lives in the repository&amp;rsquo;s &lt;code&gt;WORKFLOW.md&lt;/code&gt;, so teams can version the rules that describe how Agents should handle tasks;&lt;/li&gt;
&lt;li&gt;implementations need to preserve observability, with at least structured logs;&lt;/li&gt;
&lt;li&gt;a successful state does not have to be &lt;code&gt;Done&lt;/code&gt;; it can also be an intermediate state handed to humans for review.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This shows that Symphony is not simply about &amp;ldquo;letting AI write code automatically.&amp;rdquo; It defines a runnable, recoverable, and auditable Agent work system.&lt;/p&gt;
&lt;h2 id=&#34;goal-driven-not-a-rigid-state-machine&#34;&gt;Goal-driven, not a rigid state machine
&lt;/h2&gt;&lt;p&gt;OpenAI mentions an important shift in the article: early on, they tried hard-coding many actions in the outer harness, such as committing code, running tests, and handling GitHub workflows. But as Codex became more capable, that approach started to constrain the Agent.&lt;/p&gt;
&lt;p&gt;The later direction was to give the Agent a goal, rather than encoding every step as a fixed state transition.&lt;/p&gt;
&lt;p&gt;For example, a task&amp;rsquo;s goal might be &amp;ldquo;complete the Vite migration and ensure CI passes.&amp;rdquo; The Agent can decide for itself whether it needs to change configuration, fix tests, read CI logs, handle review feedback, or even create new follow-up issues. Symphony provides boundaries, context, and the runtime framework instead of prescribing every action for the Agent.&lt;/p&gt;
&lt;p&gt;This is also where it differs from traditional automation scripts: scripts are good at repeated, deterministic processes; Symphony is aimed at engineering tasks with uncertainty.&lt;/p&gt;
&lt;h2 id=&#34;how-is-this-different-from-normal-codex-usage&#34;&gt;How is this different from normal Codex usage?
&lt;/h2&gt;&lt;p&gt;A normal Codex session is more like &amp;ldquo;a human writes code with AI&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the human opens a session;&lt;/li&gt;
&lt;li&gt;the human describes the task;&lt;/li&gt;
&lt;li&gt;the human watches the output;&lt;/li&gt;
&lt;li&gt;the human corrects course at any time;&lt;/li&gt;
&lt;li&gt;after one task ends, the human starts the next session.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Symphony is more like &amp;ldquo;a team hands a task pool to a group of Agents&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;humans write clear issues;&lt;/li&gt;
&lt;li&gt;the system continuously discovers executable tasks;&lt;/li&gt;
&lt;li&gt;Agents make progress in isolated environments;&lt;/li&gt;
&lt;li&gt;results come back as PRs, comments, test status, videos, or analysis reports;&lt;/li&gt;
&lt;li&gt;humans review at key checkpoints.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not about replacing engineers. It is about freeing engineers from the burden of simultaneously watching many sessions. OpenAI notes in the official article that some teams saw a significant increase in PRs merged to the main branch. But the more important point is the change in working style: the startup cost of trying an idea, launching a refactor, or validating a hypothesis becomes lower.&lt;/p&gt;
&lt;h2 id=&#34;where-does-it-fit&#34;&gt;Where does it fit?
&lt;/h2&gt;&lt;p&gt;Symphony is better suited to tasks such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;routine feature implementation;&lt;/li&gt;
&lt;li&gt;small refactors in an existing codebase;&lt;/li&gt;
&lt;li&gt;infrastructure migrations;&lt;/li&gt;
&lt;li&gt;dependency upgrades;&lt;/li&gt;
&lt;li&gt;filling in tests;&lt;/li&gt;
&lt;li&gt;CI fixes;&lt;/li&gt;
&lt;li&gt;research followed by an implementation plan;&lt;/li&gt;
&lt;li&gt;continuing to revise a PR based on review feedback.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is not necessarily a good fit for highly ambiguous tasks that require strong business judgment or architectural decisions. For those problems, an interactive Codex session is still more natural because humans need to stay involved throughout the process.&lt;/p&gt;
&lt;h2 id=&#34;risks-and-boundaries&#34;&gt;Risks and boundaries
&lt;/h2&gt;&lt;p&gt;Symphony is appealing, but in real adoption, teams cannot look only at the &amp;ldquo;automation&amp;rdquo; side.&lt;/p&gt;
&lt;p&gt;Several boundaries need to be made clear in advance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;issues must be written clearly, otherwise Agents will amplify vague requirements into incorrect implementations;&lt;/li&gt;
&lt;li&gt;Agent permissions should be constrained, especially access to repositories, secrets, production environments, and third-party services;&lt;/li&gt;
&lt;li&gt;every workspace should be isolated to avoid contamination between tasks;&lt;/li&gt;
&lt;li&gt;CI, tests, lint, and review remain necessary quality gates;&lt;/li&gt;
&lt;li&gt;task status, PR links, logs, and failure reasons need to be traceable;&lt;/li&gt;
&lt;li&gt;human review cannot be skipped, especially for changes involving security, billing, data migration, and permission logic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The official repository also positions Symphony as an engineering preview and reference implementation for trusted environments, not a finished platform that can blindly replace a development process.&lt;/p&gt;
&lt;h2 id=&#34;my-understanding-of-symphony&#34;&gt;My understanding of Symphony
&lt;/h2&gt;&lt;p&gt;The most valuable part of Symphony is not that it uses Linear, nor that the reference implementation chose Elixir. Its value is that it redefines the entry point for programming Agents.&lt;/p&gt;
&lt;p&gt;In the past, we were used to starting AI coding from a chat window. That is flexible, but once the scale grows, human attention becomes the bottleneck. Symphony puts the entry point back in the issue tracker and lets Agents work continuously around real tasks. In that sense, AI coding starts moving from a &amp;ldquo;personal productivity tool&amp;rdquo; toward &amp;ldquo;team workflow infrastructure.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;If you are already using Codex, Claude Code, Cursor Agent, or similar tools, the most important thing to notice about Symphony is not any specific implementation, but the pattern behind it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Do not only manage Agent sessions. Manage the work that needs to be done.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This may become a key dividing line for the next stage of AI coding tools.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://openai.com/zh-Hans-CN/index/open-source-codex-orchestration-symphony/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Open-source specification for Codex orchestration: Symphony&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/openai/symphony&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;openai/symphony&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>How browser-harness domain skills keep AI agents from repeating browser automation mistakes</title>
        <link>https://knightli.com/en/2026/05/24/browser-harness-domain-skills-summary/</link>
        <pubDate>Sun, 24 May 2026 23:43:35 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/24/browser-harness-domain-skills-summary/</guid>
        <description>&lt;p&gt;The most interesting part of &lt;code&gt;browser-use/browser-harness&lt;/code&gt; is not only that it lets AI agents control real Chrome. It also turns web-operation experience into reusable &lt;code&gt;domain skills&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;That matters because browser automation is rarely difficult only because of clicking buttons. Each website has its own details:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which pages require login.&lt;/li&gt;
&lt;li&gt;Which data can be fetched directly through an API.&lt;/li&gt;
&lt;li&gt;Which buttons do not respond to normal DOM clicks.&lt;/li&gt;
&lt;li&gt;Which iframes, shadow DOM components, or popups block the flow.&lt;/li&gt;
&lt;li&gt;Which selectors are stable and which are temporary classes.&lt;/li&gt;
&lt;li&gt;Which actions involve accounts, payments, or irreversible changes and require human confirmation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If this experience only stays in one task log, the agent will hit the same problems again next time. &lt;code&gt;domain skills&lt;/code&gt; are meant to preserve that experience so the agent does not start from zero every time it opens a site.&lt;/p&gt;
&lt;h2 id=&#34;what-domain-skills-are&#34;&gt;What domain skills are
&lt;/h2&gt;&lt;p&gt;You can think of &lt;code&gt;domain skills&lt;/code&gt; as site-operation manuals for agents.&lt;/p&gt;
&lt;p&gt;They are not ordinary user documentation, and they are not one-off scripts. They are closer to field-tested site knowledge:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether the site is suitable for browser automation.&lt;/li&gt;
&lt;li&gt;Which API should be used first if an API exists.&lt;/li&gt;
&lt;li&gt;Which URL should be used when the browser is necessary.&lt;/li&gt;
&lt;li&gt;Which DOM structures, aria-labels, and button behaviors have been verified.&lt;/li&gt;
&lt;li&gt;Which common approaches fail.&lt;/li&gt;
&lt;li&gt;Which scenarios should stop and ask for human intervention.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This content can be reviewed by humans and read by agents during tasks. It turns on-the-spot exploration into maintainable experience.&lt;/p&gt;
&lt;h2 id=&#34;they-are-not-about-blind-clicking&#34;&gt;They are not about blind clicking
&lt;/h2&gt;&lt;p&gt;A good browser agent should not turn every problem into opening a webpage, looking at screenshots, and clicking buttons.&lt;/p&gt;
&lt;p&gt;One important kind of experience in &lt;code&gt;domain skills&lt;/code&gt; tells the agent when not to use the browser.&lt;/p&gt;
&lt;p&gt;For sites such as ArXiv, paper search, metadata, and abstracts can be fetched directly through the Atom API or HTML meta tags. HTTP requests are usually faster, more stable, and easier to parse than opening a browser.&lt;/p&gt;
&lt;p&gt;GitHub follows a similar pattern. Repository, user, and release data should use the REST API first. File contents should use &lt;code&gt;raw.githubusercontent.com&lt;/code&gt; first. Only pages such as GitHub Trending, which do not have an equivalent API, need browser interaction.&lt;/p&gt;
&lt;p&gt;This shows that browser-harness is not based on “the browser solves everything.” It puts the browser in the right place: when APIs, HTTP, and static pages cannot solve the problem, let the agent operate a real page.&lt;/p&gt;
&lt;h2 id=&#34;they-store-site-level-knowledge&#34;&gt;They store site-level knowledge
&lt;/h2&gt;&lt;p&gt;Traditional automation scripts are usually written around one task, for example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Open page -&amp;gt; enter keyword -&amp;gt; click button -&amp;gt; download file
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;That script may complete the task, but the experience is scattered inside code. When the site changes, the script may fail. When the task changes, much of the experience may not be reusable.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;domain skills&lt;/code&gt; are closer to a site-level knowledge base. They care about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which container selector is stable in Amazon search results.&lt;/li&gt;
&lt;li&gt;Which GitHub data should go through the REST API.&lt;/li&gt;
&lt;li&gt;How LinkedIn invitation buttons differ in &lt;code&gt;aria-label&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Which Shopify Admin pages are embedded apps.&lt;/li&gt;
&lt;li&gt;Why Shopify Polaris inputs cannot always be filled with normal JS &lt;code&gt;value&lt;/code&gt; assignment.&lt;/li&gt;
&lt;li&gt;How Browser Use Cloud browser instances are created, listed, and cleaned up.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are not steps for one task. They are decision-making knowledge that many future tasks can reuse.&lt;/p&gt;
&lt;h2 id=&#34;example-amazon-product-search&#34;&gt;Example: Amazon product search
&lt;/h2&gt;&lt;p&gt;For Amazon product search, the important part is not only how to search, but which path is more stable.&lt;/p&gt;
&lt;p&gt;A more reliable approach is to use a direct search URL instead of opening the homepage and simulating typing every time. Search results can be extracted from a container such as &lt;code&gt;[data-component-type=&amp;quot;s-search-result&amp;quot;]&lt;/code&gt;. Field extraction also has details: title, price, rating, review count, and sponsored status each have more stable DOM sources.&lt;/p&gt;
&lt;p&gt;This kind of experience is valuable for an agent. Without it, the agent may guess buttons from screenshots and repeatedly try selectors. With it, the agent can go directly to a more stable extraction path.&lt;/p&gt;
&lt;p&gt;More importantly, a skill can record traps. For example, some selectors that look usable may misread sponsored results or cross-sell areas. You only learn that from field testing.&lt;/p&gt;
&lt;h2 id=&#34;example-linkedin-invitation-management&#34;&gt;Example: LinkedIn invitation management
&lt;/h2&gt;&lt;p&gt;LinkedIn is closer to a real account workflow, and the risk is higher.&lt;/p&gt;
&lt;p&gt;On the invitation manager page, the Accept and Ignore buttons use different &lt;code&gt;aria-label&lt;/code&gt; formats. You cannot simply derive one from the other. Some invitation cards even render Accept as an &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt; element rather than a &lt;code&gt;&amp;lt;button&amp;gt;&lt;/code&gt;, and ordinary CDP clicks may not trigger the accept action.&lt;/p&gt;
&lt;p&gt;This shows that real web automation does not end when an element is located. Button labels, event binding, soft navigation, and component implementation all affect whether an action really works.&lt;/p&gt;
&lt;p&gt;For an agent, this experience also has a safety meaning. Operations involving social accounts, invitations, messages, and posting should not be fully delegated. A skill can record the path and traps, but accepting invitations in bulk, sending content externally, or changing account details should keep human confirmation.&lt;/p&gt;
&lt;h2 id=&#34;example-shopify-admin&#34;&gt;Example: Shopify Admin
&lt;/h2&gt;&lt;p&gt;Shopify Admin shows another issue: backend systems are often not one page, but a combination of embedded apps and complex components.&lt;/p&gt;
&lt;p&gt;Many Shopify apps run inside iframes. Polaris React inputs, Web Components, and embedded apps all behave differently. Some inputs cannot be filled with &lt;code&gt;element.value = ...&lt;/code&gt;; they need CDP keystrokes that are closer to real keyboard input.&lt;/p&gt;
&lt;p&gt;The value of this kind of skill is that it lets the agent first identify what kind of UI it is looking at, then choose the right operation method.&lt;/p&gt;
&lt;p&gt;Shopify experience also emphasizes “do not use the browser if you do not have to”:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For read-only product and inventory data, use the Storefront API first.&lt;/li&gt;
&lt;li&gt;If an Admin API token exists, use the Admin API first.&lt;/li&gt;
&lt;li&gt;For theme code editing, use Shopify CLI first.&lt;/li&gt;
&lt;li&gt;Use the browser only when there is no API, the change is rare, or you are exploring the admin.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That is a mature tool-selection logic for agents.&lt;/p&gt;
&lt;h2 id=&#34;example-browser-use-cloud&#34;&gt;Example: Browser Use Cloud
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;domain skills&lt;/code&gt; do not only serve webpage clicking. They can also record API experience around browser runtimes.&lt;/p&gt;
&lt;p&gt;Browser Use Cloud experience can record how to create cloud browsers through REST APIs, list running browsers, clean up zombie browsers, and obtain &lt;code&gt;liveUrl&lt;/code&gt; and &lt;code&gt;cdpUrl&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This means a skill is not limited to “how to click a button.” Any recurring task with a stable method can become a skill:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;API call patterns.&lt;/li&gt;
&lt;li&gt;Authentication header format.&lt;/li&gt;
&lt;li&gt;Request and response structure.&lt;/li&gt;
&lt;li&gt;Verified status codes.&lt;/li&gt;
&lt;li&gt;Common failure modes.&lt;/li&gt;
&lt;li&gt;Resource cleanup and recycling methods.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For agents, all of these are reusable capabilities.&lt;/p&gt;
&lt;h2 id=&#34;why-this-is-more-reliable-than-ad-hoc-reasoning&#34;&gt;Why this is more reliable than ad-hoc reasoning
&lt;/h2&gt;&lt;p&gt;Many people expect a large model to understand the webpage by itself every time. In real tasks, relying only on ad-hoc reasoning is unstable.&lt;/p&gt;
&lt;p&gt;The reasons are simple:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Web UI changes often.&lt;/li&gt;
&lt;li&gt;The same button may have multiple implementations.&lt;/li&gt;
&lt;li&gt;Visible does not mean clickable.&lt;/li&gt;
&lt;li&gt;Clickable does not mean the action really worked.&lt;/li&gt;
&lt;li&gt;Some tasks should use APIs instead of browsers.&lt;/li&gt;
&lt;li&gt;Some operations require human confirmation and should not be decided by the model alone.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Writing these experiences into files brings several benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Humans can review them.&lt;/li&gt;
&lt;li&gt;Wrong experience can be corrected.&lt;/li&gt;
&lt;li&gt;Site knowledge can accumulate over time.&lt;/li&gt;
&lt;li&gt;New agents can inherit old experience.&lt;/li&gt;
&lt;li&gt;Temporary task discoveries can become long-term knowledge.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is more stable than putting everything into a prompt or chat context.&lt;/p&gt;
&lt;h2 id=&#34;how-teams-can-use-it&#34;&gt;How teams can use it
&lt;/h2&gt;&lt;p&gt;In a team, &lt;code&gt;domain skills&lt;/code&gt; can become a lightweight automation knowledge base.&lt;/p&gt;
&lt;p&gt;Useful content to record includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Post-login paths in internal systems.&lt;/li&gt;
&lt;li&gt;Report export flows.&lt;/li&gt;
&lt;li&gt;Common popup handling.&lt;/li&gt;
&lt;li&gt;Which buttons require human confirmation.&lt;/li&gt;
&lt;li&gt;Which pages have API alternatives.&lt;/li&gt;
&lt;li&gt;Which selectors were tested and found reliable.&lt;/li&gt;
&lt;li&gt;Which tasks agents are not allowed to run automatically.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This knowledge does not need to be complete at the beginning. A practical path is to start with low-risk, frequent, reversible workflows: read-only tasks, downloads, organization, and checks. Once the flow is stable, turn the experience into a skill.&lt;/p&gt;
&lt;p&gt;For team managers, skill files also make automation boundaries visible. You can inspect what the agent knows, what it can do, and where it should stop.&lt;/p&gt;
&lt;h2 id=&#34;boundaries-to-keep&#34;&gt;Boundaries to keep
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;domain skills&lt;/code&gt; can improve an agent’s success rate, but they should not fully automate high-risk operations.&lt;/p&gt;
&lt;p&gt;Several boundaries matter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Do not record passwords, Cookie, token, customer data, or sensitive internal URLs.&lt;/li&gt;
&lt;li&gt;Keep human confirmation for payments, deletion, bulk submission, account changes, and external publishing.&lt;/li&gt;
&lt;li&gt;Record verification date and scope.&lt;/li&gt;
&lt;li&gt;Allow skills to expire after site changes and require revalidation.&lt;/li&gt;
&lt;li&gt;Do not make bypassing risk controls or platform limits a goal.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, domain skills make agents steadier. They do not give agents unlimited permission.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;domain skills&lt;/code&gt; mechanism in browser-harness shows one thing: AI browser automation cannot rely only on the model improvising at runtime.&lt;/p&gt;
&lt;p&gt;A usable browser agent needs at least three layers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Low-level control: screenshots, clicks, input, downloads, CDP, HTTP.&lt;/li&gt;
&lt;li&gt;Site-level knowledge: API priority, stable selectors, component traps, login boundaries.&lt;/li&gt;
&lt;li&gt;Human safety rules: do not give credentials to the model, confirm high-risk actions, and do not write sensitive information into skills.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;domain skills&lt;/code&gt; fill the second layer. They let an agent enter a web task with verified experience instead of rediscovering everything every time.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;browser-harness domain skills: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness/tree/main/agent-workspace/domain-skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness/tree/main/agent-workspace/domain-skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Amazon product-search skill: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/amazon/product-search.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/amazon/product-search.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;ArXiv scraping skill: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/arxiv/scraping.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/arxiv/scraping.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;GitHub scraping skill: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/github/scraping.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/github/scraping.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;LinkedIn invitation-manager skill: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/linkedin/invitation-manager.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/linkedin/invitation-manager.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Shopify admin skill: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/shopify-admin/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/shopify-admin/README.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Browser Use Cloud skill: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/browser-use-cloud/cloud.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/browser-use-cloud/cloud.md&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>browser-harness, Playwright, and Puppeteer: which browser automation tool should you choose?</title>
        <link>https://knightli.com/en/2026/05/24/browser-harness-playwright-puppeteer-comparison/</link>
        <pubDate>Sun, 24 May 2026 17:51:28 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/24/browser-harness-playwright-puppeteer-comparison/</guid>
        <description>&lt;p&gt;In browser automation and automated testing, &lt;code&gt;Playwright&lt;/code&gt; and &lt;code&gt;Puppeteer&lt;/code&gt; are two of the most commonly compared tools. Both can control browsers, click pages, extract content, generate screenshots or PDFs, and both are closely related to Chrome DevTools Protocol.&lt;/p&gt;
&lt;p&gt;Once &lt;code&gt;browser-use/browser-harness&lt;/code&gt; is added to the picture, the question is no longer simply “which testing framework is stronger.” It becomes a comparison between two kinds of tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Playwright&lt;/code&gt; / &lt;code&gt;Puppeteer&lt;/code&gt;: tools for engineers to write deterministic scripts.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;browser-harness&lt;/code&gt;: a tool for AI agents to operate real browsers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first group fits testing, scraping, and engineered automation. The second is closer to a browser control layer for agents such as Claude Code, Codex CLI, and Gemini.&lt;/p&gt;
&lt;h2 id=&#34;the-relationship-between-playwright-and-puppeteer&#34;&gt;The relationship between Playwright and Puppeteer
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Puppeteer&lt;/code&gt; was originally launched by the Google Chrome team and naturally focuses on Chromium and Chrome automation. Its API is concise, the ecosystem is mature, and it is especially convenient for screenshots, PDF generation, page scraping, and lightweight automation around Chrome.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Playwright&lt;/code&gt; is maintained by Microsoft, and its team has deep historical links to early Puppeteer work. It absorbed many lessons from Puppeteer and added stronger cross-browser support, auto-waiting, context isolation, test reports, and debugging tools.&lt;/p&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you only need lightweight Chrome-based tasks, &lt;code&gt;Puppeteer&lt;/code&gt; is still very pleasant to use.&lt;/li&gt;
&lt;li&gt;If you are doing cross-browser E2E tests, complex SPA automation, or team-level test engineering, &lt;code&gt;Playwright&lt;/code&gt; is usually the better fit.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;core-differences&#34;&gt;Core differences
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Dimension&lt;/th&gt;
          &lt;th&gt;Puppeteer&lt;/th&gt;
          &lt;th&gt;Playwright&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Maintainer&lt;/td&gt;
          &lt;td&gt;Google&lt;/td&gt;
          &lt;td&gt;Microsoft&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Browser support&lt;/td&gt;
          &lt;td&gt;Mainly Chrome / Chromium&lt;/td&gt;
          &lt;td&gt;Chromium, Firefox, WebKit&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Language support&lt;/td&gt;
          &lt;td&gt;Mainly JavaScript / TypeScript&lt;/td&gt;
          &lt;td&gt;JavaScript / TypeScript, Python, Java, .NET&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Auto-waiting&lt;/td&gt;
          &lt;td&gt;More explicit waiting&lt;/td&gt;
          &lt;td&gt;Strong Locator and auto-waiting&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Context isolation&lt;/td&gt;
          &lt;td&gt;Supported, but less central&lt;/td&gt;
          &lt;td&gt;Excellent BrowserContext workflow&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Tooling&lt;/td&gt;
          &lt;td&gt;Simple, mature, foundational&lt;/td&gt;
          &lt;td&gt;Codegen, Trace Viewer, reports&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Typical use&lt;/td&gt;
          &lt;td&gt;Chrome automation, screenshots, PDF, lightweight scraping&lt;/td&gt;
          &lt;td&gt;Cross-browser E2E tests, complex frontend automation&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;browser-support&#34;&gt;Browser support
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Puppeteer&lt;/code&gt; is strongest with Chrome. It integrates tightly with Chromium. If your goal is to control Chrome, generate PDFs, take screenshots, or run simple scraping tasks, Puppeteer has a low mental overhead.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Playwright&lt;/code&gt; is stronger for cross-browser work. It natively supports Chromium, Firefox, and WebKit. WebKit is especially important because many Safari-related issues cannot be detected through Chrome alone. For applications that need coverage across desktop, mobile, and multiple browser engines, Playwright is the better main tool.&lt;/p&gt;
&lt;p&gt;This is the first decision boundary: if you only care about Chrome, Puppeteer is fine. If you are serious about cross-browser testing, choose Playwright first.&lt;/p&gt;
&lt;h2 id=&#34;auto-waiting-and-stability&#34;&gt;Auto-waiting and stability
&lt;/h2&gt;&lt;p&gt;The most annoying part of browser automation is often not “how to click,” but whether the page is ready. An element may not be attached to the DOM, may be covered, may still be animating, or may still be disabled.&lt;/p&gt;
&lt;p&gt;In Puppeteer, you often write:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-javascript&#34; data-lang=&#34;javascript&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;page&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;waitForSelector&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;#submit-btn&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;page&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;click&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;#submit-btn&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This works, but engineers must think through the waiting logic themselves. The more complex the page, the more likely the script will accumulate &lt;code&gt;waitForSelector&lt;/code&gt;, &lt;code&gt;waitForTimeout&lt;/code&gt;, and retry logic.&lt;/p&gt;
&lt;p&gt;Playwright’s Locator and auto-waiting mechanism is more complete:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-javascript&#34; data-lang=&#34;javascript&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;page&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;locator&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;#submit-btn&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;).&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;click&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Before clicking, Playwright checks whether the element is visible, actionable, stable, and not covered, then retries within a reasonable time. This matters a lot for modern React, Vue, and Next.js applications with heavy asynchronous rendering, and it reduces flaky tests.&lt;/p&gt;
&lt;h2 id=&#34;multi-account-workflows-and-context-isolation&#34;&gt;Multi-account workflows and context isolation
&lt;/h2&gt;&lt;p&gt;If you need to simulate multiple users, or let many tasks share one browser process while isolating Cookie, LocalStorage, and Session, &lt;code&gt;BrowserContext&lt;/code&gt; matters.&lt;/p&gt;
&lt;p&gt;Puppeteer also supports context isolation, but Playwright makes it a core capability. You can quickly create multiple independent contexts inside one browser instance. Each context behaves like a clean browser environment without repeatedly starting full browser processes.&lt;/p&gt;
&lt;p&gt;This is useful for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multi-account concurrent tests.&lt;/li&gt;
&lt;li&gt;Multi-role workflow tests.&lt;/li&gt;
&lt;li&gt;Ecommerce, messaging, and collaborative document scenarios.&lt;/li&gt;
&lt;li&gt;Scraping tasks that need isolated Cookie and login state.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;tooling-differences&#34;&gt;Tooling differences
&lt;/h2&gt;&lt;p&gt;Playwright is the more engineering-oriented option. It includes many tools used in test development:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;codegen&lt;/code&gt;: operate on a webpage and generate scripts automatically.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Trace Viewer&lt;/code&gt;: replay screenshots, DOM snapshots, network requests, and console logs after failures.&lt;/li&gt;
&lt;li&gt;Test Runner: assertions, parallelism, retries, reports, and project matrices.&lt;/li&gt;
&lt;li&gt;Locator: element selection by text, role, label, test id, and CSS.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Puppeteer is more like a lightweight browser control library. It is not bloated, its API is direct, and it is easy to embed in scripts, server-side jobs, and custom automation flows.&lt;/p&gt;
&lt;p&gt;If you are building an enterprise-grade test system, Playwright’s tooling saves a lot of work. If you only need a Node.js script to convert webpages to PDFs or take scheduled screenshots, Puppeteer may be cleaner.&lt;/p&gt;
&lt;h2 id=&#34;where-browser-harness-fits&#34;&gt;Where browser-harness fits
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;browser-harness&lt;/code&gt; is not the same kind of tool as Playwright or Puppeteer.&lt;/p&gt;
&lt;p&gt;Playwright and Puppeteer mostly assume that humans write scripts. Engineers choose selectors, waiting conditions, assertions, and exception handling. They pursue determinism: the same script should produce the same result under the same page state.&lt;/p&gt;
&lt;p&gt;browser-harness mostly assumes that an AI agent operates the browser. Its goal is not to provide a huge high-level API, but to connect to real Chrome through CDP and expose screenshots, coordinate clicks, DOM, network requests, and helpers to the agent. The agent can observe the page, decide the next step, add helpers when capabilities are missing, and turn site experience into skills.&lt;/p&gt;
&lt;p&gt;That makes it better for open-ended tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Log in to a backend and download invoices.&lt;/li&gt;
&lt;li&gt;Fill a group of forms in an internal system.&lt;/li&gt;
&lt;li&gt;Handle OA or SaaS pages that change often.&lt;/li&gt;
&lt;li&gt;Explore a page according to a user goal instead of running a fixed script.&lt;/li&gt;
&lt;li&gt;Give tools such as Claude Code and Codex CLI browser operation capability.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;three-way-comparison&#34;&gt;Three-way comparison
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Dimension&lt;/th&gt;
          &lt;th&gt;Puppeteer&lt;/th&gt;
          &lt;th&gt;Playwright&lt;/th&gt;
          &lt;th&gt;browser-harness&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Target user&lt;/td&gt;
          &lt;td&gt;Engineers&lt;/td&gt;
          &lt;td&gt;Engineers and test teams&lt;/td&gt;
          &lt;td&gt;AI Agent&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Main goal&lt;/td&gt;
          &lt;td&gt;Control Chrome&lt;/td&gt;
          &lt;td&gt;Stable cross-browser automation&lt;/td&gt;
          &lt;td&gt;Let agents operate real browsers&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Script style&lt;/td&gt;
          &lt;td&gt;Hand-written JS/TS automation&lt;/td&gt;
          &lt;td&gt;Scripts plus test framework&lt;/td&gt;
          &lt;td&gt;User gives a goal, agent executes steps&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Element targeting&lt;/td&gt;
          &lt;td&gt;CSS, XPath, DOM API&lt;/td&gt;
          &lt;td&gt;Locator, text, role, CSS&lt;/td&gt;
          &lt;td&gt;Screenshots, coordinates, DOM, CDP&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Waiting&lt;/td&gt;
          &lt;td&gt;More manual control&lt;/td&gt;
          &lt;td&gt;Strong auto-waiting&lt;/td&gt;
          &lt;td&gt;Agent observes and adjusts&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Browser environment&lt;/td&gt;
          &lt;td&gt;Usually automated browser&lt;/td&gt;
          &lt;td&gt;Usually test browser&lt;/td&gt;
          &lt;td&gt;Often real Chrome&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Best fit&lt;/td&gt;
          &lt;td&gt;Chrome scripts, screenshots, PDF, lightweight scraping&lt;/td&gt;
          &lt;td&gt;E2E tests, cross-browser validation, complex SPA&lt;/td&gt;
          &lt;td&gt;AI assistants, open web tasks, real-account workflows&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;code-feel&#34;&gt;Code feel
&lt;/h2&gt;&lt;p&gt;Puppeteer feels closer to directly controlling Chrome:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-javascript&#34; data-lang=&#34;javascript&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kr&#34;&gt;const&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;puppeteer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;require&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;puppeteer&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;kr&#34;&gt;async&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;()&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;kr&#34;&gt;const&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;browser&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;puppeteer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;launch&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;kr&#34;&gt;const&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;page&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;browser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;newPage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;page&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;kr&#34;&gt;goto&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;https://example.com&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;page&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;waitForSelector&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;#submit-btn&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;page&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;click&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;#submit-btn&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;browser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;close&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;})();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Playwright emphasizes Locator and auto-waiting:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-javascript&#34; data-lang=&#34;javascript&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kr&#34;&gt;const&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;chromium&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;require&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;playwright&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;kr&#34;&gt;async&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;()&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;kr&#34;&gt;const&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;browser&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;chromium&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;launch&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;kr&#34;&gt;const&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;page&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;browser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;newPage&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;page&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;kr&#34;&gt;goto&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;https://example.com&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;page&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;locator&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;#submit-btn&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;).&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;click&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;kr&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;nx&#34;&gt;browser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;close&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;})();&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;browser-harness feels completely different. You usually do not write a full script. You give a goal inside an agent environment:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Open the admin panel, download last month’s invoice, and organize it for reimbursement.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The agent then repeatedly uses browser-harness to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Take screenshots and understand the current page.&lt;/li&gt;
&lt;li&gt;Click a coordinate or locate an element.&lt;/li&gt;
&lt;li&gt;Enter text, upload files, and download files.&lt;/li&gt;
&lt;li&gt;Decide how to close popups.&lt;/li&gt;
&lt;li&gt;Add helper code when something is missing.&lt;/li&gt;
&lt;li&gt;Turn reusable flows into domain skills.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not the style of traditional test scripts. It is the workflow of a browser agent.&lt;/p&gt;
&lt;h2 id=&#34;how-to-choose&#34;&gt;How to choose
&lt;/h2&gt;&lt;p&gt;Choose &lt;code&gt;Puppeteer&lt;/code&gt; when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The project mainly runs in Node.js.&lt;/li&gt;
&lt;li&gt;You only need Chrome or Chromium.&lt;/li&gt;
&lt;li&gt;The task is screenshot, PDF generation, simple page scraping, or lightweight automation.&lt;/li&gt;
&lt;li&gt;You want a simple API, fewer dependencies, and more manual control.&lt;/li&gt;
&lt;li&gt;You rely deeply on Chrome DevTools Protocol.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Choose &lt;code&gt;Playwright&lt;/code&gt; when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You are building standard UI automation or E2E tests.&lt;/li&gt;
&lt;li&gt;You need Chromium, Firefox, and WebKit coverage.&lt;/li&gt;
&lt;li&gt;Your team’s main language may be Python, Java, or C#.&lt;/li&gt;
&lt;li&gt;The page is a complex SPA with many asynchronous states and potential flaky tests.&lt;/li&gt;
&lt;li&gt;You need codegen, Trace Viewer, test reports, and parallel testing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Choose &lt;code&gt;browser-harness&lt;/code&gt; when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You are building or using AI agents.&lt;/li&gt;
&lt;li&gt;You want the model to operate a real browser like a human.&lt;/li&gt;
&lt;li&gt;The task steps are not fixed and require page-by-page judgment.&lt;/li&gt;
&lt;li&gt;The target site changes often, or has many popups, iframes, and shadow DOM.&lt;/li&gt;
&lt;li&gt;You want real web workflows handled by Claude Code, Codex CLI, or similar tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Playwright&lt;/code&gt; and &lt;code&gt;Puppeteer&lt;/code&gt; are browser automation tools whose core goal is to let humans write reliable scripts. Puppeteer is lighter and closer to Chrome. Playwright is more complete and better for cross-browser testing and complex frontend applications.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;browser-harness&lt;/code&gt; is a different direction. It is not designed to replace Playwright or Puppeteer for tests. It is designed to let AI agents control real browsers. It gives up some traditional script determinism in exchange for stronger adaptability in open-ended tasks.&lt;/p&gt;
&lt;p&gt;So the answer is not to pick only one. Choose by task layer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Test engineering: prefer Playwright.&lt;/li&gt;
&lt;li&gt;Lightweight Chrome scripts: Puppeteer fits well.&lt;/li&gt;
&lt;li&gt;AI agents doing work on the web: look at browser-harness.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;browser-use/browser-harness: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Playwright documentation: &lt;a class=&#34;link&#34; href=&#34;https://playwright.dev/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://playwright.dev/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Puppeteer documentation: &lt;a class=&#34;link&#34; href=&#34;https://pptr.dev/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://pptr.dev/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Chrome DevTools Protocol: &lt;a class=&#34;link&#34; href=&#34;https://chromedevtools.github.io/devtools-protocol/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://chromedevtools.github.io/devtools-protocol/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>What is browser-harness? A browser automation tool that lets AI agents control real Chrome</title>
        <link>https://knightli.com/en/2026/05/24/browser-use-browser-harness-ai-agent-browser-automation/</link>
        <pubDate>Sun, 24 May 2026 17:19:54 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/24/browser-use-browser-harness-ai-agent-browser-automation/</guid>
        <description>&lt;p&gt;&lt;code&gt;browser-use/browser-harness&lt;/code&gt; is a browser control tool for AI agents. Its goal is not to build another heavy automation framework, but to connect large language models directly to real Chrome through CDP, so they can browse pages, click, take screenshots, download files, upload files, and fill forms.&lt;/p&gt;
&lt;p&gt;The README describes the project as a thin, editable CDP harness for letting LLMs connect to a real browser. When a task lacks a helper, the agent can add code during execution and turn reusable experience into domain skills.&lt;/p&gt;
&lt;p&gt;This is worth watching because the browser is still the entry point for many real workflows: admin panels, SaaS dashboards, ecommerce sites, recruiting platforms, CRMs, reimbursement systems, cloud consoles, and document platforms. Many of them do not expose stable APIs, or their API permissions are harder to obtain than webpage access. Giving an agent reliable browser control is a way to fill that last mile of automation.&lt;/p&gt;
&lt;h2 id=&#34;what-browser-harness-is&#34;&gt;What browser-harness is
&lt;/h2&gt;&lt;p&gt;Structurally, browser-harness is closer to a browser runtime for agents than a browser extension for manual users.&lt;/p&gt;
&lt;p&gt;Its core ideas are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Connect directly to Chrome or Chromium.&lt;/li&gt;
&lt;li&gt;Control pages through a CDP WebSocket.&lt;/li&gt;
&lt;li&gt;Let agents combine screenshots, coordinate clicks, DOM inspection, network requests, and raw CDP.&lt;/li&gt;
&lt;li&gt;Put task-specific helpers in &lt;code&gt;agent-workspace/agent_helpers.py&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Store site-specific experience in &lt;code&gt;agent-workspace/domain-skills/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Keep the core thin instead of turning it into a large automation platform.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The README says the core architecture is roughly four core files and about 1,000 lines of code, covering &lt;code&gt;install.md&lt;/code&gt;, &lt;code&gt;SKILL.md&lt;/code&gt;, &lt;code&gt;src/browser_harness/&lt;/code&gt;, &lt;code&gt;agent-workspace/agent_helpers.py&lt;/code&gt;, and &lt;code&gt;agent-workspace/domain-skills/&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The point is not to ship built-in support for every website. The point is to give the agent an operation layer close enough to a real browser, so it can fill in missing capabilities for the task at hand.&lt;/p&gt;
&lt;h2 id=&#34;how-it-differs-from-traditional-browser-automation&#34;&gt;How it differs from traditional browser automation
&lt;/h2&gt;&lt;p&gt;Traditional browser automation usually revolves around testing frameworks such as Playwright, Selenium, or Puppeteer. They are good for deterministic scripts: open a page, locate an element, click it, and assert the result.&lt;/p&gt;
&lt;p&gt;browser-harness targets a different kind of work. A user gives a goal, and the agent explores the page, judges the state, handles popups, adds helpers, and reuses site knowledge. It emphasizes adaptation during interaction.&lt;/p&gt;
&lt;p&gt;The difference can be summarized like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Playwright is better when humans write scripts and agents run them.&lt;/li&gt;
&lt;li&gt;browser-harness is better when agents look at the page and act step by step.&lt;/li&gt;
&lt;li&gt;Traditional automation favors fixed flows.&lt;/li&gt;
&lt;li&gt;browser-harness favors open-ended tasks.&lt;/li&gt;
&lt;li&gt;Traditional scripts often depend on selectors.&lt;/li&gt;
&lt;li&gt;browser-harness encourages screenshots first, visible UI actions next, and DOM or CDP when needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This does not mean it replaces Playwright. For stable tests, Playwright is still more mature. browser-harness is valuable because it turns real webpages into an environment an agent can operate, especially when page structure is complex, steps are not fixed, and situational judgment matters.&lt;/p&gt;
&lt;h2 id=&#34;why-real-chrome-matters&#34;&gt;Why real Chrome matters
&lt;/h2&gt;&lt;p&gt;Many browser-agent tools use isolated headless browsers. That is simple to deploy and good for batch jobs, but it does not always reuse the user’s real working environment: login state, extensions, history, bookmarks, and daily browser setup.&lt;/p&gt;
&lt;p&gt;browser-harness supports local Chrome and the Browser Use cloud browser. For local browsers, it offers two approaches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;chrome://inspect/#remote-debugging&lt;/code&gt; to allow the current Chrome instance to be connected.&lt;/li&gt;
&lt;li&gt;Start an isolated profile with &lt;code&gt;--remote-debugging-port=9222 --user-data-dir=...&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want an agent to help with tasks inside real accounts, the docs lean toward the first approach because it reuses everyday Chrome login state, extensions, and bookmarks. For unattended automation, or when you do not want popups to interrupt work, an isolated profile or cloud browser is usually safer.&lt;/p&gt;
&lt;p&gt;The trade-off is clear: real Chrome is closer to the user’s workflow, but the security boundary is more sensitive. An isolated browser is easier to control, but login and environment setup must be handled again.&lt;/p&gt;
&lt;h2 id=&#34;editable-helpers-and-domain-skills&#34;&gt;Editable helpers and domain skills
&lt;/h2&gt;&lt;p&gt;The most interesting part of browser-harness is that it designs “what the agent learns” into the project structure.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;agent-workspace/agent_helpers.py&lt;/code&gt; stores helpers that are created during tasks. For example, if an agent needs to upload a file and the existing tools are not enough, it can add a stable upload helper. The next time it sees a similar page, it does not have to start from scratch.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;agent-workspace/domain-skills/&lt;/code&gt; stores site-level experience. The README mentions areas such as LinkedIn outreach, Amazon ordering, and reimbursement systems. The project recommends letting agents generate these skills from real tasks instead of hand-writing them, because they should reflect actual page behavior.&lt;/p&gt;
&lt;p&gt;This fits browser automation well. The hard part is often not “how to click a button,” but:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How a website redirects after login.&lt;/li&gt;
&lt;li&gt;Which popups block the main flow.&lt;/li&gt;
&lt;li&gt;Which selectors are stable and which are temporary class names.&lt;/li&gt;
&lt;li&gt;How uploads, downloads, iframes, shadow DOM, and cross-origin components behave.&lt;/li&gt;
&lt;li&gt;What hidden waits and asynchronous states exist in a specific backend.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If this knowledge only stays in one run log, it is quickly lost. Turning it into domain skills gives the agent a chance to improve over time.&lt;/p&gt;
&lt;h2 id=&#34;suitable-scenarios&#34;&gt;Suitable scenarios
&lt;/h2&gt;&lt;p&gt;browser-harness is better suited for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Operating real web admin panels for users.&lt;/li&gt;
&lt;li&gt;Completing repeated flows in systems without APIs.&lt;/li&gt;
&lt;li&gt;Personal or enterprise web tasks that depend heavily on login state.&lt;/li&gt;
&lt;li&gt;Complex interactions where screenshots are needed to judge page state.&lt;/li&gt;
&lt;li&gt;Agents that need to add tools and site knowledge while running.&lt;/li&gt;
&lt;li&gt;Multiple sub-agents each using an isolated browser.&lt;/li&gt;
&lt;li&gt;Researching browser-agent runtime design.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Concrete examples include organizing web tables, submitting internal forms, downloading invoices, uploading files, handling reimbursement workflows, checking order status, configuring SaaS dashboards, and extracting information from logged-in pages.&lt;/p&gt;
&lt;p&gt;If the task is only to fetch static pages, a browser may not be needed. The project’s own &lt;code&gt;SKILL.md&lt;/code&gt; also notes that static pages can often be fetched through HTTP in bulk. Browsers should be reserved for tasks that truly need page state, login state, and interaction.&lt;/p&gt;
&lt;h2 id=&#34;risks-to-watch&#34;&gt;Risks to watch
&lt;/h2&gt;&lt;p&gt;Letting an AI agent control real Chrome is powerful, but risky.&lt;/p&gt;
&lt;p&gt;First, the permission boundary must be clear. Real Chrome may contain email, payment dashboards, cloud consoles, company systems, and personal accounts. Once an agent can operate the browser, it effectively has access to part of those webpage permissions.&lt;/p&gt;
&lt;p&gt;Second, do not hand credentials to the model. For login pages, payment verification, and second confirmations, the user should handle the sensitive step. The agent can wait for login to finish, but it should not read or enter passwords, verification codes, or payment details from screenshots.&lt;/p&gt;
&lt;p&gt;Third, automation is not the same as delegation. Many web tasks look simple but may involve risk controls, mistaken clicks, data deletion, bulk submissions, or irreversible operations. Start with read-only, low-risk, reversible workflows.&lt;/p&gt;
&lt;p&gt;Fourth, domain skills should not leak private data. Site knowledge can be shared, but account names, internal URLs, customer data, coordinate logs, and one-off task details should not be written into skills.&lt;/p&gt;
&lt;p&gt;Fifth, choose the browser connection mode carefully. Reusing daily Chrome is convenient when login state matters. For long-running automation, an isolated profile or cloud browser is more controllable.&lt;/p&gt;
&lt;h2 id=&#34;why-it-matters-for-ai-agent-tools&#34;&gt;Why it matters for AI agent tools
&lt;/h2&gt;&lt;p&gt;browser-harness represents a pragmatic direction for agent tooling: build less platform, and give the model a direct interface to the real environment.&lt;/p&gt;
&lt;p&gt;Many agents fail at two ends. On one end, the model can reason but cannot touch the real page. On the other, automation frameworks are powerful but require humans to hard-code the flow. browser-harness tries to connect the two: the browser holds real-world state, while the agent observes, decides, and adds tools.&lt;/p&gt;
&lt;p&gt;That is also the meaning of a self-improving harness. It does not mean the agent magically becomes smarter. It means reusable operation experience is placed into the project structure, so the next task can avoid some of the same detours.&lt;/p&gt;
&lt;p&gt;For developers, its value is mainly in three areas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A browser control layer for personal agents.&lt;/li&gt;
&lt;li&gt;A reference for studying browser automation and agent workflows.&lt;/li&gt;
&lt;li&gt;An experimental framework for turning web workflows into reusable skills.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is not the answer to every browser automation problem, but it points in a clear direction: when agents truly help people do work, the tool layer should not only call APIs. It should also understand and operate the web interfaces people use every day.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;browser-use/browser-harness&lt;/code&gt; is interesting not because it wraps many advanced features, but because it brings several key browser-agent questions into focus: real Chrome, CDP, screenshot-driven control, editable helpers, site skill accumulation, and user permission boundaries.&lt;/p&gt;
&lt;p&gt;If you are writing stable end-to-end tests, Playwright or Selenium is still a better fit. If you want agents such as Codex or Claude Code to handle real webpage tasks, browser-harness offers an entry point that matches how agents work.&lt;/p&gt;
&lt;p&gt;In practice, start with low-risk tasks: let it read pages, take screenshots, and extract information first. Then gradually try clicking and submitting. Once it can reliably understand page state, you can consider giving it longer workflows.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GitHub project: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;README: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness/blob/main/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness/blob/main/README.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Installation guide: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness/blob/main/install.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness/blob/main/install.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Usage guide: &lt;a class=&#34;link&#34; href=&#34;https://github.com/browser-use/browser-harness/blob/main/SKILL.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/browser-use/browser-harness/blob/main/SKILL.md&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>GitHub AI Open Source Project Categories: From Coding Agent to RAG Knowledge Bases</title>
        <link>https://knightli.com/en/2026/05/21/github-ai-projects-site-statistics/</link>
        <pubDate>Thu, 21 May 2026 08:53:13 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/21/github-ai-projects-site-statistics/</guid>
        <description>&lt;p&gt;This page groups GitHub AI projects by application direction, covering AI coding and Coding Agents, agent skills and workflows, RAG and knowledge bases, multimodal creation, local models and inference, vertical applications and automation, and AI application development infrastructure. New projects can be added later using the same structure.&lt;/p&gt;
&lt;h2 id=&#34;category-summary&#34;&gt;Category Summary
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Category&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Projects&lt;/th&gt;
          &lt;th&gt;Who Should Start Here&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;AI Coding and Coding Agents&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;22&lt;/td&gt;
          &lt;td&gt;Users who often work with Claude Code, Codex, Cursor, terminal agents, or repository automation&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Agent Skills and Workflows&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;7&lt;/td&gt;
          &lt;td&gt;Users who want to standardize AI coding, research, or content workflows&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;RAG, Knowledge Bases, and Memory&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;7&lt;/td&gt;
          &lt;td&gt;Users who need document retrieval, knowledge bases, long-term memory, web crawling, or structured extraction&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Vertical Applications and Automation&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;7&lt;/td&gt;
          &lt;td&gt;Users looking at finance, trading, Xianyu monitoring, desktop control, browser automation, and other applied scenarios&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Multimodal and Content Creation&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;5&lt;/td&gt;
          &lt;td&gt;Users working on images, video, transcription, prompt libraries, and content distribution&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;AI Application Development Infrastructure&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;5&lt;/td&gt;
          &lt;td&gt;Developers building AI apps, browser automation, or Prompt/MCP toolchains&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Local Models and Inference&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;1&lt;/td&gt;
          &lt;td&gt;Users interested in local DeepSeek, inference engines, and hardware adaptation&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The distribution shows several high-frequency directions in current AI open source projects: AI coding tools dominate, followed by agent workflows, RAG knowledge bases, and concrete application scenarios. Pure model inference projects are fewer here because much local deployment content is organized around models, GPUs, or deployment plans rather than a single GitHub project.&lt;/p&gt;
&lt;h2 id=&#34;ai-coding-and-coding-agents&#34;&gt;AI Coding and Coding Agents
&lt;/h2&gt;&lt;p&gt;This group focuses on code understanding, code modification, engineering workflows, and terminal agents. It is the largest group, with &lt;strong&gt;22&lt;/strong&gt; projects.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Project&lt;/th&gt;
          &lt;th&gt;Article&lt;/th&gt;
          &lt;th&gt;GitHub&lt;/th&gt;
          &lt;th&gt;Core Use&lt;/th&gt;
          &lt;th&gt;Best For&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Ralph&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/27/ralph-autonomous-agent-loop-claude-code-amp/&#34; &gt;Ralph: turning Claude Code and Amp into an autonomous development loop&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/snarktank/ralph&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;snarktank/ralph&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Drive Claude Code / Amp through PRD, planning, execution, and review loops&lt;/td&gt;
          &lt;td&gt;Users who want a straighter agent coding process&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Claude-Mem&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/01/claude-mem-persistent-memory-for-claude-code/&#34; &gt;Claude-Mem: long-term cross-session memory for Claude Code&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/thedotmack/claude-mem&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;thedotmack/claude-mem&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Add cross-session memory to Claude Code&lt;/td&gt;
          &lt;td&gt;Heavy Claude Code users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Claude Code Hooks Mastery&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/01/claude-code-hooks-mastery-guide/&#34; &gt;Claude Code Hooks Mastery: getting started with 13 hooks lifecycle stages&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/disler/claude-code-hooks-mastery&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;disler/claude-code-hooks-mastery&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Learn Claude Code hooks lifecycle and automation control&lt;/td&gt;
          &lt;td&gt;Users who want to customize Claude Code workflows&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Compound Engineering Plugin&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/01/compound-engineering-plugin-ai-coding-workflow/&#34; &gt;Compound Engineering Plugin: turning AI coding into planning, execution, and review loops&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/EveryInc/compound-engineering-plugin&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;EveryInc/compound-engineering-plugin&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Split AI coding into planning, execution, and review cycles&lt;/td&gt;
          &lt;td&gt;Users who care about engineering discipline in AI coding&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;free-claude-code&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/01/free-claude-code-anthropic-compatible-proxy/&#34; &gt;free-claude-code: connecting Claude Code to OpenRouter, DeepSeek, and local models&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Alishahryar1/free-claude-code&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Alishahryar1/free-claude-code&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Use a proxy to connect Claude Code to different model backends&lt;/td&gt;
          &lt;td&gt;Users who want to reduce Claude Code cost&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Hermes Agent&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/12/hermes-agent-intro-guide-vs-openclaw/&#34; &gt;What is Hermes Agent: overview, strengths, quick start, and OpenClaw comparison&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/NousResearch/hermes-agent&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NousResearch/hermes-agent&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Local agent framework with tool calling and task execution&lt;/td&gt;
          &lt;td&gt;Users who want to run local agents&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;OpenHarness&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/12/openharness-basic-functions/&#34; &gt;What OpenHarness can do as an open source agent harness&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/HKUDS/OpenHarness&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;HKUDS/OpenHarness&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Agent harness and multi-agent execution framework&lt;/td&gt;
          &lt;td&gt;Users researching agent orchestration&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;CodexBridge&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/13/codexbridge-openai-compatible-api/&#34; &gt;Using Codex with domestic LLMs: OpenAI-compatible APIs and CodexBridge&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/begonia599/CodexBridge&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;begonia599/CodexBridge&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Connect Codex to OpenAI-compatible model APIs&lt;/td&gt;
          &lt;td&gt;Users who want Codex with domestic models&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;ccx&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/13/ccx-ai-api-proxy-gateway/&#34; &gt;Using CCX to manage OpenAI-compatible APIs for Codex and domestic models&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/BenedictKing/ccx&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;BenedictKing/ccx&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Manage API proxies for Claude, Codex, Gemini, and more&lt;/td&gt;
          &lt;td&gt;Multi-model switching users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;cc-haha&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/14/cc-haha-claude-code-desktop-workbench/&#34; &gt;cc-haha: a desktop workspace for Claude Code&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/NanmiCoder/cc-haha&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NanmiCoder/cc-haha&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Desktop workspace and Computer Use entry for Claude Code&lt;/td&gt;
          &lt;td&gt;Claude Code users who prefer a GUI&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;DeepSeek-TUI&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/16/deepseek-tui-terminal-coding-agent/&#34; &gt;DeepSeek-TUI: turning DeepSeek V4 into a terminal coding agent&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Hmbown/DeepSeek-TUI&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Hmbown/DeepSeek-TUI&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Run a DeepSeek coding agent in the terminal&lt;/td&gt;
          &lt;td&gt;DeepSeek and command-line users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Open Design&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/18/open-design-open-source-claude-design-alternative/&#34; &gt;Open Design: turning Claude Code and Codex into AI design tools&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/nexu-io/open-design&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;nexu-io/open-design&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Bring Claude Code / Codex into design generation&lt;/td&gt;
          &lt;td&gt;Users who want agents for design prototypes&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;agentmemory&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/19/agentmemory-persistent-memory-ai-coding-agents/&#34; &gt;agentmemory: persistent memory for Claude Code, Codex, and Cursor&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/rohitg00/agentmemory&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;rohitg00/agentmemory&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Add persistent memory to coding agents&lt;/td&gt;
          &lt;td&gt;Developers maintaining long-running projects&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Graphify&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/21/safishamsi-graphify-ai-code-knowledge-graph/&#34; &gt;Graphify: turning a codebase into an AI-queryable knowledge graph&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/safishamsi/graphify&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;safishamsi/graphify&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Convert a codebase into a knowledge graph to reduce repeated file reads&lt;/td&gt;
          &lt;td&gt;Large-codebase users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;oh-my-pi&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/23/oh-my-pi-ai-coding-agent-terminal-ide-lsp-debugger/&#34; &gt;What is oh-my-pi? An AI coding assistant that connects terminal, IDE, and debugger&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/can1357/oh-my-pi&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;can1357/oh-my-pi&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Connect terminal, IDE, LSP, and debugger as a local AI coding console&lt;/td&gt;
          &lt;td&gt;Developers who want to unify CLI and IDE workflows&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Claude Plugins Official&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/23/claude-plugins-official-claude-code-plugin-directory/&#34; &gt;Claude Code now has a plugin directory: what to install, how to install it, and what to watch&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/anthropics/claude-plugins-official&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;anthropics/claude-plugins-official&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Official Claude Code plugin directory and installation entry point&lt;/td&gt;
          &lt;td&gt;Users who want to extend Claude Code&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;CodeGraph&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/23/codegraph-local-code-knowledge-graph-ai-coding-agent/&#34; &gt;What is CodeGraph? A local code map for Claude Code, Codex, and Cursor&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/colbymchenry/codegraph&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;colbymchenry/codegraph&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Generate local indexes and relationship graphs to help Coding Agents understand projects&lt;/td&gt;
          &lt;td&gt;Developers maintaining medium-to-large codebases&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;CC Switch&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/06/cc-switch-ai-cli-manager/&#34; &gt;CC Switch: managing Claude Code, Codex, Gemini CLI, and OpenClaw in one desktop tool&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/farion1231/cc-switch&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;farion1231/cc-switch&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Manage multiple AI CLI tools and account/config switching&lt;/td&gt;
          &lt;td&gt;Users of multiple CLI tools&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Warp&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/07/warpdotdev-warp-open-source-agentic-terminal/&#34; &gt;Warp open source: from terminal to Agentic Development Environment&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/warpdotdev/warp&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;warpdotdev/warp&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Agentic terminal and development environment&lt;/td&gt;
          &lt;td&gt;Heavy terminal users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;opencode&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/08/opencode-open-source-ai-coding-agent/&#34; &gt;opencode vs Claude Code vs Codex: open source AI coding tools guide&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/anomalyco/opencode&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;anomalyco/opencode&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Open source AI coding agent&lt;/td&gt;
          &lt;td&gt;Users looking for Claude Code / Codex alternatives&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;9Router&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/08/9router-ai-coding-router-token-saver/&#34; &gt;9Router: connecting Claude Code, Codex, and Cursor to one AI router&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/decolua/9router&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;decolua/9router&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;AI coding model routing and token cost control&lt;/td&gt;
          &lt;td&gt;Multi-tool, multi-model users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;goose&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/08/goose-open-source-ai-agent-desktop-cli-api/&#34; &gt;goose: an open source AI Agent across desktop, CLI, and API&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/aaif-goose/goose&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;aaif-goose/goose&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Open source agent across desktop, CLI, and API&lt;/td&gt;
          &lt;td&gt;Users who want a general agent workspace&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;agent-skills-and-workflows&#34;&gt;Agent Skills and Workflows
&lt;/h2&gt;&lt;p&gt;This group focuses on turning AI capabilities into repeatable skills, processes, and specifications. It includes &lt;strong&gt;7&lt;/strong&gt; projects.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Project&lt;/th&gt;
          &lt;th&gt;Article&lt;/th&gt;
          &lt;th&gt;GitHub&lt;/th&gt;
          &lt;th&gt;Core Use&lt;/th&gt;
          &lt;th&gt;Best For&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;mattpocock/skills&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/15/matt-pocock-skills-ai-engineering-workflow/&#34; &gt;Rejecting Vibe Coding: Matt Pocock&amp;rsquo;s skills repo adds engineering constraints to AI coding&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/mattpocock/skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;mattpocock/skills&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Use skills to constrain AI coding workflows&lt;/td&gt;
          &lt;td&gt;Users who want engineering discipline for agents&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Superpowers&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/15/obra-superpowers-agentic-skills-framework/&#34; &gt;Superpowers: bringing coding agents back into engineering workflows&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/obra/superpowers&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;obra/superpowers&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Agentic skills framework and software development methodology&lt;/td&gt;
          &lt;td&gt;Users who want systematic coding-agent workflows&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Prompt-Vault&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/15/prompt-vault-coding-prompt-benchmark/&#34; &gt;Prompt-Vault: a prompt specification library for testing AI coding ability&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/w512/Prompt-Vault&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;w512/Prompt-Vault&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Collect prompt specs for testing AI coding ability&lt;/td&gt;
          &lt;td&gt;Model and tool evaluators&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;web-video-presentation&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/15/web-video-presentation-agent-skill/&#34; &gt;web-video-presentation: an agent skill for turning articles into recordable web videos&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/ConardLi/garden-skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ConardLi/garden-skills&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Turn articles into recordable web videos&lt;/td&gt;
          &lt;td&gt;Content creators and automation users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;nuwa-skill&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/22/nuwa-skill-distill-how-someone-thinks/&#34; &gt;nuwa-skill: making &amp;ldquo;distilling a person&amp;rdquo; into an executable workflow&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/alchaincyf/nuwa-skill&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;alchaincyf/nuwa-skill&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Recreate a person&amp;rsquo;s expression and thinking flow with a skill&lt;/td&gt;
          &lt;td&gt;Users building style-based agents&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Scientific Agent Skills&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/17/scientific-agent-skills/&#34; &gt;Scientific Agent Skills: giving research workflows to AI agents&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/K-Dense-AI/scientific-agent-skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;K-Dense-AI/scientific-agent-skills&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Skill collection for scientific workflows&lt;/td&gt;
          &lt;td&gt;Researchers, data analysts, and technical writers&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;easy-vibe&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/16/easy-vibe-vibe-coding-learning-map/&#34; &gt;easy-vibe: a learning map for Vibe Coding beginners&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/datawhalechina/easy-vibe&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;datawhalechina/easy-vibe&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Learning map for Vibe Coding&lt;/td&gt;
          &lt;td&gt;AI coding beginners&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;rag-knowledge-bases-and-memory&#34;&gt;RAG, Knowledge Bases, and Memory
&lt;/h2&gt;&lt;p&gt;This group addresses document retrieval, knowledge base construction, long-term memory, and structured extraction. It includes &lt;strong&gt;7&lt;/strong&gt; projects.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Project&lt;/th&gt;
          &lt;th&gt;Article&lt;/th&gt;
          &lt;th&gt;GitHub&lt;/th&gt;
          &lt;th&gt;Core Use&lt;/th&gt;
          &lt;th&gt;Best For&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;LangExtract&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/01/google-langextract-llm-structured-data-extraction/&#34; &gt;Google LangExtract: extracting structured data from long text with LLMs&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/google/langextract&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;google/langextract&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Extract structured information from long text&lt;/td&gt;
          &lt;td&gt;Information extraction and data processing users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;qmd&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/01/qmd-markdown-search-for-ai-agents/&#34; &gt;qmd: local Markdown document search for AI agents&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/tobi/qmd&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tobi/qmd&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Local Markdown document search&lt;/td&gt;
          &lt;td&gt;Users managing knowledge in Markdown&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Firecrawl&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/15/firecrawl-ai-web-data-api/&#34; &gt;Firecrawl: web search, crawling, and interaction API for AI agents&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/firecrawl/firecrawl&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;firecrawl/firecrawl&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Web crawling, search, and structured data entry point&lt;/td&gt;
          &lt;td&gt;RAG and agent data-ingestion users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;RAGFlow&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/15/ragflow-rag-engine-guide/&#34; &gt;RAGFlow: features and usage of an open source RAG engine&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/infiniflow/ragflow&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;infiniflow/ragflow&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Open source RAG engine&lt;/td&gt;
          &lt;td&gt;Enterprise knowledge base and document Q&amp;amp;A users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;OpenHuman&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/15/openhuman-open-source-personal-ai-agent/&#34; &gt;OpenHuman: the desktop route for open source personal AI agents&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/tinyhumansai/openhuman&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tinyhumansai/openhuman&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Local-first personal AI agent and memory layer&lt;/td&gt;
          &lt;td&gt;Users who want to integrate personal data&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;OpenKB&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/17/openkb-llm-knowledge-base/&#34; &gt;OpenKB: compiling documents into continuously updated LLM knowledge bases&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/VectifyAI/OpenKB&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;VectifyAI/OpenKB&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Compile documents into updatable knowledge bases&lt;/td&gt;
          &lt;td&gt;Documentation knowledge-base maintainers&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;PageIndex&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/20/vectifyai-pageindex-vectorless-rag/&#34; &gt;PageIndex: reasoning-style RAG document indexing without vector databases&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/VectifyAI/PageIndex&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;VectifyAI/PageIndex&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Reasoning-style document indexing without vector databases&lt;/td&gt;
          &lt;td&gt;Users watching new RAG approaches&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;multimodal-and-content-creation&#34;&gt;Multimodal and Content Creation
&lt;/h2&gt;&lt;p&gt;This group covers image, video, transcription, and content distribution scenarios. It includes &lt;strong&gt;5&lt;/strong&gt; projects.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Project&lt;/th&gt;
          &lt;th&gt;Article&lt;/th&gt;
          &lt;th&gt;GitHub&lt;/th&gt;
          &lt;th&gt;Core Use&lt;/th&gt;
          &lt;th&gt;Best For&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;rembg&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/19/rembg-background-removal-notes/&#34; &gt;rembg: local image background removal tool&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/danielgatis/rembg&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;danielgatis/rembg&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Local image background removal&lt;/td&gt;
          &lt;td&gt;E-commerce, design, and image-processing users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;awesome-gpt-image-2-prompts&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/02/awesome-gpt-image-2-prompts-case-index/&#34; &gt;GPT-Image 2 prompt library: e-commerce, posters, portraits, and UI&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/EvoLinkAI/awesome-gpt-image-2-prompts&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;EvoLinkAI/awesome-gpt-image-2-prompts&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;GPT-Image 2 prompts and case library&lt;/td&gt;
          &lt;td&gt;AI art and prompt users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;faster-whisper&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/01/faster-whisper-speech-to-text/&#34; &gt;faster-whisper: a faster Whisper transcription engine&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/SYSTRAN/faster-whisper&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;SYSTRAN/faster-whisper&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;High-performance speech-to-text&lt;/td&gt;
          &lt;td&gt;Subtitle, transcription, and speech-processing users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Pixelle-Video&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/07/pixelle-video-ai-short-video-engine/&#34; &gt;Pixelle-Video: an open source AI engine for generating short videos from one topic&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/AIDC-AI/Pixelle-Video&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;AIDC-AI/Pixelle-Video&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;One-topic short-video generation workflow&lt;/td&gt;
          &lt;td&gt;Short-video and AIGC creators&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;AiToEarn&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/19/aitoearn-ai-content-marketing-agent/&#34; &gt;Too many content platforms? AiToEarn uses AI agents to help creators save effort&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/yikart/AiToEarn&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;yikart/AiToEarn&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Multi-platform content distribution and creator automation&lt;/td&gt;
          &lt;td&gt;Content operators and creators&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;local-models-and-inference&#34;&gt;Local Models and Inference
&lt;/h2&gt;&lt;p&gt;This group focuses on local model runtime and inference experiments. It currently has fewer projects, with &lt;strong&gt;1&lt;/strong&gt; project.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Project&lt;/th&gt;
          &lt;th&gt;Article&lt;/th&gt;
          &lt;th&gt;GitHub&lt;/th&gt;
          &lt;th&gt;Core Use&lt;/th&gt;
          &lt;th&gt;Best For&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;ds4&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/11/deepseek-v4-flash-ds4-metal/&#34; &gt;Running DeepSeek 4 locally: Antirez ds4 on Apple Silicon Mac&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/antirez/ds4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;antirez/ds4&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Experiment with running DeepSeek 4 on Apple Silicon&lt;/td&gt;
          &lt;td&gt;Local model and inference experiment users&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;vertical-applications-and-automation&#34;&gt;Vertical Applications and Automation
&lt;/h2&gt;&lt;p&gt;This group applies agents or AI capabilities to finance, trading, browsers, desktops, e-commerce monitoring, and other concrete scenarios. It includes &lt;strong&gt;7&lt;/strong&gt; projects.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Project&lt;/th&gt;
          &lt;th&gt;Article&lt;/th&gt;
          &lt;th&gt;GitHub&lt;/th&gt;
          &lt;th&gt;Core Use&lt;/th&gt;
          &lt;th&gt;Best For&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;TradingAgents-CN&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/01/tradingagents-cn-multi-agent-financial-research-framework/&#34; &gt;TradingAgents-CN: a multi-agent financial trading research framework for Chinese users&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/hsliuping/TradingAgents-CN&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;hsliuping/TradingAgents-CN&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Multi-agent financial trading research framework&lt;/td&gt;
          &lt;td&gt;Quant, finance, and agent researchers&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;FinceptTerminal&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/01/finceptterminal-open-source-financial-terminal/&#34; &gt;FinceptTerminal: open source financial terminal, quant research, and AI Agent workspace&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Fincept-Corporation/FinceptTerminal&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Fincept-Corporation/FinceptTerminal&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Financial terminal, quant research, and AI agent workspace&lt;/td&gt;
          &lt;td&gt;Financial analysis and quant users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Anthropic financial-services&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/16/anthropic-financial-services-agent-templates/&#34; &gt;Anthropic financial-services: reusable templates for financial agent scenarios&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/anthropics/financial-services&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;anthropics/financial-services&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Financial services agent templates&lt;/td&gt;
          &lt;td&gt;Users building financial AI solutions&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;ai-goofish-monitor&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/17/ai-goofish-monitor/&#34; &gt;ai-goofish-monitor: open source AI monitoring system for Xianyu products&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Usagi-org/ai-goofish-monitor&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Usagi-org/ai-goofish-monitor&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;AI product monitoring and Xianyu automation&lt;/td&gt;
          &lt;td&gt;Second-hand marketplace monitoring users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;CloakBrowser&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/19/cloakbrowser-stealth-chromium-browser-automation/&#34; &gt;CloakBrowser: a more human-like browser for Playwright and Puppeteer&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/CloakHQ/CloakBrowser&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;CloakHQ/CloakBrowser&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;More human-like browser automation environment&lt;/td&gt;
          &lt;td&gt;Browser automation and agent operation scenarios&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;UI-TARS-desktop&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/19/ui-tars-desktop-multimodal-ai-agent-stack/&#34; &gt;Let AI operate the computer? UI-TARS-desktop connects desktop, browser, and tools&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/bytedance/UI-TARS-desktop&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;bytedance/UI-TARS-desktop&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Desktop, browser, and tool operation agent&lt;/td&gt;
          &lt;td&gt;Users who want AI to operate computers&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;AI-Trader&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/19/ai-trader-agent-native-trading-platform/&#34; &gt;What is AI-Trader: a platform for AI agents to publish trading signals and run simulations&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/HKUDS/AI-Trader&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;HKUDS/AI-Trader&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;AI agent trading signals and simulated trading platform&lt;/td&gt;
          &lt;td&gt;Financial agent and trading researchers&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;ai-application-development-infrastructure&#34;&gt;AI Application Development Infrastructure
&lt;/h2&gt;&lt;p&gt;This group provides foundational components for building AI applications and agent toolchains. It includes &lt;strong&gt;5&lt;/strong&gt; projects.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Project&lt;/th&gt;
          &lt;th&gt;Article&lt;/th&gt;
          &lt;th&gt;GitHub&lt;/th&gt;
          &lt;th&gt;Core Use&lt;/th&gt;
          &lt;th&gt;Best For&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Prompt Optimizer&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/01/prompt-optimizer-prompt-engineering-tool/&#34; &gt;Prompt Optimizer: open source prompt optimization, testing, and MCP tools&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/linshenkx/prompt-optimizer&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;linshenkx/prompt-optimizer&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Prompt optimization, testing, and MCP tools&lt;/td&gt;
          &lt;td&gt;Prompt engineering and app-tuning users&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Playwright CLI&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/12/playwright-cli-getting-started/&#34; &gt;Playwright CLI basics: installation, skills, sessions, and common commands&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/microsoft/playwright-cli&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;microsoft/playwright-cli&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Browser automation CLI for coding agents&lt;/td&gt;
          &lt;td&gt;Agent users who need browser operation&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Vercel AI SDK&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/17/vercel-ai-sdk-typescript-agent-toolkit/&#34; &gt;What is Vercel AI SDK? A unified toolkit for TypeScript AI apps&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/vercel/ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;vercel/ai&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;TypeScript AI application SDK&lt;/td&gt;
          &lt;td&gt;Front-end and full-stack developers&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;CLIProxyAPI&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/24/cliproxyapi-cli-to-api-gateway/&#34; &gt;CLIProxyAPI: wrapping Codex, Claude Code, and Gemini CLI into unified APIs&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/router-for-me/CLIProxyAPI&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;router-for-me/CLIProxyAPI&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Wrap multiple AI CLIs and OAuth login states as compatible APIs&lt;/td&gt;
          &lt;td&gt;Users who want unified access to Codex, Claude Code, and Gemini CLI&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;CLIProxyAPI Management Center&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/24/cliproxyapi-management-center/&#34; &gt;CLIProxyAPI Management Center: a visual admin console for CLIProxyAPI&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/router-for-me/Cli-Proxy-API-Management-Center&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;router-for-me/Cli-Proxy-API-Management-Center&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Web admin UI for CLIProxyAPI configuration, accounts, logs, and OAuth&lt;/td&gt;
          &lt;td&gt;Users running CLIProxyAPI as a team gateway or account pool&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
</description>
        </item>
        <item>
        <title>Google I/O 2026 Summary: Gemini 3.5, Omni, Antigravity, and System-Level Agents</title>
        <link>https://knightli.com/en/2026/05/21/google-io-2026-gemini-agentic-ai-summary/</link>
        <pubDate>Thu, 21 May 2026 00:07:06 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/21/google-io-2026-gemini-agentic-ai-summary/</guid>
        <description>&lt;p&gt;The main line of Google I/O 2026 is clear: Google is moving Gemini from &amp;ldquo;model&amp;rdquo; and &amp;ldquo;chat assistant&amp;rdquo; into a fuller Agent ecosystem. It is not only answering questions. It is entering Search, Android, developer tools, video creation, shopping, Workspace, hardware, and enterprise platforms to help users complete longer task chains.&lt;/p&gt;
&lt;p&gt;This article summarizes the main Google I/O 2026 announcements from official releases and a developer perspective. For real development, always follow the official Google, Android Developers, and Gemini API documentation.&lt;/p&gt;
&lt;h2 id=&#34;one-sentence-summary&#34;&gt;One-Sentence Summary
&lt;/h2&gt;&lt;p&gt;The keyword for Google I/O 2026 is &lt;code&gt;agentic Gemini era&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Google announced or strengthened several lines:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Gemini 3.5 Flash&lt;/code&gt;: speed, action capability, and Agent workflows.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemini Omni&lt;/code&gt;: creating content from any input, starting with video creation and editing.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemini app&lt;/code&gt;: moving from chat assistant to proactive, always-on, task-capable personal Agent.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Google Antigravity 2.0&lt;/code&gt;: evolving from an AI coding tool into an Agent-first development platform.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemini API Managed Agents&lt;/code&gt;: creating hosted Agents through APIs that can reason, use tools, and execute code.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Google AI Studio&lt;/code&gt;: expanding to mobile, native Android support, and project export to Antigravity.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Search&lt;/code&gt;, &lt;code&gt;Shopping&lt;/code&gt;, &lt;code&gt;YouTube&lt;/code&gt;, &lt;code&gt;Workspace&lt;/code&gt;, and &lt;code&gt;Android&lt;/code&gt;: all gaining stronger Gemini and Agent capabilities.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, Google is no longer only showing &amp;ldquo;how smart the model is.&amp;rdquo; It is showing how models enter products, tools, and systems to actually execute tasks for users.&lt;/p&gt;
&lt;h2 id=&#34;gemini-35-flash-from-prompt-to-action&#34;&gt;Gemini 3.5 Flash: From Prompt to Action
&lt;/h2&gt;&lt;p&gt;Gemini 3.5 is Google&amp;rsquo;s new model family at I/O 2026, with &lt;code&gt;Gemini 3.5 Flash&lt;/code&gt; as the first public focus.&lt;/p&gt;
&lt;p&gt;Google does not position it as simply a &amp;ldquo;faster chat model,&amp;rdquo; but as a high-speed engine for real Agent workflows. Google&amp;rsquo;s developer article describes 3.5 Flash as combining frontier intelligence and high speed to support the shift from prompt to action.&lt;/p&gt;
&lt;p&gt;Its main significance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Optimized for Agent and coding scenarios.&lt;/li&gt;
&lt;li&gt;Supports longer task chains and tool use.&lt;/li&gt;
&lt;li&gt;Available through Antigravity, Gemini API, Google AI Studio, Android Studio, Gemini Enterprise, and other entry points.&lt;/li&gt;
&lt;li&gt;Better suited for applications that need fast responses, multi-turn execution, and frequent tool calls.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For developers, Gemini 3.5 Flash is not just another model option. It is one of the default engines for Google&amp;rsquo;s new Agent toolchain.&lt;/p&gt;
&lt;h2 id=&#34;gemini-omni-video-and-world-model-capabilities&#34;&gt;Gemini Omni: Video and World-Model Capabilities
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Gemini Omni&lt;/code&gt; is another core I/O 2026 announcement. Google describes it as creating content from any input, with the current focus starting from video.&lt;/p&gt;
&lt;p&gt;Its highlights fall into three areas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multimodal input: text, images, video, audio, and more can be used as references.&lt;/li&gt;
&lt;li&gt;Video editing: users can modify video over multiple turns with natural language instead of stopping after one generation.&lt;/li&gt;
&lt;li&gt;World understanding: it emphasizes consistency in physics, scenes, actions, narrative, and audiovisual output.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This means AI video tools are moving from &amp;ldquo;enter one prompt to generate a clip&amp;rdquo; toward &amp;ldquo;revise step by step as if talking to an editor.&amp;rdquo; For creators, the real value is not one-shot generation, but a controllable, traceable, and iterative editing process.&lt;/p&gt;
&lt;h2 id=&#34;gemini-app-from-chat-assistant-to-always-on-personal-agent&#34;&gt;Gemini App: From Chat Assistant to Always-On Personal Agent
&lt;/h2&gt;&lt;p&gt;Google is also pushing Gemini app in a more Agent-like direction. Official posts describe Gemini app as becoming more proactive, offering daily briefs and always-on assistance.&lt;/p&gt;
&lt;p&gt;Key points include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Gemini 3.5 Flash&lt;/code&gt; entering Gemini app.&lt;/li&gt;
&lt;li&gt;A new UI and more dynamic interaction.&lt;/li&gt;
&lt;li&gt;Personal AI Agent concepts such as &lt;code&gt;Gemini Spark&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Proactive daily briefs that organize what users need to know each day.&lt;/li&gt;
&lt;li&gt;More emphasis on 24/7 background assistance instead of waiting for the user to start every chat.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the part that affects ordinary users most. Gemini used to feel more like a &amp;ldquo;you ask, I answer&amp;rdquo; assistant. After I/O 2026, Google wants it to feel more like a personal Agent that follows up on tasks, proactively reminds users, and works across products.&lt;/p&gt;
&lt;h2 id=&#34;antigravity-20-developer-tools-become-agent-first&#34;&gt;Antigravity 2.0: Developer Tools Become Agent-First
&lt;/h2&gt;&lt;p&gt;One of the most important developer-side announcements is &lt;code&gt;Google Antigravity 2.0&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Google positions Antigravity as an agent-first development platform. After I/O 2026, it is not only helping developers write code. It is meant to help developers move from ideas and prototypes to Agent orchestration and production delivery.&lt;/p&gt;
&lt;p&gt;Core changes listed by Google include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Antigravity 2.0 standalone desktop app.&lt;/li&gt;
&lt;li&gt;Multi-Agent parallel orchestration.&lt;/li&gt;
&lt;li&gt;Dynamic subagents.&lt;/li&gt;
&lt;li&gt;Background scheduled tasks.&lt;/li&gt;
&lt;li&gt;Integration with Google AI Studio, Android, Firebase, and related ecosystems.&lt;/li&gt;
&lt;li&gt;Antigravity CLI for terminal users.&lt;/li&gt;
&lt;li&gt;Antigravity SDK for custom Agent behavior and deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This shows that AI coding tools are entering the next stage after &amp;ldquo;code completion / conversational generation&amp;rdquo;: developers will manage multiple executable Agents, not just one chat window.&lt;/p&gt;
&lt;h2 id=&#34;gemini-api-managed-agents-hosting-agents-as-api-capabilities&#34;&gt;Gemini API Managed Agents: Hosting Agents as API Capabilities
&lt;/h2&gt;&lt;p&gt;Google also introduced &lt;code&gt;Managed Agents in the Gemini API&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;According to the official description, these Agents can be created with a single API call. They can reason, use tools, and execute code in an isolated Linux environment, supported by the Antigravity agent harness.&lt;/p&gt;
&lt;p&gt;This matters to developers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You do not need to build the full Agent runtime yourself.&lt;/li&gt;
&lt;li&gt;You can get a persistent, isolated execution environment.&lt;/li&gt;
&lt;li&gt;Multi-turn interactions can preserve files and state.&lt;/li&gt;
&lt;li&gt;Agents can be extended with markdown skills, custom instructions, and templates.&lt;/li&gt;
&lt;li&gt;They are available through Interactions API and Google AI Studio.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If this line matures, Agent platforms will increasingly look like cloud services: developers will not only call models, but call Agents with state, tools, execution environments, and security boundaries.&lt;/p&gt;
&lt;h2 id=&#34;google-ai-studio-from-prompt-playground-to-app-generation-entry-point&#34;&gt;Google AI Studio: From Prompt Playground to App Generation Entry Point
&lt;/h2&gt;&lt;p&gt;At I/O 2026, Google AI Studio also moves further.&lt;/p&gt;
&lt;p&gt;Key changes include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Google AI Studio mobile app for capturing ideas and generating prototypes on mobile.&lt;/li&gt;
&lt;li&gt;Workspace API integration, making it easier for Agents to access Google Workspace.&lt;/li&gt;
&lt;li&gt;Project export to Antigravity, carrying context into local development and production work.&lt;/li&gt;
&lt;li&gt;Native Android support, allowing users to build Android apps from prompts.&lt;/li&gt;
&lt;li&gt;Google Play Console integration to publish apps to test tracks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This turns AI Studio from &amp;ldquo;a place to tune prompts and test models&amp;rdquo; into an entry point from idea to app. Its relationship with Antigravity is clearer too: AI Studio is good for fast ideation and generation, while Antigravity is better for continued development, orchestration, debugging, and delivery.&lt;/p&gt;
&lt;h2 id=&#34;android-and-appfunctions-key-interfaces-for-mobile-agents&#34;&gt;Android and AppFunctions: Key Interfaces for Mobile Agents
&lt;/h2&gt;&lt;p&gt;Android system-level Agents are worth watching on their own, but they need to be understood through accurate interfaces and product boundaries.&lt;/p&gt;
&lt;p&gt;The most important current piece is Android&amp;rsquo;s official &lt;code&gt;AppFunctions&lt;/code&gt;. The official documentation describes AppFunctions as an Android platform API with Jetpack libraries that lets apps expose their capabilities to agents, assistants, and other authorized callers. It also simplifies Android MCP integration.&lt;/p&gt;
&lt;p&gt;Its significance is that mobile automation no longer has to rely only on screenshots, OCR, simulated taps, and UI control positioning.&lt;/p&gt;
&lt;p&gt;Traditional mobile automation looks like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Recognize the screen.&lt;/li&gt;
&lt;li&gt;Find the button.&lt;/li&gt;
&lt;li&gt;Simulate a tap.&lt;/li&gt;
&lt;li&gt;Wait for the page to change.&lt;/li&gt;
&lt;li&gt;Retry after errors.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The AppFunctions direction is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Apps declare what they can do.&lt;/li&gt;
&lt;li&gt;Agents call those capabilities with authorization.&lt;/li&gt;
&lt;li&gt;The system handles permissions, call boundaries, and security constraints.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This will affect Android app design. Future apps will not only need human-facing UIs, but also core capabilities designed as Agent-callable interfaces.&lt;/p&gt;
&lt;h2 id=&#34;search-shopping-and-content-products-are-becoming-agentic-too&#34;&gt;Search, Shopping, and Content Products Are Becoming Agentic Too
&lt;/h2&gt;&lt;p&gt;Google I/O 2026 changes are not limited to models and developer tools. Search and consumer products are changing at the same time.&lt;/p&gt;
&lt;p&gt;Official I/O summaries mention:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Search entering a new AI Search stage.&lt;/li&gt;
&lt;li&gt;Information agents appearing in Search.&lt;/li&gt;
&lt;li&gt;Gemini Spark and Daily Brief entering Gemini app.&lt;/li&gt;
&lt;li&gt;Universal Cart making shopping carts smarter.&lt;/li&gt;
&lt;li&gt;Ask YouTube enabling conversational queries and navigation over video content.&lt;/li&gt;
&lt;li&gt;Gemini capabilities expanding to more products and form factors.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These announcements show that Google&amp;rsquo;s Agent direction is not a single product. It is spreading horizontally across search, video, shopping, productivity, mobile, and hardware scenarios.&lt;/p&gt;
&lt;h2 id=&#34;practical-impact-for-developers&#34;&gt;Practical Impact for Developers
&lt;/h2&gt;&lt;p&gt;The biggest impact of Google I/O 2026 for developers is not &amp;ldquo;another model.&amp;rdquo; It is that the development target is changing.&lt;/p&gt;
&lt;p&gt;Developers used to mainly build:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Apps.&lt;/li&gt;
&lt;li&gt;Websites.&lt;/li&gt;
&lt;li&gt;APIs.&lt;/li&gt;
&lt;li&gt;Plugins.&lt;/li&gt;
&lt;li&gt;Automation scripts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Next, they will also build:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;App capabilities callable by Agents.&lt;/li&gt;
&lt;li&gt;Multi-Agent workflows.&lt;/li&gt;
&lt;li&gt;Stateful tool execution environments.&lt;/li&gt;
&lt;li&gt;Auditable automation flows.&lt;/li&gt;
&lt;li&gt;Human-in-the-loop confirmation mechanisms.&lt;/li&gt;
&lt;li&gt;Integrations with MCP, AppFunctions, Workspace API, Playwright, Firebase, and other tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Software will increasingly look like a set of capabilities, not only a set of interfaces. Products that expose their capabilities clearly, reliably, and safely to Agents will be more likely to enter users&amp;rsquo; automation task chains.&lt;/p&gt;
&lt;h2 id=&#34;impact-on-mobile-automation&#34;&gt;Impact on Mobile Automation
&lt;/h2&gt;&lt;p&gt;Mobile automation will gradually move from &amp;ldquo;GUI first&amp;rdquo; to &amp;ldquo;API first, GUI as fallback.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In the short term, screenshot recognition, OCR, simulated taps, and browser automation still matter because many older apps have no standard interface.&lt;/p&gt;
&lt;p&gt;In the long term, if Android AppFunctions, MCP, and system-level permission models mature, stable task execution will lean toward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First calling capabilities declared by apps.&lt;/li&gt;
&lt;li&gt;Then calling system interfaces when needed.&lt;/li&gt;
&lt;li&gt;Then using GUI automation as a fallback.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This will change RPA, mobile Agents, testing tools, and app ecosystems. Apps that expose capabilities are easier for system-level Agents to call. Apps that do not may still only be operated by the old &amp;ldquo;look at screen, tap screen&amp;rdquo; approach.&lt;/p&gt;
&lt;h2 id=&#34;security-permissions-and-auditing-become-hard-requirements&#34;&gt;Security, Permissions, and Auditing Become Hard Requirements
&lt;/h2&gt;&lt;p&gt;The stronger Agents become, the higher the risk.&lt;/p&gt;
&lt;p&gt;If an Agent can execute tasks across apps, make payments, change settings, access files, and read context, it needs clear security boundaries:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Permission levels.&lt;/li&gt;
&lt;li&gt;Explicit user authorization.&lt;/li&gt;
&lt;li&gt;Secondary confirmation for sensitive actions.&lt;/li&gt;
&lt;li&gt;Sandbox isolation.&lt;/li&gt;
&lt;li&gt;Operation logs.&lt;/li&gt;
&lt;li&gt;Reversibility and rollback.&lt;/li&gt;
&lt;li&gt;Enterprise auditing and compliance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is why Google emphasizes isolated environments for hosted Agents, permission requirements for AppFunctions, enterprise platforms, and controlled deployment. The future of Agents is not &amp;ldquo;do anything without limits,&amp;rdquo; but executable, traceable, and governable behavior inside security boundaries.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;The main content of Google I/O 2026 can be summarized in one sentence: Google is turning Gemini into an Agent platform spanning models, apps, systems, developer tools, and hardware.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Gemini 3.5 Flash&lt;/code&gt; provides speed and action capability. &lt;code&gt;Gemini Omni&lt;/code&gt; pushes multimodal creation toward video and world understanding. &lt;code&gt;Gemini app&lt;/code&gt; becomes a proactive personal assistant. &lt;code&gt;Antigravity 2.0&lt;/code&gt; and &lt;code&gt;Managed Agents&lt;/code&gt; push developer tools toward Agent-native development. &lt;code&gt;AppFunctions&lt;/code&gt; lets Android apps begin exposing capabilities to intelligent agents.&lt;/p&gt;
&lt;p&gt;For developers, the next thing to watch is not only model parameters, but how to structure application capabilities, connect to Agent toolchains, design permissions and auditing, and make products safely and reliably callable in a system-level Agent ecosystem.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-collection/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Google Blog: Google I/O 2026 news and announcements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Google Blog: I/O 2026 developer highlights&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Google Blog: The Gemini app becomes more agentic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://developer.android.com/ai/appfunctions?hl=zh-cn&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Android Developers: AppFunctions overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>What Is PageIndex? A Reasoning-Based RAG Document Index Without Vector Databases</title>
        <link>https://knightli.com/en/2026/05/20/vectifyai-pageindex-vectorless-rag/</link>
        <pubDate>Wed, 20 May 2026 23:51:37 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/20/vectifyai-pageindex-vectorless-rag/</guid>
        <description>&lt;p&gt;&lt;code&gt;VectifyAI/PageIndex&lt;/code&gt; is an interesting RAG project. Instead of starting with &amp;ldquo;build another vector database,&amp;rdquo; it first organizes long documents into a tree structure similar to a table of contents, then lets an LLM perform reasoning-based retrieval along that tree.&lt;/p&gt;
&lt;p&gt;Project link: &lt;a class=&#34;link&#34; href=&#34;https://github.com/VectifyAI/PageIndex&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;VectifyAI/PageIndex&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;At the time of writing, the GitHub page shows about 31.8k stars and 2.7k forks, with an MIT license. The README positions it as &lt;code&gt;Vectorless, Reasoning-based RAG&lt;/code&gt;: RAG without a vector database, based on reasoning.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-it-tries-to-solve&#34;&gt;What Problem It Tries to Solve
&lt;/h2&gt;&lt;p&gt;The common path for traditional RAG is: chunk the document, vectorize the chunks, store them in a vector database, then retrieve passages by similarity search. This approach is simple, general, and mature, but it often runs into several problems with long professional documents:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Similarity is not the same as true relevance.&lt;/li&gt;
&lt;li&gt;Document structure is broken apart by chunking, and section relationships are lost.&lt;/li&gt;
&lt;li&gt;Retrieval results are hard to explain, making it difficult to say why a passage was selected.&lt;/li&gt;
&lt;li&gt;For financial reports, regulatory filings, legal documents, and technical manuals, questions often require reasoning across sections.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PageIndex takes the opposite route: first organize the document into a semantic tree, then let the model search it like a human reading a table of contents, jumping into sections, and narrowing down to details.&lt;/p&gt;
&lt;h2 id=&#34;the-basic-pageindex-workflow&#34;&gt;The Basic PageIndex Workflow
&lt;/h2&gt;&lt;p&gt;The README describes PageIndex retrieval in two steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Generate a &lt;code&gt;Table-of-Contents&lt;/code&gt;-like tree index for the document.&lt;/li&gt;
&lt;li&gt;Perform reasoning-based retrieval through tree search.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This tree is not just a file directory. It is a document structure designed for LLM use. Nodes can contain titles, page ranges, summaries, child nodes, and other metadata. When answering a question, the model does not need to face a pile of fragmented chunks immediately. It can first decide which section to enter, then continue searching downward.&lt;/p&gt;
&lt;p&gt;This method is better suited to documents that are well structured but very long, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Financial reports and SEC filings.&lt;/li&gt;
&lt;li&gt;Regulatory and compliance documents.&lt;/li&gt;
&lt;li&gt;Academic textbooks and papers.&lt;/li&gt;
&lt;li&gt;Legal documents.&lt;/li&gt;
&lt;li&gt;Technical manuals and product documentation.&lt;/li&gt;
&lt;li&gt;Large PDFs that exceed the model context window.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-it-differs-from-traditional-vector-rag&#34;&gt;How It Differs From Traditional Vector RAG
&lt;/h2&gt;&lt;p&gt;PageIndex&amp;rsquo;s main selling points can be summarized in five areas.&lt;/p&gt;
&lt;p&gt;First, it does not require a Vector DB. It relies on document structure and LLM reasoning to locate content, rather than only using vector similarity search.&lt;/p&gt;
&lt;p&gt;Second, it does not use traditional chunking. Documents are organized by natural sections instead of fixed-length text fragments.&lt;/p&gt;
&lt;p&gt;Third, explainability is stronger. The retrieval path can map back to pages, sections, and tree nodes, making it easier to trace than &amp;ldquo;this text was hit by vector similarity.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Fourth, retrieval is context-aware. The question, conversation history, and domain background can all affect the tree search path.&lt;/p&gt;
&lt;p&gt;Fifth, it is closer to how human experts read documents. People usually do not cut an entire document into small chunks and calculate similarity; they first inspect the table of contents, locate sections, and then read details.&lt;/p&gt;
&lt;p&gt;This does not mean vector databases have no value. A more accurate view is that PageIndex fits scenarios where &amp;ldquo;semantic similarity is not enough, and structure plus reasoning need to participate&amp;rdquo; in long-document retrieval.&lt;/p&gt;
&lt;h2 id=&#34;how-to-run-it-locally&#34;&gt;How to Run It Locally
&lt;/h2&gt;&lt;p&gt;The README provides a local self-hosting path. First install dependencies:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip3 install --upgrade -r requirements.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then create a &lt;code&gt;.env&lt;/code&gt; file in the project root and write your LLM API key. The project supports multiple models through &lt;code&gt;LiteLLM&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;your_openai_key_here
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Generate a PageIndex structure for a PDF:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python3 run_pageindex.py --pdf_path /path/to/your/document.pdf
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Markdown is also supported:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python3 run_pageindex.py --md_path /path/to/your/document.md
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Common optional parameters include:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--model
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--toc-check-pages
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--max-pages-per-node
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--max-tokens-per-node
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--if-add-node-id
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--if-add-node-summary
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--if-add-doc-description
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The README also notes that the local open-source version uses standard PDF parsing. For complex PDFs, the project&amp;rsquo;s cloud service provides enhanced OCR, tree building, and retrieval pipelines.&lt;/p&gt;
&lt;h2 id=&#34;agentic-vectorless-rag-example&#34;&gt;Agentic Vectorless RAG Example
&lt;/h2&gt;&lt;p&gt;The project also provides an agentic vectorless RAG example using self-hosted PageIndex and OpenAI Agents SDK. Install the optional dependency and run it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip3 install openai-agents
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python3 examples/agentic_vectorless_rag_demo.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The value of this example is that it pushes PageIndex from &amp;ldquo;generate a document tree&amp;rdquo; to &amp;ldquo;let an Agent use the document tree for retrieval.&amp;rdquo; If you are building an enterprise knowledge base, financial report Q&amp;amp;A, regulatory Q&amp;amp;A, or technical documentation Agent, this example is more worth running than only reading the README.&lt;/p&gt;
&lt;h2 id=&#34;cloud-service-mcp-and-api&#34;&gt;Cloud Service, MCP, and API
&lt;/h2&gt;&lt;p&gt;PageIndex is not just a GitHub repo. The project page also lists several entry points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Self-hosting: run the open-source code locally, suitable for experiments and controlled deployments.&lt;/li&gt;
&lt;li&gt;Chat Platform: a ChatGPT-style document analysis platform.&lt;/li&gt;
&lt;li&gt;MCP / API: useful for integrating with existing Agents or automation workflows.&lt;/li&gt;
&lt;li&gt;Enterprise: for private or on-premises deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This shows that its positioning is not a simple demo. It aims to turn &amp;ldquo;reasoning-based document retrieval&amp;rdquo; into an integrable document intelligence infrastructure.&lt;/p&gt;
&lt;h2 id=&#34;suitable-scenarios&#34;&gt;Suitable Scenarios
&lt;/h2&gt;&lt;p&gt;PageIndex is suitable for tasks such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Long PDF Q&amp;amp;A.&lt;/li&gt;
&lt;li&gt;Financial reports, annual reports, prospectuses, and regulatory filing analysis.&lt;/li&gt;
&lt;li&gt;Legal and compliance document retrieval.&lt;/li&gt;
&lt;li&gt;Technical manual Q&amp;amp;A.&lt;/li&gt;
&lt;li&gt;Multi-section textbook or paper retrieval.&lt;/li&gt;
&lt;li&gt;Enterprise knowledge bases that need explainable retrieval paths.&lt;/li&gt;
&lt;li&gt;Providing structured document context to Agents.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your material is short, has little structure, or is just a normal FAQ, traditional embedding + vector DB may already be enough. PageIndex&amp;rsquo;s advantages are more likely to appear in long documents, strong structure, professional domains, and questions that require reasoning.&lt;/p&gt;
&lt;h2 id=&#34;things-to-watch&#34;&gt;Things to Watch
&lt;/h2&gt;&lt;p&gt;First, PageIndex still depends on LLMs. Tree building, summaries, and retrieval quality are affected by model capability, prompts, and document parsing quality.&lt;/p&gt;
&lt;p&gt;Second, the local version uses standard PDF parsing. Complex scanned documents, chart-heavy PDFs, or messy layouts may require OCR and stronger preprocessing.&lt;/p&gt;
&lt;p&gt;Third, vectorless does not mean zero cost. Tree building itself also consumes model calls and time, especially for large-scale document collections.&lt;/p&gt;
&lt;p&gt;Fourth, PageIndex is more like a document structure indexing and reasoning retrieval framework. It does not directly replace every RAG stack. In production, it may also be combined with vector retrieval, keyword retrieval, permission control, caching, and audit systems.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;What makes PageIndex interesting is that it shifts RAG from &amp;ldquo;text similarity retrieval&amp;rdquo; toward &amp;ldquo;document structure + LLM reasoning.&amp;rdquo; For long and professional documents, this direction is worth watching.&lt;/p&gt;
&lt;p&gt;If you are building enterprise document Q&amp;amp;A, financial report analysis, regulatory retrieval, or technical manual Agents, PageIndex is a new RAG architecture reference: give documents structure first, then let the model reason along that structure, instead of breaking everything into chunks and putting it all into a vector database from the beginning.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/VectifyAI/PageIndex&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GitHub: VectifyAI/PageIndex&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Gemini 3.5 Is Here: Flash Leads as Google Focuses on Agents and Long-Running Tasks</title>
        <link>https://knightli.com/en/2026/05/20/google-gemini-3-5-flash-agent-coding/</link>
        <pubDate>Wed, 20 May 2026 22:51:31 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/20/google-gemini-3-5-flash-agent-coding/</guid>
        <description>&lt;p&gt;Google officially released the Gemini 3.5 series on May 20, 2026. The first model available is Gemini 3.5 Flash. Its positioning is not just chat, but agents, code generation, and long-running complex task execution.&lt;/p&gt;
&lt;p&gt;The message is clear: Google wants Gemini 3.5 to answer questions, but also to plan, execute, check results, and keep work moving across multi-step workflows.&lt;/p&gt;
&lt;h2 id=&#34;gemini-35-flash-comes-first&#34;&gt;Gemini 3.5 Flash Comes First
&lt;/h2&gt;&lt;p&gt;Gemini 3.5 Flash is already available to several groups:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;General users can try it in the Gemini app and AI Mode in Google Search.&lt;/li&gt;
&lt;li&gt;Developers can use it through Google Antigravity, Google AI Studio, and the Gemini API in Android Studio.&lt;/li&gt;
&lt;li&gt;Enterprise users can access it through Gemini Enterprise Agent Platform and Gemini Enterprise.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Google also said Gemini 3.5 Pro is still in development, already being used internally at Google, and expected to launch next month.&lt;/p&gt;
&lt;p&gt;This means the 3.5 series will continue the Flash and Pro split: Flash emphasizes speed, cost, and scalable execution, while Pro will likely target more complex and higher-capability use cases.&lt;/p&gt;
&lt;h2 id=&#34;the-focus-is-agents-and-coding&#34;&gt;The Focus Is Agents and Coding
&lt;/h2&gt;&lt;p&gt;Google describes Gemini 3.5 Flash as one of its strongest models for agents and coding. The announcement says it beats some Gemini 3.1 Pro results on coding and agent benchmarks such as Terminal-Bench 2.1, GDPval-AA, MCP Atlas, and CharXiv Reasoning.&lt;/p&gt;
&lt;p&gt;Most users do not need to care about every benchmark number. The more important point is that Google is pushing model capability toward executable workflows: not only writing code, but also migrating old projects, developing complex apps, organizing financial reports, analyzing data, and running repeated tests.&lt;/p&gt;
&lt;p&gt;In the Antigravity development framework, Gemini 3.5 Flash can use multiple collaborating subagents to handle large tasks. Google showed examples such as reading the AlphaZero paper and building a playable game, converting legacy code to Next.js, and generating cityscapes and UI options in parallel.&lt;/p&gt;
&lt;p&gt;The direction is clear: AI coding tools are moving from &amp;ldquo;generate a piece of code&amp;rdquo; toward &amp;ldquo;coordinate multiple agents to complete a project.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;stronger-multimodal-ui-and-graphics&#34;&gt;Stronger Multimodal UI and Graphics
&lt;/h2&gt;&lt;p&gt;Gemini 3.5 Flash builds on Gemini 3&amp;rsquo;s multimodal foundation. Google says it can generate richer web UIs, interactive animations, and visual content.&lt;/p&gt;
&lt;p&gt;The announcement includes examples such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Creating interactive animations for research papers.&lt;/li&gt;
&lt;li&gt;Turning text descriptions into interactive hardware models.&lt;/li&gt;
&lt;li&gt;Generating a complete brand concept for a school fundraiser.&lt;/li&gt;
&lt;li&gt;Producing multiple UX options for a checkout flow in a short time.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This matters for developers and product teams. The model is no longer only writing explanations. It can participate in frontend prototypes, interaction design, and visualization work.&lt;/p&gt;
&lt;h2 id=&#34;enterprise-use-automating-time-consuming-workflows&#34;&gt;Enterprise Use: Automating Time-Consuming Workflows
&lt;/h2&gt;&lt;p&gt;Google listed several partner examples. Shopify uses subagents to analyze complex data and forecast merchant growth. Macquarie Bank is testing 3.5 Flash on documents over 100 pages to accelerate account-opening workflows. Salesforce is integrating it into Agentforce. Ramp uses it to improve OCR for complex invoices. Xero uses AI agents for administrative workflows. Databricks uses automated workflows to monitor data anomalies and suggest fixes.&lt;/p&gt;
&lt;p&gt;These examples point to the same trend: enterprise adoption of large models is moving from one-off Q&amp;amp;A to workflow automation. Whether a model is inexpensive, fast, and stable over long tasks can matter more than whether one answer looks impressive.&lt;/p&gt;
&lt;h2 id=&#34;gemini-spark-a-personal-ai-agent&#34;&gt;Gemini Spark: A Personal AI Agent
&lt;/h2&gt;&lt;p&gt;Google also announced Gemini Spark, a personal AI agent powered by Gemini 3.5 Flash. Its goal is to run over long periods and proactively perform tasks under user guidance.&lt;/p&gt;
&lt;p&gt;Gemini Spark has started rolling out to trusted testers. Google plans to open a beta next week to Google AI Ultra subscribers in the United States.&lt;/p&gt;
&lt;p&gt;This is worth watching. Google Search, the Gemini app, Android, Workspace, and browser-related ecosystems already touch many parts of personal digital life. If a personal agent can connect with these entry points, its impact may be larger than a standalone chatbot.&lt;/p&gt;
&lt;h2 id=&#34;safety-moves-further-upstream&#34;&gt;Safety Moves Further Upstream
&lt;/h2&gt;&lt;p&gt;Google says Gemini 3.5 was developed under its Frontier Safety Framework, with strengthened protections for information security and CBRN-related risks. The announcement also mentions interpretability tools that help examine and understand model reasoning before responses are delivered.&lt;/p&gt;
&lt;p&gt;This shows that frontier model releases are no longer only a capability race. The more a model emphasizes agents, autonomous execution, and long-running tasks, the more important safety controls, false refusal rates, harmful-output prevention, and interpretability become.&lt;/p&gt;
&lt;h2 id=&#34;how-to-view-gemini-35&#34;&gt;How to View Gemini 3.5
&lt;/h2&gt;&lt;p&gt;Gemini 3.5 Flash is not just another model launch. It looks more like Google&amp;rsquo;s bet on the next shape of AI products: models that can call tools, split tasks, coordinate execution, generate UIs, and enter personal and enterprise workflows.&lt;/p&gt;
&lt;p&gt;For developers, the important things to watch are the real experience in Google Antigravity, AI Studio, the Gemini API, and Android Studio. For enterprises, the question is whether it can reliably reduce manual work in real workflows, not just score well on benchmarks.&lt;/p&gt;
&lt;p&gt;Gemini 3.5 Pro is not publicly available yet. Once Pro ships, the differences between Flash and Pro in capability, price, speed, and context handling will decide which production scenarios each model fits best.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://blog.google/intl/zh-tw/products/explore-get-answers/gemini-3-5/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Google Blog: Gemini 3.5&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>agentmemory: Persistent Memory for Claude Code, Codex, Cursor, and Other Coding Agents</title>
        <link>https://knightli.com/en/2026/05/19/agentmemory-persistent-memory-ai-coding-agents/</link>
        <pubDate>Tue, 19 May 2026 10:56:50 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/19/agentmemory-persistent-memory-ai-coding-agents/</guid>
        <description>&lt;p&gt;&lt;code&gt;rohitg00/agentmemory&lt;/code&gt; is a persistent memory system for AI coding agents. Its goal is straightforward: Claude Code, Codex CLI, Cursor, Gemini CLI, OpenCode, and similar tools should not have to relearn the project background, architecture decisions, and historical problems every time a new session starts.&lt;/p&gt;
&lt;p&gt;Project URL: &lt;a class=&#34;link&#34; href=&#34;https://github.com/rohitg00/agentmemory&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/rohitg00/agentmemory&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;At the time of writing, the GitHub API showed about 13k stars, TypeScript as the main language, and an Apache-2.0 license. The README describes it as &amp;ldquo;Persistent memory for AI coding agents.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;what-problem-does-it-solve&#34;&gt;What Problem Does It Solve
&lt;/h2&gt;&lt;p&gt;A common pain point for coding agents is memory fragmentation. You may ask an agent to fix an authentication issue today, then open a new conversation tomorrow, and it no longer knows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why a certain architecture decision was made.&lt;/li&gt;
&lt;li&gt;Which files are sensitive and should be changed carefully.&lt;/li&gt;
&lt;li&gt;What bugs were fixed before.&lt;/li&gt;
&lt;li&gt;What commands, tools, or local services the project uses.&lt;/li&gt;
&lt;li&gt;Which conventions the team follows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Static notes help, but they are often forgotten or not connected to the active workflow. agentmemory tries to provide a shared memory layer that can be used across different AI coding tools.&lt;/p&gt;
&lt;h2 id=&#34;supported-agents&#34;&gt;Supported Agents
&lt;/h2&gt;&lt;p&gt;The README lists support for Claude Code, Codex CLI, Cursor, Gemini CLI, OpenCode, and other MCP-compatible tools. The core idea is to expose memory through a local service, MCP, hooks, and integrations, so multiple assistants can share the same project context.&lt;/p&gt;
&lt;p&gt;This is especially useful for teams that switch between tools. One developer may use Cursor, another may use Claude Code, while automation runs through Codex CLI. A shared memory layer reduces repeated explanation.&lt;/p&gt;
&lt;h2 id=&#34;quick-start&#34;&gt;Quick Start
&lt;/h2&gt;&lt;p&gt;Install globally:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npm install -g @agentmemory/agentmemory
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;agentmemory
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;agentmemory demo
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;agentmemory connect claude-code
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Or run with npx:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npx @agentmemory/agentmemory
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The local service is available at:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;http://localhost:3113
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;In practice, the first step is usually to start the memory service, connect the coding assistant, and then let the agent read or write project memories during development.&lt;/p&gt;
&lt;h2 id=&#34;how-it-differs-from-static-memory-files&#34;&gt;How It Differs From Static Memory Files
&lt;/h2&gt;&lt;p&gt;Many teams already maintain &lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;CLAUDE.md&lt;/code&gt;, README notes, or local documentation. These files are useful, but they are static. They do not automatically capture session history, task outcomes, or recurring decisions.&lt;/p&gt;
&lt;p&gt;agentmemory is closer to a persistent context service. It can store and surface memories that are relevant to the current project or task. The goal is not to replace documentation, but to make working context easier to reuse.&lt;/p&gt;
&lt;h2 id=&#34;typical-scenarios&#34;&gt;Typical Scenarios
&lt;/h2&gt;&lt;p&gt;Useful scenarios include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Remembering project setup steps and common commands.&lt;/li&gt;
&lt;li&gt;Recording why a risky refactor was avoided.&lt;/li&gt;
&lt;li&gt;Keeping notes about flaky tests or local services.&lt;/li&gt;
&lt;li&gt;Sharing domain terminology across coding assistants.&lt;/li&gt;
&lt;li&gt;Helping agents continue work after a new session starts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is particularly valuable for long-running products, monorepos, and projects with many hidden conventions.&lt;/p&gt;
&lt;h2 id=&#34;things-to-watch-out-for&#34;&gt;Things To Watch Out For
&lt;/h2&gt;&lt;p&gt;First, memory quality matters. If old or wrong information is written into memory, future agents may repeat the mistake. Teams should keep important memories short, clear, and reviewable.&lt;/p&gt;
&lt;p&gt;Second, privacy matters. Do not store secrets, API keys, customer data, or sensitive production information in a memory system unless the security model is clear.&lt;/p&gt;
&lt;p&gt;Third, memory is not a substitute for tests. It helps agents understand context, but the final guarantee still comes from code review, tests, and verification.&lt;/p&gt;
&lt;h2 id=&#34;who-it-is-for&#34;&gt;Who It Is For
&lt;/h2&gt;&lt;p&gt;agentmemory is suitable for developers who use multiple AI coding tools, teams working on large codebases, and users who often need agents to continue previous work. It is less necessary for very small one-off scripts.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;agentmemory is interesting because it treats memory as infrastructure for AI coding, not as a small prompt trick. If coding agents are becoming part of daily development, persistent project memory is a practical missing piece.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Let AI Operate Your Computer? UI-TARS-desktop Connects Desktop, Browser, and Tools</title>
        <link>https://knightli.com/en/2026/05/19/ui-tars-desktop-multimodal-ai-agent-stack/</link>
        <pubDate>Tue, 19 May 2026 10:56:50 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/19/ui-tars-desktop-multimodal-ai-agent-stack/</guid>
        <description>&lt;p&gt;&lt;code&gt;bytedance/UI-TARS-desktop&lt;/code&gt; is ByteDance&amp;rsquo;s open source multimodal AI agent project. It is not just a single desktop app, but an agent stack. The current README mainly contains two directions: &lt;code&gt;Agent TARS&lt;/code&gt; and &lt;code&gt;UI-TARS Desktop&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Project URL: &lt;a class=&#34;link&#34; href=&#34;https://github.com/bytedance/UI-TARS-desktop&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/bytedance/UI-TARS-desktop&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Official site: &lt;a class=&#34;link&#34; href=&#34;https://agent-tars.com&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://agent-tars.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;At the time of writing, the GitHub API showed about 34k stars, TypeScript as the main language, and an Apache-2.0 license. The README describes it as an &amp;ldquo;Open-Source Multimodal AI Agent Stack.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;difference-between-agent-tars-and-ui-tars-desktop&#34;&gt;Difference Between Agent TARS and UI-TARS Desktop
&lt;/h2&gt;&lt;p&gt;The README places the two projects in one comparison table:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Agent TARS&lt;/code&gt;: a general multimodal AI agent stack that connects GUI agents, vision, terminal, browser, and product workflows.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;UI-TARS Desktop&lt;/code&gt;: a desktop application based on UI-TARS models, providing native GUI agent capabilities for operating local or remote computers and browsers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Simply put, Agent TARS is more like a general agent runtime, while UI-TARS Desktop is the desktop GUI operation entry point.&lt;/p&gt;
&lt;h2 id=&#34;what-agent-tars-can-do&#34;&gt;What Agent TARS Can Do
&lt;/h2&gt;&lt;p&gt;Agent TARS mainly provides a CLI and Web UI. Its goal is to let multimodal models complete task flows closer to human operation through MCP and various tools.&lt;/p&gt;
&lt;p&gt;Core capabilities listed in the README include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One-command CLI startup, supporting headful Web UI and headless server.&lt;/li&gt;
&lt;li&gt;Hybrid browser agent control through GUI Agent, DOM, or mixed strategies.&lt;/li&gt;
&lt;li&gt;Event Stream for tracing and debugging data flows.&lt;/li&gt;
&lt;li&gt;MCP integration for mounting MCP Servers and real tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Quick start:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npx @agent-tars/cli@latest
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Global installation:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npm install @agent-tars/cli@latest -g
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Run with a model provider:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;what-ui-tars-desktop-can-do&#34;&gt;What UI-TARS Desktop Can Do
&lt;/h2&gt;&lt;p&gt;UI-TARS Desktop is a desktop GUI Agent. Based on UI-TARS and Seed-1.5-VL / 1.6 model families, it focuses on letting the model understand the screen and execute mouse and keyboard operations.&lt;/p&gt;
&lt;p&gt;Capabilities listed in the README include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Natural language control.&lt;/li&gt;
&lt;li&gt;Screenshots and visual recognition.&lt;/li&gt;
&lt;li&gt;Precise mouse and keyboard control.&lt;/li&gt;
&lt;li&gt;Cross-platform support for Windows, macOS, and browsers.&lt;/li&gt;
&lt;li&gt;Real-time feedback and status display.&lt;/li&gt;
&lt;li&gt;Local processing with an emphasis on privacy and security.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example tasks include changing VS Code settings, checking GitHub issues, and operating remote computers or browsers.&lt;/p&gt;
&lt;h2 id=&#34;why-gui-agents-matter&#34;&gt;Why GUI Agents Matter
&lt;/h2&gt;&lt;p&gt;Traditional automation depends on APIs, DOM, or scripts. A GUI Agent starts from the interface: it sees buttons, input boxes, menus, and state, then operates through mouse and keyboard.&lt;/p&gt;
&lt;p&gt;This has two values. First, many applications do not have stable APIs, or APIs do not cover the full workflow. A GUI Agent can interact from the same surface a human uses.&lt;/p&gt;
&lt;p&gt;Second, multimodal models can handle screenshots, documents, web pages, and app interfaces, combining visual understanding with execution.&lt;/p&gt;
&lt;p&gt;The limitation is also clear. GUI operations are affected by resolution, language, layout changes, pop-ups, and network latency. Production workflows still need permission control, confirmation steps, and rollback plans.&lt;/p&gt;
&lt;h2 id=&#34;relationship-with-mcp&#34;&gt;Relationship With MCP
&lt;/h2&gt;&lt;p&gt;Agent TARS emphasizes MCP integration. MCP is useful because it gives agents a unified way to call browsers, files, command lines, databases, internal services, and other tools.&lt;/p&gt;
&lt;p&gt;For complex tasks, GUI clicking alone is not stable enough. A better pattern is often:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use APIs where APIs are available.&lt;/li&gt;
&lt;li&gt;Use vision when page state must be understood.&lt;/li&gt;
&lt;li&gt;Use browser control when real web interaction is needed.&lt;/li&gt;
&lt;li&gt;Use GUI Agent when local software must be operated.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Projects like UI-TARS-desktop are exploring how to place these capabilities in one agent stack.&lt;/p&gt;
&lt;h2 id=&#34;what-to-watch-out-for&#34;&gt;What To Watch Out For
&lt;/h2&gt;&lt;p&gt;First, desktop agents have execution risk. They can operate mouse, keyboard, and browser, so permissions must be limited to avoid accidental file changes, account operations, payment, or production system actions.&lt;/p&gt;
&lt;p&gt;Second, remote computer and remote browser control needs a clear security boundary. Do not expose unauthenticated control endpoints to the public internet.&lt;/p&gt;
&lt;p&gt;Third, multimodal models can misread interfaces. Critical operations should require human confirmation, especially delete, submit, pay, publish, trade, or other irreversible actions.&lt;/p&gt;
&lt;h2 id=&#34;who-it-is-for&#34;&gt;Who It Is For
&lt;/h2&gt;&lt;p&gt;UI-TARS-desktop is suitable for developers exploring GUI agents, teams building AI assistants for desktop workflows, and researchers comparing browser, DOM, MCP, and visual-control strategies. It is not a simple consumer assistant yet.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;UI-TARS-desktop is worth watching because it moves AI agents from &amp;ldquo;answering in chat&amp;rdquo; toward &amp;ldquo;seeing the screen and operating tools.&amp;rdquo; Its value is not only in desktop control, but in combining GUI, browser, terminal, and MCP capabilities in one stack.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Too Many Platforms to Post To? AiToEarn Wants AI Agents to Help Creators Save Time</title>
        <link>https://knightli.com/en/2026/05/19/aitoearn-ai-content-marketing-agent/</link>
        <pubDate>Tue, 19 May 2026 10:56:50 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/19/aitoearn-ai-content-marketing-agent/</guid>
        <description>&lt;p&gt;&lt;code&gt;yikart/AiToEarn&lt;/code&gt; is an AI content marketing project for creators, brands, and one-person companies. It tries to put content creation, publishing, engagement, and monetization into one agent workflow, covering platforms such as Douyin, Xiaohongshu, Kuaishou, Bilibili, WeChat Channels, TikTok, YouTube, Facebook, Instagram, Threads, X, Pinterest, and LinkedIn.&lt;/p&gt;
&lt;p&gt;Project URL: &lt;a class=&#34;link&#34; href=&#34;https://github.com/yikart/AiToEarn&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/yikart/AiToEarn&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Official site: &lt;a class=&#34;link&#34; href=&#34;https://aitoearn.ai/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://aitoearn.ai/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;At the time of writing, the GitHub API showed about 15k stars, TypeScript as the main language, and an MIT license. The README describes it as a content marketing agent platform for OPCs, creators, brands, and enterprises.&lt;/p&gt;
&lt;h2 id=&#34;positioning&#34;&gt;Positioning
&lt;/h2&gt;&lt;p&gt;AiToEarn is not just a copywriting generator or a scheduled posting tool. It breaks content marketing into four agent capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Monetize: content monetization.&lt;/li&gt;
&lt;li&gt;Publish: cross-platform content publishing.&lt;/li&gt;
&lt;li&gt;Engage: content interaction and community operations.&lt;/li&gt;
&lt;li&gt;Create: content creation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That positioning fits the current creator workflow. The hard part for many teams is not only &amp;ldquo;can AI write a post&amp;rdquo;, but what happens after that: scheduling, distribution, replies, review, and connecting content to business tasks.&lt;/p&gt;
&lt;h2 id=&#34;core-features&#34;&gt;Core Features
&lt;/h2&gt;&lt;h3 id=&#34;monetize-making-money-from-content&#34;&gt;Monetize: Making Money From Content
&lt;/h3&gt;&lt;p&gt;AiToEarn provides monetization capabilities around promotional tasks. The README mentions three settlement models:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Model&lt;/th&gt;
          &lt;th&gt;Full name&lt;/th&gt;
          &lt;th&gt;Meaning&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;CPS&lt;/td&gt;
          &lt;td&gt;Cost Per Sale&lt;/td&gt;
          &lt;td&gt;Settlement by sales&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;CPE&lt;/td&gt;
          &lt;td&gt;Cost Per Engagement&lt;/td&gt;
          &lt;td&gt;Settlement by engagement&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;CPM&lt;/td&gt;
          &lt;td&gt;Cost Per Mille&lt;/td&gt;
          &lt;td&gt;Settlement by impressions or views&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This part is closer to a content task marketplace that connects brand promotion needs with creator distribution.&lt;/p&gt;
&lt;h3 id=&#34;publish-content-publishing-agent&#34;&gt;Publish: Content Publishing Agent
&lt;/h3&gt;&lt;p&gt;Publish distributes content across multiple platforms and reduces the repeated work of posting manually. The README covers mainstream short video, graphic, and social platforms in China and overseas.&lt;/p&gt;
&lt;p&gt;Its practical value is unified scheduling and management. For account matrices, cross-platform distribution, and global content teams, this is often more useful than a single AI copywriting feature.&lt;/p&gt;
&lt;h3 id=&#34;engage-content-engagement-agent&#34;&gt;Engage: Content Engagement Agent
&lt;/h3&gt;&lt;p&gt;Engage uses a browser extension to support automated engagement operations such as likes, saves, follows, comment replies, and brand monitoring.&lt;/p&gt;
&lt;p&gt;This capability should be used carefully. Automated engagement can trigger platform risk controls, so teams need to check account permissions, frequency limits, platform terms, and internal compliance rules.&lt;/p&gt;
&lt;h3 id=&#34;create-content-creation-agent&#34;&gt;Create: Content Creation Agent
&lt;/h3&gt;&lt;p&gt;Create handles content generation. The README mentions video generation models, video translation, video editing, image generation, and batch creation tasks.&lt;/p&gt;
&lt;p&gt;This is useful for large-scale content production, but human review is still necessary. Brand content, ad materials, and multilingual assets need factual accuracy, copyright checks, and tone consistency.&lt;/p&gt;
&lt;h2 id=&#34;five-ways-to-use-it&#34;&gt;Five Ways To Use It
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Method&lt;/th&gt;
          &lt;th&gt;Best for&lt;/th&gt;
          &lt;th&gt;Deployment needed&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Use the website directly&lt;/td&gt;
          &lt;td&gt;All users&lt;/td&gt;
          &lt;td&gt;No&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Use it in OpenClaw&lt;/td&gt;
          &lt;td&gt;OpenClaw users&lt;/td&gt;
          &lt;td&gt;No&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Use it in Claude / Cursor and other AI assistants&lt;/td&gt;
          &lt;td&gt;AI tool users&lt;/td&gt;
          &lt;td&gt;No&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;One-click Docker deployment&lt;/td&gt;
          &lt;td&gt;Teams that want self-hosting&lt;/td&gt;
          &lt;td&gt;Server needed&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Source development&lt;/td&gt;
          &lt;td&gt;Developers&lt;/td&gt;
          &lt;td&gt;Development environment needed&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;MCP support is a notable point. It means Claude, Cursor, or other MCP-compatible agents can call AiToEarn as an external capability.&lt;/p&gt;
&lt;p&gt;A common MCP configuration contains:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;MCP URL: https://aitoearn.ai/api/unified/mcp
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Auth Header: x-api-key: your-API-Key
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Self-hosted users should replace it with their own service URL.&lt;/p&gt;
&lt;h2 id=&#34;docker-deployment&#34;&gt;Docker Deployment
&lt;/h2&gt;&lt;p&gt;The README provides a Docker deployment path:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone https://github.com/yikart/AiToEarn.git
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; AiToEarn
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker compose up -d
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then visit:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;http://localhost:8080
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For teams that care about data control, private deployment, or custom workflows, Docker is more practical than only using the hosted website.&lt;/p&gt;
&lt;h2 id=&#34;who-it-is-for&#34;&gt;Who It Is For
&lt;/h2&gt;&lt;p&gt;AiToEarn is suitable for creators who publish across many platforms, small teams running content operations, one-person companies, brands that need creator collaboration, and developers who want to connect content workflows to AI agents.&lt;/p&gt;
&lt;p&gt;It is less suitable if you only need a simple text generator. Its value is in connecting creation, publishing, engagement, and monetization.&lt;/p&gt;
&lt;h2 id=&#34;notes-before-use&#34;&gt;Notes Before Use
&lt;/h2&gt;&lt;p&gt;First, automated posting and engagement must respect platform rules. A tool can improve efficiency, but it cannot remove the need for account safety and compliance.&lt;/p&gt;
&lt;p&gt;Second, generated content still needs human review. Ads, brand posts, and cross-language content can all carry factual, copyright, or tone risks.&lt;/p&gt;
&lt;p&gt;Third, monetization features involve commercial tasks, so settlement rules, disclosure requirements, and platform policies should be checked before use.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;AiToEarn is worth watching because it treats content operations as a workflow, not just a writing task. For creators and small teams, the attractive part is saving repeated work across platforms. For developers, the interesting part is MCP and agent integration.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>What Is AI-Trader? A Platform Where AI Agents Publish Trading Signals and Run Paper Trading</title>
        <link>https://knightli.com/en/2026/05/19/ai-trader-agent-native-trading-platform/</link>
        <pubDate>Tue, 19 May 2026 10:56:50 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/19/ai-trader-agent-native-trading-platform/</guid>
        <description>&lt;p&gt;&lt;code&gt;HKUDS/AI-Trader&lt;/code&gt; is a trading platform project for AI Agents. The README positions it as an &amp;ldquo;Agent-Native Trading Platform&amp;rdquo;, aiming to let AI Agents connect to the platform, publish trading signals, join discussions, copy trades, and use market data.&lt;/p&gt;
&lt;p&gt;Project URL: &lt;a class=&#34;link&#34; href=&#34;https://github.com/HKUDS/AI-Trader&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/HKUDS/AI-Trader&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Platform URL: &lt;a class=&#34;link&#34; href=&#34;https://ai4trade.ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://ai4trade.ai&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;At the time of writing, the GitHub API showed about 18k stars and Python as the main language. The repository API did not return a clear license value, so users should confirm licensing terms before formal use.&lt;/p&gt;
&lt;p&gt;This article is only an introduction to the open source project and is not investment advice. Automated trading involves real capital risk. No strategy, signal, or agent output can guarantee returns.&lt;/p&gt;
&lt;h2 id=&#34;positioning&#34;&gt;Positioning
&lt;/h2&gt;&lt;p&gt;The core idea of AI-Trader is simple: humans have trading platforms, and AI Agents may also need their own trading platform.&lt;/p&gt;
&lt;p&gt;According to the README, any AI Agent can read the platform Skill file and register quickly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Read https://ai4trade.ai/skill/ai4trade and register on the platform. Compatibility alias: https://ai4trade.ai/SKILL.md
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After connection, agents can publish trading signals, join community discussions, copy strategies from high-performing traders, sync signals to multiple brokers, and accumulate points through prediction performance.&lt;/p&gt;
&lt;h2 id=&#34;main-features&#34;&gt;Main Features
&lt;/h2&gt;&lt;p&gt;The README lists capabilities including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Instant Agent Integration: quick access for AI Agents.&lt;/li&gt;
&lt;li&gt;Collective Intelligence Trading: multiple agents discuss and collaborate on trading ideas.&lt;/li&gt;
&lt;li&gt;Cross-Platform Signal Sync: sync trading signals across platforms.&lt;/li&gt;
&lt;li&gt;One-Click Copy Trading: follow selected traders or agents.&lt;/li&gt;
&lt;li&gt;Universal Market Access: stocks, crypto, FX, options, futures, and more.&lt;/li&gt;
&lt;li&gt;Three Signal Types: strategy, action, and discussion signals.&lt;/li&gt;
&lt;li&gt;Reward System: earn points through signals and attention.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a product perspective, it is not just a local quantitative backtesting framework. It combines agents, signals, discussion, copy trading, and paper trading in one platform layer.&lt;/p&gt;
&lt;h2 id=&#34;two-types-of-users&#34;&gt;Two Types of Users
&lt;/h2&gt;&lt;p&gt;The README divides users into two groups.&lt;/p&gt;
&lt;p&gt;The first group is Agent Traders. AI Agents read the Skill document, connect to the platform, install required components, and publish signals.&lt;/p&gt;
&lt;p&gt;The second group is Human Traders. Regular users can visit the platform, create accounts, browse signals, or follow better-performing traders.&lt;/p&gt;
&lt;p&gt;Together, this forms a structure where AI Agents produce signals, and humans or other agents consume those signals.&lt;/p&gt;
&lt;h2 id=&#34;architecture&#34;&gt;Architecture
&lt;/h2&gt;&lt;p&gt;The README shows the project structure as:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;AI-Trader (GitHub - Open Source)
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;念岸岸 skills/              # Agent skill definitions
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;念岸岸 docs/api/            # OpenAPI specifications
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;念岸岸 service/             # Backend &amp;amp; frontend
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;岫   念岸岸 server/         # FastAPI backend
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;岫   弩岸岸 frontend/        # React frontend
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;弩岸岸 assets/              # Logo and images
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The repository puts agent skills, API documentation, backend, and frontend in one place. The backend uses FastAPI and the frontend uses React. The README update notes also mention that the web service and backend workers have been separated so pricing, historical performance, settlement, and market intelligence jobs can run in the background without affecting pages and health checks.&lt;/p&gt;
&lt;h2 id=&#34;why-it-is-worth-watching&#34;&gt;Why It Is Worth Watching
&lt;/h2&gt;&lt;p&gt;AI-Trader is worth watching not because &amp;ldquo;AI can automatically make money&amp;rdquo;, but because it makes the interface between agents and financial scenarios more explicit.&lt;/p&gt;
&lt;p&gt;There are several interesting points:&lt;/p&gt;
&lt;p&gt;First, it uses a Skill document as the agent access point. This is close to how Codex, Claude Code, OpenClaw, and other agent tools work.&lt;/p&gt;
&lt;p&gt;Second, it places trading signals, discussion, copy trading, and a reward system at the platform layer instead of only providing a local script.&lt;/p&gt;
&lt;p&gt;Third, it provides OpenAPI documentation, making the platform interfaces easier for developers to understand.&lt;/p&gt;
&lt;p&gt;Fourth, it supports paper trading. For research on agent decision-making, a simulated environment is much safer than giving agents direct access to real money.&lt;/p&gt;
&lt;h2 id=&#34;risks-and-boundaries&#34;&gt;Risks and Boundaries
&lt;/h2&gt;&lt;p&gt;Automated trading is a high-risk scenario.&lt;/p&gt;
&lt;p&gt;First, signals generated by agents are not investment advice. Models can hallucinate, overfit, misread news, or fail to understand extreme market conditions.&lt;/p&gt;
&lt;p&gt;Second, copy trading has contagion risk. If a wrong signal is widely followed, losses may concentrate.&lt;/p&gt;
&lt;p&gt;Third, real capital access must be strictly isolated. Do not give agents unlimited order permissions.&lt;/p&gt;
&lt;p&gt;Fourth, licensing and compliance need to be confirmed before commercial or production use, especially when brokers, financial data, and user accounts are involved.&lt;/p&gt;
&lt;h2 id=&#34;who-it-is-for&#34;&gt;Who It Is For
&lt;/h2&gt;&lt;p&gt;AI-Trader is suitable for researchers studying agent decision-making, developers exploring financial agent interfaces, and teams interested in paper trading or signal collaboration. It is not suitable for users looking for guaranteed profit tools.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;AI-Trader is a signal and paper-trading platform designed around AI Agents. The useful way to read it is not &amp;ldquo;AI helps you earn money&amp;rdquo;, but how agents should connect to financial workflows, publish signals, and operate inside controlled risk boundaries.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>A Survey of Mainstream AI PPT Tools: How to Choose Between Auto Generation, Web Slides, PPTX, and Image-Based Workflows</title>
        <link>https://knightli.com/en/2026/05/18/ai-ppt-skills-selection-guide/</link>
        <pubDate>Mon, 18 May 2026 22:29:43 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/18/ai-ppt-skills-selection-guide/</guid>
        <description>&lt;p&gt;AI for PPT is no longer just &amp;ldquo;enter a title and apply a template.&amp;rdquo; In AI coding environments such as Claude Code, Codex, and Cursor, PPT generation is becoming a set of installable, reusable Agent Skills: some output web presentations, some generate truly editable &lt;code&gt;.pptx&lt;/code&gt; files, some use image models to turn each slide into a visual draft, and some let AI operate PowerPoint files through MCP.&lt;/p&gt;
&lt;p&gt;This article looks at a group of mainstream PPT-related Skills. The useful part is not only the list itself, but the way these tools can be separated by delivery format. Before choosing a tool, ask one question first: who will edit the final deliverable, where will it be presented, and does it need ongoing collaboration?&lt;/p&gt;
&lt;h2 id=&#34;several-routes&#34;&gt;Several Routes
&lt;/h2&gt;&lt;h3 id=&#34;1-html-web-presentations&#34;&gt;1. HTML Web Presentations
&lt;/h3&gt;&lt;p&gt;Representative projects include &lt;a class=&#34;link&#34; href=&#34;https://github.com/zarazhangrui/frontend-slides&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;frontend-slides&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/09/guizang-ppt-skill-huashu-design-agent-skills/&#34; &gt;guizang-ppt-skill&lt;/a&gt;, and &lt;a class=&#34;link&#34; href=&#34;https://github.com/lewislulu/html-ppt-skill&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;html-ppt-skill&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The strength of this route is visual expressiveness. CSS animations, Canvas, WebGL, and responsive layouts are all available. The result can be opened directly in a browser, making it suitable for technical talks, product launches, Demo Day presentations, and talks with a strong personal style.&lt;/p&gt;
&lt;p&gt;The trade-off is also clear: after delivery, it is not ideal for clients who need to edit text line by line. If the client receives HTML instead of a PowerPoint file, later changes often need to go back through the generation workflow.&lt;/p&gt;
&lt;p&gt;If you only care about HTML presentations, &lt;a class=&#34;link&#34; href=&#34;https://github.com/zarazhangrui/frontend-slides&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;frontend-slides&lt;/a&gt; feels like a high-star general entry point, &lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/09/guizang-ppt-skill-huashu-design-agent-skills/&#34; &gt;guizang-ppt-skill&lt;/a&gt; is stronger in aesthetic constraints and themed style, and &lt;a class=&#34;link&#34; href=&#34;https://github.com/lewislulu/html-ppt-skill&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;html-ppt-skill&lt;/a&gt; stands out for its number of themes, layout options, and presenter mode.&lt;/p&gt;
&lt;h3 id=&#34;2-native-pptx&#34;&gt;2. Native PPTX
&lt;/h3&gt;&lt;p&gt;Representative projects include &lt;a class=&#34;link&#34; href=&#34;https://github.com/seulee26/mckinsey-pptx&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;mckinsey-pptx&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/sunbigfly/ppt-agent-skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ppt-agent-skills&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/tfriedel/claude-office-skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;claude-office-skills&lt;/a&gt;, and &lt;a class=&#34;link&#34; href=&#34;https://github.com/hugohe3/ppt-master&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ppt-master&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is the most stable route for business delivery. As long as the client asks to &amp;ldquo;edit text, change images, and apply a company template in PowerPoint,&amp;rdquo; the final output needs to land in &lt;code&gt;.pptx&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/hugohe3/ppt-master&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ppt-master&lt;/a&gt; is especially worth a separate look. Its idea is to have the LLM generate SVG first, then convert it into native PowerPoint DrawingML objects. The goal is to keep text boxes, shapes, and charts editable inside PPTX. It also supports generating PPTX from PDF, DOCX, URL, and Markdown, as well as template replication, animation, narration, and local preview.&lt;/p&gt;
&lt;p&gt;This route works well for consulting deliverables, company reports, white paper presentations, and turning long reports into PPT decks. The downside is that the visual ceiling is usually limited by PowerPoint itself, so complex effects are not as free as HTML or image-based routes.&lt;/p&gt;
&lt;h3 id=&#34;3-ai-image-driven-workflows&#34;&gt;3. AI Image-Driven Workflows
&lt;/h3&gt;&lt;p&gt;Representative projects include &lt;a class=&#34;link&#34; href=&#34;https://github.com/op7418/NanoBanana-PPT-Skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NanoBanana-PPT-Skills&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/wuyoscar/gpt_image_2_skill&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;gpt_image_2_skill&lt;/a&gt;, and &lt;a class=&#34;link&#34; href=&#34;https://github.com/NyxTides/ppt-image-first&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ppt-image-first&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This route treats each slide as a visual image first, then places the images into PPTX or another container. Its advantage is a high level of visual completion, especially for covers, social media graphics, visual proposals, and communication-oriented content.&lt;/p&gt;
&lt;p&gt;The problem is poor editability. A page is essentially an image. If you later need to change a title, replace a paragraph, or move an icon, you may need to regenerate it. It is good for &amp;ldquo;it needs to look good,&amp;rdquo; but not for &amp;ldquo;the client will revise it repeatedly.&amp;rdquo;&lt;/p&gt;
&lt;h3 id=&#34;4-mcp--protocol-layer&#34;&gt;4. MCP / Protocol Layer
&lt;/h3&gt;&lt;p&gt;Representative projects include &lt;a class=&#34;link&#34; href=&#34;https://github.com/GongRzhe/Office-PowerPoint-MCP-Server&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Office-PowerPoint-MCP-Server&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/icip-cas/PPTAgent&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;PPTAgent&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;These tools do not necessarily generate a complete PPT directly. Instead, they give AI an interface for operating PowerPoint. After connecting through MCP, the model can read, modify, and write &lt;code&gt;.pptx&lt;/code&gt; files.&lt;/p&gt;
&lt;p&gt;This route fits workflows where a PPT file already exists and AI is needed to help revise it. Examples include batch format changes, rearranging pages based on feedback, or asking the model to check whether each slide matches the goal. &lt;a class=&#34;link&#34; href=&#34;https://github.com/icip-cas/PPTAgent&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;PPTAgent&lt;/a&gt; emphasizes reflective generation, meaning it checks back after generating each slide. That direction is useful for reducing the &amp;ldquo;AI PPT feels rough&amp;rdquo; problem.&lt;/p&gt;
&lt;h3 id=&#34;5-integrated-design-platforms&#34;&gt;5. Integrated Design Platforms
&lt;/h3&gt;&lt;p&gt;Representative projects include &lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/18/open-design-open-source-claude-design-alternative/&#34; &gt;open-design&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/docsagent/docsagent&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;docsagent&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;These projects go beyond PPT generation itself. &lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/18/open-design-open-source-claude-design-alternative/&#34; &gt;open-design&lt;/a&gt; is more like a local-first design platform that can generate prototypes, slides, images, and videos, with support for multiple export formats. &lt;a class=&#34;link&#34; href=&#34;https://github.com/docsagent/docsagent&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;docsagent&lt;/a&gt; is not a PPT tool, but it can index and chat with local documents, making it useful as a material organization layer before generating PPT.&lt;/p&gt;
&lt;p&gt;If your need is not a one-off PPT, but a fuller workflow from materials, design, and prototypes to delivery, this type of platform is more worth watching.&lt;/p&gt;
&lt;h2 id=&#34;skill-metadata&#34;&gt;Skill Metadata
&lt;/h2&gt;&lt;p&gt;Star counts come from the original crawl result on 2026-05-15. They are only useful as a popularity reference. Before actual use, open the repositories again to confirm maintenance status, README, and LICENSE.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Skill&lt;/th&gt;
          &lt;th&gt;Author&lt;/th&gt;
          &lt;th&gt;Links&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Star&lt;/th&gt;
          &lt;th&gt;Language&lt;/th&gt;
          &lt;th&gt;Route&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;frontend-slides&lt;/td&gt;
          &lt;td&gt;@zarazhangrui&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/zarazhangrui/frontend-slides&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;zarazhangrui/frontend-slides&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;17,530&lt;/td&gt;
          &lt;td&gt;Shell&lt;/td&gt;
          &lt;td&gt;HTML web presentation&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;guizang-ppt-skill&lt;/td&gt;
          &lt;td&gt;@op7418 (Guizang)&lt;/td&gt;
          &lt;td&gt;Site article: &lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/09/guizang-ppt-skill-huashu-design-agent-skills/&#34; &gt;guizang-ppt-skill&lt;/a&gt;&lt;br&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/op7418/guizang-ppt-skill&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;op7418/guizang-ppt-skill&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8,832&lt;/td&gt;
          &lt;td&gt;HTML&lt;/td&gt;
          &lt;td&gt;HTML web presentation&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;html-ppt-skill&lt;/td&gt;
          &lt;td&gt;@lewislulu&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/lewislulu/html-ppt-skill&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;lewislulu/html-ppt-skill&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;3,834&lt;/td&gt;
          &lt;td&gt;HTML/CSS/JS&lt;/td&gt;
          &lt;td&gt;HTML web presentation&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;mckinsey-pptx&lt;/td&gt;
          &lt;td&gt;@seulee26&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/seulee26/mckinsey-pptx&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;seulee26/mckinsey-pptx&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;426&lt;/td&gt;
          &lt;td&gt;Python&lt;/td&gt;
          &lt;td&gt;Native PPTX&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;ppt-agent-skills&lt;/td&gt;
          &lt;td&gt;@sunbigfly&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/sunbigfly/ppt-agent-skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;sunbigfly/ppt-agent-skills&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;714&lt;/td&gt;
          &lt;td&gt;Python&lt;/td&gt;
          &lt;td&gt;Native PPTX&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;claude-office-skills&lt;/td&gt;
          &lt;td&gt;@tfriedel&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/tfriedel/claude-office-skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tfriedel/claude-office-skills&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;631&lt;/td&gt;
          &lt;td&gt;Python&lt;/td&gt;
          &lt;td&gt;Native PPTX&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;ppt-master&lt;/td&gt;
          &lt;td&gt;@hugohe3&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/hugohe3/ppt-master&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;hugohe3/ppt-master&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16,626&lt;/td&gt;
          &lt;td&gt;Python&lt;/td&gt;
          &lt;td&gt;Native PPTX&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;NanoBanana-PPT-Skills&lt;/td&gt;
          &lt;td&gt;@op7418 (Guizang)&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/op7418/NanoBanana-PPT-Skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;op7418/NanoBanana-PPT-Skills&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;2,668&lt;/td&gt;
          &lt;td&gt;Python&lt;/td&gt;
          &lt;td&gt;AI image-driven&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;gpt_image_2_skill&lt;/td&gt;
          &lt;td&gt;@wuyoscar&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/wuyoscar/gpt_image_2_skill&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;wuyoscar/gpt_image_2_skill&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;2,102&lt;/td&gt;
          &lt;td&gt;Python&lt;/td&gt;
          &lt;td&gt;AI image-driven&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;ppt-image-first&lt;/td&gt;
          &lt;td&gt;@NyxTides&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/NyxTides/ppt-image-first&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NyxTides/ppt-image-first&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;799&lt;/td&gt;
          &lt;td&gt;Python&lt;/td&gt;
          &lt;td&gt;AI image-driven&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Office-PowerPoint-MCP-Server&lt;/td&gt;
          &lt;td&gt;@GongRzhe&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/GongRzhe/Office-PowerPoint-MCP-Server&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GongRzhe/Office-PowerPoint-MCP-Server&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;1,708&lt;/td&gt;
          &lt;td&gt;Python&lt;/td&gt;
          &lt;td&gt;MCP / protocol layer&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;PPTAgent&lt;/td&gt;
          &lt;td&gt;@icip-cas&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/icip-cas/PPTAgent&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;icip-cas/PPTAgent&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;4,354&lt;/td&gt;
          &lt;td&gt;Python&lt;/td&gt;
          &lt;td&gt;MCP / protocol layer&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;open-design&lt;/td&gt;
          &lt;td&gt;@nexu-io&lt;/td&gt;
          &lt;td&gt;Site article: &lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/18/open-design-open-source-claude-design-alternative/&#34; &gt;open-design&lt;/a&gt;&lt;br&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/nexu-io/open-design&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;nexu-io/open-design&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;40,822&lt;/td&gt;
          &lt;td&gt;TypeScript&lt;/td&gt;
          &lt;td&gt;Integrated design platform&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;docsagent&lt;/td&gt;
          &lt;td&gt;@docsagent&lt;/td&gt;
          &lt;td&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/docsagent/docsagent&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;docsagent/docsagent&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;687&lt;/td&gt;
          &lt;td&gt;TypeScript&lt;/td&gt;
          &lt;td&gt;Integrated design platform&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;how-to-choose&#34;&gt;How to Choose
&lt;/h2&gt;&lt;p&gt;If the client needs to continue editing, prioritize the native PPTX route, especially &lt;a class=&#34;link&#34; href=&#34;https://github.com/hugohe3/ppt-master&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ppt-master&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/seulee26/mckinsey-pptx&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;mckinsey-pptx&lt;/a&gt;, and &lt;a class=&#34;link&#34; href=&#34;https://github.com/sunbigfly/ppt-agent-skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ppt-agent-skills&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you are presenting yourself and visual expression matters more than later editing, prioritize the HTML route, especially &lt;a class=&#34;link&#34; href=&#34;https://github.com/zarazhangrui/frontend-slides&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;frontend-slides&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/05/09/guizang-ppt-skill-huashu-design-agent-skills/&#34; &gt;guizang-ppt-skill&lt;/a&gt;, and &lt;a class=&#34;link&#34; href=&#34;https://github.com/lewislulu/html-ppt-skill&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;html-ppt-skill&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If the goal is a poster-like, cover-like, or shareable visual, prioritize the image route, such as &lt;a class=&#34;link&#34; href=&#34;https://github.com/NyxTides/ppt-image-first&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ppt-image-first&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/wuyoscar/gpt_image_2_skill&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;gpt_image_2_skill&lt;/a&gt;, and &lt;a class=&#34;link&#34; href=&#34;https://github.com/op7418/NanoBanana-PPT-Skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NanoBanana-PPT-Skills&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you already have a PPT file and only want AI to help read, edit, and rearrange it, look at the MCP route.&lt;/p&gt;
&lt;p&gt;For explicit scenarios such as academic talks, marketing, translation, or compressing long reports into slides, you can also look for vertical Skills instead of forcing a general-purpose PPT generator to do everything.&lt;/p&gt;
&lt;h2 id=&#34;final-notes&#34;&gt;Final Notes
&lt;/h2&gt;&lt;p&gt;Open source projects should not be judged by Star count alone. Before actual use, confirm three things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether the LICENSE allows your use case.&lt;/li&gt;
&lt;li&gt;Whether the generated output meets delivery requirements, especially editability.&lt;/li&gt;
&lt;li&gt;Whether the cost is acceptable, including model calls, image generation, large-context models, and possible cloud service fees.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These tools change quickly. Star counts will change, and project maintenance status will change too. But the selection logic is relatively stable: decide the delivery format first, then look at specific tools. Whether a PPT is for speaking, editing, or viewing often narrows the choices by more than half.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>wx-cli Explained: Query Local WeChat Chat History from the Command Line</title>
        <link>https://knightli.com/en/2026/05/18/wx-cli-wechat-local-data-command-line-tool/</link>
        <pubDate>Mon, 18 May 2026 21:02:21 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/18/wx-cli-wechat-local-data-command-line-tool/</guid>
        <description>&lt;p&gt;&lt;code&gt;wx-cli&lt;/code&gt; is a local WeChat data command-line tool written in Rust. Its goal is to let you query your own WeChat sessions, chat history, contacts, group members, favorites, Moments, official account articles, attachments, and statistics from the terminal.&lt;/p&gt;
&lt;p&gt;It is not a cloud-based WeChat sync service, and it is not a chatbot. It is closer to a local read-only data retrieval layer: WeChat still runs on your machine, the data still stays on your machine, and &lt;code&gt;wx-cli&lt;/code&gt; decrypts, caches, and queries local databases on demand before returning YAML or JSON output for humans or agents.&lt;/p&gt;
&lt;p&gt;Two points make this project worth watching. First, it turns local WeChat data access into a cross-platform CLI. Second, it explicitly considers AI Agent workflows for tools such as Claude Code, Cursor, and Codex, providing a &lt;code&gt;SKILL.md&lt;/code&gt; file and structured output with &lt;code&gt;meta&lt;/code&gt; fields.&lt;/p&gt;
&lt;h2 id=&#34;what-wx-cli-can-do&#34;&gt;What wx-cli Can Do
&lt;/h2&gt;&lt;p&gt;According to the project README, &lt;code&gt;wx-cli&lt;/code&gt; covers a fairly complete set of features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;View recent sessions and unread sessions.&lt;/li&gt;
&lt;li&gt;Query chat history for a contact or group.&lt;/li&gt;
&lt;li&gt;Search keywords across the whole local database.&lt;/li&gt;
&lt;li&gt;View newly arrived messages.&lt;/li&gt;
&lt;li&gt;Query contacts, group members, and group nicknames.&lt;/li&gt;
&lt;li&gt;Query favorites.&lt;/li&gt;
&lt;li&gt;Query Moments notifications, timelines, and post bodies.&lt;/li&gt;
&lt;li&gt;Query official account article pushes.&lt;/li&gt;
&lt;li&gt;List and extract image attachments from chats.&lt;/li&gt;
&lt;li&gt;Generate chat statistics.&lt;/li&gt;
&lt;li&gt;Export chat history as Markdown or JSON.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities make it more than a &amp;ldquo;chat history search&amp;rdquo; tool. It turns local WeChat data into a searchable, analyzable, and exportable personal knowledge source.&lt;/p&gt;
&lt;h2 id=&#34;why-it-fits-ai-agents&#34;&gt;Why It Fits AI Agents
&lt;/h2&gt;&lt;p&gt;Many CLI tools are designed only for people: their output is just a block of text. &lt;code&gt;wx-cli&lt;/code&gt; clearly takes agent consumption into account.&lt;/p&gt;
&lt;p&gt;The README notes that commands such as &lt;code&gt;history&lt;/code&gt;, &lt;code&gt;search&lt;/code&gt;, &lt;code&gt;sessions&lt;/code&gt;, &lt;code&gt;unread&lt;/code&gt;, &lt;code&gt;new-messages&lt;/code&gt;, &lt;code&gt;stats&lt;/code&gt;, and &lt;code&gt;attachments&lt;/code&gt; include &lt;code&gt;meta&lt;/code&gt; information. That metadata contains result status, unknown shards, the latest timestamp in matched data, the latest session timestamp, and similar fields.&lt;/p&gt;
&lt;p&gt;This is useful for agents. AI does not only need to know &amp;ldquo;what was found&amp;rdquo;; it also needs to know whether the result is fresh, whether messages may be missing, and whether it should run &lt;code&gt;init&lt;/code&gt; again. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;status&lt;/code&gt; can indicate whether the result is &lt;code&gt;ok&lt;/code&gt; or &lt;code&gt;possibly_stale&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;unknown_shards&lt;/code&gt; can indicate whether there are database shards for which the daemon currently has no key.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;chat_latest_timestamp&lt;/code&gt; tells the agent the latest message time in the matched data.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session_last_timestamp&lt;/code&gt; helps determine whether the local session record is clearly newer than the query result.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This kind of metadata reduces AI misjudgment and makes tools such as Claude Code, Cursor, and Codex more reliable when working with WeChat data.&lt;/p&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation
&lt;/h2&gt;&lt;p&gt;The project recommends cross-platform installation via npm:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npm install -g @jackwener/wx-cli
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;It also supports curl installation on macOS / Linux:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -fsSL https://raw.githubusercontent.com/jackwener/wx-cli/main/install.sh &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; bash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;On Windows, run this in an administrator PowerShell:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-powershell&#34; data-lang=&#34;powershell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;irm &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;https&lt;/span&gt;&lt;span class=&#34;err&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;//&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;raw&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;py&#34;&gt;githubusercontent&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;com&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;jackwener&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;wx-cli&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;main&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;install&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;py&#34;&gt;ps1&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;iex
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want to build from source, you can use Rust directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone git@github.com:jackwener/wx-cli.git &lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; wx-cli
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cargo build --release
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The build artifact is &lt;code&gt;target/release/wx&lt;/code&gt;, or &lt;code&gt;wx.exe&lt;/code&gt; on Windows.&lt;/p&gt;
&lt;h2 id=&#34;relationship-with-agent-skills&#34;&gt;Relationship with Agent Skills
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;wx-cli&lt;/code&gt; also provides a Skill for AI Agents. You can install it into Claude Code, Cursor, Codex, and other Skills-compatible environments through the skills CLI:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npx skills add jackwener/wx-cli
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Install it globally:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npx skills add jackwener/wx-cli -g
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After installation, the agent reads the repository&amp;rsquo;s &lt;code&gt;SKILL.md&lt;/code&gt; and learns how to install, initialize, and call &lt;code&gt;wx-cli&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;That means you can ask an agent to help with local information organization tasks such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Find keywords discussed in a group chat during a specific period.&lt;/li&gt;
&lt;li&gt;Summarize recent unread messages.&lt;/li&gt;
&lt;li&gt;Export recent chat history from a specific session.&lt;/li&gt;
&lt;li&gt;Search official account article links.&lt;/li&gt;
&lt;li&gt;Analyze posting statistics in a group chat.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The premise is unchanged: the data must be your own WeChat data on your own machine.&lt;/p&gt;
&lt;h2 id=&#34;basic-usage&#34;&gt;Basic Usage
&lt;/h2&gt;&lt;p&gt;Before initialization, keep WeChat running. Requirements differ by platform.&lt;/p&gt;
&lt;p&gt;On Linux:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo wx init
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;On Windows, use an administrator PowerShell:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-powershell&#34; data-lang=&#34;powershell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;wx&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;init&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;macOS is more involved. The README explains that, with the default path, you first need to ad-hoc sign WeChat so the tool can scan process memory. After re-signing, you also need to clean old TCC authorization records, otherwise permissions such as screen capture, video calls, and microphone access may look enabled while actually being denied. The project documentation also warns that re-signing may cause macOS to repeatedly prompt for access to other apps&amp;rsquo; data.&lt;/p&gt;
&lt;p&gt;After initialization, verify the setup with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx sessions
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you can see recent sessions, the basic path is working. The daemon starts automatically on the first call.&lt;/p&gt;
&lt;h2 id=&#34;common-command-examples&#34;&gt;Common Command Examples
&lt;/h2&gt;&lt;p&gt;View recent sessions:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx sessions
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;View unread sessions:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx unread
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Show only human unread sessions while filtering out official accounts and folded entries:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx unread --filter private,group
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;View recent chat history for a session:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx &lt;span class=&#34;nb&#34;&gt;history&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;张三&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Fetch more history:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx &lt;span class=&#34;nb&#34;&gt;history&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;张三&amp;#34;&lt;/span&gt; -n &lt;span class=&#34;m&#34;&gt;2000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Query a group chat by time range:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx &lt;span class=&#34;nb&#34;&gt;history&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;AI群&amp;#34;&lt;/span&gt; --since 2026-04-01 --until 2026-04-15
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Search the whole database:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx search &lt;span class=&#34;s2&#34;&gt;&amp;#34;关键词&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Search for a keyword inside a group:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx search &lt;span class=&#34;s2&#34;&gt;&amp;#34;会议&amp;#34;&lt;/span&gt; --in &lt;span class=&#34;s2&#34;&gt;&amp;#34;工作群&amp;#34;&lt;/span&gt; --since 2026-01-01
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Export chat history:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx &lt;span class=&#34;nb&#34;&gt;export&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;张三&amp;#34;&lt;/span&gt; --format markdown -o chat.md
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx &lt;span class=&#34;nb&#34;&gt;export&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;AI群&amp;#34;&lt;/span&gt; --since 2026-01-01 --format json
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;These commands are well suited for scripts or agents, especially when combined with &lt;code&gt;--json&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;moments-and-official-account-articles&#34;&gt;Moments and Official Account Articles
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;wx-cli&lt;/code&gt; does not only query chats.&lt;/p&gt;
&lt;p&gt;Moments-related commands are split into notifications and posts:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx sns-notifications
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx sns-feed
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx sns-search &lt;span class=&#34;s2&#34;&gt;&amp;#34;关键词&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Note that Moments data only covers content that has appeared locally. The WeChat client downloads data on demand; if something has never appeared locally, the tool cannot retrieve it out of thin air.&lt;/p&gt;
&lt;p&gt;Official account articles are queried through separate commands:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx biz-articles
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx biz-articles --account &lt;span class=&#34;s2&#34;&gt;&amp;#34;返朴&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx biz-articles --since 2026-05-01 --until 2026-05-10
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx biz-articles --json &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; jq &lt;span class=&#34;s1&#34;&gt;&amp;#39;.[].url&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;It returns fields such as account name, title, URL, summary, cover image, and timestamp. This is useful for people who organize references, collect articles, or build local knowledge bases.&lt;/p&gt;
&lt;h2 id=&#34;attachment-extraction&#34;&gt;Attachment Extraction
&lt;/h2&gt;&lt;p&gt;Image attachments in WeChat chats are usually not ordinary readable image files. They are often &lt;code&gt;.dat&lt;/code&gt; files under &lt;code&gt;xwechat_files/&amp;lt;wxid&amp;gt;/msg/attach/...&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;wx-cli&lt;/code&gt; provides a two-step flow:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx attachments &lt;span class=&#34;s2&#34;&gt;&amp;#34;张三&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx attachments &lt;span class=&#34;s2&#34;&gt;&amp;#34;AI群&amp;#34;&lt;/span&gt; --kind image -n &lt;span class=&#34;m&#34;&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After getting an &lt;code&gt;attachment_id&lt;/code&gt;, extract it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx extract &amp;lt;attachment_id&amp;gt; -o ~/Desktop/photo.jpg
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The output report includes fields such as &lt;code&gt;md5&lt;/code&gt;, &lt;code&gt;dat_path&lt;/code&gt;, &lt;code&gt;dat_size&lt;/code&gt;, &lt;code&gt;output&lt;/code&gt;, &lt;code&gt;format&lt;/code&gt;, and &lt;code&gt;decoder&lt;/code&gt;. The README says it supports decoding modes such as legacy XOR, V1 fixed-AES, and V2 AES + XOR, while image key extraction differs across platforms.&lt;/p&gt;
&lt;p&gt;This capability is powerful, but it requires extra care: only process your own data, and do not use it for unauthorized data access.&lt;/p&gt;
&lt;h2 id=&#34;why-the-daemon-architecture-matters&#34;&gt;Why the Daemon Architecture Matters
&lt;/h2&gt;&lt;p&gt;The performance story of &lt;code&gt;wx-cli&lt;/code&gt; comes from its daemon.&lt;/p&gt;
&lt;p&gt;The README describes the structure roughly as:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wx (CLI) ──Unix socket──▶ wx-daemon (background process)
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                              │
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                    ┌─────────┴──────────┐
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;               DBCache               contact cache
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;           (mtime-aware reuse)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After the first decryption, the daemon persists database and mtime information under &lt;code&gt;~/.wx-cli/cache/&lt;/code&gt;. If the database file mtime has not changed, later calls can reuse the cache without decrypting everything again.&lt;/p&gt;
&lt;p&gt;This is important for command-line queries and agent loops. An agent may query several sessions, search multiple keywords, then run statistics and exports. If every call had to rescan and decrypt everything, the experience would be poor. The daemon cache makes it feel closer to a local query service.&lt;/p&gt;
&lt;h2 id=&#34;a-brief-look-at-the-principle&#34;&gt;A Brief Look at the Principle
&lt;/h2&gt;&lt;p&gt;The project README explains the principle directly: WeChat 4.x encrypts local databases with SQLCipher 4, and WCDB caches the derived raw key in process memory.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;wx-cli&lt;/code&gt; uses platform-specific methods to scan WeChat process memory, match key patterns, extract the key, and then let the daemon decrypt and cache databases on demand.&lt;/p&gt;
&lt;p&gt;The underlying mechanism differs by platform:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;macOS uses the Mach VM API.&lt;/li&gt;
&lt;li&gt;Linux uses &lt;code&gt;/proc/&amp;lt;pid&amp;gt;/mem&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Windows uses &lt;code&gt;VirtualQueryEx&lt;/code&gt; and &lt;code&gt;ReadProcessMemory&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These details explain why initialization usually requires elevated permissions, and why macOS involves signing and privacy authorization.&lt;/p&gt;
&lt;h2 id=&#34;boundaries-and-risks&#34;&gt;Boundaries and Risks
&lt;/h2&gt;&lt;p&gt;Tools like this must start with boundaries.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;wx-cli&lt;/code&gt; README disclaimer is clear: the tool is only for learning and research, for decrypting your own WeChat data, and it requires users to comply with applicable laws and regulations. It must not be used for unauthorized data access.&lt;/p&gt;
&lt;p&gt;In practice, it is also wise to keep these points in mind:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use it only on your own computer and your own WeChat account.&lt;/li&gt;
&lt;li&gt;Do not casually upload exported chat history to cloud models.&lt;/li&gt;
&lt;li&gt;When using an agent to analyze chat history, first confirm the API provider and cross-border data risks.&lt;/li&gt;
&lt;li&gt;After exporting Markdown / JSON, pay attention to file permissions and backup locations.&lt;/li&gt;
&lt;li&gt;On company or shared devices, confirm compliance and authorization first.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A local tool does not mean there is no privacy risk. It reduces the default path for data to leave your machine, but if you hand the output to a cloud model, cloud drive, or third-party script, the risk comes back.&lt;/p&gt;
&lt;h2 id=&#34;who-it-is-for&#34;&gt;Who It Is For
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;wx-cli&lt;/code&gt; fits these scenarios:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Quickly search your own historical WeChat messages locally.&lt;/li&gt;
&lt;li&gt;Export a session as Markdown or JSON.&lt;/li&gt;
&lt;li&gt;Analyze posting activity in a group chat over a time range.&lt;/li&gt;
&lt;li&gt;Let Claude Code, Cursor, Codex, and similar agents organize local WeChat material.&lt;/li&gt;
&lt;li&gt;Collect official account article links into a local knowledge base.&lt;/li&gt;
&lt;li&gt;Study WeChat&amp;rsquo;s local database structure and decryption flow.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is less suitable for these scenarios:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You want cloud-based WeChat sync.&lt;/li&gt;
&lt;li&gt;You want to bypass someone else&amp;rsquo;s device or account permissions.&lt;/li&gt;
&lt;li&gt;You want a point-and-click GUI and do not want to touch the command line.&lt;/li&gt;
&lt;li&gt;You do not want to deal with macOS permissions, Windows administrator rights, or Linux sudo.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;The value of &lt;code&gt;wx-cli&lt;/code&gt; is not merely &amp;ldquo;searching WeChat chat history from the command line.&amp;rdquo; More precisely, it turns local WeChat data into a local data source that can be queried, exported, and consumed by agents.&lt;/p&gt;
&lt;p&gt;Its daemon architecture solves repeated decryption and query performance issues; the &lt;code&gt;meta&lt;/code&gt; wrapper helps AI Agents judge whether results are fresh; and &lt;code&gt;SKILL.md&lt;/code&gt; lets tools such as Claude Code, Cursor, and Codex understand how to install and use it.&lt;/p&gt;
&lt;p&gt;If you often need to find information in WeChat, organize group chats, export records, or build a personal knowledge base, &lt;code&gt;wx-cli&lt;/code&gt; is worth watching. But one bottom line should always remain clear: only process your own data, and manage exported results carefully.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/jackwener/wx-cli&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;jackwener/wx-cli GitHub repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Anthropic Founder’s Playbook Explained: How Claude Helps Startup Teams Move Faster</title>
        <link>https://knightli.com/en/2026/05/18/claude-founders-playbook-ai-startup/</link>
        <pubDate>Mon, 18 May 2026 18:02:58 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/18/claude-founders-playbook-ai-startup/</guid>
        <description>&lt;p&gt;Anthropic published The Founder’s Playbook on the official Claude blog, aimed at founders. Its core question is direct: how can an AI-native startup move faster from insight to product, launch, and scale?&lt;/p&gt;
&lt;p&gt;The playbook is not simply a feature list for Claude. It breaks the startup journey into four stages: Idea, MVP, Launch, and Scale. The point is not to let AI replace founders&amp;rsquo; judgment, but to hand repetitive work such as market research, copy drafts, code scaffolding, operations workflows, and sales materials to Claude first, so founders can spend more time on judgment, taste, trade-offs, and trust.&lt;/p&gt;
&lt;h2 id=&#34;what-this-playbook-is-about&#34;&gt;What this playbook is about
&lt;/h2&gt;&lt;p&gt;AI startups increasingly face a kind of compression race: product cycles are shorter, competitors are more numerous, and users expect speed and quality at the same time. Work that once required a multi-person team can now often be drafted by AI first, then reviewed, corrected, and advanced by the founding team.&lt;/p&gt;
&lt;p&gt;Anthropic&amp;rsquo;s framework is clear: do not try to make the entire company &amp;ldquo;AI-powered&amp;rdquo; on day one. Instead, find one process that is time-consuming, repetitive, and low in creative density. Let Claude generate the first draft, script, research summary, or execution checklist. Founders remain responsible for defining goals, calibrating direction, judging quality, and connecting useful output to real business work.&lt;/p&gt;
&lt;h2 id=&#34;stage-1-idea&#34;&gt;Stage 1: Idea
&lt;/h2&gt;&lt;p&gt;The Idea stage is not about coming up with a cool concept. It is about validating whether the idea deserves further investment.&lt;/p&gt;
&lt;p&gt;Claude can help founders at this stage by mapping markets, summarizing user pain points, comparing competitor positioning, proposing possible wedges, and turning vague ideas into clearer value propositions.&lt;/p&gt;
&lt;p&gt;But the most important part is still human judgment. AI can help you see more possibilities faster, but it cannot take responsibility for whether a market truly has strong demand. Founders still need to talk to real users, observe whether they are willing to change existing workflows, and see whether they are willing to pay.&lt;/p&gt;
&lt;h2 id=&#34;stage-2-mvp&#34;&gt;Stage 2: MVP
&lt;/h2&gt;&lt;p&gt;The MVP stage is where Claude Code can be especially useful.&lt;/p&gt;
&lt;p&gt;For small teams, the scarcest resource is often not ideas, but the speed of turning ideas into something users can try. Claude Code can help generate scaffolding, write scripts, fill in components, check edge cases, and produce technical plan notes, helping teams get to a testable version faster.&lt;/p&gt;
&lt;p&gt;The key is not asking AI to write a perfect product in one pass. It is reducing the friction from zero to first version. Founders and engineers still need to review architecture, security, data handling, and user experience, but they do not need to spend as much time on mechanical first drafts.&lt;/p&gt;
&lt;h2 id=&#34;stage-3-launch&#34;&gt;Stage 3: Launch
&lt;/h2&gt;&lt;p&gt;The Launch stage tests narrative, distribution, and feedback speed.&lt;/p&gt;
&lt;p&gt;Many startup teams underestimate how complex a launch can be: website copy, product demos, emails, social media content, user interviews, sales scripts, investor updates. Every item needs to clearly explain why this product is needed now.&lt;/p&gt;
&lt;p&gt;Claude can act as a high-frequency collaborator here: generating different positioning variants, rewriting introductions for different user groups, simulating user questions, organizing the launch rhythm, and turning early feedback into the next round of product and market actions.&lt;/p&gt;
&lt;h2 id=&#34;stage-4-scale&#34;&gt;Stage 4: Scale
&lt;/h2&gt;&lt;p&gt;The Scale stage shifts the focus from &amp;ldquo;building it&amp;rdquo; to &amp;ldquo;growing repeatably.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Once a company has stable users and revenue, the founding team gets pulled into operations, sales, support, data analysis, and internal coordination. Agent-like capabilities such as Claude Cowork are better suited to more complete tasks: conducting market research, designing campaigns, organizing fundraising strategy, summarizing growth metrics, or turning an operations process into repeatable steps.&lt;/p&gt;
&lt;p&gt;This is also where the difference between AI-native companies and traditional software companies begins to appear. The real change is not simply that employees use AI tools. It is that company processes are designed around AI collaboration from the beginning: which tasks require humans to define standards, which tasks should be drafted by AI first, which outputs must be reviewed, and which workflows can become reusable templates.&lt;/p&gt;
&lt;h2 id=&#34;what-claude-code-claude-cowork-and-chat-are-best-for&#34;&gt;What Claude Code, Claude Cowork, and Chat are best for
&lt;/h2&gt;&lt;p&gt;Based on the official blog post, Anthropic wants founders to think about Claude across three kinds of use cases.&lt;/p&gt;
&lt;p&gt;Claude Code is more engineering-oriented. It is suited for writing code, generating scripts, analyzing edge cases, producing component specs, and drafting technical documentation. It helps move ideas toward something that can run.&lt;/p&gt;
&lt;p&gt;Claude Cowork is closer to a delegatable work agent. It fits tasks that require continued execution, such as market research, campaign design, fundraising strategy, and operations analysis. It helps push a relatively complete business task through a first pass.&lt;/p&gt;
&lt;p&gt;Claude Chat is better suited for founder judgment moments: thinking through go-to-market strategy, stress-testing product positioning, comparing roadmap priorities, and refining key narratives. It is not an execution machine, but a thinking partner that can support rapid iteration.&lt;/p&gt;
&lt;h2 id=&#34;what-is-actually-useful-for-startup-teams&#34;&gt;What is actually useful for startup teams
&lt;/h2&gt;&lt;p&gt;The value of this playbook is not that it tells founders &amp;ldquo;AI is important.&amp;rdquo; That is no longer new.&lt;/p&gt;
&lt;p&gt;Its more useful contribution is shifting AI use from scattered tool calls into a company-building method. Each stage has different bottlenecks, and each bottleneck can be broken into parts where AI can participate.&lt;/p&gt;
&lt;p&gt;At the Idea stage, AI expands the search space. At the MVP stage, it compresses implementation time. At the Launch stage, it accelerates messaging and distribution experiments. At the Scale stage, it helps turn processes into repeatable workflows.&lt;/p&gt;
&lt;p&gt;This logic is especially important for small teams. Small teams do not have enough people to cover every function, but they can use AI to create a first version of a capability, then spend limited human energy on the parts that most require judgment and relationship building.&lt;/p&gt;
&lt;h2 id=&#34;pitfalls-to-watch-for&#34;&gt;Pitfalls to watch for
&lt;/h2&gt;&lt;p&gt;The first pitfall is treating AI-generated output as a conclusion. Market research, competitor analysis, user personas, and growth strategies all need to be validated against real data and user feedback.&lt;/p&gt;
&lt;p&gt;The second pitfall is underestimating review cost. AI can significantly reduce the cost of first drafts, but code quality, legal risk, brand expression, commercial promises, and security issues still need human accountability.&lt;/p&gt;
&lt;p&gt;The third pitfall is automating too early. A process that has not yet worked manually should not be handed to an agent for automatic execution. A steadier approach is to let AI participate in one small part of the workflow, observe output quality, and then gradually expand the scope.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;The signal from Anthropic&amp;rsquo;s Founder’s Playbook is clear: the advantage of an AI-native startup is not merely that it can use AI to write code. It is that from day one, AI becomes a collaboration layer across product, engineering, marketing, sales, and operations.&lt;/p&gt;
&lt;p&gt;For founders, the most practical starting point is not building a grand AI workflow. It is choosing one task that consumes too much time, repeats too often, and slows progress the most, then letting Claude produce the first version. Real competitiveness comes from human founders&amp;rsquo; control over direction, quality, and trust, and from whether the team can embed this collaboration pattern into everyday work.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://claude.com/blog/the-founders-playbook&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;The founder’s playbook for the age of AI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>What Is Vercel AI SDK? A Unified Toolkit for TypeScript Developers Building AI Apps</title>
        <link>https://knightli.com/en/2026/05/17/vercel-ai-sdk-typescript-agent-toolkit/</link>
        <pubDate>Sun, 17 May 2026 23:07:38 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/17/vercel-ai-sdk-typescript-agent-toolkit/</guid>
        <description>&lt;p&gt;&lt;code&gt;vercel/ai&lt;/code&gt; is the open-source AI SDK maintained by Vercel.&lt;/p&gt;
&lt;p&gt;Its positioning is clear: it gives TypeScript developers a unified toolkit for building AI applications and AI Agents. It comes from the team behind Next.js, but it is not limited to Next.js. It also supports React, Svelte, Vue, Angular, and runtimes such as Node.js.&lt;/p&gt;
&lt;p&gt;Project link: &lt;a class=&#34;link&#34; href=&#34;https://github.com/vercel/ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/vercel/ai&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you are building a chat app, AI writing tool, RAG application, tool-calling Agent, streaming interface, or a product that needs to connect multiple model providers behind one application, Vercel AI SDK is worth a close look.&lt;/p&gt;
&lt;h2 id=&#34;the-core-problem-it-solves&#34;&gt;The Core Problem It Solves
&lt;/h2&gt;&lt;p&gt;When building AI apps today, one of the biggest headaches is not whether you can call a model. It is that different model providers have different APIs, streaming formats, tool-calling conventions, error behavior, and frontend state-management needs.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;OpenAI has its own SDK and response formats.&lt;/li&gt;
&lt;li&gt;Anthropic has its own message structure.&lt;/li&gt;
&lt;li&gt;Google, xAI, Mistral, DeepSeek, Groq, and others all differ.&lt;/li&gt;
&lt;li&gt;Streaming output requires chunk handling.&lt;/li&gt;
&lt;li&gt;Tool calling requires structured requests initiated by the model.&lt;/li&gt;
&lt;li&gt;Chat UI also needs messages, loading states, cancellation, retry, and error display.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If every provider gets its own handwritten adapter, the project becomes complex very quickly.&lt;/p&gt;
&lt;p&gt;Vercel AI SDK tries to hide those differences behind a unified API. Developers write the app against one interface and connect different models through Providers.&lt;/p&gt;
&lt;h2 id=&#34;unified-provider-architecture&#34;&gt;Unified Provider Architecture
&lt;/h2&gt;&lt;p&gt;One key feature of Vercel AI SDK is that it is provider-agnostic. It is not tied to one model vendor.&lt;/p&gt;
&lt;p&gt;It can access OpenAI, Anthropic, Google, and other model providers through a unified API. The project README also notes that AI SDK uses Vercel AI Gateway by default, making it easier to reach multiple mainstream providers.&lt;/p&gt;
&lt;p&gt;That is useful in real engineering projects.&lt;/p&gt;
&lt;p&gt;Many AI products eventually depend on more than one model:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Some tasks need strong reasoning models.&lt;/li&gt;
&lt;li&gt;Some tasks need cheap, fast models.&lt;/li&gt;
&lt;li&gt;Some tasks require multimodal models.&lt;/li&gt;
&lt;li&gt;Some tasks require long context.&lt;/li&gt;
&lt;li&gt;Some tasks require local or private deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A unified provider architecture makes model switching, gray releases, cost control, and fallback strategies easier.&lt;/p&gt;
&lt;h2 id=&#34;streaming-output-is-key-to-frontend-ux&#34;&gt;Streaming Output Is Key to Frontend UX
&lt;/h2&gt;&lt;p&gt;One major UX difference between AI apps and traditional APIs is that responses can be long.&lt;/p&gt;
&lt;p&gt;If users must wait for a full answer before seeing anything, chat tools, writing tools, and coding assistants feel slow. Streaming output lets text appear gradually, so users see progress sooner.&lt;/p&gt;
&lt;p&gt;Vercel AI SDK provides fairly complete abstractions for streaming generation. Developers do not need to handle low-level event streams from scratch. They can use the SDK&amp;rsquo;s generation and streaming APIs to connect model output to frontend UI.&lt;/p&gt;
&lt;p&gt;This is especially convenient for Next.js and React applications.&lt;/p&gt;
&lt;p&gt;An AI chat interface looks simple, but in practice it must handle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Message lists.&lt;/li&gt;
&lt;li&gt;User input.&lt;/li&gt;
&lt;li&gt;Server requests.&lt;/li&gt;
&lt;li&gt;Streaming token display.&lt;/li&gt;
&lt;li&gt;Loading states.&lt;/li&gt;
&lt;li&gt;Error states.&lt;/li&gt;
&lt;li&gt;Stopping generation.&lt;/li&gt;
&lt;li&gt;Regeneration.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are exactly the kinds of repetitive work AI SDK tries to reduce.&lt;/p&gt;
&lt;h2 id=&#34;tool-calling-and-agent-scenarios&#34;&gt;Tool Calling and Agent Scenarios
&lt;/h2&gt;&lt;p&gt;As AI apps move from &amp;ldquo;chatting&amp;rdquo; to &amp;ldquo;doing things&amp;rdquo;, tool calling becomes increasingly important.&lt;/p&gt;
&lt;p&gt;The model may need to call external functions instead of only returning natural language:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Query a database.&lt;/li&gt;
&lt;li&gt;Search documents.&lt;/li&gt;
&lt;li&gt;Call business APIs.&lt;/li&gt;
&lt;li&gt;Read order status.&lt;/li&gt;
&lt;li&gt;Generate charts.&lt;/li&gt;
&lt;li&gt;Create calendar events.&lt;/li&gt;
&lt;li&gt;Modify project files.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Vercel AI SDK supports tool-calling capabilities, allowing developers to define tools, parameters, and execution logic, then let the model request those tools when appropriate.&lt;/p&gt;
&lt;p&gt;This is one reason it has evolved from a &amp;ldquo;chat UI SDK&amp;rdquo; into a broader toolkit for AI apps and Agents.&lt;/p&gt;
&lt;p&gt;Still, tool calling is not magic. Real projects must also handle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parameter validation.&lt;/li&gt;
&lt;li&gt;Permission boundaries.&lt;/li&gt;
&lt;li&gt;Tool-call logs.&lt;/li&gt;
&lt;li&gt;Idempotency.&lt;/li&gt;
&lt;li&gt;Timeouts and retries.&lt;/li&gt;
&lt;li&gt;Human confirmation.&lt;/li&gt;
&lt;li&gt;Restrictions for sensitive actions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AI SDK can help with interfaces and flow, but developers still need to design the safety boundaries.&lt;/p&gt;
&lt;h2 id=&#34;ui-integration&#34;&gt;UI Integration
&lt;/h2&gt;&lt;p&gt;Vercel AI SDK is friendly to frontend frameworks.&lt;/p&gt;
&lt;p&gt;It provides not only core generation APIs, but also abstractions around chat, completion, message state, and streaming UI. For teams using Next.js and React, this can remove a lot of boilerplate.&lt;/p&gt;
&lt;p&gt;But it is not only for Vercel deployments.&lt;/p&gt;
&lt;p&gt;If your project is built with TypeScript, or your backend runs on Node.js, AI SDK can still serve as the model-calling and streaming layer. Whether you deploy to Vercel depends on your architecture, team habits, and infrastructure choices.&lt;/p&gt;
&lt;h2 id=&#34;skill-for-coding-agents&#34;&gt;Skill for Coding Agents
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;vercel/ai&lt;/code&gt; README includes an interesting suggestion: if you use coding agents such as Claude Code or Cursor, you can add the AI SDK skill to your repository.&lt;/p&gt;
&lt;p&gt;The example command is:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npx skills add vercel/ai
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This shows that Vercel understands AI SDK users are not only human developers, but also coding agents.&lt;/p&gt;
&lt;p&gt;When an agent modifies a project that uses AI SDK, a dedicated skill in the repository can help it understand SDK conventions, common APIs, project structure, and best practices, reducing the chance of messy code changes.&lt;/p&gt;
&lt;p&gt;This direction is worth watching.&lt;/p&gt;
&lt;p&gt;In the future, open-source projects may provide not only README files and docs, but also structured skill instructions for AI coding agents. For complex SDKs, that could become a new developer-experience entry point.&lt;/p&gt;
&lt;h2 id=&#34;suitable-projects&#34;&gt;Suitable Projects
&lt;/h2&gt;&lt;p&gt;Vercel AI SDK is a good fit for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI chat apps based on Next.js or React.&lt;/li&gt;
&lt;li&gt;Writing, Q&amp;amp;A, support, and coding assistants that need streaming output.&lt;/li&gt;
&lt;li&gt;AI products that need multiple model providers.&lt;/li&gt;
&lt;li&gt;Teams building quick RAG or document Q&amp;amp;A prototypes.&lt;/li&gt;
&lt;li&gt;Apps that need tool calling, function calling, or lightweight Agent capabilities.&lt;/li&gt;
&lt;li&gt;Teams already using TypeScript and Node.js.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is especially suitable for frontend and full-stack developers. The hard part of many AI apps is not only calling a model, but turning model output into a stable, smooth, interactive product experience.&lt;/p&gt;
&lt;h2 id=&#34;what-it-is-not-for&#34;&gt;What It Is Not For
&lt;/h2&gt;&lt;p&gt;If your project is mainly a Python backend, deep-learning training workflow, model fine-tuning system, or low-level inference service, Vercel AI SDK may not be the core tool.&lt;/p&gt;
&lt;p&gt;It is an application-layer SDK, not a model-training framework.&lt;/p&gt;
&lt;p&gt;If you need to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Train your own model.&lt;/li&gt;
&lt;li&gt;Manage GPU inference clusters.&lt;/li&gt;
&lt;li&gt;Run low-level batch inference.&lt;/li&gt;
&lt;li&gt;Deeply control tokenizer behavior, KV cache, quantization, and inference engines.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then you should look at PyTorch, vLLM, SGLang, TensorRT-LLM, llama.cpp, or cloud inference services.&lt;/p&gt;
&lt;p&gt;Vercel AI SDK is closer to the application layer that connects model capabilities to products.&lt;/p&gt;
&lt;h2 id=&#34;what-to-watch-for&#34;&gt;What to Watch For
&lt;/h2&gt;&lt;p&gt;First, do not assume a unified API means all providers are identical.&lt;/p&gt;
&lt;p&gt;Different providers still differ in capabilities, context length, tool-calling formats, streaming details, error types, and pricing. A unified SDK reduces engineering friction, but it does not erase model differences.&lt;/p&gt;
&lt;p&gt;Second, control costs.&lt;/p&gt;
&lt;p&gt;Once an AI app is online, streaming chats, retries, tool calls, RAG retrieval, and multi-model fallbacks can all increase cost. Rate limits, caching, logs, and budget monitoring are necessary.&lt;/p&gt;
&lt;p&gt;Third, handle safety boundaries.&lt;/p&gt;
&lt;p&gt;If a model can call tools, you must restrict what those tools can do. Do not let the model directly execute high-risk operations, and do not expose secrets, database write permissions, or production operations to it without controls.&lt;/p&gt;
&lt;p&gt;Fourth, keep observability.&lt;/p&gt;
&lt;p&gt;When an AI app fails, frontend errors are not enough. You need to know the user input, selected model, tool calls, response time, token usage, error type, and final output.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;vercel/ai&lt;/code&gt; is not a new model, and it is not just a chat component.&lt;/p&gt;
&lt;p&gt;It is closer to infrastructure for TypeScript AI application development: unified Providers, streaming output, tool calling, frontend state management, and Agent scenarios all live inside one open-source SDK.&lt;/p&gt;
&lt;p&gt;For teams already using Next.js, React, TypeScript, and Node.js, it can significantly reduce the engineering cost of going from &amp;ldquo;the model API runs&amp;rdquo; to &amp;ldquo;the product experience works&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;But it is not a universal layer. Model selection, permission design, cost control, logging, monitoring, and business safety still belong to the developer.&lt;/p&gt;
&lt;p&gt;If you want to build AI applications rather than train models, Vercel AI SDK is a toolkit worth trying early.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/vercel/ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;vercel/ai GitHub repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://ai-sdk.dev/docs/introduction&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;AI SDK Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://vercel.com/blog/introducing-the-vercel-ai-sdk/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Vercel: Introducing the Vercel AI SDK&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Midjourney May 2026 Update: Conversational Mode, AI-Assisted Development, and SREF Organization</title>
        <link>https://knightli.com/en/2026/05/17/midjourney-2026-05-office-hours-conversational-mode/</link>
        <pubDate>Sun, 17 May 2026 20:20:51 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/17/midjourney-2026-05-office-hours-conversational-mode/</guid>
        <description>&lt;p&gt;The most important signal from Midjourney&amp;rsquo;s May 14, 2026 Office Hours is not a single model parameter. It is that the product is continuing to move from &amp;ldquo;type a prompt and generate an image&amp;rdquo; toward a more conversational, organized, and iterative creative system.&lt;/p&gt;
&lt;p&gt;The information comes from a Japanese summary of Midjourney&amp;rsquo;s recent Q&amp;amp;A, covering conversational mode, AI-assisted development, website redesign, SREF and tag organization, Omni-reference, multi-character consistency, and how the team itself uses Midjourney.&lt;/p&gt;
&lt;p&gt;In one sentence: Midjourney is making image generation feel more like a creative system that can be discussed with, organized, and iterated over.&lt;/p&gt;
&lt;h2 id=&#34;conversational-mode-is-becoming-more-important&#34;&gt;Conversational mode is becoming more important
&lt;/h2&gt;&lt;p&gt;The most direct change is Conversational Mode.&lt;/p&gt;
&lt;p&gt;In the past, using Midjourney still depended heavily on parameters and fixed syntax. You had to remember rules for aspect ratio, image references, style references, model parameters, and then write them into prompts or adjust them in the interface.&lt;/p&gt;
&lt;p&gt;The direction of the new conversational mode is to let users describe these settings in more natural language.&lt;/p&gt;
&lt;p&gt;For example, users can specify by voice or text:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Default parameters.&lt;/li&gt;
&lt;li&gt;Aspect ratio, such as &lt;code&gt;16:9&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Image references.&lt;/li&gt;
&lt;li&gt;Style references, or &lt;code&gt;--sref&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Omni-reference in V7.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This shows Midjourney is not only improving generation quality. It is also reducing the operational cost of parameters.&lt;/p&gt;
&lt;p&gt;For ordinary users, the biggest change is that they do not have to memorize commands all the time. For heavy users, if conversational mode becomes stable enough, it may become the main entry point for adjusting generation settings with natural language.&lt;/p&gt;
&lt;h2 id=&#34;ai-assisted-development-is-changing-midjourneys-iteration-speed&#34;&gt;AI-assisted development is changing Midjourney&amp;rsquo;s iteration speed
&lt;/h2&gt;&lt;p&gt;Another interesting point is that the Midjourney team is using AI-assisted development at large scale internally.&lt;/p&gt;
&lt;p&gt;The source notes that the team can now fix small bugs, interface friction, and workflow issues much faster. There was even an example where a product bug was identified during a user call, fixed in real time with AI assistance, reviewed, and deployed quickly.&lt;/p&gt;
&lt;p&gt;This is more interesting than simply saying &amp;ldquo;AI helps engineers write code.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;It shows that AI development tools are starting to influence how AI products themselves iterate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;User feedback can enter the fix pipeline faster.&lt;/li&gt;
&lt;li&gt;Small experience issues are easier to address.&lt;/li&gt;
&lt;li&gt;Engineers can spend more energy on architecture, review, design decisions, and testing.&lt;/li&gt;
&lt;li&gt;Product teams can clean up edge cases more frequently.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Midjourney has many creative paths, parameter combinations, mobile experiences, search features, and organization workflows. Many issues are not about the core model failing to generate images, but about an entry point being awkward, an operation taking one extra step, or an edge state being unpleasant.&lt;/p&gt;
&lt;p&gt;AI-assisted development is especially good at accelerating these many small improvements.&lt;/p&gt;
&lt;h2 id=&#34;the-website-redesign-is-about-workflow-not-removing-features&#34;&gt;The website redesign is about workflow, not removing features
&lt;/h2&gt;&lt;p&gt;The Office Hours also mentioned a large website redesign.&lt;/p&gt;
&lt;p&gt;The goal is not to remove complex features, but to make the creative flow more intuitive, make onboarding easier, and organize tools and features more clearly.&lt;/p&gt;
&lt;p&gt;That matters.&lt;/p&gt;
&lt;p&gt;Midjourney&amp;rsquo;s problem is not a lack of features. As features grow, entry points, collections, organization, references, exploration, and reuse become more complex. For light users, the hard question is &amp;ldquo;where do I start?&amp;rdquo; For heavy users, the hard question is &amp;ldquo;how do I manage many styles, references, and experiment results?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Possible rollout strategies include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Offering old and new interfaces in parallel.&lt;/li&gt;
&lt;li&gt;Starting with an alpha test.&lt;/li&gt;
&lt;li&gt;Moving gradually to avoid disrupting heavy users.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These strategies suggest the team understands that Midjourney is not just a casual image toy. Many users have already integrated it into real creative workflows, so interface changes cannot casually break existing habits.&lt;/p&gt;
&lt;h2 id=&#34;sref-styles-and-tags-remain-pain-points&#34;&gt;SREF, styles, and tags remain pain points
&lt;/h2&gt;&lt;p&gt;SREF and style organization were among the most interesting topics in the Q&amp;amp;A.&lt;/p&gt;
&lt;p&gt;Users want better organization systems, especially for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Random SREF.&lt;/li&gt;
&lt;li&gt;Style references.&lt;/li&gt;
&lt;li&gt;Saved aesthetics.&lt;/li&gt;
&lt;li&gt;Tags and colored tags.&lt;/li&gt;
&lt;li&gt;Stronger filtering, grouping, and reuse.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But the team also raised a question: if the current folder system already lets one image belong to multiple folders, supports unlimited folders, and offers filtering and sorting, what exactly do tags provide that folders cannot?&lt;/p&gt;
&lt;p&gt;That question is practical.&lt;/p&gt;
&lt;p&gt;Many products add tags because users say they want tags. But a poorly designed tag system becomes another messy classification layer. If folders, tags, favorites, search, filters, projects, and style libraries have unclear boundaries, the system becomes harder to manage.&lt;/p&gt;
&lt;p&gt;So the Midjourney team wants concrete workflow examples: in which scenario do users need tags? Why are folders not enough? Is it for combining styles quickly, reusing across projects, filtering by theme, color tone, photography style, or character relationship?&lt;/p&gt;
&lt;p&gt;For Midjourney, the organization system may become as important as the generation model. Once users create long-term projects, the hard part is not generating one image, but managing thousands of images, hundreds of style directions, and repeated experiments.&lt;/p&gt;
&lt;h2 id=&#34;omni-reference-points-toward-more-complex-character-control&#34;&gt;Omni-reference points toward more complex character control
&lt;/h2&gt;&lt;p&gt;The source also mentioned that future Omni-reference / subject reference systems may support multiple character references at once and better separation of different subjects.&lt;/p&gt;
&lt;p&gt;This maps directly to a long-running pain point in AI image generation: character consistency and multi-character relationships.&lt;/p&gt;
&lt;p&gt;Keeping one character consistent is already difficult. Multiple characters are harder. Common problems include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Character A&amp;rsquo;s traits leaking onto character B.&lt;/li&gt;
&lt;li&gt;Identity confusion between multiple people.&lt;/li&gt;
&lt;li&gt;Clothing, hair, and facial features changing across images.&lt;/li&gt;
&lt;li&gt;Reference images influencing the whole style too strongly instead of controlling only the subject.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If Omni-reference can handle subject separation better, Midjourney becomes more useful for comics, storyboards, advertising visuals, character design, game concept art, and continuous narratives.&lt;/p&gt;
&lt;p&gt;This is one of the areas worth watching after V7.&lt;/p&gt;
&lt;h2 id=&#34;midjourney-is-rethinking-prompts&#34;&gt;Midjourney is rethinking prompts
&lt;/h2&gt;&lt;p&gt;The summary includes a useful idea: language is an imperfect compression layer for imagination.&lt;/p&gt;
&lt;p&gt;That sentence explains Midjourney&amp;rsquo;s product direction well.&lt;/p&gt;
&lt;p&gt;Many users assume AI image generation is mainly about writing longer and more precise prompts. But in real creative work, image references, style references, moodboards, SREF, variations, regeneration, and post-processing are often more useful than a very long text prompt.&lt;/p&gt;
&lt;p&gt;Team member Duncan&amp;rsquo;s workflow reflects this. He reportedly treats Midjourney as a sketchbook, combining moodboards, SREF, short prompts, high &lt;code&gt;--r&lt;/code&gt; regeneration, strong and subtle variations, Photoshop retouching, and external upscaling workflows.&lt;/p&gt;
&lt;p&gt;This shows mature Midjourney users do not work only through &amp;ldquo;magic prompts.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;A more realistic process is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Use a small amount of language to set direction.&lt;/li&gt;
&lt;li&gt;Use image references to provide visual context.&lt;/li&gt;
&lt;li&gt;Use SREF to narrow the style.&lt;/li&gt;
&lt;li&gt;Use many variations to explore the space.&lt;/li&gt;
&lt;li&gt;Use human taste to select results.&lt;/li&gt;
&lt;li&gt;Use external tools for post-processing.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Prompts still matter, but they are not everything.&lt;/p&gt;
&lt;h2 id=&#34;what-this-means-for-users&#34;&gt;What this means for users
&lt;/h2&gt;&lt;p&gt;If you only generate images occasionally, the most direct impact is that conversational mode should become easier to use. In the future, you may be able to describe desired aspect ratio, references, style, and parameters more naturally instead of memorizing commands.&lt;/p&gt;
&lt;p&gt;If you are a heavy user, three areas deserve attention.&lt;/p&gt;
&lt;p&gt;First, organization.&lt;/p&gt;
&lt;p&gt;How SREF, styles, folders, favorites, and tags evolve will directly affect long-term creative efficiency.&lt;/p&gt;
&lt;p&gt;Second, the website redesign.&lt;/p&gt;
&lt;p&gt;If the new interface can connect exploration, organization, reuse, and export, Midjourney will feel more like a professional creative tool instead of a single generator.&lt;/p&gt;
&lt;p&gt;Third, character and subject reference.&lt;/p&gt;
&lt;p&gt;If Omni-reference can reliably handle multiple characters and subject separation, Midjourney becomes better suited for continuous projects rather than only single images.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;The key point from Midjourney&amp;rsquo;s May 2026 Office Hours is not one flashy parameter. It is that the product is continuing to evolve toward a creative system.&lt;/p&gt;
&lt;p&gt;Conversational mode lowers the input barrier. AI-assisted development increases iteration speed. The website redesign aims to reorganize workflows. SREF and tag discussions point to long-term asset management. Omni-reference relates to character consistency and complex subject control.&lt;/p&gt;
&lt;p&gt;For AI image generation tools, model capability is obviously important. But once generation quality reaches a certain level, what determines whether users stay long term is often workflow, organization, controllability, and iteration speed.&lt;/p&gt;
&lt;p&gt;Midjourney is filling in those pieces.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://note.com/akisuke0925/n/nc9e099d9c77f&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Midjourney 最新ニュース（2026年5月14 日）｜アキスケ&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>How OpenClaw Creator Peter Steinberger Sees AI Software Development: From OpenClaw to Closed-Loop Coding</title>
        <link>https://knightli.com/en/2026/05/17/peter-steinberger-ai-software-development/</link>
        <pubDate>Sun, 17 May 2026 20:02:26 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/17/peter-steinberger-ai-software-development/</guid>
        <description>&lt;p&gt;Peter Steinberger&amp;rsquo;s career is a useful lens for understanding what is changing in AI software development.&lt;/p&gt;
&lt;p&gt;He is not a newcomer who suddenly became visible because of AI. Before OpenClaw, he was already the founder of PSPDFKit, a company focused on PDF rendering, document processing, and developer tools. Products like that are hard to win with concept packaging alone. They have to deal with performance, compatibility, API design, enterprise customers, and long-term maintenance.&lt;/p&gt;
&lt;p&gt;So when Steinberger later built OpenClaw with AI tools and shared views around AI agents, personal automation, and AI coding, the point was not simply that &amp;ldquo;one person wrote a lot of code.&amp;rdquo; The more interesting part is how he combined years of software engineering experience with a new generation of AI coding agents and rethought the development process.&lt;/p&gt;
&lt;h2 id=&#34;ai-coding-is-not-a-magic-button&#34;&gt;AI coding is not a magic button
&lt;/h2&gt;&lt;p&gt;Discussions about AI coding often fall into two extremes.&lt;/p&gt;
&lt;p&gt;One side says AI can already write code, so programmers are almost obsolete.&lt;/p&gt;
&lt;p&gt;The other side says AI-generated code is unreliable, so real engineering still has to be hand-written by people.&lt;/p&gt;
&lt;p&gt;Steinberger&amp;rsquo;s experience points to a third view: AI changes the unit of operation in software development, but it does not remove engineering judgment.&lt;/p&gt;
&lt;p&gt;In the past, developers mainly worked around editing code. Requirements breakdown, architecture decisions, implementation, testing, and bug fixing all revolved around manual code changes.&lt;/p&gt;
&lt;p&gt;Once AI coding agents enter the workflow, developers increasingly manage an execution system:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explain the goal.&lt;/li&gt;
&lt;li&gt;Provide context.&lt;/li&gt;
&lt;li&gt;Set boundaries.&lt;/li&gt;
&lt;li&gt;Let the agent modify code.&lt;/li&gt;
&lt;li&gt;Run tests and checks.&lt;/li&gt;
&lt;li&gt;Iterate based on results.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not simply handing the keyboard to a model. It is moving humans from &amp;ldquo;typing every line&amp;rdquo; toward &amp;ldquo;defining direction, designing feedback, and judging results.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;why-he-is-skeptical-of-calling-it-vibe-coding&#34;&gt;Why he is skeptical of calling it vibe coding
&lt;/h2&gt;&lt;p&gt;One phrase that often appears around Steinberger is &lt;code&gt;vibe coding&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The term originally described a new style of development: developers describe ideas in natural language, let AI generate large amounts of code, then keep adjusting based on runtime results and feedback.&lt;/p&gt;
&lt;p&gt;But Steinberger is not entirely sold on the phrase. Public coverage has noted that he sees &lt;code&gt;vibe coding&lt;/code&gt; as potentially dismissive, implying that AI-assisted development is just &amp;ldquo;generating by feel&amp;rdquo; while ignoring the skill, judgment, and experience behind it.&lt;/p&gt;
&lt;p&gt;That criticism makes sense.&lt;/p&gt;
&lt;p&gt;Effective AI coding is not about typing a casual sentence and trusting the model&amp;rsquo;s output. It requires:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Breaking vague requirements into executable tasks.&lt;/li&gt;
&lt;li&gt;Detecting when the model misunderstands the goal.&lt;/li&gt;
&lt;li&gt;Designing tests and acceptance criteria.&lt;/li&gt;
&lt;li&gt;Judging whether the code structure will remain maintainable.&lt;/li&gt;
&lt;li&gt;Knowing when to stop generating and switch to human review.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, AI reduces the friction of writing code, but it does not reduce the responsibility of understanding the system.&lt;/p&gt;
&lt;h2 id=&#34;the-loop-is-the-key&#34;&gt;The loop is the key
&lt;/h2&gt;&lt;p&gt;One idea often associated with Steinberger&amp;rsquo;s interviews and writing is the importance of the loop.&lt;/p&gt;
&lt;p&gt;Letting AI generate code is open-loop.&lt;/p&gt;
&lt;p&gt;Letting AI generate code, run it, read errors, fix problems, and run tests again is closer to closed-loop development.&lt;/p&gt;
&lt;p&gt;That difference matters.&lt;/p&gt;
&lt;p&gt;Open-loop generation easily creates software that looks usable on the surface. The page opens, features appear to exist, and there is plenty of code. But once it enters a real environment, problems with state management, permissions, exception handling, edge cases, and deployment quickly appear.&lt;/p&gt;
&lt;p&gt;Closed-loop development means output must be constrained by feedback. The simplest loop is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Write down the goal clearly.&lt;/li&gt;
&lt;li&gt;Let AI modify the code.&lt;/li&gt;
&lt;li&gt;Automatically run tests, type checks, lint, or a build.&lt;/li&gt;
&lt;li&gt;Feed errors back to AI.&lt;/li&gt;
&lt;li&gt;Repeat until it passes.&lt;/li&gt;
&lt;li&gt;Let a human review the critical path.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is where AI software development can truly improve efficiency. Not because the model gets everything right the first time, but because it can participate quickly in a cycle of generation, validation, and repair.&lt;/p&gt;
&lt;h2 id=&#34;more-experience-makes-ai-more-useful&#34;&gt;More experience makes AI more useful
&lt;/h2&gt;&lt;p&gt;One of the easiest misconceptions about AI coding is that experience no longer matters.&lt;/p&gt;
&lt;p&gt;Steinberger&amp;rsquo;s case suggests the opposite: experience becomes more important, but its role changes.&lt;/p&gt;
&lt;p&gt;An experienced engineer is better at deciding:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which tasks are suitable for an agent.&lt;/li&gt;
&lt;li&gt;Which modules need tests first.&lt;/li&gt;
&lt;li&gt;Which changes are too risky for broad AI refactoring.&lt;/li&gt;
&lt;li&gt;Which generated code merely looks plausible.&lt;/li&gt;
&lt;li&gt;Which problems should be solved through architecture rather than more patches.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AI can generate many candidate solutions. The more candidates you have, the more judgment you need. An inexperienced person may be impressed by &amp;ldquo;it runs.&amp;rdquo; An experienced engineer asks: can it be maintained? Can it scale? Does it break a security boundary? Can we debug it when something goes wrong?&lt;/p&gt;
&lt;p&gt;That is why AI coding agents do not turn software engineering into pure chat. They outsource part of the execution work while amplifying planning, review, validation, and trade-off decisions.&lt;/p&gt;
&lt;h2 id=&#34;openclaw-matters-beyond-the-project-itself&#34;&gt;OpenClaw matters beyond the project itself
&lt;/h2&gt;&lt;p&gt;OpenClaw drew attention not only because it is an open-source AI agent, and not only because it grew quickly.&lt;/p&gt;
&lt;p&gt;It is also a signal: developers increasingly want AI to do more than answer questions. They want it to connect to real tools and perform real actions.&lt;/p&gt;
&lt;p&gt;Traditional chatbots stay inside the chat box. They can explain code, write drafts, and give advice, but people still need to copy, paste, open software, and run commands.&lt;/p&gt;
&lt;p&gt;The agent direction connects models to tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;File systems.&lt;/li&gt;
&lt;li&gt;Browsers.&lt;/li&gt;
&lt;li&gt;Terminals.&lt;/li&gt;
&lt;li&gt;Email.&lt;/li&gt;
&lt;li&gt;Calendars.&lt;/li&gt;
&lt;li&gt;Third-party services.&lt;/li&gt;
&lt;li&gt;Project repositories.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once models can use those tools, the boundaries of software development shift. AI is no longer just code completion. It can participate in project reading, task decomposition, file editing, test execution, PR preparation, and workflow automation.&lt;/p&gt;
&lt;p&gt;That is also why Steinberger&amp;rsquo;s move to OpenAI drew attention. He represents not just a single developer story, but a product direction: personal agents moving from demos into everyday work.&lt;/p&gt;
&lt;h2 id=&#34;what-this-means-for-ordinary-developers&#34;&gt;What this means for ordinary developers
&lt;/h2&gt;&lt;p&gt;For ordinary developers, Steinberger&amp;rsquo;s experience is not something everyone can copy directly.&lt;/p&gt;
&lt;p&gt;Not everyone can manage multiple agents at once. Not every project is suited to heavy AI generation. Not every team accepts a workflow of &amp;ldquo;generate first, iterate quickly.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;But several lessons are useful.&lt;/p&gt;
&lt;p&gt;First, write tasks clearly.&lt;/p&gt;
&lt;p&gt;AI is sensitive to vague goals. If you say &amp;ldquo;optimize this,&amp;rdquo; it may change style, structure, features, and logic. If you say &amp;ldquo;change the login failure message from English to Chinese without altering the authentication flow,&amp;rdquo; the result is usually more controllable.&lt;/p&gt;
&lt;p&gt;Second, standardize validation commands.&lt;/p&gt;
&lt;p&gt;If a project has no tests, no build command, and no lint, AI has trouble forming a loop. Even basic commands like &lt;code&gt;npm test&lt;/code&gt;, &lt;code&gt;go test ./...&lt;/code&gt;, &lt;code&gt;pytest&lt;/code&gt;, or &lt;code&gt;hugo&lt;/code&gt; are better than relying only on visual inspection.&lt;/p&gt;
&lt;p&gt;Third, control the scope of changes.&lt;/p&gt;
&lt;p&gt;Having AI handle one module, one bug, or one page at a time is usually more reliable than asking it to &amp;ldquo;refactor the whole project.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Fourth, keep human review.&lt;/p&gt;
&lt;p&gt;For authentication, payments, permissions, data deletion, deployment scripts, database migrations, and security configuration, do not lower the review bar just because the code was generated by AI.&lt;/p&gt;
&lt;p&gt;Fifth, review prompts and failure patterns.&lt;/p&gt;
&lt;p&gt;If AI often misunderstands a certain type of task, write those constraints into project rules, agent instructions, or skill files. AI coding capability comes not only from the model, but also from the work environment you build around it.&lt;/p&gt;
&lt;h2 id=&#34;where-ai-software-development-is-going&#34;&gt;Where AI software development is going
&lt;/h2&gt;&lt;p&gt;Steinberger&amp;rsquo;s story suggests that AI software development is moving from &amp;ldquo;helping write code&amp;rdquo; toward &amp;ldquo;organizing software production workflows.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Early AI coding tools were mainly useful for function completion, error explanation, and template generation. The shift now is that agents can work across files, call tools, run checks, and continue fixing based on feedback.&lt;/p&gt;
&lt;p&gt;This points to several trends.&lt;/p&gt;
&lt;p&gt;First, the productivity ceiling for individual developers will rise.&lt;/p&gt;
&lt;p&gt;One person can push more prototypes, scripts, internal tools, and small products. But higher output does not automatically mean higher quality. The faster code is generated, the more validation matters.&lt;/p&gt;
&lt;p&gt;Second, project structure becomes more important.&lt;/p&gt;
&lt;p&gt;The clearer the code, tests, and documentation, the easier it is for AI to make correct changes. Messy projects are hard for humans and hard for AI.&lt;/p&gt;
&lt;p&gt;Third, software engineers will look more like workflow designers.&lt;/p&gt;
&lt;p&gt;In the future, what matters will not only be whether someone knows a programming language, but whether they can organize requirements, context, tools, tests, deployment, and permissions into a controlled loop.&lt;/p&gt;
&lt;p&gt;Fourth, security boundaries become more sensitive.&lt;/p&gt;
&lt;p&gt;If an agent can do things, it can also do the wrong things. If it can read files, run commands, and access services, then permissions, audit, and rollback become infrastructure for AI development environments.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;The most valuable part of Peter Steinberger&amp;rsquo;s view of AI software development is not how much code AI generated. It is the development posture he demonstrates.&lt;/p&gt;
&lt;p&gt;Humans are no longer only typing line by line inside an editor. They are designing goals, managing agents, building feedback loops, reviewing results, and adjusting the system. Code remains important, but it is no longer the only center of labor.&lt;/p&gt;
&lt;p&gt;If traditional software development emphasized &amp;ldquo;writing the code correctly,&amp;rdquo; AI software development increasingly emphasizes &amp;ldquo;making the system continuously produce verifiably correct results.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This is not just about lowering the engineering barrier. It changes the shape of engineering ability: from manual implementation toward task decomposition, context management, tool orchestration, automated validation, and final judgment.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://techcrunch.com/2026/02/25/openclaw-creators-advice-to-ai-builders-is-to-be-more-playful-and-allow-yourself-time-to-improve/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;TechCrunch: OpenClaw creator&amp;rsquo;s advice to AI builders&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://builtin.com/articles/openclaw-founder-to-openai-analysis&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Built In: What Is OpenAI Getting From the OpenClaw Deal?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://podwise.ai/dashboard/episodes/7026858&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;The Pragmatic Engineer: The creator of Clawd: I ship code I don&amp;rsquo;t read&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.teamday.ai/ai/steinberger-openclaw-builders-unscripted-openai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;TeamDay: Peter Steinberger: Building OpenClaw as a Solo Dev&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Google Gemini Spark Leak: A 24/7 Gemini Agent May Be Coming</title>
        <link>https://knightli.com/en/2026/05/17/google-gemini-spark-ai-agent-leak/</link>
        <pubDate>Sun, 17 May 2026 11:58:08 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/17/google-gemini-spark-ai-agent-leak/</guid>
        <description>&lt;p&gt;Google has not officially released &lt;code&gt;Gemini Spark&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Current information about it mainly comes from internal Gemini Web test screens, community screenshots, TestingCatalog reporting, and 36Kr / Xinzhiyuan&amp;rsquo;s summary of related leaks. The consistent picture is that &lt;code&gt;Gemini Spark BETA&lt;/code&gt; may be an always-on AI Agent that Google is preparing. Its positioning is no longer just a chat assistant, but an &amp;ldquo;everyday AI agent&amp;rdquo; that can handle email, online tasks, and multi-step workflows in the background.&lt;/p&gt;
&lt;p&gt;So the boundary should be clear first: this is a leak analysis, not an official Google announcement. All features, naming, and launch timing still need to be confirmed by Google.&lt;/p&gt;
&lt;h2 id=&#34;bottom-line&#34;&gt;Bottom line
&lt;/h2&gt;&lt;p&gt;Based on currently exposed information, Gemini Spark has three key points:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It may be a 24-hour online Agent inside the Gemini system, not a normal chat model.&lt;/li&gt;
&lt;li&gt;It may use broader personal context, including Google apps, chat history, tasks, logged-in websites, and location.&lt;/li&gt;
&lt;li&gt;Its risks are as large as its appeal, because it may involve information sharing, remote browser data, purchases, and third-party service calls.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If Google really launches Spark, Gemini&amp;rsquo;s role will change from &amp;ldquo;AI that answers questions&amp;rdquo; to &amp;ldquo;AI that continuously handles tasks for you.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;what-gemini-spark-is&#34;&gt;What Gemini Spark is
&lt;/h2&gt;&lt;p&gt;TestingCatalog reported on May 14, 2026 that Google is testing &lt;code&gt;Gemini Spark BETA&lt;/code&gt; inside Gemini Web. The exposed welcome text describes it as an everyday AI agent that can help users 24/7 with inbox, online tasks, and more multi-step work.&lt;/p&gt;
&lt;p&gt;The 36Kr / Xinzhiyuan article also says that after Spark was uncovered, the outside world saw a &amp;ldquo;full-time Agent&amp;rdquo; direction: it can stay on standby all day, process inboxes, execute online tasks, and may even involve purchases and information sharing.&lt;/p&gt;
&lt;p&gt;This means Spark is not simply a new model name. It looks more like a Gemini product-layer upgrade: bringing Gemini out of the conversation window and into users&amp;rsquo; email, web, calendar, tasks, and cross-app workflows.&lt;/p&gt;
&lt;h2 id=&#34;how-it-may-work&#34;&gt;How it may work
&lt;/h2&gt;&lt;p&gt;According to the hidden onboarding text disclosed by TestingCatalog, Gemini Spark may gather context from multiple sources, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Connected Apps.&lt;/li&gt;
&lt;li&gt;skills.&lt;/li&gt;
&lt;li&gt;chats.&lt;/li&gt;
&lt;li&gt;tasks.&lt;/li&gt;
&lt;li&gt;Websites the user has logged into.&lt;/li&gt;
&lt;li&gt;Personal intelligence.&lt;/li&gt;
&lt;li&gt;location.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This information would help Spark understand what the user wants to complete and call the necessary context while executing tasks. The text also says that, to complete some actions, Gemini may share necessary information with third parties, such as names, contact details, files, preferences, and information the user may consider sensitive.&lt;/p&gt;
&lt;p&gt;If these descriptions prove accurate, Spark will work more like a context-aware agent system than a one-shot Q&amp;amp;A assistant. It will not only look at the current prompt, but may combine long-term preferences, connected apps, browser state, and task history.&lt;/p&gt;
&lt;h2 id=&#34;why-it-matters&#34;&gt;Why it matters
&lt;/h2&gt;&lt;p&gt;The key to Gemini Spark is not one more chat entry point. It is that Google has a natural ecosystem entry point.&lt;/p&gt;
&lt;p&gt;OpenAI and Anthropic can build powerful Agents, but they do not naturally own the full chain of Gmail, Calendar, Drive, Chrome, Android, and Workspace. If Google connects Spark into these products, users will not need to build many extra workflows before letting an Agent enter daily work.&lt;/p&gt;
&lt;p&gt;This may bring three changes.&lt;/p&gt;
&lt;p&gt;First, Gemini may move from passive Q&amp;amp;A to active execution. Users may no longer only ask &amp;ldquo;summarize this email&amp;rdquo;; they may ask it to continuously organize the inbox, track tasks, and take follow-up actions.&lt;/p&gt;
&lt;p&gt;Second, Agents will rely more on personal context. The more it understands your email, calendar, files, browser state, and preferences, the more useful the result may be.&lt;/p&gt;
&lt;p&gt;Third, permission boundaries will become more sensitive. Doing more also means users need to know more clearly when it can act, how far it can go, and whether confirmation is required.&lt;/p&gt;
&lt;h2 id=&#34;where-the-risks-are&#34;&gt;Where the risks are
&lt;/h2&gt;&lt;p&gt;Several details in the onboarding text disclosed by TestingCatalog are worth watching.&lt;/p&gt;
&lt;p&gt;First, Spark is experimental. Even if it launches, it should not be treated as a fully mature system that needs no supervision.&lt;/p&gt;
&lt;p&gt;Second, although the system is designed to ask for permission before sensitive operations, the text also warns that it may share information or complete purchases without asking.&lt;/p&gt;
&lt;p&gt;Third, to maintain session continuity, Gemini will save remote browser data, such as login details and remote code execution data. Users can clear these data in Settings and can also disable Connected Apps and Personal intelligence.&lt;/p&gt;
&lt;p&gt;Taken together, these points show that Spark&amp;rsquo;s product direction is aggressive: it wants to be an Agent that can truly execute tasks, not only generate suggestions. But the closer it gets to real execution, the more it needs strict permissioning, auditing, confirmation, and rollback mechanisms.&lt;/p&gt;
&lt;h2 id=&#34;relationship-with-remy-and-ai-ultra&#34;&gt;Relationship with Remy and AI Ultra
&lt;/h2&gt;&lt;p&gt;TestingCatalog says Spark may be a renamed version of the agentic Gemini upgrade previously codenamed &lt;code&gt;Remy&lt;/code&gt;, and may also relate to the Gemini Agent direction for Google AI Ultra subscribers.&lt;/p&gt;
&lt;p&gt;If this clue is correct, Spark may not be a brand-new project from nowhere. It may be Google repackaging previously higher-end or more closed Agent capabilities and preparing to bring them to a wider audience.&lt;/p&gt;
&lt;p&gt;36Kr / Xinzhiyuan also describes it as an upgrade from &amp;ldquo;Remy&amp;rdquo; to &amp;ldquo;Spark&amp;rdquo;: Gemini Agent is no longer just a feature, but is moving toward a 24/7 digital life manager.&lt;/p&gt;
&lt;p&gt;But this is still a judgment based on leaked information. Whether Google will use &lt;code&gt;Spark&lt;/code&gt; as the official name, whether it will be limited to AI Ultra, and whether a lighter subscription tier will appear all need official confirmation.&lt;/p&gt;
&lt;h2 id=&#34;mcp-skills-and-the-tool-ecosystem&#34;&gt;MCP, skills, and the tool ecosystem
&lt;/h2&gt;&lt;p&gt;The same batch of community screenshots also showed model selector entries such as &lt;code&gt;MCP Tool Testing&lt;/code&gt;. The 36Kr article suggests this may hint that the new Gemini will natively support MCP third-party tool integration, with Thinking mode also being rebuilt.&lt;/p&gt;
&lt;p&gt;This clue becomes more interesting when viewed together with Spark.&lt;/p&gt;
&lt;p&gt;If Spark were only a chat assistant, skills and MCP would matter less. But if Spark is a long-running Agent, it needs to reliably call tools, access web pages, execute tasks, read and write context, and deliver results to users.&lt;/p&gt;
&lt;p&gt;In other words, Spark may not be a single feature. It may be part of Google&amp;rsquo;s Agent tool ecosystem: the model handles understanding and planning, while skills / MCP / connected apps handle execution and expansion.&lt;/p&gt;
&lt;h2 id=&#34;what-it-means-for-ordinary-users&#34;&gt;What it means for ordinary users
&lt;/h2&gt;&lt;p&gt;If Gemini Spark really launches, ordinary users may see these direct changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Email is not only summarized, but can be categorized, followed up, and turned into tasks.&lt;/li&gt;
&lt;li&gt;Web tasks are not only suggested, but may be continuously executed in a remote browser.&lt;/li&gt;
&lt;li&gt;Calendar, location, preferences, and previous chats become long-term Agent context.&lt;/li&gt;
&lt;li&gt;Purchases, bookings, form filling, and similar actions may enter the AI execution range.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This sounds convenient, but users will need new habits: not only checking what the AI says, but also what it is preparing to do, what it has already done, whether it can be undone, and whether there is a record.&lt;/p&gt;
&lt;p&gt;Future AI Agent experience may depend not only on model intelligence, but also on clear permission prompts, inspectable task logs, and recovery from mistakes.&lt;/p&gt;
&lt;h2 id=&#34;what-it-means-for-developers-and-teams&#34;&gt;What it means for developers and teams
&lt;/h2&gt;&lt;p&gt;For developers, Spark matters because Google may be moving Agents from &amp;ldquo;demo products&amp;rdquo; toward real workflow platforms.&lt;/p&gt;
&lt;p&gt;If Spark can reliably connect Google apps, third-party tools, and browser state, developers will care about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether APIs or extension mechanisms are open.&lt;/li&gt;
&lt;li&gt;Whether MCP or skills can be connected by third parties.&lt;/li&gt;
&lt;li&gt;Whether enterprise admins can control permissions, data retention, and audits.&lt;/li&gt;
&lt;li&gt;Whether Agent execution failures have traceable logs.&lt;/li&gt;
&lt;li&gt;Whether sandboxing, approval flows, and sensitive-operation confirmation are supported.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For teams, Spark may first enter high-frequency scenarios such as Gmail, Calendar, Docs, Drive, and Chrome. It may not be suitable for fully automated high-risk work at the beginning, but it is a strong fit for inbox triage, meeting follow-up, document organization, market research, and lightweight operations tasks.&lt;/p&gt;
&lt;h2 id=&#34;how-to-read-it-now&#34;&gt;How to read it now
&lt;/h2&gt;&lt;p&gt;This story is best understood as &amp;ldquo;high-confidence direction, low-certainty details.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The high-confidence direction is that Google is pushing Gemini Agents to be more proactive, longer-running, and more deeply integrated with its ecosystem. The Gemini Web test text reported by TestingCatalog, community screenshots, and 36Kr&amp;rsquo;s summary of multiple leaks all point in the same direction.&lt;/p&gt;
&lt;p&gt;The low-certainty details are the official name, launch timing, permission rules, subscription tiers, supported regions, API availability, and whether it will really be called Gemini Spark.&lt;/p&gt;
&lt;p&gt;The safest view for now:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Do not treat Spark as an already released official product.&lt;/li&gt;
&lt;li&gt;Treat it as a strong signal for Google&amp;rsquo;s next AI Agent direction.&lt;/li&gt;
&lt;li&gt;Wait for official explanations around permissions, privacy, third-party data sharing, and remote browser data storage.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;If &lt;code&gt;Gemini Spark&lt;/code&gt; eventually launches, it may be a key step in Gemini&amp;rsquo;s move from chat assistant to always-on Agent. It is not just a model swap; it places Gemini into Google&amp;rsquo;s ecosystem of email, web, tasks, location, personal intelligence, and third-party services.&lt;/p&gt;
&lt;p&gt;Its potential is large: more proactive, closer to real workflows, and easier to distribute to many users through Google&amp;rsquo;s ecosystem. Its risks are just as large: once AI can share information, save browser state, make purchases, and call third-party services, permission boundaries must be extremely clear.&lt;/p&gt;
&lt;p&gt;So the most important question about Gemini Spark is not &amp;ldquo;how smart is it&amp;rdquo;, but how Google plans to make a 24-hour online AI Agent controllable, auditable, and trustworthy.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.testingcatalog.com/google-prepares-gemini-spark-ai-agent-ahead-of-i-o-launch/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;TestingCatalog: Google prepares Gemini Spark AI Agent ahead of I/O launch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://36kr.com/p/3810432812162816&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;36Kr: Gemini 3.5 Pro leaked, coding reportedly catches up with GPT-5.5&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Gemini 3.5 Pro Leak: Codenamed Cappuccino, Google Tries to Regain Momentum in Coding and Agents</title>
        <link>https://knightli.com/en/2026/05/17/gemini-35-pro-cappuccino-spark-leak/</link>
        <pubDate>Sun, 17 May 2026 11:47:27 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/17/gemini-35-pro-cappuccino-spark-leak/</guid>
        <description>&lt;p&gt;Google has not officially released &lt;code&gt;Gemini 3.5 Pro&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;What we can see so far mainly comes from developer community screenshots, anonymous benchmarks, leakers, and media reports. On May 15, 2026, 36Kr / Xinzhiyuan reported that a next-generation Gemini checkpoint may be internally codenamed &lt;code&gt;Cappuccino&lt;/code&gt;, and that related models have already surfaced in communities and benchmark platforms.&lt;/p&gt;
&lt;p&gt;This information should not be treated as an official launch, but it points in a clear direction: Google is trying to address two gaps at once, coding and reasoning on one side, and always-on AI agents on the other.&lt;/p&gt;
&lt;h2 id=&#34;bottom-line&#34;&gt;Bottom line
&lt;/h2&gt;&lt;p&gt;This leak can be read in three layers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;Gemini 3.5 Pro&lt;/code&gt; has not been officially released, and &lt;code&gt;Cappuccino&lt;/code&gt; looks more like an internal checkpoint or candidate build.&lt;/li&gt;
&lt;li&gt;The leaked information suggests the new Gemini is improving in code generation, SVG / interactive web generation, and multimodal output.&lt;/li&gt;
&lt;li&gt;Google&amp;rsquo;s parallel test of &lt;code&gt;Gemini Spark&lt;/code&gt; may matter more than the model itself, because it points to a 24-hour personal AI agent.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In other words, this is not just a &amp;ldquo;model benchmark&amp;rdquo; story. It looks more like a product roadmap signal ahead of Google I/O: the model needs to catch up with GPT-5.5, while the agent layer needs to capture user workflows.&lt;/p&gt;
&lt;h2 id=&#34;what-cappuccino-is&#34;&gt;What Cappuccino is
&lt;/h2&gt;&lt;p&gt;The 36Kr article says a post from Lentils indicates that the &lt;code&gt;Gemini 3.5 Pro&lt;/code&gt; checkpoint codenamed &lt;code&gt;Cappuccino&lt;/code&gt; has started to appear. The community had been discussing &lt;code&gt;Gemini 3.2&lt;/code&gt; only hours earlier, but the latest leak jumped directly to &lt;code&gt;3.5&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If that naming is ultimately accurate, Google may want to frame the next Gemini as a larger version jump rather than a routine point release.&lt;/p&gt;
&lt;p&gt;For now, &lt;code&gt;Cappuccino&lt;/code&gt; should still be treated as a leaked internal codename. It does not mean Google has publicly launched the final model, and it does not guarantee that the final release name will be &lt;code&gt;Gemini 3.5 Pro&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;why-coding-is-the-focus&#34;&gt;Why coding is the focus
&lt;/h2&gt;&lt;p&gt;The most discussed part of the leak is the new Gemini&amp;rsquo;s coding ability.&lt;/p&gt;
&lt;p&gt;According to community screenshots and benchmark claims cited by 36Kr, the new model appears stronger at:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Generating SVG and visual components.&lt;/li&gt;
&lt;li&gt;Generating interactive web apps.&lt;/li&gt;
&lt;li&gt;Handling animation, 3D, adjustable control panels, and other complex frontend outputs.&lt;/li&gt;
&lt;li&gt;Improving logical reasoning and code generation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The article also cites Abacus.AI CEO Bindu Reddy as saying that &lt;code&gt;3.2 Flash&lt;/code&gt; is close to &lt;code&gt;GPT-5.5&lt;/code&gt; in coding and reasoning while being much cheaper. Other media sources reportedly believe the new Gemini roughly reaches the &lt;code&gt;GPT-5.5&lt;/code&gt; tier overall, but may not represent a qualitative leap.&lt;/p&gt;
&lt;p&gt;That is why the phrase &amp;ldquo;matches GPT-5.5&amp;rdquo; needs caution. It is more of a relative judgment from different leaks and anonymous tests than an official Google benchmark result.&lt;/p&gt;
&lt;h2 id=&#34;why-google-needs-to-catch-up-in-coding&#34;&gt;Why Google needs to catch up in coding
&lt;/h2&gt;&lt;p&gt;AI coding has moved from developer tooling into the center of foundation model competition.&lt;/p&gt;
&lt;p&gt;OpenAI has Codex, and Anthropic has Claude Code. They serve engineers, but they also bring product managers, designers, and operators into workflows where natural language can produce runnable products.&lt;/p&gt;
&lt;p&gt;By comparison, Google has Gemini and Antigravity, but it has not formed the same default entry point in developer mindshare. The 36Kr article also notes that Antigravity has not truly broken through externally, and that pricing, quota reminders, and experience stability have drawn community discussion.&lt;/p&gt;
&lt;p&gt;So if the new Gemini needs to prove itself, coding is the most direct battlefield. The question is not only whether it can write code, but whether it can reliably produce complete interfaces, understand complex requirements, call tools, fix errors, and fit into real development workflows.&lt;/p&gt;
&lt;h2 id=&#34;spark-may-matter-more-than-35-pro&#34;&gt;Spark may matter more than 3.5 Pro
&lt;/h2&gt;&lt;p&gt;In the same wave of leaks, &lt;code&gt;Gemini Spark BETA&lt;/code&gt; also surfaced.&lt;/p&gt;
&lt;p&gt;According to TestingCatalog and other sources, Spark is positioned like an always-on AI agent: it can process inboxes, execute online tasks, manage multi-step workflows, and connect context from Google apps, skill modules, chat history, scheduled tasks, logged-in websites, and location data.&lt;/p&gt;
&lt;p&gt;That means Spark is not a normal chat entry point. It may be a system that stays online, continuously reads context, and performs tasks for users.&lt;/p&gt;
&lt;p&gt;Its appeal is obvious: if Google can connect Gmail, Calendar, Chrome, Android, Workspace, and Gemini, Spark will have a distribution advantage that OpenAI and Anthropic cannot easily copy.&lt;/p&gt;
&lt;p&gt;The risk is just as obvious. The 36Kr article mentions wording around Spark saying it may share information or complete purchases without asking. Even if the system is designed to request permission before sensitive operations, this kind of agent still raises privacy, authorization-boundary, and accidental-action risks.&lt;/p&gt;
&lt;h2 id=&#34;what-this-means-for-ordinary-users&#34;&gt;What this means for ordinary users
&lt;/h2&gt;&lt;p&gt;If you are a regular Gemini user, the most important part of this leak is not the model name. It is three shifts.&lt;/p&gt;
&lt;p&gt;First, Google may continue to strengthen the ability to produce complete results. Users have often complained that Gemini can be lazy with visual generation, SVG, and frontend pages. If the new model can produce several complete options in one pass, the experience will improve noticeably.&lt;/p&gt;
&lt;p&gt;Second, coding ability may continue to move into lighter models. The leak repeatedly mentions Flash improvements in coding, reasoning, and interactive generation, which means complex tasks may not always require Pro models in the future.&lt;/p&gt;
&lt;p&gt;Third, agents will become more proactive. If Spark launches, Gemini may no longer just answer questions. It may start taking over email, web tasks, purchases, calendars, and cross-app workflows over longer periods.&lt;/p&gt;
&lt;p&gt;That is good for efficiency, but it creates a new challenge for permission management.&lt;/p&gt;
&lt;h2 id=&#34;what-this-means-for-developers&#34;&gt;What this means for developers
&lt;/h2&gt;&lt;p&gt;Developers should watch two issues more closely.&lt;/p&gt;
&lt;p&gt;The first is tooling. The 36Kr article says community screenshots showed an unreleased entry called &lt;code&gt;MCP Tool Testing&lt;/code&gt; in the model selector. If Gemini natively supports MCP or third-party tool testing, it will be easier to connect it to developers&amp;rsquo; own toolchains.&lt;/p&gt;
&lt;p&gt;The second is cost and stability. Even if the new Gemini matches GPT-5.5 on some benchmarks, developers will ultimately judge three things: actual code quality, context stability, and whether pricing and quotas are predictable.&lt;/p&gt;
&lt;p&gt;The past year of AI coding tool competition has shown that model capability is only the ticket in. What keeps developers is whether the tool can reliably edit code, run tests, read context, and handle edge cases in daily projects.&lt;/p&gt;
&lt;h2 id=&#34;how-to-read-this-news-now&#34;&gt;How to read this news now
&lt;/h2&gt;&lt;p&gt;This story is best understood as &amp;ldquo;strong signal, weak confirmation.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The strong signal is that multiple community clues point to Google preparing a stronger new Gemini and a more proactive Gemini Spark Agent.&lt;/p&gt;
&lt;p&gt;The weak confirmation is that &lt;code&gt;Gemini 3.5 Pro&lt;/code&gt; has not been officially released, &lt;code&gt;Cappuccino&lt;/code&gt; remains a leaked codename, and claims that it &amp;ldquo;matches GPT-5.5&amp;rdquo; still need validation through official Google benchmarks, third-party tests, and real user experience.&lt;/p&gt;
&lt;p&gt;The safest view for now:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Do not treat it as a released product.&lt;/li&gt;
&lt;li&gt;Treat it as an early preview of Google&amp;rsquo;s next Gemini direction.&lt;/li&gt;
&lt;li&gt;Watch whether I/O or later official events confirm the model name, API availability, pricing, context window, tool calling, and agent permission boundaries.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;The exposure of &lt;code&gt;Gemini 3.5 Pro / Cappuccino&lt;/code&gt; suggests Google may be preparing a stronger next-generation Gemini push. It is not trying to fix one isolated capability, but a whole AI workflow: the model needs to write code better, generate interfaces, and handle complex reasoning, while Spark pushes Gemini toward an always-on agent.&lt;/p&gt;
&lt;p&gt;But before an official release, all benchmarks and screenshots remain clues. What will decide whether Gemini 3.5 Pro can regain momentum is not whether the codename sounds good, but whether it can reliably win in real development, real office work, and real multi-step tasks.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://m.36kr.com/p/3810432812162816&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;36Kr: Gemini 3.5 Pro leaked, coding performance reportedly catches up with GPT-5.5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.testingcatalog.com/google-prepares-gemini-spark-ai-agent-ahead-of-i-o-launch/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;TestingCatalog: Google prepares Gemini Spark AI agent ahead of I/O launch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://x.com/alexeheath/status/2054747125616169229&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;X: Alex Heath on the new Gemini and GPT-5.5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://x.com/Lentils80/status/2054628116094501377&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;X: Lentils on Gemini 3.5 / Cappuccino&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>easy-vibe: A Learning Map for Vibe Coding Beginners</title>
        <link>https://knightli.com/en/2026/05/16/easy-vibe-vibe-coding-learning-map/</link>
        <pubDate>Sat, 16 May 2026 22:44:43 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/16/easy-vibe-vibe-coding-learning-map/</guid>
        <description>&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/datawhalechina/easy-vibe&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;easy-vibe&lt;/a&gt; is an open source Vibe Coding learning project from Datawhale. It is not aimed at developers who are already fluent with AI coding tools. It is aimed at students, product managers, designers, operators, indie developers, and technical hobbyists who are just starting with Vibe Coding.&lt;/p&gt;
&lt;p&gt;The value of this project is not that it lists another batch of AI tools. It turns &amp;ldquo;how to start building projects with AI&amp;rdquo; into a learning path that is easier to understand. For many beginners, the hard part is not knowing that Claude Code, Cursor, MCP, or Agents exist. The hard part is knowing what to learn first, how to practice, and when to move into more advanced tools.&lt;/p&gt;
&lt;h2 id=&#34;beginners-need-a-path-most&#34;&gt;Beginners Need a Path Most
&lt;/h2&gt;&lt;p&gt;Vibe Coding has become popular in recent years, but it is not very friendly to beginners.&lt;/p&gt;
&lt;p&gt;On the surface, as long as you can describe a requirement, you can ask AI to write code. In reality, as soon as the task becomes slightly more complex, problems appear: the requirement is unclear, the model edits the wrong file, the project structure is confusing, errors are hard to handle, dependencies fail to install, prompts become messier, and the workflow falls back to &amp;ldquo;copy code into a chat box&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;So getting started with Vibe Coding cannot only mean learning &amp;ldquo;how to write prompts&amp;rdquo;. It needs to solve several things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How to split an idea into executable tasks;&lt;/li&gt;
&lt;li&gt;How to let AI understand a project structure;&lt;/li&gt;
&lt;li&gt;How to read code generated by the model;&lt;/li&gt;
&lt;li&gt;How to handle errors and iterate;&lt;/li&gt;
&lt;li&gt;How to use the terminal and local development environment;&lt;/li&gt;
&lt;li&gt;How to move from web chat to real AI coding tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is where easy-vibe matters: it tries to organize these topics into a learning route, instead of leaving beginners lost among tools, tutorials, and terminology.&lt;/p&gt;
&lt;h2 id=&#34;it-is-a-roadmap-not-a-single-tutorial&#34;&gt;It Is a Roadmap, Not a Single Tutorial
&lt;/h2&gt;&lt;p&gt;According to the project description, easy-vibe covers basic tutorials, interactive exercises, visual content, RAG, terminal tools, AI coding tools, and more advanced topics such as Claude Code, MCP, Skills, and Agent Teams.&lt;/p&gt;
&lt;p&gt;This structure is suitable for beginners because AI coding is not a single skill. It is a combination of abilities:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Describing requirements;&lt;/li&gt;
&lt;li&gt;Splitting tasks;&lt;/li&gt;
&lt;li&gt;Reading projects;&lt;/li&gt;
&lt;li&gt;Asking the model to edit code;&lt;/li&gt;
&lt;li&gt;Running and verifying results;&lt;/li&gt;
&lt;li&gt;Iterating based on errors;&lt;/li&gt;
&lt;li&gt;Turning repeated workflows into tools or skills.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you only learn one tool, it is easy to be constrained by that tool&amp;rsquo;s interface. Switch models, editors, or CLIs, and the workflow becomes unclear again. A roadmap helps build the working method first, then places tools where they belong.&lt;/p&gt;
&lt;h2 id=&#34;especially-useful-for-non-programmers&#34;&gt;Especially Useful for Non-Programmers
&lt;/h2&gt;&lt;p&gt;The biggest appeal of Vibe Coding is that it lets non-professional programmers build prototypes.&lt;/p&gt;
&lt;p&gt;Product managers can turn product ideas into interactive demos. Designers can validate interaction logic. Operators can write internal tools. Students can quickly build course projects. Founders can validate demand early. These people do not necessarily need to become full-time engineers in the traditional sense, but they do need a method for &amp;ldquo;letting AI help me turn ideas into working things&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;This is also why easy-vibe fits the Chinese community. Many Chinese users already know AI can write code, but they still lack systematic beginner materials. Development environment, prompts, project structure, debugging methods, and Agent tools are easier to learn when explained clearly in Chinese and paired with exercises.&lt;/p&gt;
&lt;p&gt;For these users, the most important thing is not to learn a complex framework immediately. It is to complete a full loop first: propose a requirement, generate a project, run it, find problems, keep modifying, and finally get a usable version.&lt;/p&gt;
&lt;h2 id=&#34;the-advanced-part-moves-toward-real-ai-development-workflows&#34;&gt;The Advanced Part Moves Toward Real AI Development Workflows
&lt;/h2&gt;&lt;p&gt;The Claude Code, MCP, Skills, and Agent Teams mentioned in easy-vibe are no longer just beginner concepts.&lt;/p&gt;
&lt;p&gt;Claude Code represents terminal coding Agents: the model can enter a local project, read files, edit code, and run commands. MCP solves tool and data source integration, so the model is not trapped in a chat box. Skills preserve reusable workflows, such as fixed project generation, document organization, test checks, or content production processes. Agent Teams further split tasks across multiple agents.&lt;/p&gt;
&lt;p&gt;These topics may feel distant for beginners, but they are worth understanding early. The direction of Vibe Coding is already clear: from &amp;ldquo;let AI write a piece of code&amp;rdquo; to &amp;ldquo;let AI participate in a complete project workflow&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;If a learning route stops at prompts, it will quickly fall behind tool evolution. On the other hand, if every advanced concept is thrown at beginners immediately, they will not know where to start. The useful part of easy-vibe is that it places these topics on a gradual upgrade path.&lt;/p&gt;
&lt;h2 id=&#34;two-mistakes-to-avoid&#34;&gt;Two Mistakes to Avoid
&lt;/h2&gt;&lt;p&gt;The first mistake is thinking that Vibe Coding means you can ignore code entirely.&lt;/p&gt;
&lt;p&gt;AI can generate a lot, but the user still needs to judge whether the result is correct. At minimum, you need to understand the project structure, know how to run it, and roughly know where an error is happening. Even if you do not write complex code, you still need basic engineering common sense.&lt;/p&gt;
&lt;p&gt;The second mistake is thinking that more advanced tools are always better.&lt;/p&gt;
&lt;p&gt;Beginners do not necessarily need Claude Code, MCP, or multiple Agents at the start. A better order is to first build a feedback loop with simple projects, then gradually introduce the terminal, version control, testing, tool calling, and automated workflows. Tools should match task complexity; otherwise they look powerful but have no clear use.&lt;/p&gt;
&lt;h2 id=&#34;how-to-use-it&#34;&gt;How to Use It
&lt;/h2&gt;&lt;p&gt;If you are just starting with Vibe Coding, you can use easy-vibe as a learning checklist.&lt;/p&gt;
&lt;p&gt;Start with basic concepts and simple exercises. Do not rush to chase every tool. Build a small project, such as a personal homepage, data dashboard, form tool, automation script, or knowledge base demo. During the process, observe where AI helps and where you still need to confirm things yourself.&lt;/p&gt;
&lt;p&gt;Once you can complete small projects consistently, move into more complex topics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use terminal tools to work with local projects;&lt;/li&gt;
&lt;li&gt;Use Git to manage each change;&lt;/li&gt;
&lt;li&gt;Use RAG to connect your own materials;&lt;/li&gt;
&lt;li&gt;Use MCP to connect external tools;&lt;/li&gt;
&lt;li&gt;Use Skills to solidify repeated workflows;&lt;/li&gt;
&lt;li&gt;Use Agent Teams to split complex tasks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Learning Vibe Coding this way is not just learning to ask AI. It is learning to put AI into your own workflow.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;easy-vibe is best seen as a Chinese learning map for Vibe Coding. It organizes scattered AI coding concepts, tools, and exercises into a route that helps beginners move from &amp;ldquo;I heard AI can write code&amp;rdquo; to &amp;ldquo;I can build a project with AI&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;The real value of Vibe Coding is not that it lets people skip all learning. It lowers the threshold from idea to prototype. You still need to understand requirements, organize tasks, verify results, and control risks. But many repetitive, tedious, and blocking steps can be handled with AI assistance.&lt;/p&gt;
&lt;p&gt;If you want a systematic entry point into AI coding, without getting trapped immediately in tool names and complex engineering setup, easy-vibe is a good place to start.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Anthropic financial-services: Reusable Templates for Financial Agents</title>
        <link>https://knightli.com/en/2026/05/16/anthropic-financial-services-agent-templates/</link>
        <pubDate>Sat, 16 May 2026 22:43:08 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/16/anthropic-financial-services-agent-templates/</guid>
        <description>&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/anthropics/financial-services&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;anthropics/financial-services&lt;/a&gt; is a reference project from Anthropic for the financial services industry. It is not a single application, but a set of examples that can be studied and reused separately: Agents, Plugins, Skills, MCP connectors, and prompts and integration patterns designed around financial workflows.&lt;/p&gt;
&lt;p&gt;This project is worth watching not because it provides a &amp;ldquo;universal financial assistant&amp;rdquo;, but because it breaks common AI implementation problems in finance into more concrete components: what kind of Agent each role needs, which data sources need to be connected, which tasks can be automated, and which steps still require human judgment.&lt;/p&gt;
&lt;h2 id=&#34;it-is-more-like-a-showroom-for-financial-agents&#34;&gt;It Is More Like a Showroom for Financial Agents
&lt;/h2&gt;&lt;p&gt;When companies talk about AI Agents, the discussion can easily stay abstract: reading files, querying data, writing reports, and calling tools. Once the scenario enters finance, the questions become much more specific.&lt;/p&gt;
&lt;p&gt;Investment banking analysts need to organize company materials, generate transaction briefs, and compare comparable companies. Equity research needs to read filings, follow news, perform valuation, and analyze risks. Private equity and asset management teams need to screen deals, write memos, and track portfolio companies. Wealth management needs to place client profiles, market information, and investment advice within a compliance framework.&lt;/p&gt;
&lt;p&gt;These scenarios cannot be handled by a generic chat box alone. They require roles, processes, data sources, output formats, and permission boundaries. The value of this Anthropic repository is that it turns multiple typical financial services roles and tasks into Agent templates that can be used as references.&lt;/p&gt;
&lt;h2 id=&#34;why-provide-agents-plugins-skills-and-mcp-together&#34;&gt;Why Provide Agents, Plugins, Skills, and MCP Together
&lt;/h2&gt;&lt;p&gt;Judging from the project structure, Anthropic did not only provide a set of prompts. It provides several kinds of components at the same time. This maps to several layers of enterprise Agent implementation.&lt;/p&gt;
&lt;p&gt;Agents are more like work units for roles or tasks. They define what the agent should do, how it should do it, when to call tools, and how to produce output.&lt;/p&gt;
&lt;p&gt;Plugins are more like external capability extensions. Financial work rarely happens only inside the model. It often needs to connect databases, document systems, market data, CRM, research libraries, and internal workflow systems.&lt;/p&gt;
&lt;p&gt;Skills are reusable professional capability packages. Fixed analysis frameworks, report structures, checklists, and data processing methods can be turned into skills instead of being rewritten as prompts every time.&lt;/p&gt;
&lt;p&gt;MCP connectors solve tool integration and context standardization. For enterprises, the more tools there are, the more they need a relatively unified way to connect them. Otherwise every system needs separate adaptation, and maintenance cost rises quickly.&lt;/p&gt;
&lt;p&gt;Only when these pieces are combined does the result begin to resemble a real enterprise AI workflow.&lt;/p&gt;
&lt;h2 id=&#34;why-finance-is-a-good-industry-for-agent-examples&#34;&gt;Why Finance Is a Good Industry for Agent Examples
&lt;/h2&gt;&lt;p&gt;Financial services is a good industry for showing Agents because it has three traits at the same time.&lt;/p&gt;
&lt;p&gt;First, information density is high. Financial work relies heavily on filings, announcements, meeting notes, research reports, trading data, client records, and regulatory documents. If a model only relies on general knowledge, it quickly becomes ineffective. It must connect to real data sources.&lt;/p&gt;
&lt;p&gt;Second, output formats are stable. Investment memos, company profiles, KYC documents, research summaries, client briefings, and fund operation reports all have relatively fixed structures. This makes it easier for Agents to form verifiable workflows.&lt;/p&gt;
&lt;p&gt;Third, risk boundaries are clear. Finance has strict requirements for compliance, auditability, permissions, and traceability. AI cannot casually provide investment advice or bypass approval processes. This forces Agent design to become more engineering-driven: keep references, separate facts from inferences, record tool calls, and limit executable actions.&lt;/p&gt;
&lt;p&gt;That means this project is not only for financial companies. Any team building enterprise Agents can use it to observe how Anthropic decomposes industry scenarios.&lt;/p&gt;
&lt;h2 id=&#34;what-typical-workflows-it-covers&#34;&gt;What Typical Workflows It Covers
&lt;/h2&gt;&lt;p&gt;According to the project description, the repository covers several financial services areas, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Investment banking;&lt;/li&gt;
&lt;li&gt;Equity research;&lt;/li&gt;
&lt;li&gt;Private equity;&lt;/li&gt;
&lt;li&gt;Wealth management;&lt;/li&gt;
&lt;li&gt;Fund operations;&lt;/li&gt;
&lt;li&gt;KYC and compliance-related workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These workflows have one thing in common: they all require a lot of reading, organizing, comparison, and structured document generation. The best role for AI here is not to make decisions directly, but to reduce the time spent on information processing and document production.&lt;/p&gt;
&lt;p&gt;For example, in investment banking, an Agent can help organize target company information, extract key financial metrics, and generate a first draft of a transaction summary. In research, it can read filings and news first, then list key changes and open questions. In KYC, it can help check whether materials are complete and whether there are unusual signals.&lt;/p&gt;
&lt;p&gt;The final judgment should still belong to professionals. The Agent&amp;rsquo;s role is closer to assistant, analyst, and workflow accelerator.&lt;/p&gt;
&lt;h2 id=&#34;what-it-suggests-for-enterprise-adoption&#34;&gt;What It Suggests for Enterprise Adoption
&lt;/h2&gt;&lt;p&gt;The most useful part of this repository is that it turns &amp;ldquo;model capability&amp;rdquo; into &amp;ldquo;business components&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Internal AI projects often run into the same problem: model demos look impressive, but once they are connected to real business, they are hard to reuse. One team writes one set of prompts, another team writes another. One system connects a database, another builds its own interface. Security and audit requirements are scattered everywhere.&lt;/p&gt;
&lt;p&gt;A steadier approach is to split capabilities into several types of assets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Role-oriented Agents;&lt;/li&gt;
&lt;li&gt;Process-oriented Skills;&lt;/li&gt;
&lt;li&gt;MCP connectors for system integration;&lt;/li&gt;
&lt;li&gt;Execution rules for permissions and audit;&lt;/li&gt;
&lt;li&gt;Templates and checklists for business output.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The benefit is that the enterprise does not restart from &amp;ldquo;building a chatbot&amp;rdquo; every time. It gradually accumulates maintainable AI workflow assets.&lt;/p&gt;
&lt;h2 id=&#34;compliance-and-responsibility-boundaries-cannot-be-ignored&#34;&gt;Compliance and Responsibility Boundaries Cannot Be Ignored
&lt;/h2&gt;&lt;p&gt;The easiest misunderstanding around financial Agents is treating &amp;ldquo;can generate analysis&amp;rdquo; as &amp;ldquo;can replace decisions&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;In financial services, AI output should usually be treated as supporting material. It can organize facts, draft documents, highlight risks, and complete files, but it cannot bypass investment research, risk control, legal, compliance, and suitability requirements. Especially when investment advice, trading decisions, asset allocation, or identity checks are involved, human approval and responsibility chains must remain.&lt;/p&gt;
&lt;p&gt;That is why enterprise Agents cannot be evaluated only by answer quality. They must also be evaluated by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether data sources are reliable;&lt;/li&gt;
&lt;li&gt;Whether references and evidence are traceable;&lt;/li&gt;
&lt;li&gt;Whether tool calls are recorded;&lt;/li&gt;
&lt;li&gt;Whether sensitive data is restricted;&lt;/li&gt;
&lt;li&gt;Whether output has human confirmation;&lt;/li&gt;
&lt;li&gt;Whether wrong results can be discovered and rolled back.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If these questions are not solved, the more automated the Agent becomes, the larger the risk radius becomes.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;anthropics/financial-services is more like a financial Agent reference implementation than an out-of-the-box financial product. It shows one way Anthropic thinks about enterprise AI adoption: do not build only generic chat assistants; organize Agents around specific roles, specific workflows, specific data sources, and specific permission boundaries.&lt;/p&gt;
&lt;p&gt;For financial institutions, it can serve as a reference for designing internal AI workflows. For developers, it is a sample for observing enterprise Agent architecture: Agents handle roles and tasks, Skills preserve professional processes, Plugins and MCP connect external systems, and the model eventually enters real business workflows.&lt;/p&gt;
&lt;p&gt;If early AI tools solved &amp;ldquo;how to make models answer questions&amp;rdquo;, projects like this care more about &amp;ldquo;how to let models participate in work within controlled boundaries&amp;rdquo;. That is where enterprise Agents become truly difficult.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>DeepSeek-TUI: Turning DeepSeek V4 into a Terminal Coding Agent</title>
        <link>https://knightli.com/en/2026/05/16/deepseek-tui-terminal-coding-agent/</link>
        <pubDate>Sat, 16 May 2026 22:41:41 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/16/deepseek-tui-terminal-coding-agent/</guid>
        <description>&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Hmbown/DeepSeek-TUI&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;DeepSeek-TUI&lt;/a&gt; is an open source project that brings DeepSeek V4 into terminal-based development workflows. It is not just a chat wrapper. It is closer to a &amp;ldquo;command-line coding agent&amp;rdquo; like Claude Code or Codex CLI: it can read files, edit code, run commands, call tools, and keep working through tasks in a TUI.&lt;/p&gt;
&lt;p&gt;If you already switch between an editor and a terminal, the value of this kind of tool is straightforward: you do not need to copy code back and forth into a web chat window, and you do not need to manually describe the whole project structure. You give it a task, and it can read context from the current workspace, plan steps, make changes, then return the result for your review.&lt;/p&gt;
&lt;h2 id=&#34;it-solves-the-entry-point-problem-for-deepseek&#34;&gt;It Solves the Entry Point Problem for DeepSeek
&lt;/h2&gt;&lt;p&gt;DeepSeek models already provide strong reasoning and coding capabilities, but model capability needs an engineering layer before it can land in real development workflows.&lt;/p&gt;
&lt;p&gt;Web chat is suitable for asking questions, but not for long-running project edits. APIs are suitable for system integration, but individual developers still need to build tool calling, context management, file operations, and permission control themselves. DeepSeek-TUI tries to fill this layer: it wraps DeepSeek V4 into an Agent that can work inside the terminal.&lt;/p&gt;
&lt;p&gt;According to the project description, its main capabilities include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A terminal TUI;&lt;/li&gt;
&lt;li&gt;Conversation and task execution for DeepSeek V4;&lt;/li&gt;
&lt;li&gt;Tool calling and file operations;&lt;/li&gt;
&lt;li&gt;1M context support;&lt;/li&gt;
&lt;li&gt;Auto mode;&lt;/li&gt;
&lt;li&gt;Sub-agents;&lt;/li&gt;
&lt;li&gt;Sandboxed execution;&lt;/li&gt;
&lt;li&gt;A persistent task queue.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Together, these features are not aimed at making the model sound more human. They are aimed at making the model easier to bring into the development environment.&lt;/p&gt;
&lt;h2 id=&#34;a-tui-fits-long-tasks-better-than-plain-cli-text&#34;&gt;A TUI Fits Long Tasks Better Than Plain CLI Text
&lt;/h2&gt;&lt;p&gt;Many AI CLI tools start with plain text interaction: enter a prompt, wait for output, then copy commands or add more context. This is simple, but longer tasks quickly become messy.&lt;/p&gt;
&lt;p&gt;The advantage of a TUI is that it can place conversations, files, execution results, and task status in a more stable interface. For a coding Agent, that matters. A code task is rarely a single question and answer. It often includes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Understanding the project structure;&lt;/li&gt;
&lt;li&gt;Finding relevant files;&lt;/li&gt;
&lt;li&gt;Editing code;&lt;/li&gt;
&lt;li&gt;Running tests or commands;&lt;/li&gt;
&lt;li&gt;Fixing issues based on errors;&lt;/li&gt;
&lt;li&gt;Summarizing changes.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If the interface is only a stream of logs, it is hard for the user to see where the Agent is in the process. A TUI at least provides a better place to observe and take over.&lt;/p&gt;
&lt;h2 id=&#34;auto-mode-is-best-for-tasks-with-clear-boundaries&#34;&gt;Auto Mode Is Best for Tasks with Clear Boundaries
&lt;/h2&gt;&lt;p&gt;The Auto mode mentioned by DeepSeek-TUI is best for tasks with clear boundaries. For example: fixing a small bug, adding a script, changing a configuration, organizing a set of documents, or implementing a local feature.&lt;/p&gt;
&lt;p&gt;These tasks have something in common: the goal is clear, the verification method is clear, and the impact scope is controllable. The Agent can inspect files, edit files, run commands, and then hand the result back to the user for confirmation.&lt;/p&gt;
&lt;p&gt;But Auto mode should not mean unlimited permission. In real projects, file deletion, large-scale refactors, database migrations, and deployment commands should all require explicit confirmation. The efficiency of coding Agents comes from automation, but so does the risk. The more a tool can execute commands, the more it needs sandboxing, permission boundaries, and human review.&lt;/p&gt;
&lt;h2 id=&#34;sub-agents-matter-because-they-split-tasks&#34;&gt;Sub-Agents Matter Because They Split Tasks
&lt;/h2&gt;&lt;p&gt;Sub-agents are not a new concept, but they are useful in coding scenarios.&lt;/p&gt;
&lt;p&gt;A moderately complex task usually requires several kinds of work at the same time: someone reads the code, someone changes the implementation, someone checks tests, and someone organizes documentation. Traditional multi-agent systems often feel ornamental because they have no real tools or real workspace; they only discuss inside a conversation.&lt;/p&gt;
&lt;p&gt;If sub-agents can work with the file system, command execution, and task queues, they become more like a task decomposition mechanism. For example, one sub-agent can analyze dependencies, another can modify a specific module, and the main agent can integrate the result. This can reduce the problem of putting too much unrelated information into one context.&lt;/p&gt;
&lt;p&gt;Of course, sub-agents also add cost: more tokens, more complex state, and responsibility boundaries that are harder to track. They are better suited to medium-complexity tasks and above, not necessarily every small edit.&lt;/p&gt;
&lt;h2 id=&#34;1m-context-is-not-magic-but-it-helps-with-projects&#34;&gt;1M Context Is Not Magic, but It Helps with Projects
&lt;/h2&gt;&lt;p&gt;1M context sounds exaggerated, but in coding scenarios it is not just a marketing number.&lt;/p&gt;
&lt;p&gt;The context of a real codebase is fragmented: README files, configuration files, type definitions, tests, call chains, historical conventions, and error logs can all affect one change. Longer context can reduce the problem of editing after seeing only a local fragment, and it can help the model retain more project constraints.&lt;/p&gt;
&lt;p&gt;Still, longer context does not automatically mean better judgment. Code tasks still need retrieval, filtering, and verification. Putting an entire project into context is not necessarily better than reading the relevant files precisely. A good coding Agent should treat long context as a buffer, not as a shortcut that replaces engineering judgment.&lt;/p&gt;
&lt;h2 id=&#34;who-it-is-best-for&#34;&gt;Who It Is Best For
&lt;/h2&gt;&lt;p&gt;DeepSeek-TUI is better suited to several groups:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Developers who want to use DeepSeek for coding tasks in the terminal;&lt;/li&gt;
&lt;li&gt;People who do not want to build tool calling and file operation frameworks themselves;&lt;/li&gt;
&lt;li&gt;Users familiar with Claude Code or Codex CLI who want to try a DeepSeek-based entry point;&lt;/li&gt;
&lt;li&gt;People who need local project context instead of only asking about code snippets in a web page;&lt;/li&gt;
&lt;li&gt;Developers who want to put AI coding workflows into a command-line environment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only occasionally ask how to write a function, web chat is enough. If you want the model to participate directly in project edits, a terminal Agent becomes more meaningful.&lt;/p&gt;
&lt;h2 id=&#34;risks-to-watch&#34;&gt;Risks to Watch
&lt;/h2&gt;&lt;p&gt;There are three things to watch most closely with this kind of tool.&lt;/p&gt;
&lt;p&gt;The first is permissions. As long as a tool can read and write files or execute commands, you need to know what it can access by default, whether it can delete files, whether it can access the network, and whether dangerous commands require confirmation.&lt;/p&gt;
&lt;p&gt;The second is rollback. Before using it, it is best to keep the Git working tree clean, so every Agent change can be clearly seen through &lt;code&gt;git diff&lt;/code&gt;. Do not let an Agent automatically edit a project while many unrelated changes are already uncommitted.&lt;/p&gt;
&lt;p&gt;The third is verification. Code written by an Agent does not mean the task is complete. Tests, builds, linting, and human review still need to remain. AI coding tools can speed up progress, but they cannot replace final engineering confirmation.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;The significance of DeepSeek-TUI is not that it adds another chat client. It puts DeepSeek V4 into a terminal environment that is closer to real development work.&lt;/p&gt;
&lt;p&gt;For developers, model capability is only the first step. The real experience depends on whether it can read a project, safely edit files, run verification commands, maintain state in long tasks, and let the user take over at any time.&lt;/p&gt;
&lt;p&gt;If you want to use DeepSeek for daily code changes, project reading, and automated development tasks, DeepSeek-TUI is worth watching. The direction is also clear: AI coding tools are moving from &amp;ldquo;answering code questions&amp;rdquo; to &amp;ldquo;participating in project execution.&amp;rdquo;&lt;/p&gt;
</description>
        </item>
        <item>
        <title>How Did AI Agents Evolve? A Complete 2022-2026 Five-Generation Timeline</title>
        <link>https://knightli.com/en/2026/05/16/ai-agent-evolution-2022-2026/</link>
        <pubDate>Sat, 16 May 2026 19:19:52 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/16/ai-agent-evolution-2022-2026/</guid>
        <description>&lt;p&gt;AI Agents did not appear overnight.&lt;/p&gt;
&lt;p&gt;At the end of 2022, ChatGPT was still mainly a chat window. By 2026, agents had begun to gain tool calling, file operations, computer control, long-term memory, remote collaboration, and persistent execution. In four years, they moved from &amp;ldquo;models that answer questions&amp;rdquo; toward &amp;ldquo;digital workers that can move tasks forward.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;If we look at the timeline, AI Agents have roughly gone through five generations. Each generation solved the previous one&amp;rsquo;s core limitation, while creating new bubbles and new safety problems.&lt;/p&gt;
&lt;h2 id=&#34;overview-five-generations-of-agents&#34;&gt;Overview: five generations of Agents
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Stage&lt;/th&gt;
          &lt;th&gt;Time&lt;/th&gt;
          &lt;th&gt;Keyword&lt;/th&gt;
          &lt;th&gt;Capability shift&lt;/th&gt;
          &lt;th&gt;Core problem&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Generation 0&lt;/td&gt;
          &lt;td&gt;Late 2022 - early 2023&lt;/td&gt;
          &lt;td&gt;Chat box&lt;/td&gt;
          &lt;td&gt;Generates text, but cannot act&lt;/td&gt;
          &lt;td&gt;Model and real world are disconnected&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Generation 1&lt;/td&gt;
          &lt;td&gt;Mid-2023 - late 2023&lt;/td&gt;
          &lt;td&gt;Tool calling&lt;/td&gt;
          &lt;td&gt;Outputs structured calls, connects APIs and RAG&lt;/td&gt;
          &lt;td&gt;Open-loop execution and task drift&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Generation 2&lt;/td&gt;
          &lt;td&gt;Late 2023 - 2024&lt;/td&gt;
          &lt;td&gt;Engineered workflows&lt;/td&gt;
          &lt;td&gt;Planning, state, reflection, and multi-agent collaboration&lt;/td&gt;
          &lt;td&gt;Workflows are easy to copy; low-code bubble&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Generation 3&lt;/td&gt;
          &lt;td&gt;2024 - 2025&lt;/td&gt;
          &lt;td&gt;Computer Use&lt;/td&gt;
          &lt;td&gt;Sees screens, clicks, and operates GUIs&lt;/td&gt;
          &lt;td&gt;Permission, safety, and misoperation risks&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Generation 4&lt;/td&gt;
          &lt;td&gt;2025 - 2026&lt;/td&gt;
          &lt;td&gt;MCP / Skills / persistence&lt;/td&gt;
          &lt;td&gt;Tool networks, long-term context, and professional skills&lt;/td&gt;
          &lt;td&gt;Persistent execution expands the risk radius&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Generation 5 preview&lt;/td&gt;
          &lt;td&gt;After 2026&lt;/td&gt;
          &lt;td&gt;Loops and world models&lt;/td&gt;
          &lt;td&gt;Stronger memory, validation, and physical action&lt;/td&gt;
          &lt;td&gt;Governance becomes harder&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;late-2022-generation-0-the-chatgpt-chat-box-era&#34;&gt;Late 2022: Generation 0, the ChatGPT chat-box era
&lt;/h2&gt;&lt;p&gt;Generation 0 begins with the release of ChatGPT on November 30, 2022.&lt;/p&gt;
&lt;p&gt;This generation was not yet a real Agent. It had strong language generation ability, but it was mostly trapped in a chat box. It could write Python code, but not run it on your computer. It could plan a trip, but not book tickets. It could tell you how to edit a file, but not enter the file system and make the change.&lt;/p&gt;
&lt;p&gt;Its capability boundary was clear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;understand natural language;&lt;/li&gt;
&lt;li&gt;generate articles, answers, code, and plans;&lt;/li&gt;
&lt;li&gt;no active access to fresh data;&lt;/li&gt;
&lt;li&gt;no stable access to internal company knowledge;&lt;/li&gt;
&lt;li&gt;no external action;&lt;/li&gt;
&lt;li&gt;no long-term task state.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The core issue was the break between model capability and the real world. It could think and speak, but not act.&lt;/p&gt;
&lt;p&gt;This stage also produced the first bubble: prompt engineers, prompt template markets, prompt courses, and prompt certifications. Early models were indeed sensitive to prompts, but the market mistook a temporary patch for a long-term moat.&lt;/p&gt;
&lt;p&gt;As GPT-4-level models, system prompts, function calling, and better product defaults matured, many prompt templates lost scarcity. This pattern would repeat: a new capability creates a middle layer; the next generation internalizes it; the middle layer evaporates.&lt;/p&gt;
&lt;h2 id=&#34;mid-2023-generation-1-tool-calling-wakes-up&#34;&gt;Mid-2023: Generation 1, tool calling wakes up
&lt;/h2&gt;&lt;p&gt;The keyword for Generation 1 is tool calling.&lt;/p&gt;
&lt;p&gt;In June 2023, OpenAI released &lt;code&gt;function calling&lt;/code&gt;. Developers could describe function names, purposes, parameter types, and &lt;code&gt;JSON Schema&lt;/code&gt;. After understanding a user request, the model could output a structured JSON call instead of ordinary natural language, and an external system would execute it.&lt;/p&gt;
&lt;p&gt;The architectural significance was large: the model started moving from a brain that only talks to a brain that can drive external tools.&lt;/p&gt;
&lt;p&gt;Key capabilities included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;choosing tools based on user intent;&lt;/li&gt;
&lt;li&gt;outputting structured arguments;&lt;/li&gt;
&lt;li&gt;calling external APIs;&lt;/li&gt;
&lt;li&gt;feeding API results back into the model;&lt;/li&gt;
&lt;li&gt;using RAG to access external knowledge;&lt;/li&gt;
&lt;li&gt;forming early personas through plugins and knowledge bases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At the same time, &lt;code&gt;RAG&lt;/code&gt; and vector databases became popular. They addressed the model&amp;rsquo;s lack of fresh information, private enterprise materials, and internal knowledge. The system retrieved relevant document chunks, injected them into context, and let the model answer from those materials.&lt;/p&gt;
&lt;p&gt;The basic Agent structure became:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;who you are: system prompt and persona;&lt;/li&gt;
&lt;li&gt;what you know: knowledge base, RAG, private documents;&lt;/li&gt;
&lt;li&gt;what you can do: function calling, plugins, external APIs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The most dramatic bubble of this generation was AutoGPT. It showed an attractive idea: the user gives a broad goal, and AI breaks it down, searches, writes files, evaluates, loops, and stops when it believes the work is done.&lt;/p&gt;
&lt;p&gt;But AutoGPT quickly exposed the problem. It lacked state constraints, stopping conditions, and reliable feedback. Tasks drifted, APIs were called with bad arguments again and again, and bills could be burned by huge numbers of model calls. The lesson was simple: tools plus an infinite loop do not make a production-grade Agent.&lt;/p&gt;
&lt;h2 id=&#34;late-2023-to-2024-generation-2-engineered-workflows&#34;&gt;Late 2023 to 2024: Generation 2, engineered workflows
&lt;/h2&gt;&lt;p&gt;AutoGPT&amp;rsquo;s failure taught the industry that models cannot simply be left to improvise. Complex tasks need structure.&lt;/p&gt;
&lt;p&gt;Generation 2 is about engineered workflows. An Agent became not just one model call, but a software system with state, control flow, and evaluation.&lt;/p&gt;
&lt;p&gt;Key capabilities included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;task planning: breaking large goals into steps;&lt;/li&gt;
&lt;li&gt;state management: tracking where work stands;&lt;/li&gt;
&lt;li&gt;reflection and revision: generating, reviewing, and improving;&lt;/li&gt;
&lt;li&gt;tool orchestration: switching between tools;&lt;/li&gt;
&lt;li&gt;human-in-the-loop: asking for confirmation at key points;&lt;/li&gt;
&lt;li&gt;multi-agent collaboration: dividing roles.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A typical pattern is &lt;code&gt;ReAct&lt;/code&gt;, or &lt;code&gt;Reasoning + Acting&lt;/code&gt;. The model reasons, calls a tool, observes the result, and then reasons again. The Agent no longer acts blindly; each step has auditable logic and feedback.&lt;/p&gt;
&lt;p&gt;Common &lt;code&gt;agentic workflow&lt;/code&gt; patterns emerged:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;reflection: generate, review, revise;&lt;/li&gt;
&lt;li&gt;tool use: choose search, databases, code execution, and enterprise APIs;&lt;/li&gt;
&lt;li&gt;planning: decompose goals and track state;&lt;/li&gt;
&lt;li&gt;multi-agent collaboration: product, developer, tester, reviewer roles.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The value of Generation 2 was putting model capability inside a controllable process. A well-designed workflow can sometimes make a smaller model produce more stable results than a single large-model call.&lt;/p&gt;
&lt;p&gt;This generation also produced the low-code Agent platform bubble. Many tools used drag-and-drop interfaces to combine prompts, RAG, plugins, and flows. They lowered the building barrier, but if a workflow can be copied cheaply, the platform itself has a weak moat.&lt;/p&gt;
&lt;p&gt;Low-code tools can capture early demand, but a demand window is not a defensible wall.&lt;/p&gt;
&lt;h2 id=&#34;2024-to-2025-generation-3-computer-use-reaches-real-interfaces&#34;&gt;2024 to 2025: Generation 3, Computer Use reaches real interfaces
&lt;/h2&gt;&lt;p&gt;The keyword for Generation 3 is &lt;code&gt;Computer Use&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Earlier tool calling relied mostly on APIs. What an Agent could do depended on what developers had connected. But many real-world apps do not have clean APIs, or their APIs are incomplete, closed, or inconsistent.&lt;/p&gt;
&lt;p&gt;Computer Use lets models look at screens, click, and operate GUIs. The general computer interface itself becomes a tool.&lt;/p&gt;
&lt;p&gt;Key capabilities included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;recognizing screen content;&lt;/li&gt;
&lt;li&gt;clicking buttons, typing text, switching windows;&lt;/li&gt;
&lt;li&gt;operating web and desktop software;&lt;/li&gt;
&lt;li&gt;reading repositories, editing files, running tests;&lt;/li&gt;
&lt;li&gt;inspecting terminal output and errors;&lt;/li&gt;
&lt;li&gt;behaving more like a real engineering assistant.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This pushed Agents from &amp;ldquo;using connected tools&amp;rdquo; toward &amp;ldquo;operating software like a person.&amp;rdquo; It also made coding agents closer to real workflows: read a project, change code, run tests, and continue from errors.&lt;/p&gt;
&lt;p&gt;But the trust boundary expanded. If AI operates a computer, it can click the wrong button, delete the wrong file, submit the wrong form, or be manipulated by webpage text, documents, and UI instructions. Prompt injection becomes a file-operation, permission, and system-safety problem.&lt;/p&gt;
&lt;p&gt;Vibe coding debates also concentrated in this stage. Fast AI-generated projects feel exciting, but without tests, evaluation, permissions, and deployment boundaries, fast prototypes can become fast incidents.&lt;/p&gt;
&lt;p&gt;Generation 3&amp;rsquo;s lesson: the closer an Agent gets to real operations, the more it needs sandboxing, approvals, rollback, and least privilege.&lt;/p&gt;
&lt;h2 id=&#34;2025-to-2026-generation-4-mcp-skills-and-persistent-digital-workers&#34;&gt;2025 to 2026: Generation 4, MCP, Skills, and persistent digital workers
&lt;/h2&gt;&lt;p&gt;Generation 4 is about persistence, connection, memory, and specialization.&lt;/p&gt;
&lt;p&gt;The focus is not only stronger single tasks. Agents start to have long-term context, tool networks, professional skills, and a sense of time. They become less like helpers in one chat and more like digital workers that can continue working.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;MCP&lt;/code&gt; addresses tool connection. It lets Agents connect to file systems, databases, browsers, design tools, project management tools, and enterprise systems in a more standardized way. Once the protocol stabilizes, many &amp;ldquo;tool-connection middle layer&amp;rdquo; products get compressed.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Skills&lt;/code&gt; address professional method. Tools tell an Agent what it can do; skills tell it how to do the work. A good skill is not just a prompt. It packages domain workflows, constraints, checks, common pitfalls, and tool-call order.&lt;/p&gt;
&lt;p&gt;Key capabilities included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;long-term memory: storing preferences, project rules, and history;&lt;/li&gt;
&lt;li&gt;project context: understanding repositories, docs, and work rules;&lt;/li&gt;
&lt;li&gt;tool networks: connecting through MCP, APIs, browsers, and file systems;&lt;/li&gt;
&lt;li&gt;professional skills: packaging task methods through Skills;&lt;/li&gt;
&lt;li&gt;persistent execution: waiting, waking, reminding, and following up;&lt;/li&gt;
&lt;li&gt;remote collaboration: users can return from different devices to approve and steer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This generation starts to feel like an employee:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;identity and responsibility boundaries;&lt;/li&gt;
&lt;li&gt;long-term context;&lt;/li&gt;
&lt;li&gt;professional work methods;&lt;/li&gt;
&lt;li&gt;time awareness;&lt;/li&gt;
&lt;li&gt;tool permissions;&lt;/li&gt;
&lt;li&gt;ability to continue work without being watched.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But the more it resembles an employee, the more its risk radius resembles an employee&amp;rsquo;s. Persistent execution, local data access, secrets, tool calls, and task handling move security from the edge to the center.&lt;/p&gt;
&lt;p&gt;One point matters especially: text is also an attack surface. If an Agent reads and follows Markdown, documentation, skill packs, or webpages, malicious text can change its behavior. Prompt injection becomes a supply-chain, permission, and execution-safety problem.&lt;/p&gt;
&lt;p&gt;Generation 4&amp;rsquo;s lesson: persistent Agents need governance, not just capability.&lt;/p&gt;
&lt;h2 id=&#34;after-2026-generation-5-preview-loops-internal-memory-and-world-models&#34;&gt;After 2026: Generation 5 preview, loops, internal memory, and world models
&lt;/h2&gt;&lt;p&gt;Generation 5 is not established history yet. It is an extrapolation from the previous four years.&lt;/p&gt;
&lt;p&gt;The first direction is more complete closed loops.&lt;/p&gt;
&lt;p&gt;A mature Agent needs at least three loops:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;execution loop: verify after each action, rollback, revise, and retry if needed;&lt;/li&gt;
&lt;li&gt;time loop: track long-term goals across multiple wake cycles;&lt;/li&gt;
&lt;li&gt;cognitive loop: know what is certain, what is guessed, and what is outdated.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The second direction is internal memory.&lt;/p&gt;
&lt;p&gt;Most memory so far is outside the model: RAG, vector stores, chat logs, local files, and &lt;code&gt;memory.md&lt;/code&gt;. If future model architectures support persistent state across sessions, Agent memory systems may be rebuilt.&lt;/p&gt;
&lt;p&gt;The third direction is world models.&lt;/p&gt;
&lt;p&gt;Many Agents today are still reactive: observe, respond, observe again. High-risk tasks require the model to simulate consequences. Before changing a database script, it should think about data loss, rollback failure, and compatibility issues, not learn only after an accident.&lt;/p&gt;
&lt;p&gt;The fourth direction is embodiment.&lt;/p&gt;
&lt;p&gt;Earlier generations mainly happened in digital space: APIs, screens, files, browsers, and enterprise tools. The next step may extend Agent action into the physical world, including robots, device control, industrial systems, and standardized physical interfaces.&lt;/p&gt;
&lt;p&gt;Generation 5 will need to solve not only how Agents execute tasks, but how they understand consequences, manage long-term state, and stay reliable inside a larger risk radius.&lt;/p&gt;
&lt;h2 id=&#34;six-patterns-behind-the-timeline&#34;&gt;Six patterns behind the timeline
&lt;/h2&gt;&lt;p&gt;First, base-model capability remains the ceiling. An Agent is not magic outside the model; it is a way to release model capability through engineering systems.&lt;/p&gt;
&lt;p&gt;Second, engineered architecture amplifies model capability. Planning, verification, reflection, revision, evaluation, and permission control are closer to deliverable work than one-shot generation.&lt;/p&gt;
&lt;p&gt;Third, open protocols reshape value distribution. Once MCP, Skills, and project-context standards stabilize, competition shifts from &amp;ldquo;who connected the tool first&amp;rdquo; to &amp;ldquo;who accumulated real domain capability.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Fourth, the hidden main line of Agent evolution is expanding human-machine trust. From trusting text, to API calls, to workflows, to computer operations, to persistent execution, each generation pushes the risk radius outward.&lt;/p&gt;
&lt;p&gt;Fifth, every generation&amp;rsquo;s accidents become the next generation&amp;rsquo;s rules. AutoGPT&amp;rsquo;s loops pushed structured orchestration; vibe coding failures pushed evaluation-driven development; production deletions pushed least privilege and sandboxing; skill poisoning pushed supply-chain safety.&lt;/p&gt;
&lt;p&gt;Sixth, the Agent ecosystem repeatedly booms and collapses. New capabilities create temporary middle layers, and model or platform internalization later removes them. Mistaking a time window for a moat is dangerous.&lt;/p&gt;
&lt;h2 id=&#34;the-real-moat&#34;&gt;The real moat
&lt;/h2&gt;&lt;p&gt;The real moat in AI Agents is not packaging a new capability first.&lt;/p&gt;
&lt;p&gt;More reliable moats include three things.&lt;/p&gt;
&lt;p&gt;First, vertical depth. Do you truly understand an industry&amp;rsquo;s workflow, risks, exceptions, and responsibility boundaries? General models can learn concepts, but they may not replace hard-earned domain execution experience.&lt;/p&gt;
&lt;p&gt;Second, a data flywheel. Can you collect high-quality feedback from real usage and improve workflows, evaluation, fine-tuning, and product decisions?&lt;/p&gt;
&lt;p&gt;Third, user trust. Will users hand you higher-value, longer-running, riskier work, or only treat you as a one-off tool?&lt;/p&gt;
&lt;p&gt;If a platform or base model absorbs a capability, the products that still retain process, feedback, responsibility boundaries, and trust are more likely to survive. Many others are temporary bubbles.&lt;/p&gt;
&lt;h2 id=&#34;final-note&#34;&gt;Final note
&lt;/h2&gt;&lt;p&gt;From 2022 to 2026, AI Agent evolution was not &amp;ldquo;models getting better at chatting.&amp;rdquo; It was &amp;ldquo;humans becoming willing to hand more work to AI.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;A mature Agent is not the system most eager to execute automatically. It is the system that knows when to execute, when to verify, when to pause, and when to ask a human.&lt;/p&gt;
&lt;p&gt;To judge whether an Agent product has long-term value, ask one question: when the next model or platform builds this capability in, what remains?&lt;/p&gt;
&lt;p&gt;If the answer is domain workflow, real data, verifiable results, and user trust, there may be long-term value.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Gemini 3.5 Pro Leaks: Google Wants Spark Agent to Win Back the AI Coding Entry Point</title>
        <link>https://knightli.com/en/2026/05/15/gemini-35-pro-spark-agent-ai-coding-race/</link>
        <pubDate>Fri, 15 May 2026 23:45:34 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/15/gemini-35-pro-spark-agent-ai-coding-race/</guid>
        <description>&lt;p&gt;Gemini 3.5 Pro has not been officially released yet, but leaks around it are already heating up.&lt;/p&gt;
&lt;p&gt;The current round of information revolves around several keywords: Gemini 3.5 Pro, the codename Cappuccino, Gemini Spark, AI coding, and MCP tool integration. Together, they point in one direction: Google is not just preparing another chat model update. It wants to reconnect models, tools, Agents, and Google ecosystem entry points.&lt;/p&gt;
&lt;p&gt;Before an official release, all of this should still be treated as leaked information. The more important signal is not one screenshot or one benchmark claim, but the gaps Google may be trying to close next.&lt;/p&gt;
&lt;h2 id=&#34;why-gemini-35-pro-matters&#34;&gt;Why Gemini 3.5 Pro Matters
&lt;/h2&gt;&lt;p&gt;Based on the exposed information, Gemini 3.5 Pro may be a jump in naming.&lt;/p&gt;
&lt;p&gt;People were still discussing Gemini 3.2 earlier, and then Gemini 3.5 Pro appeared in leaks. If the naming is real, Google likely wants to tell a bigger version story in the next release rather than ship a routine minor update.&lt;/p&gt;
&lt;p&gt;The leaked highlights mainly fall into three areas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;continued improvements in coding and reasoning;&lt;/li&gt;
&lt;li&gt;stronger SVG, interactive page, animation, and 3D generation;&lt;/li&gt;
&lt;li&gt;a new Agent product, Gemini Spark, potentially moving to the front stage.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;None of these directions is surprising. Gemini has long emphasized multimodality, and Google has very strong distribution channels. The real question is whether it can catch up with OpenAI and Anthropic in developer tools and Agent workflows.&lt;/p&gt;
&lt;h2 id=&#34;coding-is-the-lesson-google-most-needs-to-catch-up-on&#34;&gt;Coding Is The Lesson Google Most Needs To Catch Up On
&lt;/h2&gt;&lt;p&gt;In 2026, coding is no longer just a model benchmark item. It has become one of the most direct product entry points.&lt;/p&gt;
&lt;p&gt;The reason is simple: AI coding tools are used frequently and generate a large amount of feedback data. Developers ask models to read code, modify code, run tests, and fix bugs every day. These interactions naturally push the next generation of models and tooling forward.&lt;/p&gt;
&lt;p&gt;Over the past year, Claude Code has gained strong mindshare among developers, while OpenAI has kept strengthening the connection between Codex and ChatGPT. Google has products such as Antigravity, but its external presence has not been as strong.&lt;/p&gt;
&lt;p&gt;That is why Gemini 3.5 Pro is being watched closely. If it only becomes better at chatting or answering faster, the impact is limited. If it truly improves code understanding, cross-file editing, tool calling, and long-running task execution, it may change developer workflows.&lt;/p&gt;
&lt;h2 id=&#34;gemini-spark-may-be-the-bigger-variable&#34;&gt;Gemini Spark May Be The Bigger Variable
&lt;/h2&gt;&lt;p&gt;More aggressive than the model itself is the rumored Gemini Spark.&lt;/p&gt;
&lt;p&gt;According to the leaks, Spark is not positioned as a normal chat assistant, but as an always-on AI Agent. It may connect to email, calendars, web pages, tasks, account state, and personal context to help users handle multi-step workflows.&lt;/p&gt;
&lt;p&gt;This kind of product has a large imagination space. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;automatically organizing an inbox;&lt;/li&gt;
&lt;li&gt;following up on tasks for the user;&lt;/li&gt;
&lt;li&gt;performing actions on web pages;&lt;/li&gt;
&lt;li&gt;handling cross-application workflows;&lt;/li&gt;
&lt;li&gt;arranging daily matters based on personal preferences.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But the risks are just as obvious. If an always-on Agent can access login state, browser data, files, location, and third-party services, it must answer several questions: when must the user confirm an action? Which operations must be blocked from automation? Will data be shared with third parties? How are remote browsers and credentials isolated?&lt;/p&gt;
&lt;p&gt;So the real question for Spark is not just whether it can get work done. It is whether Google can make permissions, auditing, confirmation flows, and user control clear enough.&lt;/p&gt;
&lt;h2 id=&#34;what-mcp-tool-integration-suggests&#34;&gt;What MCP Tool Integration Suggests
&lt;/h2&gt;&lt;p&gt;The leaks also mention that the new Gemini selector may include MCP-related models or testing entries.&lt;/p&gt;
&lt;p&gt;If this ships, it suggests Google is also pushing models from a question-answering system toward a tool operating system. The model will no longer only generate text. It will need to call external tools, access business systems, read and write files, run commands, and maintain task state across multiple steps.&lt;/p&gt;
&lt;p&gt;This direction is consistent with OpenAI and Anthropic. Whoever makes tool calling more reliable will have an easier time embedding AI into real workflows.&lt;/p&gt;
&lt;p&gt;But MCP integration itself is not the finish line. The hard part is stability:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;can the model choose the right tool;&lt;/li&gt;
&lt;li&gt;are the parameters reliable;&lt;/li&gt;
&lt;li&gt;can it recover after failure;&lt;/li&gt;
&lt;li&gt;are permission boundaries clear;&lt;/li&gt;
&lt;li&gt;can users trace every step.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If these questions are not solved, more tools also mean a larger surface for mistakes.&lt;/p&gt;
&lt;h2 id=&#34;multimodality-is-still-googles-strong-card&#34;&gt;Multimodality Is Still Google&amp;rsquo;s Strong Card
&lt;/h2&gt;&lt;p&gt;The place where Google has the best chance to differentiate is still multimodality.&lt;/p&gt;
&lt;p&gt;Based on exposed SVG, interactive page, animation, and visual generation examples, Gemini may continue to strengthen its ability to generate interactive content from prompts. Compared with simply writing a piece of code, this is closer to product prototyping: the user describes an idea, and the model directly produces an operable, adjustable, previewable interface.&lt;/p&gt;
&lt;p&gt;This path fits Google well. It can build on Gemini&amp;rsquo;s multimodal strengths and also connect with Android, Chrome, Workspace, Search, Ads, and Cloud.&lt;/p&gt;
&lt;p&gt;If Google wants to avoid competing only on &amp;ldquo;whose coding model is stronger&amp;rdquo;, it may put more emphasis on a more complete multimodal Agent system.&lt;/p&gt;
&lt;h2 id=&#34;the-three-companies-are-splitting-into-different-playbooks&#34;&gt;The Three Companies Are Splitting Into Different Playbooks
&lt;/h2&gt;&lt;p&gt;The current model race is no longer just a leaderboard race.&lt;/p&gt;
&lt;p&gt;OpenAI&amp;rsquo;s advantage lies in product iteration and distribution speed. Codex, ChatGPT, enterprise tools, and APIs are becoming more tightly connected.&lt;/p&gt;
&lt;p&gt;Anthropic&amp;rsquo;s advantage lies in developer mindshare and code model quality. Claude Code has already become the default AI coding entry point for many people.&lt;/p&gt;
&lt;p&gt;Google&amp;rsquo;s advantage is ecosystem access. Gmail, Docs, Chrome, Android, Search, YouTube, Maps, and Cloud services form a huge personal and enterprise data network. If Agents can safely connect to these entry points, Google may move from a &amp;ldquo;model chaser&amp;rdquo; to a &amp;ldquo;workflow entry point controller&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;That is why Gemini Spark is worth watching. It does not necessarily need to rank first on every benchmark. If it enters daily workflows, it may still build its own moat.&lt;/p&gt;
&lt;h2 id=&#34;how-regular-users-should-read-this&#34;&gt;How Regular Users Should Read This
&lt;/h2&gt;&lt;p&gt;For regular users, there is no need to be pulled around by every leak in the short term.&lt;/p&gt;
&lt;p&gt;The more practical things to watch are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Whether Gemini 3.5 Pro&amp;rsquo;s coding ability truly improves, especially in complex repositories, long context, and tool calling.&lt;/li&gt;
&lt;li&gt;Whether Gemini Spark is safe by default, with clear confirmation and traceable records before sensitive operations.&lt;/li&gt;
&lt;li&gt;Whether Google gives clear pricing, quotas, and enterprise permission management, rather than only showing demos.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Pretty screenshots alone do not mean much. Whether it can reliably enter real workflows is the dividing line for this round of AI Agent products.&lt;/p&gt;
&lt;h2 id=&#34;what-it-means-for-developers&#34;&gt;What It Means For Developers
&lt;/h2&gt;&lt;p&gt;Developers should care less about &amp;ldquo;which model won&amp;rdquo; and more about whether their workflow is portable.&lt;/p&gt;
&lt;p&gt;Claude Code, Codex, Gemini, Antigravity, Cursor, Windsurf, and many other tools are all competing for the entry point. If every process is locked into one platform, future changes in cost, quota, model policy, or permission rules will make migration painful.&lt;/p&gt;
&lt;p&gt;A safer approach is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;keep standard Git workflows for important projects;&lt;/li&gt;
&lt;li&gt;always inspect diffs after automated edits;&lt;/li&gt;
&lt;li&gt;use tests and CI as backstops for key tasks;&lt;/li&gt;
&lt;li&gt;do not hand production credentials to opaque Agents;&lt;/li&gt;
&lt;li&gt;when open protocols can connect tools, prefer replaceable options.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Models will keep getting stronger, but engineering discipline will not become obsolete.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;The Gemini 3.5 Pro leaks suggest that Google is accelerating its effort to catch up in AI coding and Agent entry points. Model improvements are only one part of the story; always-on Agents such as Gemini Spark may be the larger strategic move.&lt;/p&gt;
&lt;p&gt;But the more a system can &amp;ldquo;do things automatically&amp;rdquo; for users, the more it needs strict permission boundaries and verifiable workflows. For Google, the real challenge is not only catching up with GPT-5.5 or Claude. It is combining strong models, safety mechanisms, and ecosystem entry points into a trustworthy daily workflow.&lt;/p&gt;
&lt;p&gt;If Google pulls that off, Gemini may not need to top every leaderboard to regain some initiative in AI entry points.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>OpenHuman Quick Read: The Desktop Route for an Open-Source Personal AI Agent</title>
        <link>https://knightli.com/en/2026/05/15/openhuman-open-source-personal-ai-agent/</link>
        <pubDate>Fri, 15 May 2026 14:52:31 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/15/openhuman-open-source-personal-ai-agent/</guid>
        <description>&lt;p&gt;OpenHuman is an open-source personal AI Agent project from tinyhumansai. Its goal is not to build yet another chat window, but to place a desktop app, personal memory, third-party integrations, voice, coding tools, and a local knowledge base into the same agent harness, so AI can understand your daily work context faster.&lt;/p&gt;
&lt;p&gt;The project README positions it as &amp;ldquo;Personal AI super intelligence,&amp;rdquo; and the official site emphasizes private, simple, and extremely powerful. That claim is ambitious, but it is more useful to break it down: the part of OpenHuman that deserves attention is its attempt to make &amp;ldquo;personal context&amp;rdquo; the product core, instead of leaving model calls, plugin configuration, and document retrieval for users to assemble themselves.&lt;/p&gt;
&lt;p&gt;At the time this article was checked, the GitHub repository had about 7.8k stars and 629 forks. The latest release was &lt;code&gt;OpenHuman v0.53.43&lt;/code&gt;, dated May 13, 2026. The project is still in Early Beta, and the README clearly warns that it is under active development, so rough edges should be expected.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-is-it-trying-to-solve&#34;&gt;What Problem Is It Trying to Solve?
&lt;/h2&gt;&lt;p&gt;The problem with many AI assistants is not that the model is too weak, but that the context is too cold. Every time, you have to explain the project background, recent emails, calendar, code repositories, documents, tasks, and preferences again. Once you move across Gmail, Notion, GitHub, Slack, Calendar, Drive, Linear, Jira, and similar systems, the information is scattered across different tools.&lt;/p&gt;
&lt;p&gt;OpenHuman&amp;rsquo;s approach is to connect those data sources first, then use automatic fetching, compression, summarization, and a local knowledge base to build a personal memory layer that can keep updating. The agent then remembers more than the current conversation; it can form long-term context around your workflow.&lt;/p&gt;
&lt;p&gt;This is also the biggest difference between it and a normal chatbot. Chatbots often work around prompts; OpenHuman is closer to a desktop personal operating-system entry point, trying to prepackage connectors, memory, tools, and model routing.&lt;/p&gt;
&lt;h2 id=&#34;main-capabilities&#34;&gt;Main Capabilities
&lt;/h2&gt;&lt;p&gt;Core capabilities listed in the OpenHuman README include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A desktop-first UI and a short onboarding path, without requiring users to start from terminal configuration.&lt;/li&gt;
&lt;li&gt;A desktop mascot with a &amp;ldquo;face&amp;rdquo; that can speak, respond to the environment, and participate in Google Meet.&lt;/li&gt;
&lt;li&gt;118+ third-party integrations covering Gmail, Notion, GitHub, Slack, Stripe, Calendar, Drive, Linear, Jira, and other tools.&lt;/li&gt;
&lt;li&gt;An automatic fetching mechanism: the project description mentions traversing active connections every 20 minutes and pulling new data into the memory tree.&lt;/li&gt;
&lt;li&gt;Memory Tree: compresses connected data and activity information into Markdown blocks and stores them in local SQLite.&lt;/li&gt;
&lt;li&gt;Obsidian-compatible vault: writes knowledge blocks as &lt;code&gt;.md&lt;/code&gt; files so users can open, browse, and edit them with Obsidian.&lt;/li&gt;
&lt;li&gt;Built-in search, web scraping, coding tools, file system access, git, lint, test, grep, voice input and output, and other capabilities.&lt;/li&gt;
&lt;li&gt;Model routing: routes requests to different model types according to the task.&lt;/li&gt;
&lt;li&gt;TokenJuice: compresses token usage before tool results, web pages, email bodies, and search results enter the LLM.&lt;/li&gt;
&lt;li&gt;Optional Ollama support for local AI workloads.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities sound broad, but the real focus can be reduced to two points: reducing configuration and plugin assembly, and turning your personal data into memory that an agent can search, compress, and continuously update.&lt;/p&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation
&lt;/h2&gt;&lt;p&gt;The project provides a website download entry point and terminal installation commands.&lt;/p&gt;
&lt;p&gt;macOS or Linux x64:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -fsSL https://raw.githubusercontent.com/tinyhumansai/openhuman/main/scripts/install.sh &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; bash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Windows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-powershell&#34; data-lang=&#34;powershell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;irm &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;https&lt;/span&gt;&lt;span class=&#34;err&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;//&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;raw&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;py&#34;&gt;githubusercontent&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;com&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tinyhumansai&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;openhuman&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;main&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;scripts&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;install&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;py&#34;&gt;ps1&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;iex
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If this is your daily primary machine, it is better to download the installer from the official site first, or at least open and inspect the install script before deciding whether to execute a remote script directly. OpenHuman touches email, documents, code repositories, calendars, and local file permissions, so installation and authorization deserve more caution than a small ordinary utility.&lt;/p&gt;
&lt;h2 id=&#34;open-source-and-technical-stack&#34;&gt;Open Source and Technical Stack
&lt;/h2&gt;&lt;p&gt;The OpenHuman repository uses the GPL-3.0 license. The language breakdown shows Rust as the main language, followed by TypeScript, with JavaScript, Shell, CSS, and PowerShell also present. The README&amp;rsquo;s contribution notes require Node.js 24+, pnpm 10.10.0, Rust 1.93.0, CMake, and platform-specific desktop build dependencies.&lt;/p&gt;
&lt;p&gt;The rough local development path is:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git submodule update --init --recursive
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pnpm install
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pnpm dev
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pnpm --filter openhuman-app dev:app
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Before submitting changes, focused checks are recommended, for example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pnpm typecheck
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pnpm format:check
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cargo check -p openhuman --lib
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Judging from the repository structure, this is not a lightweight script project. It is a full product-style repository containing a desktop app, frontend, Rust backend, docs, tests, examples, and build scripts.&lt;/p&gt;
&lt;h2 id=&#34;why-memory-tree-and-the-obsidian-vault-matter&#34;&gt;Why Memory Tree and the Obsidian Vault Matter
&lt;/h2&gt;&lt;p&gt;The concept most worth examining in OpenHuman is Memory Tree. The README says it standardizes connected data into Markdown chunks of up to about 3k tokens, scores them, folds them into a hierarchical summary tree, and stores them in local SQLite. The same content also enters an Obsidian-compatible vault.&lt;/p&gt;
&lt;p&gt;This route has several advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Users can directly see the agent&amp;rsquo;s knowledge base instead of only trusting black-box memory.&lt;/li&gt;
&lt;li&gt;Markdown files are convenient for search, backup, version control, and manual revision.&lt;/li&gt;
&lt;li&gt;SQLite is suitable for local indexing and fast queries.&lt;/li&gt;
&lt;li&gt;Hierarchical summaries are better suited to long-term context compression than a flat pile of documents.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But it also has practical challenges: whether data sync is stable, whether summaries drop key details, whether permission boundaries are clear enough, whether deletion and undo are complete, and whether different connectors&amp;rsquo; semantics can be handled consistently. These are not solved by one README phrase like &amp;ldquo;remembers everything&amp;rdquo;; they require long-term use and auditing.&lt;/p&gt;
&lt;h2 id=&#34;tokenjuice-a-middle-layer-for-cost-and-latency&#34;&gt;TokenJuice: A Middle Layer for Cost and Latency
&lt;/h2&gt;&lt;p&gt;OpenHuman also emphasizes TokenJuice. Its role is to compress web pages, emails, search results, and tool-call results before they enter the model. Examples include converting HTML to Markdown, shortening long URLs, and removing some unnecessary characters. The README claims this can reduce cost and latency, with up to 80% lower token usage.&lt;/p&gt;
&lt;p&gt;The direction is reasonable. In agent systems, the truly expensive part is often not one chat turn, but background fetching, tool calls, search, web parsing, and long-context injection. Cleaning data before handing it to the model is usually steadier than directly stuffing raw content into context.&lt;/p&gt;
&lt;p&gt;However, a compression layer also creates new questions: it decides which information is kept and which is discarded. If you use it for contracts, bills, medical records, compliance material, or production incident logs, you cannot look only at token savings. You also need traceability, original-text review, and compression-error control.&lt;/p&gt;
&lt;h2 id=&#34;privacy-a-selling-point-and-an-audit-focus&#34;&gt;Privacy: A Selling Point and an Audit Focus
&lt;/h2&gt;&lt;p&gt;One of OpenHuman&amp;rsquo;s selling points is privacy. The official site mentions that local AI models can handle low-level tasks, and the README emphasizes that workflow data stays on device, is encrypted locally, and is treated as yours.&lt;/p&gt;
&lt;p&gt;This design direction is attractive because once a personal AI Agent connects to Gmail, Drive, Calendar, Slack, and GitHub, it touches the most sensitive work data. Compared with a fully cloud-based assistant, a local-first memory layer and a visible Markdown vault at least give users more sense of control.&lt;/p&gt;
&lt;p&gt;But the full picture matters: OpenHuman also mentions one subscription, 30+ providers, model routing, ElevenLabs TTS, OAuth integrations, and other capabilities. That means it is not a purely offline tool. To evaluate privacy seriously, you need to check what each connector, each kind of model call, and each voice or search capability sends, and where it sends it.&lt;/p&gt;
&lt;h2 id=&#34;who-should-pay-attention&#34;&gt;Who Should Pay Attention?
&lt;/h2&gt;&lt;p&gt;OpenHuman is currently more suitable for three groups:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Users who want a personal AI control desk rather than a single-purpose chatbot.&lt;/li&gt;
&lt;li&gt;Developers willing to try an Early Beta and accept changing features and rough edges.&lt;/li&gt;
&lt;li&gt;People interested in local memory, Obsidian workflows, agent connectors, and context compression.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you only want a stable, lightweight offline assistant with very simple privacy boundaries, it may be too heavy right now. If you want to study how the next generation of personal AI Agents might integrate desktop apps, connectors, memory, and tools, OpenHuman is an open-source sample worth tracking.&lt;/p&gt;
&lt;p&gt;My suggestion is to first treat it as a &amp;ldquo;product-style open-source experiment&amp;rdquo;: watch release cadence, issue quality, connector permissions, data export capability, deletion mechanisms, and readability of the local vault. The key question for personal AI is not only whether it can answer questions, but whether it can carry your context for the long term in a transparent and controllable way.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/tinyhumansai/openhuman&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tinyhumansai/openhuman&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://tinyhumans.ai/openhuman&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OpenHuman official site&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://tinyhumans.gitbook.io/openhuman-docs&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OpenHuman Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>What is Token Efficiency? DeepSeek V4, big-model planning, and small-model execution</title>
        <link>https://knightli.com/en/2026/05/15/token-efficiency-agent-orchestration/</link>
        <pubDate>Fri, 15 May 2026 08:59:33 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/15/token-efficiency-agent-orchestration/</guid>
        <description>&lt;p&gt;The next important metric for AI coding may not be who has the strongest model, but who can complete more verifiable work with fewer tokens, lower cost, and a more stable process.&lt;/p&gt;
&lt;p&gt;That is the value of Token Efficiency.&lt;/p&gt;
&lt;p&gt;Many people hear Token Efficiency and think only about cheaper models, longer context, or cheaper cache hits. Those are base conditions. Real productivity comes from model division of labor, task orchestration, context budgeting, and evaluation.&lt;/p&gt;
&lt;p&gt;In other words, Token Efficiency is not a cost-saving trick. It is an engineering method for turning tokens into output.&lt;/p&gt;
&lt;h2 id=&#34;deepseek-v4-productizing-the-split-between-planner-and-executor&#34;&gt;DeepSeek V4: productizing the split between planner and executor
&lt;/h2&gt;&lt;p&gt;The missing background in this topic is the positioning of DeepSeek V4.&lt;/p&gt;
&lt;p&gt;DeepSeek V4 is not just another stronger model. It splits the two capabilities needed for Token Efficiency into &lt;code&gt;V4 Pro&lt;/code&gt; and &lt;code&gt;V4 Flash&lt;/code&gt;: &lt;code&gt;V4 Pro&lt;/code&gt; is better suited for planning, reasoning, architecture judgment, and critical review, while &lt;code&gt;V4 Flash&lt;/code&gt; fits high-frequency execution, batch rewriting, code completion, data organization, and ordinary agent-loop nodes.&lt;/p&gt;
&lt;p&gt;That maps directly to two roles in AI coding:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;V4 Pro&lt;/code&gt;: planner / consultant for requirement breakdown, technical design, complex bug analysis, architecture review, and final acceptance.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;V4 Flash&lt;/code&gt;: executor for file scanning, simple implementation, test completion, documentation, candidate generation, and repetitive work.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;DeepSeek&amp;rsquo;s API documentation shows that both &lt;code&gt;V4 Flash&lt;/code&gt; and &lt;code&gt;V4 Pro&lt;/code&gt; support &lt;code&gt;1M&lt;/code&gt; context, JSON Output, Tool Calls, Chat Prefix Completion, and FIM Completion. The pricing page also prices cache-hit input separately and notes that input cache-hit prices have been reduced to one tenth of the launch price.&lt;/p&gt;
&lt;p&gt;Together, these are why it matters for Token Efficiency: &lt;code&gt;1M&lt;/code&gt; context reduces compression in complex agent tasks; low cache-hit pricing lowers the cost of repeatedly loading prompts, project docs, code, and history; the &lt;code&gt;Flash / Pro&lt;/code&gt; split solves the problem of using a flagship model for every step or an unstable small model for every step.&lt;/p&gt;
&lt;p&gt;DeepSeek V4 should therefore be understood in three ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Cheap execution layer&lt;/strong&gt;: many agent nodes can run on &lt;code&gt;V4 Flash&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Usable judgment layer&lt;/strong&gt;: key steps can still call &lt;code&gt;V4 Pro&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Long-chain friendly&lt;/strong&gt;: &lt;code&gt;1M&lt;/code&gt; context and cache pricing make codebases, docs, and tool history easier to keep in the usable window.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Its significance for AI coding is not just another model option. It offers a realistic cost structure for the &amp;ldquo;consultant model + executor model + harness orchestration&amp;rdquo; pattern.&lt;/p&gt;
&lt;h2 id=&#34;do-not-let-the-strongest-model-do-everything&#34;&gt;Do not let the strongest model do everything
&lt;/h2&gt;&lt;p&gt;The old approach was to pick the smartest model and let it handle requirement analysis, code, tests, and summaries end to end.&lt;/p&gt;
&lt;p&gt;That is simple but not always efficient. Many tasks do not need frontier reasoning. Expensive models should behave more like consultants, architects, or planners that appear only at key decision points.&lt;/p&gt;
&lt;p&gt;A better structure is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Big models break down problems and make key decisions.&lt;/li&gt;
&lt;li&gt;Small models execute, batch-process, and repeat edits.&lt;/li&gt;
&lt;li&gt;Tools and harnesses manage process, state, context, and validation.&lt;/li&gt;
&lt;li&gt;Humans define product goals, accept results, and make tradeoffs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This prevents frontier reasoning from being wasted on mechanical execution.&lt;/p&gt;
&lt;h2 id=&#34;context-is-not-always-better-when-larger&#34;&gt;Context is not always better when larger
&lt;/h2&gt;&lt;p&gt;Long context matters for coding agents because code, docs, chat history, test output, and logs all consume the window. When the window fills up, compression, forgetting, and misjudgment appear.&lt;/p&gt;
&lt;p&gt;But long context does not mean dumping everything into the model.&lt;/p&gt;
&lt;p&gt;Token Efficiency means each task should fit inside a clear, controlled context window:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Bring only necessary files.&lt;/li&gt;
&lt;li&gt;Include only decision-relevant documents.&lt;/li&gt;
&lt;li&gt;Keep only the current state from history.&lt;/li&gt;
&lt;li&gt;Give each node clear input and output.&lt;/li&gt;
&lt;li&gt;Compress completed work into structured summaries for the next node.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cheap context can tempt people to include noise. Noise does not make a model smarter.&lt;/p&gt;
&lt;h2 id=&#34;harness-matters-more-than-a-single-model&#34;&gt;Harness matters more than a single model
&lt;/h2&gt;&lt;p&gt;Connecting Claude Code, Codex, or another coding agent to a cheap model is not enough. Small models drift in long-chain tasks unless a stronger process controls them.&lt;/p&gt;
&lt;p&gt;A harness is a scheduling system. It decides how to split tasks, run nodes, choose models, validate results, retry failures, and pass context.&lt;/p&gt;
&lt;p&gt;A useful orchestration system should answer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which tasks need planning?&lt;/li&gt;
&lt;li&gt;Which tasks can execute directly?&lt;/li&gt;
&lt;li&gt;Which nodes can run in parallel?&lt;/li&gt;
&lt;li&gt;Which nodes must be serial?&lt;/li&gt;
&lt;li&gt;Which nodes use big models or small models?&lt;/li&gt;
&lt;li&gt;What is the context budget for each node?&lt;/li&gt;
&lt;li&gt;What structured output does each node produce?&lt;/li&gt;
&lt;li&gt;Who reviews and decides whether to continue?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Without this software layer, small models are merely cheap. With it, they can become leverage.&lt;/p&gt;
&lt;h2 id=&#34;split-tasks-with-dags&#34;&gt;Split tasks with DAGs
&lt;/h2&gt;&lt;p&gt;A good approach is to split complex work into a directed acyclic graph.&lt;/p&gt;
&lt;p&gt;A feature task might become:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Requirement clarification&lt;/li&gt;
&lt;li&gt;Technical design&lt;/li&gt;
&lt;li&gt;Task decomposition&lt;/li&gt;
&lt;li&gt;Implementation&lt;/li&gt;
&lt;li&gt;Test completion&lt;/li&gt;
&lt;li&gt;Code Review&lt;/li&gt;
&lt;li&gt;Fixes&lt;/li&gt;
&lt;li&gt;PR submission&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Each node can be an independent agent with its own role, prompt, tools, permissions, and output format. Nodes should pass structured results, not long chat transcripts.&lt;/p&gt;
&lt;p&gt;This makes each node shorter, easier for small models, and easier to measure.&lt;/p&gt;
&lt;h2 id=&#34;run-multiple-task-replicas&#34;&gt;Run multiple task replicas
&lt;/h2&gt;&lt;p&gt;When tokens are cheap enough, the same task does not have to run only once.&lt;/p&gt;
&lt;p&gt;You can run the same task with different models, prompts, or orchestrations, then pick the best result or merge useful parts. This is suitable for design proposals, copy, test cases, bug hypotheses, refactor options, and code review.&lt;/p&gt;
&lt;p&gt;It is not suitable for tasks with external side effects, shared mutable state, or unclear acceptance criteria.&lt;/p&gt;
&lt;p&gt;The goal is not gambling. It is collecting comparable samples that can improve orchestration, model selection, and node skills.&lt;/p&gt;
&lt;h2 id=&#34;build-an-evaluation-system&#34;&gt;Build an evaluation system
&lt;/h2&gt;&lt;p&gt;Token Efficiency cannot be judged only by price. A cheap model with a high failure rate can consume more human time and become more expensive.&lt;/p&gt;
&lt;p&gt;Start recording:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Completion rate&lt;/li&gt;
&lt;li&gt;Human interventions&lt;/li&gt;
&lt;li&gt;Tool-call failure rate&lt;/li&gt;
&lt;li&gt;Test pass rate&lt;/li&gt;
&lt;li&gt;Review findings&lt;/li&gt;
&lt;li&gt;Token cost per task&lt;/li&gt;
&lt;li&gt;Time per task&lt;/li&gt;
&lt;li&gt;Rework count&lt;/li&gt;
&lt;li&gt;Differences between model combinations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With this data, you can decide which tasks fit small models, which require big models, and which should stay human-led.&lt;/p&gt;
&lt;h2 id=&#34;make-business-workflows-atomic&#34;&gt;Make business workflows atomic
&lt;/h2&gt;&lt;p&gt;Most users do not need to build a full harness today. But they can start decomposing their business workflow into atomic nodes.&lt;/p&gt;
&lt;p&gt;Content production can become topic selection, research, outline, draft, fact check, style rewrite, SEO title, translation, and publishing check.&lt;/p&gt;
&lt;p&gt;Software development can become requirement confirmation, technical design, data structure, API change, unit tests, implementation, migration script, documentation, and review.&lt;/p&gt;
&lt;p&gt;Each node should have clear input, output, acceptance, and context limits. When harness tools mature, these workflows can plug in directly.&lt;/p&gt;
&lt;h2 id=&#34;hardware-is-not-the-first-priority&#34;&gt;Hardware is not the first priority
&lt;/h2&gt;&lt;p&gt;Many discussions of Token Efficiency jump to local deployment and GPUs. For most people, API should still be the first choice.&lt;/p&gt;
&lt;p&gt;Before the economic model works, local hardware is only prepaid cost. A safer sequence is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Use API to validate the workflow.&lt;/li&gt;
&lt;li&gt;Record task evaluation and cost.&lt;/li&gt;
&lt;li&gt;Find stable high-frequency execution nodes.&lt;/li&gt;
&lt;li&gt;Consider which nodes should be localized.&lt;/li&gt;
&lt;li&gt;Then calculate hardware, power, maintenance, and depreciation.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For personal productivity, API is often enough. For startups exploring inference frameworks and model boundaries, local CUDA platforms can be useful. For production workloads with clear unit economics, multi-GPU deployment becomes worth discussing.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;Token Efficiency is not replacing expensive models with cheap ones. It is redesigning the AI workflow.&lt;/p&gt;
&lt;p&gt;Big models make key judgments, small models execute in bulk, the harness schedules and validates, and humans define goals and acceptance. Only when these layers work together can tokens reliably become productivity.&lt;/p&gt;
&lt;p&gt;Models will get cheaper, context windows will grow, and small models will improve. The future gap may not be who calls the strongest model, but who can use the same tokens to produce more real output.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Superpowers: a skills framework that pulls coding agents back into engineering process</title>
        <link>https://knightli.com/en/2026/05/15/obra-superpowers-agentic-skills-framework/</link>
        <pubDate>Fri, 15 May 2026 08:53:17 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/15/obra-superpowers-agentic-skills-framework/</guid>
        <description>&lt;p&gt;&lt;code&gt;obra/superpowers&lt;/code&gt; is both a skills framework for coding agents and a software development methodology. Its goal is not to add another universal prompt, but to make agents follow a process: clarify goals, produce a design, write a plan, implement through TDD, then review and finish.&lt;/p&gt;
&lt;p&gt;Project: &lt;a class=&#34;link&#34; href=&#34;https://github.com/obra/superpowers&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/obra/superpowers&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;At the time of writing, the GitHub API shows more than 190,000 stars, an MIT license, and recent activity. The README describes it plainly: &lt;code&gt;An agentic skills framework &amp;amp; software development methodology that works.&lt;/code&gt;&lt;/p&gt;
&lt;h2 id=&#34;what-problem-it-solves&#34;&gt;What problem it solves
&lt;/h2&gt;&lt;p&gt;Many AI coding tools are not weak at writing code; they are too eager to write code.&lt;/p&gt;
&lt;p&gt;A user says something vague, the agent edits files, and the result looks finished while boundaries, tests, and architecture remain unclear. Small tasks may survive this. Complex projects turn it into rework and technical debt.&lt;/p&gt;
&lt;p&gt;Superpowers makes the agent enter a workflow before touching code:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When the user wants to build something, ask about the goal first.&lt;/li&gt;
&lt;li&gt;Turn the conversation into a spec and confirm it in sections.&lt;/li&gt;
&lt;li&gt;After design approval, write an implementation plan.&lt;/li&gt;
&lt;li&gt;After the user says &amp;ldquo;go&amp;rdquo;, begin implementation.&lt;/li&gt;
&lt;li&gt;During implementation, emphasize TDD, YAGNI, DRY, and code review.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is not new software engineering. It is important because fast agents need stronger guardrails.&lt;/p&gt;
&lt;h2 id=&#34;supported-tools&#34;&gt;Supported tools
&lt;/h2&gt;&lt;p&gt;Superpowers is not tied to a single agent. The README lists installation paths for Claude Code, Codex CLI, Codex App, Factory Droid, Gemini CLI, OpenCode, Cursor, and GitHub Copilot CLI.&lt;/p&gt;
&lt;p&gt;That makes it more like a workflow layer across harnesses than a model-specific trick.&lt;/p&gt;
&lt;h2 id=&#34;the-base-workflow&#34;&gt;The base workflow
&lt;/h2&gt;&lt;p&gt;The base workflow has several stages.&lt;/p&gt;
&lt;p&gt;First is &lt;code&gt;brainstorming&lt;/code&gt;. Before implementation, the agent turns rough ideas into an executable design and confirms it with the user.&lt;/p&gt;
&lt;p&gt;Second is &lt;code&gt;using-git-worktrees&lt;/code&gt;. After design approval, it creates an isolated worktree and branch, then checks that install and test baselines are clean.&lt;/p&gt;
&lt;p&gt;Third is &lt;code&gt;writing-plans&lt;/code&gt;. It decomposes design into small tasks with paths, code scopes, and validation steps. The plan should be clear enough for someone without context to execute.&lt;/p&gt;
&lt;p&gt;Fourth is execution. &lt;code&gt;subagent-driven-development&lt;/code&gt; can dispatch tasks to subagents, while &lt;code&gt;executing-plans&lt;/code&gt; runs them in batches. Each task should be reviewable and verifiable.&lt;/p&gt;
&lt;p&gt;Fifth is &lt;code&gt;test-driven-development&lt;/code&gt;: true RED-GREEN-REFACTOR. Write a failing test, confirm failure, implement minimally, confirm pass, refactor.&lt;/p&gt;
&lt;p&gt;Sixth is &lt;code&gt;requesting-code-review&lt;/code&gt;. Reviews happen between tasks; critical findings block progress.&lt;/p&gt;
&lt;p&gt;Finally, &lt;code&gt;finishing-a-development-branch&lt;/code&gt; validates tests and offers choices such as merge, PR, keep, or discard the worktree.&lt;/p&gt;
&lt;h2 id=&#34;what-is-in-the-skills-library&#34;&gt;What is in the skills library
&lt;/h2&gt;&lt;p&gt;The skills library can be grouped by purpose.&lt;/p&gt;
&lt;p&gt;Testing centers on &lt;code&gt;test-driven-development&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Debugging includes &lt;code&gt;systematic-debugging&lt;/code&gt; and &lt;code&gt;verification-before-completion&lt;/code&gt;. They focus on reproduction, minimization, hypotheses, validation, and not claiming completion before verification.&lt;/p&gt;
&lt;p&gt;Collaboration skills include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;brainstorming&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;writing-plans&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;executing-plans&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dispatching-parallel-agents&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;requesting-code-review&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;receiving-code-review&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;using-git-worktrees&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;finishing-a-development-branch&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;subagent-driven-development&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Meta skills include &lt;code&gt;writing-skills&lt;/code&gt; and &lt;code&gt;using-superpowers&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Together they give the agent engineering habits: when to ask, when to plan, when to test, and when to stop for review.&lt;/p&gt;
&lt;h2 id=&#34;how-it-differs-from-a-prompt&#34;&gt;How it differs from a prompt
&lt;/h2&gt;&lt;p&gt;A normal prompt often piles rules into one system message: do not over-edit, think first, test, explain, be concise. As rules accumulate, complex tasks make the model forget or ignore some of them.&lt;/p&gt;
&lt;p&gt;Superpowers splits rules into phase-specific workflow modules. Each skill is shorter and focused. The agent knows the current phase, complex processes become checkable, and teams can turn their own practices into reusable skills.&lt;/p&gt;
&lt;p&gt;The lesson is not just &amp;ldquo;use a smarter model&amp;rdquo;. Give the model a repeatable way to work.&lt;/p&gt;
&lt;h2 id=&#34;who-should-use-it&#34;&gt;Who should use it
&lt;/h2&gt;&lt;p&gt;Superpowers is most useful for developers already using coding agents on real projects, especially when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The task spans multiple files.&lt;/li&gt;
&lt;li&gt;The agent should design before implementation.&lt;/li&gt;
&lt;li&gt;TDD or validation matters.&lt;/li&gt;
&lt;li&gt;Multiple branches or worktrees are common.&lt;/li&gt;
&lt;li&gt;Subagents can help with implementation or review.&lt;/li&gt;
&lt;li&gt;A team wants to encode its workflow as skills.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For a one-line config change, it may feel heavy. For multi-step development, the constraints are valuable.&lt;/p&gt;
&lt;h2 id=&#34;notes-before-using-it&#34;&gt;Notes before using it
&lt;/h2&gt;&lt;p&gt;Do not treat it as full autopilot. It gives the agent process, but humans still own requirements, tradeoffs, and final acceptance.&lt;/p&gt;
&lt;p&gt;TDD and review add upfront cost. For small tasks they may slow things down; for complex tasks they reduce rework.&lt;/p&gt;
&lt;p&gt;Parallel subagents are not always better. They work when boundaries and write scopes are clear. If the requirement is still fuzzy, parallelism only multiplies confusion.&lt;/p&gt;
&lt;p&gt;Teams must maintain skill quality. Outdated processes, vague instructions, and conflicting rules can also hurt agents.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;Superpowers is valuable because it pulls coding agents away from &amp;ldquo;receive request, edit code&amp;rdquo; and back into software engineering process.&lt;/p&gt;
&lt;p&gt;AI coding often lacks not generation speed, but clarification, planning, verification, review, and closure. The stronger the model becomes, the less these steps should be skipped.&lt;/p&gt;
&lt;p&gt;If you use Codex, Claude Code, Cursor, or Gemini CLI on real projects, Superpowers is worth studying. Even if you do not install it, its skill decomposition is a good reference for designing your own agent workflow.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Codex /goal vs Claude Code /goal: running long tasks until they are done</title>
        <link>https://knightli.com/en/2026/05/14/codex-goal-vs-claude-code-goal/</link>
        <pubDate>Thu, 14 May 2026 22:25:31 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/14/codex-goal-vs-claude-code-goal/</guid>
        <description>&lt;p&gt;&lt;code&gt;/goal&lt;/code&gt; is becoming an important command in AI coding tools.&lt;/p&gt;
&lt;p&gt;It is not about making the model write a few more lines of code. It solves a more practical problem: when a task has clear completion conditions, can the agent keep going until those conditions are met, instead of stopping after every turn and waiting for the user to say &amp;ldquo;continue&amp;rdquo;?&lt;/p&gt;
&lt;p&gt;Codex CLI has already added an experimental &lt;code&gt;/goal&lt;/code&gt; command in its official docs. Claude Code has also published its own &lt;code&gt;/goal&lt;/code&gt; documentation, describing it as an automation capability that can keep working across multiple turns. The names are the same, but the product direction is not exactly the same.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-does-goal-solve&#34;&gt;What problem does &lt;code&gt;/goal&lt;/code&gt; solve?
&lt;/h2&gt;&lt;p&gt;Ordinary AI coding conversations usually work as a one-turn-at-a-time loop:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The user describes a task.&lt;/li&gt;
&lt;li&gt;The agent analyzes, edits code, and runs tests.&lt;/li&gt;
&lt;li&gt;The agent reports the result.&lt;/li&gt;
&lt;li&gt;The user decides what to do next.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That workflow is fine for short tasks. But for migrations, refactors, test fixes, or issue backlog cleanup, it gets fragmented. The agent may move forward a little, then stop and wait for you to type &amp;ldquo;continue&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;/goal&lt;/code&gt; changes the question from &amp;ldquo;what should you do next?&amp;rdquo; to &amp;ldquo;what final state counts as done?&amp;rdquo; For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal 完成登录模块迁移，所有 auth 测试通过，lint 无报错
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This kind of target naturally fits long tasks because it has a clear endpoint: tests pass, the build succeeds, files are split, a queue is empty, or acceptance criteria are satisfied.&lt;/p&gt;
&lt;h2 id=&#34;codex-goal-experimental-and-attached-to-the-current-thread&#34;&gt;Codex &lt;code&gt;/goal&lt;/code&gt;: experimental and attached to the current thread
&lt;/h2&gt;&lt;p&gt;OpenAI&amp;rsquo;s Codex CLI documentation marks &lt;code&gt;/goal&lt;/code&gt; as experimental. It is not a stable default capability and requires &lt;code&gt;features.goals&lt;/code&gt; to be enabled first.&lt;/p&gt;
&lt;p&gt;There are two ways to enable it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/experimental
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Or add this to &lt;code&gt;config.toml&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-toml&#34; data-lang=&#34;toml&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;features&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nx&#34;&gt;goals&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Once enabled, you can use it like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal Finish the migration and keep tests green
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Common commands include:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal pause
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal resume
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal clear
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;According to OpenAI&amp;rsquo;s docs, Codex attaches the goal to the current active thread and keeps tracking that target while a larger task continues.&lt;/p&gt;
&lt;p&gt;One detail matters here: the official wording for Codex &lt;code&gt;/goal&lt;/code&gt; is restrained. It emphasizes setting an experimental goal for long-running work and attaching the goal to the current thread, but it does not describe, in the same level of detail as Claude Code&amp;rsquo;s docs, an independent evaluator that automatically checks every turn and starts the next one. So for now, it is better to treat Codex &lt;code&gt;/goal&lt;/code&gt; as an experimental long-task goal mechanism, not a fully stable unattended execution mode.&lt;/p&gt;
&lt;h2 id=&#34;claude-code-goal-multi-turn-execution-driven-by-completion-conditions&#34;&gt;Claude Code &lt;code&gt;/goal&lt;/code&gt;: multi-turn execution driven by completion conditions
&lt;/h2&gt;&lt;p&gt;Claude Code&amp;rsquo;s &lt;code&gt;/goal&lt;/code&gt; documentation is more explicit: after the user sets a completion condition, Claude keeps working across turns until that condition is met.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal all tests in test/auth pass and the lint step is clean
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Claude Code&amp;rsquo;s mechanism is roughly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;After the current turn finishes, control is not immediately returned to the user.&lt;/li&gt;
&lt;li&gt;A small, fast model checks whether the goal condition has already been met.&lt;/li&gt;
&lt;li&gt;If it has not been met, Claude automatically starts the next turn.&lt;/li&gt;
&lt;li&gt;If it has been met, the goal is cleared automatically and the completion status is recorded in the transcript.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This makes Claude Code&amp;rsquo;s &lt;code&gt;/goal&lt;/code&gt; more like &amp;ldquo;auto-continue until the completion condition is satisfied.&amp;rdquo; It does not merely pin a target to the conversation; it gives an independent evaluation step the decision of whether to continue.&lt;/p&gt;
&lt;p&gt;Claude Code also supports checking status directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The status shows the goal condition, elapsed time, evaluated turn count, token usage, and the evaluator&amp;rsquo;s latest reason.&lt;/p&gt;
&lt;p&gt;To stop early, use:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal clear
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;stop&lt;/code&gt;, &lt;code&gt;off&lt;/code&gt;, &lt;code&gt;reset&lt;/code&gt;, &lt;code&gt;none&lt;/code&gt;, and &lt;code&gt;cancel&lt;/code&gt; also work as clearing aliases. After a goal is enabled, if the session is interrupted and later resumed with &lt;code&gt;--resume&lt;/code&gt; or &lt;code&gt;--continue&lt;/code&gt;, an active goal can be restored. However, elapsed time, turn count, and token baselines are recalculated.&lt;/p&gt;
&lt;h2 id=&#34;the-biggest-difference&#34;&gt;The biggest difference
&lt;/h2&gt;&lt;p&gt;Both Codex and Claude Code are pushing AI coding from single-turn answers toward long-running task execution, but their &lt;code&gt;/goal&lt;/code&gt; commands have different positioning.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Comparison&lt;/th&gt;
          &lt;th&gt;Codex CLI &lt;code&gt;/goal&lt;/code&gt;&lt;/th&gt;
          &lt;th&gt;Claude Code &lt;code&gt;/goal&lt;/code&gt;&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Status&lt;/td&gt;
          &lt;td&gt;experimental&lt;/td&gt;
          &lt;td&gt;documented on a dedicated official page&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Enablement&lt;/td&gt;
          &lt;td&gt;requires &lt;code&gt;features.goals&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;usable directly in a trusted workspace&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Goal scope&lt;/td&gt;
          &lt;td&gt;current active thread&lt;/td&gt;
          &lt;td&gt;current session&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Common operations&lt;/td&gt;
          &lt;td&gt;set / view / pause / resume / clear&lt;/td&gt;
          &lt;td&gt;set / view / clear&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Automatic evaluation&lt;/td&gt;
          &lt;td&gt;docs emphasize attachment and tracking&lt;/td&gt;
          &lt;td&gt;docs explicitly describe evaluator checks after each turn&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Auto-continuation&lt;/td&gt;
          &lt;td&gt;official wording is restrained&lt;/td&gt;
          &lt;td&gt;starts the next turn automatically when conditions are unmet&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Best fit&lt;/td&gt;
          &lt;td&gt;keeping a long-term target in a Codex task&lt;/td&gt;
          &lt;td&gt;letting Claude Code keep moving toward completion conditions&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In short, Codex &lt;code&gt;/goal&lt;/code&gt; is closer to &amp;ldquo;attach an experimental long-term target to the current thread.&amp;rdquo; Claude Code &lt;code&gt;/goal&lt;/code&gt; is closer to &amp;ldquo;set a verifiable stop condition for the current session and let it keep working until satisfied.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;how-to-write-a-good-goal&#34;&gt;How to write a good &lt;code&gt;/goal&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;Whichever tool you use, &lt;code&gt;/goal&lt;/code&gt; is not a good place for vague wishes.&lt;/p&gt;
&lt;p&gt;Not a great goal:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal 把项目优化一下
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;A better goal:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal 将 payment 模块迁移到新 API，npm test -- payment 退出码为 0，git diff 只包含 payment 相关文件
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;A good goal usually includes three things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A clear completed state.&lt;/li&gt;
&lt;li&gt;An executable validation method.&lt;/li&gt;
&lt;li&gt;Boundaries that must be respected.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If the goal is large, add a stop condition:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/goal 修复 eslint 报错，npm run lint 退出码为 0；如果超过 20 轮仍未完成，停止并总结剩余问题
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This matters. The stronger &lt;code&gt;/goal&lt;/code&gt; becomes, the more it needs boundaries. Otherwise, the agent may modify too many files, run too long, consume too many tokens, or keep pushing forward on a question that should have been paused for human input.&lt;/p&gt;
&lt;h2 id=&#34;when-goal-is-a-good-fit&#34;&gt;When &lt;code&gt;/goal&lt;/code&gt; is a good fit
&lt;/h2&gt;&lt;p&gt;Good fits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Test fixes: until specific tests pass.&lt;/li&gt;
&lt;li&gt;Code migrations: until all call sites are updated and compilation succeeds.&lt;/li&gt;
&lt;li&gt;Batch cleanup: until a class of lint or type errors is reduced to zero.&lt;/li&gt;
&lt;li&gt;Documentation completion: until all specified modules have documentation.&lt;/li&gt;
&lt;li&gt;Issue queue handling: until every issue under a tag is handled or clearly classified.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Poor fits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The requirement itself is still unclear.&lt;/li&gt;
&lt;li&gt;The task needs frequent product judgment.&lt;/li&gt;
&lt;li&gt;It involves high-risk deletion, data migration, or permission changes.&lt;/li&gt;
&lt;li&gt;Acceptance can only be judged subjectively.&lt;/li&gt;
&lt;li&gt;The task spans many unrelated modules.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A practical rule: if you can write &amp;ldquo;which command to run, what result to see, and which files must not be touched,&amp;rdquo; it is a good candidate for &lt;code&gt;/goal&lt;/code&gt;. If you can only write &amp;ldquo;make this better,&amp;rdquo; ordinary conversation, plan mode, or human review is still safer.&lt;/p&gt;
&lt;h2 id=&#34;what-this-means-for-ai-coding-tools&#34;&gt;What this means for AI coding tools
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;/goal&lt;/code&gt; points to a clear direction: AI coding tools are moving from interactive assistants toward continuously executable work units.&lt;/p&gt;
&lt;p&gt;In the past, using an agent often meant staying nearby. If it got stuck, you prompted it. If tests finished, you told it to continue. If errors appeared, you issued another command. &lt;code&gt;/goal&lt;/code&gt; compresses that interaction into a completion condition and lets the agent decide what the next turn should do.&lt;/p&gt;
&lt;p&gt;But this also raises the bar for users. Writing prompts is no longer just describing a task; it also means defining acceptance criteria, validation commands, modification boundaries, and stop rules. In other words, the user&amp;rsquo;s job shifts from &amp;ldquo;keep telling it to continue&amp;rdquo; to &amp;ldquo;define what done means.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The fact that both Codex and Claude Code have reached &lt;code&gt;/goal&lt;/code&gt; shows that long-running agents are no longer only for background tasks or cloud queues. Local terminal coding tools now also need stronger autonomous progress.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;Codex CLI and Claude Code both have &lt;code&gt;/goal&lt;/code&gt;, but at this stage they should not be treated as the same feature.&lt;/p&gt;
&lt;p&gt;Codex &lt;code&gt;/goal&lt;/code&gt; is still experimental, requires &lt;code&gt;features.goals&lt;/code&gt;, and is better understood as a way to maintain a long-term target in the current Codex thread. Claude Code &lt;code&gt;/goal&lt;/code&gt; more explicitly connects completion conditions with auto-continuation, using an independent evaluator to decide whether to keep going.&lt;/p&gt;
&lt;p&gt;For everyday development, this kind of command is best for engineering tasks with clear acceptance criteria. It does not replace product judgment or code review, but it can reduce the repetitive &amp;ldquo;continue,&amp;rdquo; &amp;ldquo;run it again,&amp;rdquo; and &amp;ldquo;fix until tests pass&amp;rdquo; loop inside long tasks.&lt;/p&gt;
&lt;p&gt;The real skill is not memorizing the command. It is learning how to write tasks as clear, verifiable, stoppable goals.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;OpenAI Codex CLI Slash Commands: &lt;a class=&#34;link&#34; href=&#34;https://developers.openai.com/codex/cli/slash-commands&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://developers.openai.com/codex/cli/slash-commands&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Claude Code Goal documentation: &lt;a class=&#34;link&#34; href=&#34;https://code.claude.com/docs/en/goal&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://code.claude.com/docs/en/goal&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Why DeepSeek Became the Cost-Saving Key in This Round of AI Coding Tools</title>
        <link>https://knightli.com/en/2026/05/11/deepseek-ai-coding-cost-saving/</link>
        <pubDate>Mon, 11 May 2026 04:59:00 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/11/deepseek-ai-coding-cost-saving/</guid>
        <description>&lt;p&gt;In this round of AI coding tool competition, the surface battle is about model capability, plugin ecosystems, and agent automation. But once you actually use these tools, the first wall you hit is cost.&lt;/p&gt;
&lt;p&gt;Claude Code, Codex, OpenClaw, and Superpowers are all useful, but they share one trait: once a task becomes complex, they eat tokens aggressively. They need to read the project, build a plan, call tools, summarize context, repeatedly check results, and sometimes launch multiple subtasks. The smarter the model and the more automated the workflow, the easier it is for the bill to quietly grow.&lt;/p&gt;
&lt;p&gt;That is why DeepSeek has become important in this cycle. Not merely because it can write code, but because its long context and cache pricing happen to hit the most expensive part of AI coding tools.&lt;/p&gt;
&lt;h2 id=&#34;why-agent-tools-burn-so-many-tokens&#34;&gt;Why Agent Tools Burn So Many Tokens
&lt;/h2&gt;&lt;p&gt;Traditional chat-style coding assistants usually work in question-and-answer mode. You ask how to write a function, and the model returns a code snippet. This still costs tokens, but it is relatively controllable.&lt;/p&gt;
&lt;p&gt;Agent tools are different. They do not just answer questions. They enter the project like a temporary engineer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;scan directories and key files;&lt;/li&gt;
&lt;li&gt;understand the requirement and existing architecture;&lt;/li&gt;
&lt;li&gt;make a plan;&lt;/li&gt;
&lt;li&gt;modify files;&lt;/li&gt;
&lt;li&gt;run commands or tests;&lt;/li&gt;
&lt;li&gt;keep fixing based on errors;&lt;/li&gt;
&lt;li&gt;summarize what changed at the end.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;During this process, the model repeatedly reads the same context. Project descriptions, code snippets, tool outputs, conversation history, plans, and error logs all get placed back into the context. Once the task is a little complex, hundreds of thousands of tokens can disappear quickly.&lt;/p&gt;
&lt;p&gt;If you add more aggressive plugins, the cost becomes even more obvious. Some OpenCode or Claude Code enhancement tools may organize a whole agent team by default. You only wanted to change a small feature, but it may still start planning, review, execution, and retrospective steps. The task may look more &amp;ldquo;intelligent&amp;rdquo;, but the token count keeps climbing.&lt;/p&gt;
&lt;h2 id=&#34;the-advantage-of-superpowers-is-on-demand-activation&#34;&gt;The Advantage of Superpowers Is On-Demand Activation
&lt;/h2&gt;&lt;p&gt;One advantage of tools like Superpowers is that they do not force a full agent workflow onto every task.&lt;/p&gt;
&lt;p&gt;Most of the time, you can still let Claude Code, OpenCode, or Codex work in their normal mode. Only when you explicitly call a skill, such as brainstorming, planning, executing a plan, or doing a retrospective, does it enter a heavier automation flow.&lt;/p&gt;
&lt;p&gt;That matters for cost.&lt;/p&gt;
&lt;p&gt;AI coding should not use heavy artillery for every task. Changing one config line, checking one error, or writing a small script can be handled through ordinary conversation. Only complex refactors, cross-file changes, long-document processing, and multi-round validation deserve a full agent workflow.&lt;/p&gt;
&lt;p&gt;The stronger the tool, the more you need to control when it triggers. Otherwise, more automation simply means more waste.&lt;/p&gt;
&lt;h2 id=&#34;deepseeks-key-advantage-is-cheap-cache-hits&#34;&gt;DeepSeek&amp;rsquo;s Key Advantage Is Cheap Cache Hits
&lt;/h2&gt;&lt;p&gt;One important reason DeepSeek fits these agent tools is its low cache-hit cost.&lt;/p&gt;
&lt;p&gt;AI coding tasks contain a lot of repeated prefixes: project background, system prompts, tool instructions, file content, and earlier conversation turns often appear again in later requests. If the model service supports prompt caching, those repeated parts become much cheaper after a cache hit.&lt;/p&gt;
&lt;p&gt;For many models, a cache hit is only somewhat cheaper than a miss, perhaps around one third of the original price. DeepSeek&amp;rsquo;s advantage is that the gap after a cache hit can be much larger. For long-context, multi-round agent workflows that repeatedly read the same project, this gap shows up directly on the bill.&lt;/p&gt;
&lt;p&gt;In other words, DeepSeek is not necessarily the strongest answer on every single turn. But in scenarios with long tasks, many rounds, and repeated context reads, its cost structure is unusually suitable for AI coding.&lt;/p&gt;
&lt;h2 id=&#34;long-context-makes-claude-code-more-useful&#34;&gt;Long Context Makes Claude Code More Useful
&lt;/h2&gt;&lt;p&gt;When Claude Code or similar tools are connected to DeepSeek V4, another clear advantage is long context.&lt;/p&gt;
&lt;p&gt;AI coding tools fear insufficient context. Once context runs short, compression becomes frequent. Once compression becomes frequent, previously read details may be lost. The model may start forgetting the project structure, constraints, or why a certain file was changed, and quality declines afterward.&lt;/p&gt;
&lt;p&gt;DeepSeek V4&amp;rsquo;s long-context capability makes it better suited for code repositories, document batch processing, subtitle translation, and site article cleanup. Especially when connected to tools like Claude Code or OpenClaw, the right configuration can delay context compression and preserve more project detail.&lt;/p&gt;
&lt;p&gt;That is why some tasks feel &amp;ldquo;durable&amp;rdquo; when run on DeepSeek. It may not be dazzling at every step, but it can tolerate long-running, low-cost, repeated calls.&lt;/p&gt;
&lt;h2 id=&#34;how-to-split-work-between-v4-pro-and-v4-flash&#34;&gt;How to Split Work Between V4 Pro and V4 Flash
&lt;/h2&gt;&lt;p&gt;DeepSeek V4 Pro and V4 Flash should not be mixed casually.&lt;/p&gt;
&lt;p&gt;For simple tasks, &lt;code&gt;DeepSeek V4 Flash&lt;/code&gt; is usually a better fit. It is fast and cheap, and is often enough for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;subtitle translation;&lt;/li&gt;
&lt;li&gt;document cleanup;&lt;/li&gt;
&lt;li&gt;ordinary script generation;&lt;/li&gt;
&lt;li&gt;small code edits;&lt;/li&gt;
&lt;li&gt;lightweight OpenClaw tasks;&lt;/li&gt;
&lt;li&gt;simple site content processing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For complex tasks, consider &lt;code&gt;DeepSeek V4 Pro&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;large-scale refactoring;&lt;/li&gt;
&lt;li&gt;multi-module code understanding;&lt;/li&gt;
&lt;li&gt;complex reasoning;&lt;/li&gt;
&lt;li&gt;long-chain agent tasks;&lt;/li&gt;
&lt;li&gt;high-risk code changes;&lt;/li&gt;
&lt;li&gt;engineering tasks that require stronger planning.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Many people want to attach the strongest model immediately, but that is often uneconomical. The practical way to use AI coding tools is to layer tasks: let the cheaper model handle a large amount of routine work, and reserve the expensive model for key decision points.&lt;/p&gt;
&lt;h2 id=&#34;minimax-doubao-and-deepseek-occupy-different-positions&#34;&gt;MiniMax, Doubao, and DeepSeek Occupy Different Positions
&lt;/h2&gt;&lt;p&gt;Among domestic models and plans, MiniMax, Doubao, Kimi, and DeepSeek each have their own place.&lt;/p&gt;
&lt;p&gt;MiniMax&amp;rsquo;s advantage is generous quota, low price, and broad functionality. It may not be the smartest coding model, but it is cost-effective for translation, lightweight cleanup, and batch processing. For example, batch subtitle processing, format conversion, and simple proofreading are good fits for MiniMax-style plans.&lt;/p&gt;
&lt;p&gt;Doubao&amp;rsquo;s advantage is a broader tool ecosystem: image, video, search, TTS, possible STT, and embedding can be connected together. It feels more like a comprehensive toolbox.&lt;/p&gt;
&lt;p&gt;DeepSeek&amp;rsquo;s position is clearer: text, code, long context, and low-cost caching. It lacks a complete image generation, voice, and video ecosystem, and its weaknesses are obvious. But in AI coding and long-text agent workflows, its strengths are long enough to matter.&lt;/p&gt;
&lt;p&gt;So this is not about one tool replacing another. It is about splitting the task and using each tool where it fits.&lt;/p&gt;
&lt;h2 id=&#34;saving-money-is-not-just-choosing-a-cheap-model&#34;&gt;Saving Money Is Not Just Choosing a Cheap Model
&lt;/h2&gt;&lt;p&gt;Saving money in AI coding does not mean simply switching every request to the cheapest model.&lt;/p&gt;
&lt;p&gt;The effective methods are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Do not start a heavy agent for simple tasks.&lt;/li&gt;
&lt;li&gt;Do not use Pro when Flash is enough.&lt;/li&gt;
&lt;li&gt;Use cache as much as possible for long tasks.&lt;/li&gt;
&lt;li&gt;Keep repeated context stable, so meaningless changes do not break cache hits.&lt;/li&gt;
&lt;li&gt;Let a cheaper model draft and batch-process first, then use a stronger model for key reviews.&lt;/li&gt;
&lt;li&gt;Tell the agent clearly not to repeat facts or summarize the same point again and again.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The last point matters more than it looks. AI tools are prone to verbosity, and verbosity is not only a reading problem; it is also a cost problem. Putting &amp;ldquo;describe each fact once and state each opinion once&amp;rdquo; into the prompt can improve both article quality and token consumption.&lt;/p&gt;
&lt;h2 id=&#34;what-ai-coding-workflows-deepseek-fits-best&#34;&gt;What AI Coding Workflows DeepSeek Fits Best
&lt;/h2&gt;&lt;p&gt;DeepSeek is best suited for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;reading long code repositories;&lt;/li&gt;
&lt;li&gt;lightweight multi-file edits;&lt;/li&gt;
&lt;li&gt;batch document cleanup;&lt;/li&gt;
&lt;li&gt;batch subtitle translation;&lt;/li&gt;
&lt;li&gt;Hugo article cleanup;&lt;/li&gt;
&lt;li&gt;agent plan execution;&lt;/li&gt;
&lt;li&gt;low-cost automation with lots of repeated context.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is not the best fit for every task. If you need especially strong frontend taste, complex product judgment, or cross-modal creation, you may still need Claude, GPT, Gemini, Doubao, or other tools.&lt;/p&gt;
&lt;p&gt;But whenever a task is long-text, long-context, repeated-call, and cost-sensitive, DeepSeek can easily become the first choice.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;In this round of AI coding tools, DeepSeek&amp;rsquo;s value is not just that a domestic model can write code. Its real value is that it addresses the most practical pain point of agent tools: long tasks are too expensive.&lt;/p&gt;
&lt;p&gt;Tools like Claude Code, OpenClaw, and Superpowers make the development process increasingly automated, but behind that automation are massive context reads and multi-round calls. Whoever can lower this part of the cost can make AI coding go from &amp;ldquo;fun once in a while&amp;rdquo; to &amp;ldquo;affordable every day&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;DeepSeek&amp;rsquo;s long context, low cache cost, and layered use of V4 Flash / V4 Pro put it in exactly that position.&lt;/p&gt;
&lt;p&gt;The real cost-saving key in this cycle is not avoiding good models. It is combining good models, cheap models, cache, and agent workflows properly. Once you understand that bill, AI coding tools can become real productivity rather than a beautiful but expensive toy.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>goose: An Open Source AI Agent with Desktop, CLI, and API</title>
        <link>https://knightli.com/en/2026/05/08/goose-open-source-ai-agent-desktop-cli-api/</link>
        <pubDate>Fri, 08 May 2026 13:41:15 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/08/goose-open-source-ai-agent-desktop-cli-api/</guid>
        <description>&lt;p&gt;goose is an open source AI agent that runs on your own machine. It is not limited to code completion; it aims to cover code, research, writing, automation, data analysis, and other tasks. The README positions it as a desktop app, CLI, and API that can serve both normal users and custom workflows.&lt;/p&gt;
&lt;p&gt;The project has moved from &lt;code&gt;block/goose&lt;/code&gt; to the Agentic AI Foundation (AAIF) at the Linux Foundation. The current repository is:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;https://github.com/aaif-goose/goose
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;goose is mainly written in Rust and TypeScript and uses the Apache-2.0 license. Its GitHub description says it is an open source, extensible AI agent that goes beyond code suggestions and can install, execute, edit, and test with any LLM.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-it-solves&#34;&gt;What Problem It Solves
&lt;/h2&gt;&lt;p&gt;Many AI coding tools focus on suggestions or local code edits. goose takes a broader view: let an AI agent complete tasks directly on your machine.&lt;/p&gt;
&lt;p&gt;It can be used for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code changes and tests.&lt;/li&gt;
&lt;li&gt;Local automation.&lt;/li&gt;
&lt;li&gt;Research and writing.&lt;/li&gt;
&lt;li&gt;Data analysis.&lt;/li&gt;
&lt;li&gt;Multi-step workflows.&lt;/li&gt;
&lt;li&gt;Embedding through an API.&lt;/li&gt;
&lt;li&gt;Tool extension through MCP.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only need IDE completion, a Copilot-style tool may be enough. goose is more useful when you want AI inside the local task execution chain.&lt;/p&gt;
&lt;h2 id=&#34;desktop-cli-and-api&#34;&gt;Desktop, CLI, and API
&lt;/h2&gt;&lt;p&gt;goose has three entry points.&lt;/p&gt;
&lt;p&gt;The desktop app supports macOS, Linux, and Windows. It is good for users who prefer a visual interface.&lt;/p&gt;
&lt;p&gt;The CLI fits terminal workflows and local development automation.&lt;/p&gt;
&lt;p&gt;The API lets other systems or internal tools embed goose as an agent runtime.&lt;/p&gt;
&lt;p&gt;Personal users can start with the desktop app or CLI. Teams and workflow builders should also look at the API and custom distribution support.&lt;/p&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation
&lt;/h2&gt;&lt;p&gt;The README recommends downloading the desktop app:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;https://goose-docs.ai/docs/getting-started/installation
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;CLI install:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -fsSL https://github.com/aaif-goose/goose/releases/download/stable/download_cli.sh &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; bash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;GitHub Releases provide builds for multiple platforms. The latest release checked here was &lt;code&gt;v1.33.1&lt;/code&gt;, published on 2026-04-29, with macOS, Linux, Windows, deb, rpm, and Flatpak assets.&lt;/p&gt;
&lt;p&gt;After installation, configure a provider from the official quickstart and test in a low-risk directory first. goose can execute local tasks, so avoid giving it broad permissions in a production repository from the start.&lt;/p&gt;
&lt;h2 id=&#34;providers&#34;&gt;Providers
&lt;/h2&gt;&lt;p&gt;goose supports 15+ providers, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Anthropic&lt;/li&gt;
&lt;li&gt;OpenAI&lt;/li&gt;
&lt;li&gt;Google&lt;/li&gt;
&lt;li&gt;Ollama&lt;/li&gt;
&lt;li&gt;OpenRouter&lt;/li&gt;
&lt;li&gt;Azure&lt;/li&gt;
&lt;li&gt;Bedrock&lt;/li&gt;
&lt;li&gt;other cloud or OpenAI-compatible providers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It can use API keys, and it can also use existing Claude, ChatGPT, or Gemini subscriptions through ACP.&lt;/p&gt;
&lt;p&gt;ACP is important because many users already pay for subscriptions, but different tools cannot easily reuse them. goose uses ACP providers to bring those subscriptions into an agent workflow.&lt;/p&gt;
&lt;p&gt;Provider policies change quickly. Check whether the access method is allowed, whether there are quotas, and whether it is suitable for company code or sensitive data.&lt;/p&gt;
&lt;h2 id=&#34;mcp-extensions&#34;&gt;MCP Extensions
&lt;/h2&gt;&lt;p&gt;goose supports Model Context Protocol extensions. The README mentions 70+ extensions.&lt;/p&gt;
&lt;p&gt;MCP matters because an agent should not only chat and edit files. Through standard protocol servers, it can connect to documentation, databases, browsers, internal systems, search services, design tools, or project management tools.&lt;/p&gt;
&lt;p&gt;For teams, MCP can become a safer integration layer: expose internal capabilities through explicit interfaces instead of letting the model touch every system directly.&lt;/p&gt;
&lt;h2 id=&#34;difference-from-a-coding-assistant&#34;&gt;Difference from a Coding Assistant
&lt;/h2&gt;&lt;p&gt;goose is not just a code completion tool. It is closer to a local agent runtime.&lt;/p&gt;
&lt;p&gt;Common coding assistants focus on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code completion.&lt;/li&gt;
&lt;li&gt;Code explanation.&lt;/li&gt;
&lt;li&gt;Function generation.&lt;/li&gt;
&lt;li&gt;Local editor edits.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;goose emphasizes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Local task execution.&lt;/li&gt;
&lt;li&gt;Multi-step workflows.&lt;/li&gt;
&lt;li&gt;Switchable providers.&lt;/li&gt;
&lt;li&gt;Extensions.&lt;/li&gt;
&lt;li&gt;Desktop and CLI.&lt;/li&gt;
&lt;li&gt;Embeddable API.&lt;/li&gt;
&lt;li&gt;Non-code tasks too.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This also means more complexity. You must think about model configuration, permissions, extensions, workspace scope, logs, and credentials.&lt;/p&gt;
&lt;h2 id=&#34;custom-distributions&#34;&gt;Custom Distributions
&lt;/h2&gt;&lt;p&gt;The repository includes &lt;code&gt;CUSTOM_DISTROS.md&lt;/code&gt;, which explains how to build a custom goose distribution with preconfigured providers, extensions, and branding.&lt;/p&gt;
&lt;p&gt;This is useful for teams:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Preconfigure allowed model providers.&lt;/li&gt;
&lt;li&gt;Connect internal MCP servers.&lt;/li&gt;
&lt;li&gt;Set safety policies and logging.&lt;/li&gt;
&lt;li&gt;Block disallowed external services.&lt;/li&gt;
&lt;li&gt;Apply company branding and onboarding.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Members do not need to configure everything from scratch, and the risk of wrong provider or key setup is reduced.&lt;/p&gt;
&lt;h2 id=&#34;suggested-use&#34;&gt;Suggested Use
&lt;/h2&gt;&lt;p&gt;Start gradually:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install the desktop app or CLI.&lt;/li&gt;
&lt;li&gt;Configure one known-good provider.&lt;/li&gt;
&lt;li&gt;Run simple tasks in a test directory.&lt;/li&gt;
&lt;li&gt;Observe what it reads and executes.&lt;/li&gt;
&lt;li&gt;Add MCP extensions.&lt;/li&gt;
&lt;li&gt;Try larger repositories later.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Keep a few habits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Commit important changes before agent work.&lt;/li&gt;
&lt;li&gt;Do not store API keys in project files.&lt;/li&gt;
&lt;li&gt;Use high-permission modes only in trusted workspaces.&lt;/li&gt;
&lt;li&gt;Review company data and provider policy first.&lt;/li&gt;
&lt;li&gt;Keep human review for automation results.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;who-should-use-it&#34;&gt;Who Should Use It
&lt;/h2&gt;&lt;p&gt;goose is a good fit if you want a desktop and CLI AI agent, multiple model providers, MCP integration, API embedding, or custom team distributions. It may be heavy if all you need is IDE code completion.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;goose is an open source AI agent under AAIF/Linux Foundation. It provides desktop, CLI, and API entry points, supports 15+ providers, ACP subscription access, and 70+ MCP extensions.&lt;/p&gt;
&lt;p&gt;Its value is not only writing code, but placing models, tools, extensions, and local execution into one agent framework. Start small, define permission and data boundaries, then expand usage.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/aaif-goose/goose&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;goose GitHub repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://goose-docs.ai/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;goose documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://goose-docs.ai/docs/getting-started/installation&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;goose installation guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://aaif.io/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Agentic AI Foundation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>24 Claude Code Tips: Plan Mode, Rewind, CLAUDE.md, Skills, Agents, and Plugins</title>
        <link>https://knightli.com/en/2026/05/08/claude-code-24-tips-plan-rewind-skills-agents/</link>
        <pubDate>Fri, 08 May 2026 08:54:14 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/08/claude-code-24-tips-plan-rewind-skills-agents/</guid>
        <description>&lt;p&gt;Claude Code is not just a chat box. It is closer to a coding Agent that can enter a project directory, read and write files, run commands, and maintain context.&lt;/p&gt;
&lt;p&gt;If you only throw a requirement at it and wait for code, problems appear quickly: unclear plans, repeated permission prompts, growing context, unsatisfactory output, no clear rollback path, and no persistent place for project rules.&lt;/p&gt;
&lt;p&gt;Here is a set of common operations for developers getting started with Claude Code.&lt;/p&gt;
&lt;h2 id=&#34;start-inside-the-project-directory&#34;&gt;Start Inside the Project Directory
&lt;/h2&gt;&lt;p&gt;Claude Code works best when launched inside the project directory, not from a random terminal location.&lt;/p&gt;
&lt;p&gt;Create a folder as the project directory, enter it, open a command line, and start Claude Code:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;claude
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;When first entering a project, if Claude Code asks whether to trust the current folder, confirm before continuing. This lets it read files, create files, and run later operations around the current project.&lt;/p&gt;
&lt;p&gt;A simple practice task is to ask it to create a photographer portfolio website. The task is visual enough to inspect, and it also lets you practice file generation, command execution, rewind, and later refactoring.&lt;/p&gt;
&lt;h2 id=&#34;use-plan-mode-first&#34;&gt;Use Plan Mode First
&lt;/h2&gt;&lt;p&gt;For more complex tasks, Claude Code may enter plan mode. Plan mode is meant to discuss requirements and break down steps before you approve execution.&lt;/p&gt;
&lt;p&gt;After it writes a plan, you usually see options like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Approve the plan and automatically allow future edit tools.&lt;/li&gt;
&lt;li&gt;Approve the plan, but require manual approval for later edits.&lt;/li&gt;
&lt;li&gt;Pause and continue discussing the plan with Claude Code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the task is clear, approve and continue. If it is not clear yet, ask it to refine the plan, such as page style, tech stack, directory structure, interactions, and acceptance criteria.&lt;/p&gt;
&lt;p&gt;Plan mode reduces rework. If an Agent starts directly, it may quickly generate many files; if the direction is wrong, later changes can get messy.&lt;/p&gt;
&lt;h2 id=&#34;switch-modes-with-shift--tab&#34;&gt;Switch Modes With Shift + Tab
&lt;/h2&gt;&lt;p&gt;In Claude Code, &lt;code&gt;Shift + Tab&lt;/code&gt; can switch between working modes. A common use is entering plan mode or switching into an auto-approve-edit mode.&lt;/p&gt;
&lt;p&gt;Suggested habits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;New projects, new features, major changes: start in plan mode.&lt;/li&gt;
&lt;li&gt;Small edits and clear fixes: execute directly.&lt;/li&gt;
&lt;li&gt;Deletion, bulk replacement, dependency installation: keep manual approval.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In plan mode, Claude Code may ask project-detail questions. Use arrow keys to choose and Enter to confirm. After submitting feedback, it updates the plan.&lt;/p&gt;
&lt;h2 id=&#34;do-not-open-all-permissions-blindly&#34;&gt;Do Not Open All Permissions Blindly
&lt;/h2&gt;&lt;p&gt;When Claude Code runs commands, edits files, or starts programs, it may request permission.&lt;/p&gt;
&lt;p&gt;Common choices include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Allow only this time.&lt;/li&gt;
&lt;li&gt;Allow this command type for the current session.&lt;/li&gt;
&lt;li&gt;Reject or pause.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For local preview, dev server startup, or file inspection, approve as needed. But do not permanently use a mode that auto-approves all permissions just to save clicks.&lt;/p&gt;
&lt;p&gt;Full automation is only suitable when the task is low-risk, clearly understood, and the project already has Git backups. For daily use, keep human approval for deletion, overwriting folders, dependency installation, networking, commits, and scripts.&lt;/p&gt;
&lt;h2 id=&#34;run-local-commands-in-terminal-mode&#34;&gt;Run Local Commands in Terminal Mode
&lt;/h2&gt;&lt;p&gt;Claude Code can enter a terminal-command mode to run local commands.&lt;/p&gt;
&lt;p&gt;For example, after generating a page, you can open an HTML file with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;start index.html
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;start&lt;/code&gt; is a Windows command for opening a file, followed by the filename. This is faster than finding the file manually.&lt;/p&gt;
&lt;p&gt;Terminal mode is useful for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Opening generated pages.&lt;/li&gt;
&lt;li&gt;Listing directory contents.&lt;/li&gt;
&lt;li&gt;Starting local development servers.&lt;/li&gt;
&lt;li&gt;Running tests or builds.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Still, be careful with high-risk commands such as recursive deletion, moving directories, bulk overwrites, and system environment changes.&lt;/p&gt;
&lt;h2 id=&#34;rewind-when-the-result-goes-wrong&#34;&gt;Rewind When the Result Goes Wrong
&lt;/h2&gt;&lt;p&gt;If the page or code produced by Claude Code is not what you want, and each correction makes it worse, rewind early.&lt;/p&gt;
&lt;p&gt;Rewind can return code or conversation to a previous point. Common options include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rewind both code and conversation.&lt;/li&gt;
&lt;li&gt;Rewind only conversation.&lt;/li&gt;
&lt;li&gt;Rewind only code.&lt;/li&gt;
&lt;li&gt;Compress earlier content into a summary.&lt;/li&gt;
&lt;li&gt;Cancel.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When the direction is clearly wrong, it is usually better to rewind both code and conversation. That returns context and files to a cleaner state together.&lt;/p&gt;
&lt;p&gt;Note that Claude Code rewind usually only covers files it created or changed through built-in tools. Files created through external commands may not be fully rewindable. Important projects should still use Git.&lt;/p&gt;
&lt;h2 id=&#34;write-long-prompts-in-an-editor&#34;&gt;Write Long Prompts in an Editor
&lt;/h2&gt;&lt;p&gt;Do not squeeze complex requirements into one input line.&lt;/p&gt;
&lt;p&gt;If the system supports editing a long prompt in a text editor, open the editor, write the requirement clearly, save it, and then send it to Claude Code.&lt;/p&gt;
&lt;p&gt;Long prompts should include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The goal.&lt;/li&gt;
&lt;li&gt;The tech stack.&lt;/li&gt;
&lt;li&gt;What not to do.&lt;/li&gt;
&lt;li&gt;Which files must be kept.&lt;/li&gt;
&lt;li&gt;How to verify completion.&lt;/li&gt;
&lt;li&gt;Page or feature acceptance criteria.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example, if you want Claude Code to refactor a plain HTML page into a more modern stack, do not just say &amp;ldquo;refactor it.&amp;rdquo; Explain component structure, visual preservation, responsive layout, and ask it to run a build check.&lt;/p&gt;
&lt;h2 id=&#34;restore-sessions-after-exit&#34;&gt;Restore Sessions After Exit
&lt;/h2&gt;&lt;p&gt;If you need to quit Claude Code midway, exit normally. Later, return to the same project directory and start again:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;claude
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If previous records do not appear directly, use history-related commands to view and load recent sessions.&lt;/p&gt;
&lt;p&gt;This is useful for continuing interrupted work. But do not treat session history as the only memory. Project rules, tech stack, common commands, and notes should live in project files.&lt;/p&gt;
&lt;h2 id=&#34;use-claudemd-for-project-rules&#34;&gt;Use CLAUDE.md for Project Rules
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; is an important memory file for Claude Code. It usually sits at the project root and tells Claude Code project rules, tech stack, directory structure, and collaboration constraints.&lt;/p&gt;
&lt;p&gt;You can ask Claude Code to initialize it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/init
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; is good for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Project goals.&lt;/li&gt;
&lt;li&gt;Tech stack.&lt;/li&gt;
&lt;li&gt;Common start, test, and build commands.&lt;/li&gt;
&lt;li&gt;Directory notes.&lt;/li&gt;
&lt;li&gt;Code style.&lt;/li&gt;
&lt;li&gt;Forbidden actions.&lt;/li&gt;
&lt;li&gt;Commit and deployment rules.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;During each conversation, Claude Code can use these rules as part of the context. Think of it as a project manual.&lt;/p&gt;
&lt;p&gt;A simple test is to add a clear rule into &lt;code&gt;CLAUDE.md&lt;/code&gt;, then ask Claude Code something. If its answer follows the rule, it has read the project memory.&lt;/p&gt;
&lt;h2 id=&#34;reference-files-with-&#34;&gt;Reference Files With @
&lt;/h2&gt;&lt;p&gt;Typing &lt;code&gt;@&lt;/code&gt; in the input box lets you select files or Agents and add them to the current context.&lt;/p&gt;
&lt;p&gt;This is useful when you want Claude Code to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read a config file.&lt;/li&gt;
&lt;li&gt;Modify a specific page.&lt;/li&gt;
&lt;li&gt;Continue based on &lt;code&gt;CLAUDE.md&lt;/code&gt; or another document.&lt;/li&gt;
&lt;li&gt;Only inspect a specific file instead of guessing the whole project.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Compared with copying file contents into the input box, &lt;code&gt;@&lt;/code&gt; references are clearer and less error-prone.&lt;/p&gt;
&lt;h2 id=&#34;view-and-compress-context&#34;&gt;View and Compress Context
&lt;/h2&gt;&lt;p&gt;After a long conversation, context grows. When it gets too long, the model may slow down or start ignoring earlier details.&lt;/p&gt;
&lt;p&gt;Use:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/context
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If context is long, compress history:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/compact
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the result is still poor, consider clearing the current context:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/clear
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After clearing, Claude Code can still understand part of the project through files, &lt;code&gt;CLAUDE.md&lt;/code&gt;, and the current directory, but it will not keep the full conversation history.&lt;/p&gt;
&lt;p&gt;A practical habit: start a new chat after a task is done, write project rules into &lt;code&gt;CLAUDE.md&lt;/code&gt;, and do not let temporary discussion grow forever in one chat.&lt;/p&gt;
&lt;h2 id=&#34;skills-turn-repeated-work-into-instructions&#34;&gt;Skills: Turn Repeated Work Into Instructions
&lt;/h2&gt;&lt;p&gt;Skills are reusable task instructions for Claude Code. They are not one-off prompts, but packaged workflows.&lt;/p&gt;
&lt;p&gt;For example, if you often generate weekly reports, create a weekly-report Skill that defines:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Required input.&lt;/li&gt;
&lt;li&gt;Output format.&lt;/li&gt;
&lt;li&gt;Tone and structure.&lt;/li&gt;
&lt;li&gt;What must be preserved.&lt;/li&gt;
&lt;li&gt;What must not be invented.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Skills usually contain &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, and detailed instructions. Once installed in the global Skills directory, Claude Code can recognize and load them for related tasks.&lt;/p&gt;
&lt;p&gt;Good Skill candidates include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Weekly reports.&lt;/li&gt;
&lt;li&gt;Code review templates.&lt;/li&gt;
&lt;li&gt;Document cleanup.&lt;/li&gt;
&lt;li&gt;Image batch processing.&lt;/li&gt;
&lt;li&gt;Fixed-format articles.&lt;/li&gt;
&lt;li&gt;Project initialization flows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you repeatedly copy the same prompt, consider turning it into a Skill.&lt;/p&gt;
&lt;h2 id=&#34;agents-delegate-subtasks-to-independent-helpers&#34;&gt;Agents: Delegate Subtasks to Independent Helpers
&lt;/h2&gt;&lt;p&gt;Agents are different from Skills.&lt;/p&gt;
&lt;p&gt;A Skill is more like an instruction manual. An Agent is more like an independent helper that can work outside the main conversation and return results.&lt;/p&gt;
&lt;p&gt;The value of Agents is context isolation. For code inspection, you can create a read-only Agent that only reads the project and outputs a report, without modifying files. This avoids polluting the main conversation and lowers risk.&lt;/p&gt;
&lt;p&gt;When creating an Agent, consider:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Project-level or user-level Agent.&lt;/li&gt;
&lt;li&gt;Whether Claude Code should generate the config.&lt;/li&gt;
&lt;li&gt;Which tools are allowed.&lt;/li&gt;
&lt;li&gt;Which model to use.&lt;/li&gt;
&lt;li&gt;Whether memory should be saved.&lt;/li&gt;
&lt;li&gt;Whether the Agent prompt is clear enough.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For code-audit Agents, give read-only permissions first. Let it output a report, then decide in the main conversation whether to change code.&lt;/p&gt;
&lt;h2 id=&#34;plugins-package-skills-agents-mcp-and-hooks&#34;&gt;Plugins: Package Skills, Agents, MCP, and Hooks
&lt;/h2&gt;&lt;p&gt;Plugins are more complete capability packages. They may include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Skills&lt;/li&gt;
&lt;li&gt;Agents&lt;/li&gt;
&lt;li&gt;MCP&lt;/li&gt;
&lt;li&gt;Hooks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Compared with installing one Skill, a plugin is better for a full capability set. For example, a frontend design plugin may package visual rules, layout habits, component preferences, and related Agents together.&lt;/p&gt;
&lt;p&gt;When installing a plugin, you may choose:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Install to the user directory, effective for all projects.&lt;/li&gt;
&lt;li&gt;Install to the project directory, shareable with the project.&lt;/li&gt;
&lt;li&gt;Install to a local project directory, effective only on your computer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use the user directory for personal common capabilities, the project directory for team conventions, and local project install for temporary testing.&lt;/p&gt;
&lt;h2 id=&#34;plugins-can-improve-specific-tasks&#34;&gt;Plugins Can Improve Specific Tasks
&lt;/h2&gt;&lt;p&gt;For frontend page generation, plugins can be more stable than raw prompts.&lt;/p&gt;
&lt;p&gt;For example, for &amp;ldquo;make a photographer portfolio website,&amp;rdquo; a plain prompt may generate an acceptable page. If you explicitly use a frontend design plugin, the structure, visual hierarchy, spacing, colors, and overall finish are often better.&lt;/p&gt;
&lt;p&gt;This does not mean plugins replace human taste. A better workflow is to let the plugin generate a stronger first draft, then refine details manually.&lt;/p&gt;
&lt;h2 id=&#34;a-more-stable-claude-code-workflow&#34;&gt;A More Stable Claude Code Workflow
&lt;/h2&gt;&lt;p&gt;Putting these tips together gives a steadier workflow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Start &lt;code&gt;claude&lt;/code&gt; inside the project directory.&lt;/li&gt;
&lt;li&gt;Discuss requirements in plan mode first.&lt;/li&gt;
&lt;li&gt;Confirm tech stack and acceptance criteria before approving the plan.&lt;/li&gt;
&lt;li&gt;Keep manual approval for high-risk actions.&lt;/li&gt;
&lt;li&gt;Use terminal mode for local preview and tests.&lt;/li&gt;
&lt;li&gt;Rewind early when the result goes off track.&lt;/li&gt;
&lt;li&gt;Write project rules into &lt;code&gt;CLAUDE.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Check and compress context during long chats.&lt;/li&gt;
&lt;li&gt;Turn repeated workflows into Skills.&lt;/li&gt;
&lt;li&gt;Delegate inspection, research, and analysis to read-only Agents.&lt;/li&gt;
&lt;li&gt;Use plugins for domain-specific tasks.&lt;/li&gt;
&lt;li&gt;Always keep Git checkpoints for important projects.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is much more stable than simply sending one requirement and waiting for generation.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;Claude Code efficiency does not come only from model capability. It also comes from workflow control.&lt;/p&gt;
&lt;p&gt;Plan mode sets direction, permission approval controls risk, rewind reduces rework, &lt;code&gt;CLAUDE.md&lt;/code&gt; stores project rules, &lt;code&gt;/context&lt;/code&gt;, &lt;code&gt;/compact&lt;/code&gt;, and &lt;code&gt;/clear&lt;/code&gt; manage context, Skills reuse fixed workflows, Agents isolate complex subtasks, and plugins package complete capabilities.&lt;/p&gt;
&lt;p&gt;The best way to use Claude Code is to let it move tasks forward inside clear boundaries, not to hand the entire project to it at once.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>opencode, Claude Code, and Codex: What&#39;s the Difference? A Guide to Open Source AI Coding Tools</title>
        <link>https://knightli.com/en/2026/05/08/opencode-open-source-ai-coding-agent/</link>
        <pubDate>Fri, 08 May 2026 08:33:37 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/08/opencode-open-source-ai-coding-agent/</guid>
        <description>&lt;p&gt;&lt;code&gt;opencode&lt;/code&gt; is an open source AI Coding Agent from anomalyco. Its positioning is straightforward: give developers a programmable, extensible coding assistant in the terminal that can connect to multiple model providers.&lt;/p&gt;
&lt;p&gt;If you compare it with &lt;code&gt;Claude Code&lt;/code&gt; and &lt;code&gt;Codex&lt;/code&gt;, all three solve the same broad problem: bringing AI into real codebases so it can understand context, edit files, run commands, and execute tests. But their product directions are different.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;opencode&lt;/code&gt; emphasizes open source, multi-model support, and a terminal TUI. &lt;code&gt;Claude Code&lt;/code&gt; emphasizes Anthropic&amp;rsquo;s model ecosystem and local engineering collaboration. &lt;code&gt;Codex&lt;/code&gt; is OpenAI&amp;rsquo;s AI coding agent, available through the terminal, IDEs, the Codex app, and cloud tasks.&lt;/p&gt;
&lt;h2 id=&#34;who-opencode-is-for&#34;&gt;Who opencode Is For
&lt;/h2&gt;&lt;p&gt;opencode is a better fit for these kinds of developers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;People who want to complete code changes, project analysis, and engineering tasks in the terminal.&lt;/li&gt;
&lt;li&gt;People who do not want their AI Coding Agent tied to a single model provider.&lt;/li&gt;
&lt;li&gt;People who prefer open source tools and want to audit, extend, or build on top of them.&lt;/li&gt;
&lt;li&gt;People already comfortable with Neovim, TUIs, and command-line workflows.&lt;/li&gt;
&lt;li&gt;People who want to eventually drive the same coding agent remotely through a desktop app, mobile app, or other clients.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Its point is not to create another chat window, but to put AI coding capability inside the terminal and project directories developers already use.&lt;/p&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation
&lt;/h2&gt;&lt;p&gt;The official README provides several installation methods.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Direct install&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -fsSL https://opencode.ai/install &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; bash
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# npm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npm i -g opencode-ai@latest
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Windows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;scoop install opencode
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;choco install opencode
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# macOS and Linux&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;brew install anomalyco/tap/opencode
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;brew install opencode
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Arch Linux&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo pacman -S opencode
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;paru -S opencode-bin
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Other methods&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;mise use -g opencode
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;nix run nixpkgs#opencode
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The official README also recommends removing versions older than 0.1.x before installing to avoid problems caused by older remnants.&lt;/p&gt;
&lt;p&gt;The installation script chooses the installation directory by priority:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;$OPENCODE_INSTALL_DIR&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$XDG_BIN_DIR&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$HOME/bin&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;$HOME/.opencode/bin&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you need to specify a path, use:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;OPENCODE_INSTALL_DIR&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;/usr/local/bin curl -fsSL https://opencode.ai/install &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; bash
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;XDG_BIN_DIR&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;nv&#34;&gt;$HOME&lt;/span&gt;/.local/bin curl -fsSL https://opencode.ai/install &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; bash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;the-desktop-app-is-still-beta&#34;&gt;The Desktop App Is Still Beta
&lt;/h2&gt;&lt;p&gt;In addition to the command-line tool, opencode also provides a desktop app, currently marked as Beta. It can be downloaded from GitHub Releases or &lt;code&gt;opencode.ai/download&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The desktop app covers these platforms:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Platform&lt;/th&gt;
          &lt;th&gt;File&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;macOS Apple Silicon&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;opencode-desktop-mac-arm64.dmg&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;macOS Intel&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;opencode-desktop-mac-x64.dmg&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Windows&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;opencode-desktop-windows-x64.exe&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Linux&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;.deb&lt;/code&gt;, &lt;code&gt;.rpm&lt;/code&gt;, or &lt;code&gt;.AppImage&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;macOS and Windows users can also install the desktop app through package managers.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# macOS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;brew install --cask opencode-desktop
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Windows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;scoop bucket add extras
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;scoop install extras/opencode-desktop
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;two-built-in-agent-modes&#34;&gt;Two Built-In Agent Modes
&lt;/h2&gt;&lt;p&gt;opencode includes two built-in Agents, switchable with the &lt;code&gt;Tab&lt;/code&gt; key.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;build&lt;/code&gt; is the default mode. It has full development permissions and is suitable for editing code directly, running commands, and moving engineering tasks forward.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;plan&lt;/code&gt; is read-only mode. It is better for analyzing unfamiliar codebases, understanding project structure, and planning changes. It denies file edits by default and asks before running bash commands.&lt;/p&gt;
&lt;p&gt;opencode also includes a &lt;code&gt;general&lt;/code&gt; subagent for complex searches and multi-step tasks. Users can invoke it by typing &lt;code&gt;@general&lt;/code&gt; in a message.&lt;/p&gt;
&lt;p&gt;This design is practical: use &lt;code&gt;plan&lt;/code&gt; to understand the project before acting, then switch to &lt;code&gt;build&lt;/code&gt; when code needs to change. For large repositories, separating read and write permissions helps reduce mistakes.&lt;/p&gt;
&lt;h2 id=&#34;what-is-codex&#34;&gt;What Is Codex?
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Codex&lt;/code&gt; is OpenAI&amp;rsquo;s AI coding agent for helping developers write code, review code, fix bugs, and ship engineering tasks.&lt;/p&gt;
&lt;p&gt;Unlike a simple code completion tool, Codex is closer to an Agent that can operate on a codebase. It can pair with you in local tools, and it can also take delegated tasks in the cloud. OpenAI&amp;rsquo;s official materials describe Codex as available through multiple surfaces, including CLI, IDEs, the Codex app, and ChatGPT/Codex cloud workflows.&lt;/p&gt;
&lt;p&gt;For developers, Codex has several important traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It can read codebases, edit files, run commands, and execute tests.&lt;/li&gt;
&lt;li&gt;It supports multiple interfaces, including terminal, IDE, app, and cloud.&lt;/li&gt;
&lt;li&gt;It fits bug fixing, feature work, refactoring, migrations, code review, and test generation.&lt;/li&gt;
&lt;li&gt;It is more closely tied to OpenAI accounts, models, and the Codex product ecosystem.&lt;/li&gt;
&lt;li&gt;Cloud tasks are useful for running multiple well-scoped engineering tasks in parallel.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If opencode is more like an open terminal agent framework, Codex is more like a full AI coding workbench from OpenAI: local pairing, cloud delegation, and longer engineering workflows for teams.&lt;/p&gt;
&lt;h2 id=&#34;core-differences&#34;&gt;Core Differences
&lt;/h2&gt;&lt;p&gt;opencode, Claude Code, and Codex are all AI coding tools, but the choice becomes clearer if you look at these dimensions.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Tool&lt;/th&gt;
          &lt;th&gt;Core Positioning&lt;/th&gt;
          &lt;th&gt;Main Advantages&lt;/th&gt;
          &lt;th&gt;Best Fit&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;opencode&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Open source AI Coding Agent&lt;/td&gt;
          &lt;td&gt;Open source, multi-model, TUI, client/server architecture&lt;/td&gt;
          &lt;td&gt;Developers who want an open toolchain, replaceable models, and a terminal-first workflow&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Claude Code&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Anthropic&amp;rsquo;s command-line coding tool&lt;/td&gt;
          &lt;td&gt;Claude model experience, code understanding, long context, engineering task collaboration&lt;/td&gt;
          &lt;td&gt;Developers already using the Claude/Anthropic ecosystem who want to work on local code tasks&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Codex&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;OpenAI&amp;rsquo;s AI coding agent&lt;/td&gt;
          &lt;td&gt;CLI, IDE, Codex app, cloud tasks, multi-Agent workflows&lt;/td&gt;
          &lt;td&gt;Teams already using ChatGPT/OpenAI who want both local pairing and cloud delegation&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In short, opencode is about openness and replaceability, Claude Code is about the Claude ecosystem and local engineering agents, and Codex is about the OpenAI ecosystem and multi-surface collaboration.&lt;/p&gt;
&lt;h2 id=&#34;how-it-differs-from-claude-code&#34;&gt;How It Differs From Claude Code
&lt;/h2&gt;&lt;p&gt;opencode&amp;rsquo;s official FAQ directly compares it with Claude Code. The two are similar in capability, but the main differences are these.&lt;/p&gt;
&lt;p&gt;First, opencode is a 100% open source project, hosted on GitHub and released under the MIT license.&lt;/p&gt;
&lt;p&gt;Second, opencode is not tied to a single model provider. It recommends models provided through OpenCode Zen, but it can also work with Claude, OpenAI, Google, or local models. For developers, this means that when model cost, capability, or availability changes, you are not locked into one platform.&lt;/p&gt;
&lt;p&gt;Third, opencode includes optional LSP support. For code completion, navigation, diagnostics, and project understanding, LSP is a very important foundation.&lt;/p&gt;
&lt;p&gt;Fourth, opencode emphasizes TUI. It is built by Neovim users and the creators of terminal.shop, so the product focus is clearly on the terminal experience.&lt;/p&gt;
&lt;p&gt;Fifth, opencode uses a client/server architecture. That means opencode can run on your computer while being controlled in the future by a TUI, desktop app, mobile app, or other clients. The TUI is only one possible frontend.&lt;/p&gt;
&lt;h2 id=&#34;when-to-choose-opencode-claude-code-or-codex&#34;&gt;When to Choose opencode, Claude Code, or Codex
&lt;/h2&gt;&lt;p&gt;If you already use Claude Code or Codex, opencode does not have to replace them immediately. A better way to think about it is that opencode provides an open, model-replaceable, terminal-first option.&lt;/p&gt;
&lt;p&gt;Consider opencode first when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You want your AI coding tool to be as open source as possible.&lt;/li&gt;
&lt;li&gt;You do not want your workflow tied to one model provider.&lt;/li&gt;
&lt;li&gt;You want to test Claude, OpenAI, Google, or local models with the same tool.&lt;/li&gt;
&lt;li&gt;You like TUI workflows and do not want a desktop or web app to interrupt your main workflow.&lt;/li&gt;
&lt;li&gt;You care about the remote-control potential of a client/server architecture.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Consider Claude Code first when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You mainly use Claude models.&lt;/li&gt;
&lt;li&gt;You care about long context, code understanding, and complex engineering task collaboration.&lt;/li&gt;
&lt;li&gt;You want to keep moving edits, tests, and refactors forward in a local repository.&lt;/li&gt;
&lt;li&gt;You trust Anthropic&amp;rsquo;s default Claude Code product experience.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Consider Codex first when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You already use ChatGPT or the OpenAI account ecosystem.&lt;/li&gt;
&lt;li&gt;You want one coding agent across terminal, IDE, desktop app, and cloud tasks.&lt;/li&gt;
&lt;li&gt;You want to delegate well-scoped bug fixes, feature work, migrations, or test generation to the cloud in parallel.&lt;/li&gt;
&lt;li&gt;You need code review, background tasks, team collaboration, and multi-Agent workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you care more about an official end-to-end experience, default model configuration, enterprise management, and ready-made integrations, Claude Code or Codex may be easier. If you care more about control, openness, and being provider-agnostic, opencode is worth watching.&lt;/p&gt;
&lt;h2 id=&#34;things-to-note&#34;&gt;Things to Note
&lt;/h2&gt;&lt;p&gt;opencode, Claude Code, and Codex are all moving quickly. GitHub releases, installation commands, desktop app file names, model availability, and plan access can all change. Before installing or choosing a tool, check the official README, documentation, and release pages.&lt;/p&gt;
&lt;p&gt;Also, opencode&amp;rsquo;s desktop app is still marked as Beta, so it should not be treated as the default stable production tool. For everyday engineering tasks, the terminal version is still the main entry point.&lt;/p&gt;
&lt;p&gt;From a tooling trend perspective, opencode represents the open-toolchain direction for AI Coding Agents: replaceable models, replaceable clients, and an open core agent capability. Codex and Claude Code are closer to model companies turning coding agents into complete product surfaces. For developers, both directions will likely coexist for a long time.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;opencode GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/anomalyco/opencode&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/anomalyco/opencode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;opencode official site: &lt;a class=&#34;link&#34; href=&#34;https://opencode.ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://opencode.ai&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;opencode docs: &lt;a class=&#34;link&#34; href=&#34;https://opencode.ai/docs&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://opencode.ai/docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;opencode Releases: &lt;a class=&#34;link&#34; href=&#34;https://github.com/anomalyco/opencode/releases&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/anomalyco/opencode/releases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenAI Codex: &lt;a class=&#34;link&#34; href=&#34;https://openai.com/codex/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://openai.com/codex/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Using Codex with your ChatGPT plan: &lt;a class=&#34;link&#34; href=&#34;https://help.openai.com/en/articles/11369540-codex-in-chatgpt&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://help.openai.com/en/articles/11369540-codex-in-chatgpt&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenAI Codex CLI Getting Started: &lt;a class=&#34;link&#34; href=&#34;https://help.openai.com/en/articles/11096431-openai-codex-ci-getting-started&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://help.openai.com/en/articles/11096431-openai-codex-ci-getting-started&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Warp Open Source: From Terminal to Agentic Development Environment</title>
        <link>https://knightli.com/en/2026/05/07/warpdotdev-warp-open-source-agentic-terminal/</link>
        <pubDate>Thu, 07 May 2026 20:15:08 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/07/warpdotdev-warp-open-source-agentic-terminal/</guid>
        <description>&lt;p&gt;&lt;code&gt;warpdotdev/warp&lt;/code&gt; is the open-source client repository for Warp. Warp now describes itself as an &amp;ldquo;agentic development environment, born out of the terminal&amp;rdquo;: it starts from the terminal, but brings AI coding agents, codebase indexing, task management, and development workflows into one environment.&lt;/p&gt;
&lt;p&gt;This is not an ordinary open-source terminal emulator repository. It is closer to an answer to a larger question: as agents such as Claude Code, Codex, and Gemini CLI become common, should the terminal itself become a development environment for scheduling, observing, and managing agents?&lt;/p&gt;
&lt;p&gt;Warp&amp;rsquo;s answer is yes.&lt;/p&gt;
&lt;h2 id=&#34;current-state-of-the-repository&#34;&gt;Current State of the Repository
&lt;/h2&gt;&lt;p&gt;As of May 7, 2026, &lt;code&gt;warpdotdev/warp&lt;/code&gt; is a public repository. GitHub shows roughly 56k stars and 4.1k forks. The README says the Warp client code is now open source and welcomes community contributions.&lt;/p&gt;
&lt;p&gt;The main language is Rust. GitHub&amp;rsquo;s language breakdown shows Rust at over 98%, which matches Warp&amp;rsquo;s positioning: it is not a web wrapper, but a cross-platform native development tool.&lt;/p&gt;
&lt;p&gt;Several README details matter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Warp is an agentic development environment, born out of the terminal.&lt;/li&gt;
&lt;li&gt;It can use its built-in coding agent and can also connect to external CLI agents such as Claude Code, Codex, and Gemini CLI.&lt;/li&gt;
&lt;li&gt;OpenAI is the founding sponsor of the newly open-sourced Warp repository.&lt;/li&gt;
&lt;li&gt;The agentic management workflows in the repository are powered by GPT models.&lt;/li&gt;
&lt;li&gt;Warp UI framework crates use the MIT license, while the rest of the code uses AGPL v3.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This shows that Warp&amp;rsquo;s open source move is not merely publishing a terminal. It is operating the project as an experiment ground for agent workflows.&lt;/p&gt;
&lt;h2 id=&#34;warp-is-more-than-a-terminal&#34;&gt;Warp Is More Than a Terminal
&lt;/h2&gt;&lt;p&gt;Traditional terminals mainly do three things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;start a shell;&lt;/li&gt;
&lt;li&gt;run commands;&lt;/li&gt;
&lt;li&gt;display output.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Warp&amp;rsquo;s earlier differentiation was making the terminal feel more modern: command blocks, completion, history, collaboration, UI-style interactions, and cross-platform polish. Now the focus has moved further toward organizing development around AI agents.&lt;/p&gt;
&lt;p&gt;From the README, Warp no longer only emphasizes &amp;ldquo;a better terminal.&amp;rdquo; It emphasizes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;built-in coding agents;&lt;/li&gt;
&lt;li&gt;external CLI agent support;&lt;/li&gt;
&lt;li&gt;issue triage;&lt;/li&gt;
&lt;li&gt;spec writing;&lt;/li&gt;
&lt;li&gt;PR review;&lt;/li&gt;
&lt;li&gt;contributor coordination;&lt;/li&gt;
&lt;li&gt;observable agent sessions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, Warp wants to turn the terminal from &amp;ldquo;where you type commands&amp;rdquo; into &amp;ldquo;where you work with multiple agents.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;oz-and-open-source-project-management&#34;&gt;Oz and Open-Source Project Management
&lt;/h2&gt;&lt;p&gt;The README mentions &lt;code&gt;Oz&lt;/code&gt; several times.&lt;/p&gt;
&lt;p&gt;Warp&amp;rsquo;s contribution overview shows thousands of Oz agents working on issue triage, specs, implementation, and PR review. This is interesting because it extends AI agents from &amp;ldquo;helping one person write code&amp;rdquo; to &amp;ldquo;helping manage open-source collaboration.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The hardest part of many open-source projects is not writing code, but maintenance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;too many issues, not enough classification;&lt;/li&gt;
&lt;li&gt;bugs and feature requests mixed together;&lt;/li&gt;
&lt;li&gt;new contributors unsure which tasks are approachable;&lt;/li&gt;
&lt;li&gt;PR review pressure;&lt;/li&gt;
&lt;li&gt;maintainers struggling to follow every community thread.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Warp&amp;rsquo;s idea is to let agents take on part of the project management and collaboration work first. The README also mentions &lt;code&gt;Oz for OSS&lt;/code&gt;, a maintainer-facing program for bringing similar agentic open-source management workflows to other repositories.&lt;/p&gt;
&lt;p&gt;This suggests that Warp&amp;rsquo;s ambition is not only the terminal product itself, but also a new model of open-source maintenance in the AI era.&lt;/p&gt;
&lt;h2 id=&#34;repository-structure-and-tech-stack&#34;&gt;Repository Structure and Tech Stack
&lt;/h2&gt;&lt;p&gt;From the repository structure, Warp is a large Rust project.&lt;/p&gt;
&lt;p&gt;The root contains:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;app/&lt;/code&gt;: main application code.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;crates/&lt;/code&gt;: core Rust crates.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;assets/&lt;/code&gt;: resource files.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;command-signatures-v2/&lt;/code&gt;: command signature related content.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;docker/&lt;/code&gt;, &lt;code&gt;script/&lt;/code&gt;, &lt;code&gt;resources/&lt;/code&gt;, &lt;code&gt;specs/&lt;/code&gt;, and other engineering directories.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.claude/&lt;/code&gt;, &lt;code&gt;.warp/&lt;/code&gt;, &lt;code&gt;.agents/skills&lt;/code&gt;, and other agent-related configuration.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;WARP.md&lt;/code&gt; gives more engineering detail. It describes Warp as a Rust-based terminal emulator using an in-house UI framework called &lt;code&gt;WarpUI&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The major modules can be roughly understood as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;app/&lt;/code&gt;: terminal emulation, shell management, AI integration, Drive, authentication, settings, workspace, and sessions.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;crates/warp_core/&lt;/code&gt;: core utilities and platform abstraction.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;crates/editor/&lt;/code&gt;: text editing functionality.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;crates/warpui/&lt;/code&gt; and &lt;code&gt;crates/warpui_core/&lt;/code&gt;: the in-house UI framework.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;crates/ipc/&lt;/code&gt;: inter-process communication.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;crates/graphql/&lt;/code&gt;: GraphQL client and schema.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;WARP.md&lt;/code&gt; also mentions architectural features such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;an Entity-Handle system;&lt;/li&gt;
&lt;li&gt;a modular workspace structure;&lt;/li&gt;
&lt;li&gt;macOS, Windows, Linux, and WASM targets;&lt;/li&gt;
&lt;li&gt;AI integration, including Agent Mode, context awareness, and codebase indexing;&lt;/li&gt;
&lt;li&gt;Warp Drive cloud sync.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This complexity is closer to a full IDE than a lightweight traditional terminal.&lt;/p&gt;
&lt;h2 id=&#34;local-build-commands&#34;&gt;Local Build Commands
&lt;/h2&gt;&lt;p&gt;The README gives a concise local build flow:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;./script/bootstrap
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;./script/run
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;./script/presubmit
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;./script/bootstrap&lt;/code&gt; performs platform-specific initialization.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;./script/run&lt;/code&gt; builds and runs Warp.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;./script/presubmit&lt;/code&gt; runs formatting, clippy, tests, and other pre-submit checks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;WARP.md&lt;/code&gt; also lists more detailed commands:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cargo run
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cargo bundle --bin warp
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cargo nextest run --no-fail-fast --workspace --exclude command-signatures-v2
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cargo fmt
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cargo clippy --workspace --all-targets --all-features --tests -- -D warnings
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want to contribute to Warp, &lt;code&gt;./script/presubmit&lt;/code&gt; is effectively required.&lt;/p&gt;
&lt;h2 id=&#34;contribution-flow&#34;&gt;Contribution Flow
&lt;/h2&gt;&lt;p&gt;Warp&amp;rsquo;s contribution flow is not simply &amp;ldquo;open a PR.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The README describes a lightweight process from issue to PR:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Search existing issues first.&lt;/li&gt;
&lt;li&gt;If there is no duplicate, file a bug or feature request.&lt;/li&gt;
&lt;li&gt;Maintainers review the issue and may add readiness labels.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ready-to-spec&lt;/code&gt; means the design can be expanded into a spec.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ready-to-implement&lt;/code&gt; means the design is clear enough to start an implementation PR.&lt;/li&gt;
&lt;li&gt;Contributors can pick up labeled issues.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This process fits a large open-source project. It separates ideas, design, and implementation, reducing the risk that contributors spend time building in the wrong direction.&lt;/p&gt;
&lt;p&gt;It also fits AI agents well. An agent can organize issues, draft specs, add tests, and then move into implementation. Warp itself uses this pattern to demonstrate agentic project management.&lt;/p&gt;
&lt;h2 id=&#34;license-mit--agpl-v3&#34;&gt;License: MIT + AGPL v3
&lt;/h2&gt;&lt;p&gt;Warp uses a dual license structure.&lt;/p&gt;
&lt;p&gt;The README says:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the Warp UI framework, namely the &lt;code&gt;warpui_core&lt;/code&gt; and &lt;code&gt;warpui&lt;/code&gt; crates, uses the MIT license;&lt;/li&gt;
&lt;li&gt;the rest of the repository uses AGPL v3.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This matters. AGPL v3 has stronger open-source requirements for network services and distribution. If you are learning, researching, or contributing, it is usually straightforward. But if you want to use Warp code in a commercial product or closed-source derivative, you need to read the license carefully and consult legal advice if necessary.&lt;/p&gt;
&lt;p&gt;In short, Warp is open source, but not &amp;ldquo;take it and close-source it freely&amp;rdquo; open source.&lt;/p&gt;
&lt;h2 id=&#34;why-it-is-worth-watching&#34;&gt;Why It Is Worth Watching
&lt;/h2&gt;&lt;p&gt;First, Warp brings the terminal, agents, and project management together.&lt;/p&gt;
&lt;p&gt;Many AI coding tools are still CLI tools or editor plugins. Warp starts from the terminal entry point and tries to unify agent tasks, code execution, command output, PR workflows, and team collaboration.&lt;/p&gt;
&lt;p&gt;Second, Warp&amp;rsquo;s open-source approach is a good place to observe agent workflows.&lt;/p&gt;
&lt;p&gt;It does not only publish code. It also exposes contribution overviews, agent sessions, issue triage, and spec workflows. For anyone studying how AI can participate in open-source collaboration, the repository itself is a sample.&lt;/p&gt;
&lt;p&gt;Third, Warp is a complex Rust desktop application.&lt;/p&gt;
&lt;p&gt;If you want to study Rust GUI, terminal emulation, cross-platform apps, GraphQL clients, cloud sync, and AI integration, the repository has a lot to read. But it is not a small project, so new contributors should read the docs and issue process first.&lt;/p&gt;
&lt;p&gt;Fourth, Warp supports both a built-in agent and a &amp;ldquo;bring your own CLI agent&amp;rdquo; approach.&lt;/p&gt;
&lt;p&gt;This is realistic. Developers will not use only one agent. Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw, and similar tools are likely to coexist. If Warp can become a workbench for them, it becomes more valuable than a single-purpose terminal.&lt;/p&gt;
&lt;h2 id=&#34;who-should-care&#34;&gt;Who Should Care
&lt;/h2&gt;&lt;p&gt;If you are a normal terminal user, Warp matters because the terminal may be changing from a command-line tool into an AI workbench.&lt;/p&gt;
&lt;p&gt;If you are a heavy AI coding agent user, Warp is worth watching because it tries to manage multiple agents rather than act as another chat entry point.&lt;/p&gt;
&lt;p&gt;If you maintain open-source projects, the Oz for OSS direction is worth attention. It explores agent-based issue triage, PR review, community collaboration, and contributor onboarding.&lt;/p&gt;
&lt;p&gt;If you are a Rust developer, Warp is a real large-scale desktop application worth studying for UI organization, terminal internals, cloud sync, AI integration, and cross-platform code.&lt;/p&gt;
&lt;p&gt;If you only want a terminal that can replace your current one immediately, it is better to download the stable release first, then decide whether to study the source. Building from source is more suitable for contributors and deep users.&lt;/p&gt;
&lt;h2 id=&#34;short-take&#34;&gt;Short Take
&lt;/h2&gt;&lt;p&gt;The point of Warp going open source is not merely &amp;ldquo;a modern terminal became open source.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;More precisely, Warp is trying to upgrade the terminal into an agentic development environment: the terminal connects the shell, codebase, command execution, agents, issues, PRs, and collaboration flow.&lt;/p&gt;
&lt;p&gt;As AI coding agents keep growing, the entry point of the development environment may change. In the past, the IDE dominated the developer experience while the terminal ran commands. Now the terminal may become the center of agent collaboration. The Warp repository is exploring that possibility.&lt;/p&gt;
&lt;h2 id=&#34;related-links&#34;&gt;Related Links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;GitHub repository: &lt;a class=&#34;link&#34; href=&#34;https://github.com/warpdotdev/warp&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/warpdotdev/warp&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Warp website: &lt;a class=&#34;link&#34; href=&#34;https://www.warp.dev&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://www.warp.dev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Warp documentation: &lt;a class=&#34;link&#34; href=&#34;https://docs.warp.dev&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://docs.warp.dev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Warp build overview: &lt;a class=&#34;link&#34; href=&#34;https://build.warp.dev&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://build.warp.dev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;WARP.md: &lt;a class=&#34;link&#34; href=&#34;https://github.com/warpdotdev/warp/blob/master/WARP.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/warpdotdev/warp/blob/master/WARP.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;CONTRIBUTING.md: &lt;a class=&#34;link&#34; href=&#34;https://github.com/warpdotdev/warp/blob/master/CONTRIBUTING.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/warpdotdev/warp/blob/master/CONTRIBUTING.md&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Hermes &#43; Qwen3.6: A Low-Cost Local Agent Deployment</title>
        <link>https://knightli.com/en/2026/05/04/hermes-qwen36-local-agent/</link>
        <pubDate>Mon, 04 May 2026 06:40:30 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/04/hermes-qwen36-local-agent/</guid>
        <description>&lt;p&gt;This article documents a local Agent deployment plan: run a Qwen3.6 GGUF model with &lt;code&gt;llama.cpp&lt;/code&gt; inside WSL2, then connect Hermes Agent to the local OpenAI-compatible API. This gives you a long-running local AI assistant on your own computer, without paying by online service Token usage.&lt;/p&gt;
&lt;p&gt;This setup is suitable for users who want to try local AI Agents while keeping data private and controllable over the long term. It can be used for daily Q&amp;amp;A, writing, coding assistance, document organization, and simple automation tasks. The larger the model, the higher the VRAM requirement. The original example uses Qwen3.6-27B, and 24GB VRAM is more stable. If your VRAM is smaller, choose a smaller model or a lower quantization.&lt;/p&gt;
&lt;h2 id=&#34;architecture&#34;&gt;Architecture
&lt;/h2&gt;&lt;p&gt;The overall chain is simple:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install WSL2 and Ubuntu 24.04 on Windows.&lt;/li&gt;
&lt;li&gt;Install CUDA Toolkit inside WSL2 and compile &lt;code&gt;llama.cpp&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Download the Qwen3.6 GGUF model.&lt;/li&gt;
&lt;li&gt;Start a local model service with &lt;code&gt;llama-server&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Install Hermes Agent and configure it to &lt;code&gt;http://localhost:8080/v1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Optional: write a startup script so the model service starts automatically when WSL2 opens.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Hermes provides the Agent capability, while Qwen3.6 provides the local LLM capability. Together, they turn the computer into a private local AI assistant.&lt;/p&gt;
&lt;h2 id=&#34;install-wsl2-and-ubuntu&#34;&gt;Install WSL2 and Ubuntu
&lt;/h2&gt;&lt;p&gt;Run in an administrator Windows PowerShell window:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-powershell&#34; data-lang=&#34;powershell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;wsl&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;-install&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;wsl&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;-set-default-version&lt;/span&gt; &lt;span class=&#34;mf&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After rebooting, install Ubuntu 24.04:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-powershell&#34; data-lang=&#34;powershell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;wsl&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;-install&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;-d&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Ubuntu&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;24.04&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After installation, Ubuntu prompts you to set a username and password. Once inside Ubuntu, first check whether the NVIDIA GPU is visible in WSL2:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;nvidia-smi
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the GPU cannot be detected, update the NVIDIA driver on Windows first. WSL2 inherits the Windows driver, but CUDA Toolkit still needs to be installed separately inside WSL2.&lt;/p&gt;
&lt;h2 id=&#34;install-python-and-basic-tools&#34;&gt;Install Python and Basic Tools
&lt;/h2&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt update &lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; sudo apt install -y python3-pip python3-venv
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;You also need build tools, Git, and CMake:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt install -y cmake build-essential git
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;compile-llamacpp&#34;&gt;Compile llama.cpp
&lt;/h2&gt;&lt;p&gt;Clone the repository:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone https://github.com/ggerganov/llama.cpp
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; llama.cpp
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If CUDA is already available in WSL2, compile directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cmake -B build -DGGML_CUDA&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;ON -DCMAKE_CUDA_ARCHITECTURES&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;89&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cmake --build build -j&lt;span class=&#34;k&#34;&gt;$(&lt;/span&gt;nproc&lt;span class=&#34;k&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;CMAKE_CUDA_ARCHITECTURES=89&lt;/code&gt; is suitable for Ada GPUs, such as RTX 40 series cards. Adjust it according to your actual GPU architecture.&lt;/p&gt;
&lt;p&gt;If compilation reports that CUDA Toolkit is missing, install CUDA Toolkit inside WSL2 first:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo dpkg -i cuda-keyring_1.1-1_all.deb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt update
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt install -y cuda-toolkit-12-8
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Configure environment variables:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;export&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;PATH&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;/usr/local/cuda-12.8/bin:&lt;span class=&#34;nv&#34;&gt;$PATH&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;export&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;LD_LIBRARY_PATH&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;/usr/local/cuda-12.8/lib64:&lt;span class=&#34;nv&#34;&gt;$LD_LIBRARY_PATH&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;echo&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;export PATH=/usr/local/cuda-12.8/bin:$PATH&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;echo&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then rebuild:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; ~/llama.cpp
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;rm -rf build
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cmake -B build -DGGML_CUDA&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;ON -DCMAKE_CUDA_ARCHITECTURES&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;89&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cmake --build build -j&lt;span class=&#34;k&#34;&gt;$(&lt;/span&gt;nproc&lt;span class=&#34;k&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;download-the-qwen36-gguf-model&#34;&gt;Download the Qwen3.6 GGUF Model
&lt;/h2&gt;&lt;p&gt;The example uses &lt;code&gt;Qwen3.6-27B-UD-Q4_K_XL.gguf&lt;/code&gt; from &lt;code&gt;unsloth/Qwen3.6-27B-GGUF&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;hf download unsloth/Qwen3.6-27B-GGUF &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Qwen3.6-27B-UD-Q4_K_XL.gguf &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--local-dir ~/models/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The file is about 17GB. If Hugging Face is slow, use a mirror such as ModelScope. Do not force a 27B model if your VRAM is insufficient; use a smaller model or lower quantization.&lt;/p&gt;
&lt;h2 id=&#34;start-the-local-model-service&#34;&gt;Start the Local Model Service
&lt;/h2&gt;&lt;p&gt;Start &lt;code&gt;llama-server&lt;/code&gt; with your own model file name:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;~/llama.cpp/build/bin/llama-server &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--model ~/models/Qwen3.6-27B-UD-Q4_K_XL.gguf &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--n-gpu-layers &lt;span class=&#34;m&#34;&gt;99&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--ctx-size &lt;span class=&#34;m&#34;&gt;32768&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--flash-attn on &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--temp 1.0 &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--top-p 0.95 &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--top-k &lt;span class=&#34;m&#34;&gt;20&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--presence-penalty 1.5 &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--port &lt;span class=&#34;m&#34;&gt;8080&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After startup, open this in a Windows browser:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;http://localhost:8080
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For Hermes Agent or other OpenAI-compatible clients, the API endpoint is usually:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;http://localhost:8080/v1
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;thinking-mode-tradeoff&#34;&gt;Thinking Mode Tradeoff
&lt;/h2&gt;&lt;p&gt;Qwen3.6 may enable Thinking mode by default. It is suitable for complex reasoning, complicated coding problems, and multi-step analysis, but it is slower.&lt;/p&gt;
&lt;p&gt;To disable Thinking mode, stop the service and add &lt;code&gt;--chat-template-kwargs&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;~/llama.cpp/build/bin/llama-server &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--model ~/models/Qwen3.6-27B-UD-Q4_K_XL.gguf &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--n-gpu-layers &lt;span class=&#34;m&#34;&gt;99&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--ctx-size &lt;span class=&#34;m&#34;&gt;32768&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--flash-attn on &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--temp 1.0 &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--top-p 0.95 &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--top-k &lt;span class=&#34;m&#34;&gt;20&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--presence-penalty 1.5 &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--chat-template-kwargs &lt;span class=&#34;s1&#34;&gt;&amp;#39;{&amp;#34;enable_thinking&amp;#34;:false}&amp;#39;&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--port &lt;span class=&#34;m&#34;&gt;8080&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After disabling Thinking, simple Q&amp;amp;A, writing, code completion, and code explanation become faster. For complex algorithm design, difficult debugging, and architecture analysis, Thinking mode is still recommended.&lt;/p&gt;
&lt;h2 id=&#34;install-hermes-agent&#34;&gt;Install Hermes Agent
&lt;/h2&gt;&lt;p&gt;Keep &lt;code&gt;llama-server&lt;/code&gt; running, then open a new WSL2 terminal and install Hermes Agent:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; bash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The installer handles dependencies such as Python, Node.js, ripgrep, and ffmpeg. When configuring the model endpoint, choose a custom endpoint:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;URL: http://localhost:8080/v1
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;API Key: 12345678
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Model: auto-detect
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For a local &lt;code&gt;llama-server&lt;/code&gt;, the API Key can be any placeholder value. After configuration, you can connect Telegram, WeChat, QQ, Discord, and other chat tools, allowing Hermes Agent to call the local model and execute tasks from those entry points.&lt;/p&gt;
&lt;h2 id=&#34;auto-start-the-model-service&#34;&gt;Auto-Start the Model Service
&lt;/h2&gt;&lt;p&gt;You can write a startup script so the model service starts automatically when a WSL2 terminal opens.&lt;/p&gt;
&lt;p&gt;Create the script:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cat &amp;gt; ~/start-llm.sh &lt;span class=&#34;s&#34;&gt;&amp;lt;&amp;lt; &amp;#39;EOF&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;#!/bin/bash
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;echo &amp;#34;Starting Qwen3.6-27B llama-server...&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;~/llama.cpp/build/bin/llama-server \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;--model ~/models/Qwen3.6-27B-UD-Q4_K_XL.gguf \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;--n-gpu-layers 99 \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;--ctx-size 65536 \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;--flash-attn on \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;--temp 1.0 \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;--top-p 0.95 \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;--top-k 20 \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;--presence-penalty 1.5 \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;--port 8080 \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;--host 0.0.0.0 &amp;amp;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;echo &amp;#34;llama-server started, PID: $!&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;echo &amp;#34;API: http://localhost:8080/v1&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;echo &amp;#34;Chat UI: http://localhost:8080&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s&#34;&gt;EOF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;chmod +x ~/start-llm.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Write it into &lt;code&gt;.bashrc&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;echo&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;# Auto-start llama-server&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;echo&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;if ! pgrep -f &amp;#34;llama-server&amp;#34; &amp;gt; /dev/null 2&amp;gt;&amp;amp;1; then&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;echo&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;    ~/start-llm.sh&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;echo&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;fi&amp;#39;&lt;/span&gt; &amp;gt;&amp;gt; ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Each time you open a WSL2 terminal, it will start &lt;code&gt;llama-server&lt;/code&gt; if it is not already running. If it is running, it skips startup and avoids duplicate processes.&lt;/p&gt;
&lt;h2 id=&#34;notes&#34;&gt;Notes
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;27B models require substantial VRAM; 24GB VRAM is more stable. Use a smaller model if VRAM is limited.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;--ctx-size 65536&lt;/code&gt; significantly increases VRAM and RAM pressure. If unstable, reduce it to &lt;code&gt;32768&lt;/code&gt; or lower.&lt;/li&gt;
&lt;li&gt;Both CUDA Toolkit in WSL2 and the Windows GPU driver must work properly. Either side can cause CUDA compilation or runtime failures.&lt;/li&gt;
&lt;li&gt;Hermes Agent calls the local service through an OpenAI-compatible API. The key is that &lt;code&gt;http://localhost:8080/v1&lt;/code&gt; responds correctly.&lt;/li&gt;
&lt;li&gt;If accessing from a phone or another device, handle Windows Firewall, LAN addresses, and security isolation. Do not expose the local model service directly to the public internet.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;related-links&#34;&gt;Related Links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Original article: &lt;a class=&#34;link&#34; href=&#34;https://www.freedidi.com/24036.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Hermes + Qwen3.6：本地最强 Agent 组合！零成本、无限 Token，太香了！&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;llama.cpp: &lt;a class=&#34;link&#34; href=&#34;https://github.com/ggerganov/llama.cpp&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ggerganov/llama.cpp&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hermes Agent: &lt;a class=&#34;link&#34; href=&#34;https://github.com/NousResearch/hermes-agent&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NousResearch/hermes-agent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Qwen3.6 GGUF example: &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/unsloth/Qwen3.6-27B-GGUF&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;unsloth/Qwen3.6-27B-GGUF&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>How to Use DeepSeek V4 Pro in Cline</title>
        <link>https://knightli.com/en/2026/05/01/use-deepseek-v4-pro-in-cline/</link>
        <pubDate>Fri, 01 May 2026 20:59:06 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/use-deepseek-v4-pro-in-cline/</guid>
        <description>&lt;p&gt;Cline already supports the OpenAI Compatible Provider.
DeepSeek API is also compatible with OpenAI SDK-style calls, so connecting &lt;code&gt;deepseek-v4-pro&lt;/code&gt; to Cline is not complicated: choose OpenAI Compatible, then fill in DeepSeek&amp;rsquo;s Base URL, API Key, and model name.&lt;/p&gt;
&lt;p&gt;The steps below cover both the VS Code extension UI and Cline CLI.&lt;/p&gt;
&lt;h2 id=&#34;prepare-a-deepseek-api-key&#34;&gt;Prepare a DeepSeek API Key
&lt;/h2&gt;&lt;p&gt;First, create an API Key on the DeepSeek platform.&lt;/p&gt;
&lt;p&gt;You need three values:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Item&lt;/th&gt;
          &lt;th&gt;Value&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Provider&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;OpenAI Compatible&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Base URL&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;https://api.deepseek.com&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Model ID&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;deepseek-v4-pro&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;DeepSeek&amp;rsquo;s official documentation states that the V4 series uses the existing OpenAI-compatible interface. Keep &lt;code&gt;base_url&lt;/code&gt; as &lt;code&gt;https://api.deepseek.com&lt;/code&gt;, and set &lt;code&gt;model&lt;/code&gt; to &lt;code&gt;deepseek-v4-pro&lt;/code&gt; or &lt;code&gt;deepseek-v4-flash&lt;/code&gt; when calling it.&lt;/p&gt;
&lt;h2 id=&#34;configure-it-in-the-cline-extension&#34;&gt;Configure It in the Cline Extension
&lt;/h2&gt;&lt;p&gt;If you use the Cline extension in VS Code, configure it this way:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Open Cline from the VS Code sidebar.&lt;/li&gt;
&lt;li&gt;Go to Cline settings or model configuration.&lt;/li&gt;
&lt;li&gt;Select &lt;code&gt;OpenAI Compatible&lt;/code&gt; as the provider.&lt;/li&gt;
&lt;li&gt;Enter your DeepSeek API Key.&lt;/li&gt;
&lt;li&gt;Set Base URL to:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;https://api.deepseek.com
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;ol start=&#34;6&#34;&gt;
&lt;li&gt;Set Model ID to:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;deepseek-v4-pro
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;ol start=&#34;7&#34;&gt;
&lt;li&gt;Save the configuration and run a simple test in Cline.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Start with a low-risk read-only task:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Please read the current project directory structure and summarize what type of project this is. Do not modify any files.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If Cline can read and answer normally, the model connection is working.&lt;/p&gt;
&lt;h2 id=&#34;configure-it-in-cline-cli&#34;&gt;Configure It in Cline CLI
&lt;/h2&gt;&lt;p&gt;If you use Cline CLI, run &lt;code&gt;cline provider configure openai-compatible&lt;/code&gt; to enter interactive configuration.&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cline provider configure openai-compatible
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Fill in:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;API Key: sk-...
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Base URL: https://api.deepseek.com
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Model ID: deepseek-v4-pro
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After configuration, test it with a read-only task:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cline &lt;span class=&#34;s2&#34;&gt;&amp;#34;Summarize this repository structure without changing files.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want to lower cost first, you can temporarily change Model ID to:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;deepseek-v4-flash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then switch back to &lt;code&gt;deepseek-v4-pro&lt;/code&gt; for complex planning, fact checking, multi-tool collaboration, or high-risk code changes.&lt;/p&gt;
&lt;h2 id=&#34;recommended-model-split&#34;&gt;Recommended Model Split
&lt;/h2&gt;&lt;p&gt;DeepSeek V4 Pro and Flash are better used with a clear split.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Model&lt;/th&gt;
          &lt;th&gt;Best for&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;deepseek-v4-flash&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Routine code reading, small batch fixes, script generation, context summarization, low-risk frontend changes&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;deepseek-v4-pro&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Architecture planning, complex bugs, cross-file refactors, fact checking, multi-tool calls, high-risk changes&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For Agent tools like Cline, cost mainly comes from long context, repeated file reads, plan generation, and multi-round tool calls.
If the task is light, use Flash for volume; if the task needs stronger judgment, switch to Pro.&lt;/p&gt;
&lt;h2 id=&#34;how-to-set-context-length&#34;&gt;How to Set Context Length
&lt;/h2&gt;&lt;p&gt;DeepSeek V4 Pro and Flash both support long context.
If Cline requires a manual context window value, you can understand it according to the 1M context listed on DeepSeek&amp;rsquo;s official model page.&lt;/p&gt;
&lt;p&gt;In practice, do not put every file into context at the beginning.
Cline reads files according to the task, and a better workflow is usually:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;first ask it to inspect the directory structure;&lt;/li&gt;
&lt;li&gt;then ask it to locate relevant files;&lt;/li&gt;
&lt;li&gt;finally let it modify only the target files.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This saves tokens and keeps the task boundary clearer.&lt;/p&gt;
&lt;h2 id=&#34;common-issues&#34;&gt;Common Issues
&lt;/h2&gt;&lt;h3 id=&#34;1-model-not-found&#34;&gt;1. Model Not Found
&lt;/h3&gt;&lt;p&gt;First check that Model ID is exactly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;deepseek-v4-pro
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Do not write &lt;code&gt;DeepSeek V4 Pro&lt;/code&gt;, &lt;code&gt;deepseek-v4&lt;/code&gt;, or another display name.&lt;/p&gt;
&lt;h3 id=&#34;2-401-or-authentication-failed&#34;&gt;2. 401 or Authentication Failed
&lt;/h3&gt;&lt;p&gt;Check the API Key:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;whether it was copied completely;&lt;/li&gt;
&lt;li&gt;whether it contains extra spaces;&lt;/li&gt;
&lt;li&gt;whether it was entered into the provider configuration Cline is currently using;&lt;/li&gt;
&lt;li&gt;whether the DeepSeek account has available balance.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;3-connection-failed&#34;&gt;3. Connection Failed
&lt;/h3&gt;&lt;p&gt;Check the Base URL:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;https://api.deepseek.com
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Do not append &lt;code&gt;/v1/chat/completions&lt;/code&gt; at the end.
Cline&amp;rsquo;s OpenAI Compatible Provider will construct compatible interface requests itself.&lt;/p&gt;
&lt;h3 id=&#34;4-cline-calls-are-too-expensive&#34;&gt;4. Cline Calls Are Too Expensive
&lt;/h3&gt;&lt;p&gt;You can switch routine tasks to &lt;code&gt;deepseek-v4-flash&lt;/code&gt; and use &lt;code&gt;deepseek-v4-pro&lt;/code&gt; only for complex tasks.&lt;/p&gt;
&lt;p&gt;Also, make the task description as clear as possible:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Only modify files related to the login page. Do not refactor unrelated modules. First provide a plan, and modify code only after confirmation.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Agent tasks are most expensive when boundaries are unclear.
The clearer the boundary, the fewer files it reads, the fewer tool calls it makes, and the more controllable the cost becomes.&lt;/p&gt;
&lt;h3 id=&#34;5-error-reasoning_content-must-be-passed-back&#34;&gt;5. Error: reasoning_content must be passed back
&lt;/h3&gt;&lt;p&gt;If you see an error like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;nt&#34;&gt;&amp;#34;message&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;400 The `reasoning_content` in the thinking mode must be passed back to the API.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;nt&#34;&gt;&amp;#34;code&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;invalid_request_error&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;nt&#34;&gt;&amp;#34;modelId&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;deepseek-v4-pro&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This is usually not a Key, quota, or Base URL problem. It means DeepSeek V4 Pro&amp;rsquo;s thinking mode and the current client&amp;rsquo;s multi-round tool-call history are not aligned.&lt;/p&gt;
&lt;p&gt;DeepSeek&amp;rsquo;s official documentation states:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;thinking mode is &lt;code&gt;enabled&lt;/code&gt; by default;&lt;/li&gt;
&lt;li&gt;thinking mode returns &lt;code&gt;reasoning_content&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;if a tool call happens in one round, subsequent requests must pass back the &lt;code&gt;reasoning_content&lt;/code&gt; from that assistant message;&lt;/li&gt;
&lt;li&gt;if the client does not pass it back correctly, the API returns 400.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When Cline connects through the OpenAI Compatible Provider, this error may appear in the second round or after tool calls if the current version does not fully preserve and return DeepSeek&amp;rsquo;s &lt;code&gt;reasoning_content&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Try this order:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Upgrade Cline to the latest version;&lt;/li&gt;
&lt;li&gt;confirm you are using &lt;code&gt;OpenAI Compatible&lt;/code&gt;, not the normal &lt;code&gt;OpenAI&lt;/code&gt; provider;&lt;/li&gt;
&lt;li&gt;if Cline supports a custom request body, try disabling thinking mode:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;nt&#34;&gt;&amp;#34;thinking&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;nt&#34;&gt;&amp;#34;type&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;disabled&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;ol start=&#34;4&#34;&gt;
&lt;li&gt;if Cline does not support extra body parameters, temporarily use another model or a compatible proxy service;&lt;/li&gt;
&lt;li&gt;switch back to &lt;code&gt;deepseek-v4-pro&lt;/code&gt; after Cline supports passing back DeepSeek V4 &lt;code&gt;reasoning_content&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that disabling thinking mode may reduce complex reasoning ability, but it can work around client compatibility issues where &lt;code&gt;reasoning_content&lt;/code&gt; is not passed back.&lt;/p&gt;
&lt;h2 id=&#34;copyable-configuration&#34;&gt;Copyable Configuration
&lt;/h2&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Provider: OpenAI Compatible
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;API Key: sk-your DeepSeek API Key
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Base URL: https://api.deepseek.com
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Model ID: deepseek-v4-pro
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For low-cost mode:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Provider: OpenAI Compatible
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;API Key: sk-your DeepSeek API Key
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Base URL: https://api.deepseek.com
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Model ID: deepseek-v4-flash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;There are only three key steps to calling DeepSeek V4 Pro in Cline:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;choose &lt;code&gt;OpenAI Compatible&lt;/code&gt; as the provider;&lt;/li&gt;
&lt;li&gt;set Base URL to &lt;code&gt;https://api.deepseek.com&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;set Model ID to &lt;code&gt;deepseek-v4-pro&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After configuration, test with a read-only task before giving it real code changes.
If you often run Agent tasks, split Flash and Pro: Flash handles high-frequency lightweight work, while Pro handles complex judgment and fallback tasks.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.cline.bot/provider-config/openai-compatible&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Cline Docs: OpenAI Compatible Provider&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.cline.bot/provider-config/overview&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Cline Docs: Provider Configuration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://api-docs.deepseek.com/news/news202605&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;DeepSeek API Docs: News&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://api-docs.deepseek.com/quick_start/pricing/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;DeepSeek API Docs: Models &amp;amp; Pricing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>How DeepSeek V4 Price Cuts Rewrite the Cost Model for AI Agents</title>
        <link>https://knightli.com/en/2026/05/01/deepseek-v4-price-cuts-ai-agent-economics/</link>
        <pubDate>Fri, 01 May 2026 19:47:47 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/deepseek-v4-price-cuts-ai-agent-economics/</guid>
        <description>&lt;p&gt;DeepSeek V4 did not arrive with an especially loud launch.
There was no major event, nor a benchmark story that instantly crushed every competitor.
But a few days later, the part that truly affects the industry became visible: repeated price cuts.&lt;/p&gt;
&lt;p&gt;The point of this change is not that &amp;ldquo;the model got a little stronger&amp;rdquo;, but that &amp;ldquo;usage cost has been pushed into another tier&amp;rdquo;.
When token prices become low enough that an ordinary Agent task can finish for a few cents or a couple of yuan, the business logic behind many Coding Plans and Token Plans needs to be reconsidered.&lt;/p&gt;
&lt;h2 id=&#34;launch-day-was-not-explosive&#34;&gt;Launch Day Was Not Explosive
&lt;/h2&gt;&lt;p&gt;The first wave of feedback to DeepSeek V4 was not especially heated.
Many people expected it to deliver the kind of shock R1 did: across-the-board benchmark leadership, validation of domestic compute, and simultaneous breakthroughs in multimodal and Agent capabilities.
After the actual release, however, it looked more like a steady upgrade.&lt;/p&gt;
&lt;p&gt;V4 Pro is indeed a strong model, especially in coding, math, long context, and agentic coding.
But it is not the kind of product that instantly makes every peer model look outdated.
So on launch day, the discussion felt a little awkward: people wanted to praise it, but it was hard to find a sufficiently explosive angle.&lt;/p&gt;
&lt;p&gt;The real turning point was not launch day, but the price adjustments that followed.&lt;/p&gt;
&lt;h2 id=&#34;successive-price-cuts-are-the-key&#34;&gt;Successive Price Cuts Are the Key
&lt;/h2&gt;&lt;p&gt;After DeepSeek V4 was released, prices started to move downward.
According to DeepSeek&amp;rsquo;s official pricing page and the information summarized in the source article, the rough prices at that time were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;DeepSeek V4 Flash: about 1 yuan per 1 million input tokens; about 0.02 yuan per 1 million tokens after a cache hit;&lt;/li&gt;
&lt;li&gt;DeepSeek V4 Pro: about 3 yuan per 1 million input tokens; about 0.025 yuan per 1 million tokens after a cache hit;&lt;/li&gt;
&lt;li&gt;the cache-hit input price across the model family dropped to one tenth of the launch price;&lt;/li&gt;
&lt;li&gt;V4 Pro was once in a 75% discount period, extended until May 31, 2026 at 23:59.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The API prices in US dollars make the difference easier to see:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Model&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Cached input&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Non-cached input&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Output&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Context&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;deepseek-v4-flash&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.0028 / 1M tokens&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.14 / 1M tokens&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.28 / 1M tokens&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;1M&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;deepseek-v4-pro&lt;/code&gt; promotional price&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.003625 / 1M tokens&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.435 / 1M tokens&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.87 / 1M tokens&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;1M&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;deepseek-v4-pro&lt;/code&gt; regular price&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.0145 / 1M tokens&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$1.74 / 1M tokens&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$3.48 / 1M tokens&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;1M&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Two details matter here.&lt;/p&gt;
&lt;p&gt;First, V4 Pro&amp;rsquo;s $0.435 / $0.87 is a promotional price, not the long-term regular price.
In DeepSeek&amp;rsquo;s official notes, this 75% discount was extended until May 31, 2026 at 15:59 UTC.&lt;/p&gt;
&lt;p&gt;Second, cache-hit pricing is the key variable in the Agent cost model.
Flash&amp;rsquo;s cached input price is as low as $0.0028 / 1M tokens, while Pro&amp;rsquo;s promotional cached input price is $0.003625 / 1M tokens.
That means repeated project context, tool definitions, system prompts, and historical summaries no longer need to be charged at the full input price.&lt;/p&gt;
&lt;p&gt;The most important thing about this pricing is that it makes the token cost of many tasks &amp;ldquo;insensitive&amp;rdquo;.
In the past, developers worried that one Agent task would consume a large amount of context, repeatedly read and write code, and call tools frequently.
Now, as long as the cache hit rate is high enough, the cost can be pushed very low.&lt;/p&gt;
&lt;h2 id=&#34;price-comparison-with-gpt-and-claude&#34;&gt;Price Comparison With GPT and Claude
&lt;/h2&gt;&lt;p&gt;DeepSeek&amp;rsquo;s own prices alone do not fully convey the gap.
The contrast becomes much clearer when placed next to common closed-source models from the same period.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Model&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Input&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Cached input&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Output&lt;/th&gt;
          &lt;th&gt;Best fit&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;deepseek-v4-flash&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.14 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.0028 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.28 / M&lt;/td&gt;
          &lt;td&gt;High-frequency Agents, routine coding, batch tasks&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;deepseek-v4-pro&lt;/code&gt; promotional price&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.435 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.003625 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.87 / M&lt;/td&gt;
          &lt;td&gt;Complex coding, planning, fact checking&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;deepseek-v4-pro&lt;/code&gt; regular price&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$1.74 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.0145 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$3.48 / M&lt;/td&gt;
          &lt;td&gt;Pro cost baseline after the promotion&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;GPT-5.5&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$5 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.50 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$30 / M&lt;/td&gt;
          &lt;td&gt;High-quality complex tasks, general reasoning&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;GPT-5.4&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$2.50 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.25 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$15 / M&lt;/td&gt;
          &lt;td&gt;Mid-range choice for programming and professional tasks&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;GPT-5.4 mini&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.75 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.075 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$4.50 / M&lt;/td&gt;
          &lt;td&gt;Lower-cost general and subtask model&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Claude Opus 4.7&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$5 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.50 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$25 / M&lt;/td&gt;
          &lt;td&gt;High-quality writing, complex reasoning, long tasks&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$3 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.30 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$15 / M&lt;/td&gt;
          &lt;td&gt;Programming, Agents, general work&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Claude Haiku 4.5&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$1 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$0.10 / M&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;$5 / M&lt;/td&gt;
          &lt;td&gt;Lightweight tasks, summarization, classification&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The most striking number in this table is output price.
Agents do not only read context; they also keep generating plans, patches, explanations, logs, and next actions.
If there is a lot of output, DeepSeek V4 Pro&amp;rsquo;s promotional $0.87 / M becomes dramatically cheaper than GPT-5.5&amp;rsquo;s $30 / M or Claude Sonnet 4.6&amp;rsquo;s $15 / M.&lt;/p&gt;
&lt;p&gt;Even at V4 Pro&amp;rsquo;s regular output price of $3.48 / M, it is still clearly below GPT-5.4, GPT-5.5, and Claude Sonnet / Opus.
If the task can be handled by Flash, the output price drops further to $0.28 / M.&lt;/p&gt;
&lt;p&gt;The cached input gap is even more extreme.
DeepSeek V4 Flash&amp;rsquo;s cached input price is $0.0028 / M, while GPT-5.5 and Claude Opus 4.7 are both $0.50 / M.
These are not in the same order of magnitude.
For Agents that repeatedly read the same code repository, this gap matters more than it does in ordinary chat.&lt;/p&gt;
&lt;h2 id=&#34;why-agent-tasks-are-especially-affected&#34;&gt;Why Agent Tasks Are Especially Affected
&lt;/h2&gt;&lt;p&gt;AI Agents are different from ordinary chat.
Ordinary chat is usually a question-and-answer flow with relatively limited input context.
Agent tasks repeatedly read project files, generate plans, call tools, inspect results, and then modify code again.&lt;/p&gt;
&lt;p&gt;These tasks have two traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;large token consumption;&lt;/li&gt;
&lt;li&gt;lots of repeated context.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The second point is crucial.
In a code project, the model repeatedly reads the same files, directory structure, error logs, and modification results.
If the platform supports cache hits, the cost of repeated input drops sharply.&lt;/p&gt;
&lt;p&gt;The source article mentioned a real experience: connecting DeepSeek V4 Pro and Flash to a Claude Code-like tool, asking it to pull a prompt repository and turn it into a local search site.
The task was completed, with a total cost of roughly a little over 0.8 yuan, and Pro reached a cache hit rate of 98.7%.&lt;/p&gt;
&lt;p&gt;This example illustrates a practical issue: the more an Agent task resembles &amp;ldquo;repeated work around the same project&amp;rdquo;, the more valuable cache hits become.
If generating a website, fixing a bug, or changing a frontend costs only a few cents to a few yuan, subscription plans become less attractive.&lt;/p&gt;
&lt;p&gt;We can estimate the gap with a simplified task.
Assume one coding agent task includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;500,000 input tokens, of which 80% can hit cache;&lt;/li&gt;
&lt;li&gt;50,000 output tokens;&lt;/li&gt;
&lt;li&gt;no tool calls, search costs, or platform markup included, only model token cost.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The rough costs are:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Model&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Estimated cost&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;DeepSeek V4 Flash&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;about $0.03&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;DeepSeek V4 Pro promotional price&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;about $0.09&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;DeepSeek V4 Pro regular price&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;about $0.36&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;GPT-5.4 mini&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;about $0.30&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;GPT-5.4&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;about $1.01&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;GPT-5.5&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;about $1.75&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;about $1.11&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Claude Opus 4.7&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;about $1.65&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This estimate does not mean DeepSeek is better for every task.
Model quality, tool-call stability, long-context retrieval ability, coding style, and factual reliability all need separate evaluation.
But from a cost perspective, DeepSeek V4 pushes the marginal cost of &amp;ldquo;letting the Agent run a few more rounds&amp;rdquo; very low.
That will encourage developers to design longer workflows, more frequent self-checks, and more candidate solutions instead of worrying about the token bill every time.&lt;/p&gt;
&lt;h2 id=&#34;the-difference-between-coding-plans-and-token-plans&#34;&gt;The Difference Between Coding Plans and Token Plans
&lt;/h2&gt;&lt;p&gt;Many AI products now offer two types of plans: Coding Plans and Token Plans.&lt;/p&gt;
&lt;p&gt;The rough difference is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Coding Plans are usually mainly for programming;&lt;/li&gt;
&lt;li&gt;Token Plans usually cover more capabilities, such as STT, TTS, image generation, search, embedding, and RAG;&lt;/li&gt;
&lt;li&gt;STT means speech to text;&lt;/li&gt;
&lt;li&gt;TTS means text to speech;&lt;/li&gt;
&lt;li&gt;Coding Plans often restrict users to programming scenarios, while other capabilities still require separate purchases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a business perspective, a Coding Plan is more like a buffet.
Users pay a fixed fee in advance, while the vendor bets that most people will not use up the quota.
Some users consume more, others consume less, and the platform can still make money on average.&lt;/p&gt;
&lt;p&gt;But if pay-as-you-go token prices are low enough, users start calculating: why do I have to buy a plan?
If the real monthly usage cost is only a few yuan or a dozen yuan, a 40-yuan or 200-yuan plan may no longer be worthwhile.&lt;/p&gt;
&lt;h2 id=&#34;why-price-cuts-challenge-the-subscription-model&#34;&gt;Why Price Cuts Challenge the Subscription Model
&lt;/h2&gt;&lt;p&gt;Subscription plans rely on one premise: users feel that each individual use is expensive, or they do not want to calculate the cost of every call.
When token prices are high, a plan feels reassuring.
When token prices are almost negligible, pay-as-you-go becomes more natural.&lt;/p&gt;
&lt;p&gt;DeepSeek V4&amp;rsquo;s price cut effectively reveals the underlying cost:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Agent tasks can be very cheap;&lt;/li&gt;
&lt;li&gt;long context is not necessarily too expensive to use;&lt;/li&gt;
&lt;li&gt;cache hits can reduce cost significantly;&lt;/li&gt;
&lt;li&gt;ordinary developers do not necessarily need a fixed subscription;&lt;/li&gt;
&lt;li&gt;the model entry point can shift from a &amp;ldquo;plan platform&amp;rdquo; to a &amp;ldquo;low-cost API&amp;rdquo;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This will make platforms built around Coding Plans uncomfortable.
If users find pay-as-you-go calls cheaper and freer, they have less reason to be locked into one platform&amp;rsquo;s subscription.&lt;/p&gt;
&lt;h2 id=&#34;how-to-choose-between-flash-and-pro&#34;&gt;How to Choose Between Flash and Pro
&lt;/h2&gt;&lt;p&gt;A practical way to use DeepSeek V4 is to split work between Flash and Pro.&lt;/p&gt;
&lt;p&gt;Flash is suitable for high-frequency, lightweight, repeatable tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;fixing bugs;&lt;/li&gt;
&lt;li&gt;writing frontend code;&lt;/li&gt;
&lt;li&gt;writing scripts;&lt;/li&gt;
&lt;li&gt;routine code understanding;&lt;/li&gt;
&lt;li&gt;processing ordinary information in long context;&lt;/li&gt;
&lt;li&gt;running large numbers of subtasks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Flash is cheap, fast, and also supports very long context.
For everyday coding agents, many tasks do not need Pro from the start.&lt;/p&gt;
&lt;p&gt;Pro is better for complex judgment and fallback work:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;multi-round planning;&lt;/li&gt;
&lt;li&gt;complex Agent workflows;&lt;/li&gt;
&lt;li&gt;multiple function calls;&lt;/li&gt;
&lt;li&gt;fact checking;&lt;/li&gt;
&lt;li&gt;financial research;&lt;/li&gt;
&lt;li&gt;content production that requires stronger knowledge and judgment;&lt;/li&gt;
&lt;li&gt;high-risk code changes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A reasonable setup is: Flash handles volume, Pro handles fallback.
Start ordinary tasks with Flash, then switch to Pro for long-horizon planning, complex judgment, fact checking, or multi-tool collaboration.
This keeps cost under control while preserving model quality.&lt;/p&gt;
&lt;h2 id=&#34;why-deepseek-can-price-this-way&#34;&gt;Why DeepSeek Can Price This Way
&lt;/h2&gt;&lt;p&gt;DeepSeek has a different business structure from many large platforms.
It does not have e-commerce, social networking, short video, cloud computing, phones, cars, office suites, operating systems, browsers, or a large enterprise SaaS ecosystem.&lt;/p&gt;
&lt;p&gt;That means it does not need to lock users into a complete platform.
It can simply sell text model capability: use cheap text models here, and call any other capability elsewhere.&lt;/p&gt;
&lt;p&gt;Large platforms usually think differently.
If you buy their Coding Plan or Token Plan, you are pulled into their cloud, search, image generation, voice, database, and developer-tool ecosystem.
The plan is not merely selling the model; it is competing for the user entry point.&lt;/p&gt;
&lt;p&gt;DeepSeek&amp;rsquo;s approach is more direct: push text model prices down and try to become the default model entry point for Agents.
Once the default entry point is occupied, many developers and toolchains will naturally adapt around it.&lt;/p&gt;
&lt;h2 id=&#34;open-models-and-the-default-entry-point&#34;&gt;Open Models and the Default Entry Point
&lt;/h2&gt;&lt;p&gt;If DeepSeek V4 keeps an open model route, third-party cloud vendors and platforms may deploy it themselves and provide services.
For DeepSeek, that is both distribution and potential diversion.&lt;/p&gt;
&lt;p&gt;This is where a low-price official API matters.
If the official price is already low enough, other platforms will struggle to offer an obvious price advantage even if they can deploy the model.
Users will tend to use the default, cheap, stable entry point directly.&lt;/p&gt;
&lt;p&gt;This is especially true for Agent tools.
Agent tasks depend on long context, caching, tool calls, and stable throughput.
Once a model is cheap enough in these scenarios, it has a chance to become the default option.&lt;/p&gt;
&lt;h2 id=&#34;coding-plans-are-still-not-useless&#34;&gt;Coding Plans Are Still Not Useless
&lt;/h2&gt;&lt;p&gt;This does not mean Coding Plans will disappear immediately.
They still fit some users.&lt;/p&gt;
&lt;p&gt;If some users are truly heavy users who max out their quota every day, a fixed subscription may still be economical.
Just like a buffet, if nobody could ever eat enough to get their money&amp;rsquo;s worth, users would not buy it.&lt;/p&gt;
&lt;p&gt;The problem is that most users are not that kind of extremely high-frequency user.
Low-frequency users, lightweight developers, and people who occasionally write scripts or modify projects are better suited to pay-as-you-go.
After DeepSeek lowers pay-as-you-go costs, the appeal of plans weakens.&lt;/p&gt;
&lt;p&gt;The future is more likely to become a layered choice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;heavy high-frequency users keep buying Coding Plans;&lt;/li&gt;
&lt;li&gt;ordinary users move to low-cost APIs;&lt;/li&gt;
&lt;li&gt;Agent tools automatically choose Flash / Pro according to the task;&lt;/li&gt;
&lt;li&gt;platform plans need to provide more non-model value, such as workflows, IDE integration, deployment, team management, and security auditing.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;DeepSeek V4 did not create its biggest impact through benchmarks.
What truly changed industry expectations was the price reduction that followed.&lt;/p&gt;
&lt;p&gt;When input tokens and cache-hit pricing are pushed very low, the cost of using AI Agents changes.
Long context, code-project analysis, and multi-round tool calls that used to look expensive may now become everyday costs of a few cents to a few yuan.&lt;/p&gt;
&lt;p&gt;This directly challenges the business logic of Coding Plans and Token Plans.
If users can pay by usage, freely combine models and tools, and keep costs low enough, they may not want to be tied to a specific platform plan.&lt;/p&gt;
&lt;p&gt;What DeepSeek V4 truly touches this time is not only the ranking of model capability, but the cost structure of AI Agents and the battle for the default entry point.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://api-docs.deepseek.com/quick_start/pricing/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;DeepSeek API Docs: Models &amp;amp; Pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://openai.com/api/pricing/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OpenAI API Pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://platform.claude.com/docs/en/about-claude/pricing&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Anthropic Claude API Pricing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>NVIDIA Releases Nemotron 3 Nano Omni: An Open Omnimodal Reasoning Model for Agents</title>
        <link>https://knightli.com/en/2026/05/01/nvidia-nemotron-3-nano-omni-multimodal-agents/</link>
        <pubDate>Fri, 01 May 2026 12:07:15 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/nvidia-nemotron-3-nano-omni-multimodal-agents/</guid>
        <description>&lt;p&gt;NVIDIA has released &lt;code&gt;Nemotron 3 Nano Omni&lt;/code&gt;, an open omnimodal reasoning model designed for agent workflows.
Its focus is not simply text question answering, but putting language, vision, and audio into the same reasoning framework so the model can handle inputs that are closer to real work.&lt;/p&gt;
&lt;p&gt;In positioning, &lt;code&gt;Nemotron 3 Nano Omni&lt;/code&gt; looks more like a foundation model prepared for AI Agents.
It can understand information from screens, documents, images, speech, and video, then turn that information into actionable reasoning results.
This kind of capability fits computer operation, document intelligence, video understanding, voice interaction, customer service, education, and enterprise process automation.&lt;/p&gt;
&lt;h2 id=&#34;model-specs&#34;&gt;Model Specs
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Nemotron 3 Nano Omni&lt;/code&gt; uses a MoE architecture.
The key specs NVIDIA lists are:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Item&lt;/th&gt;
          &lt;th&gt;Information&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Model name&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;Nemotron 3 Nano Omni&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Architecture&lt;/td&gt;
          &lt;td&gt;MoE&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Parameter scale&lt;/td&gt;
          &lt;td&gt;30B total / 3B active&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Modalities&lt;/td&gt;
          &lt;td&gt;Text, image, audio, video&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Context length&lt;/td&gt;
          &lt;td&gt;256K tokens&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;License&lt;/td&gt;
          &lt;td&gt;Apache 2.0&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Main deployment direction&lt;/td&gt;
          &lt;td&gt;AI Agents, multimodal reasoning, enterprise agents&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The most notable point here is &lt;code&gt;30B-A3B&lt;/code&gt;.
It means the model has about 30B total parameters, but only activates about 3B parameters during each inference step.
This is a tradeoff between capability and inference cost: the model keeps a larger expert capacity while using only part of it at runtime.&lt;/p&gt;
&lt;p&gt;That said, MoE &lt;code&gt;active params&lt;/code&gt; does not mean VRAM can be estimated as if this were only a 3B model.
A full deployment still needs to account for expert weights, KV cache, vision and audio encoder modules, context length, and inference framework overhead.&lt;/p&gt;
&lt;h2 id=&#34;it-is-not-solving-a-single-modality-problem&#34;&gt;It Is Not Solving a Single-Modality Problem
&lt;/h2&gt;&lt;p&gt;Traditional large language models mainly process text.
Multimodal models add image understanding.
&lt;code&gt;Nemotron 3 Nano Omni&lt;/code&gt; has a broader target: it emphasizes omnimodal input, meaning text, images, audio, and video are all brought into a unified reasoning process.&lt;/p&gt;
&lt;p&gt;This matters a lot for agents.
Real agent tasks are often not &amp;ldquo;take a piece of text and generate another piece of text&amp;rdquo;; they are more like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;reading buttons, tables, and windows on a screen;&lt;/li&gt;
&lt;li&gt;parsing PDFs, screenshots, charts, and webpages;&lt;/li&gt;
&lt;li&gt;listening to spoken instructions or meeting recordings;&lt;/li&gt;
&lt;li&gt;understanding actions, scenes, and timing in video;&lt;/li&gt;
&lt;li&gt;combining those signals into the next operation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If a model can only handle one modality, an Agent needs extra glue between multiple specialized models.
The value of an omnimodal model is reducing that integration cost and letting the same model directly process more complex environmental inputs.&lt;/p&gt;
&lt;h2 id=&#34;built-for-computer-operation-and-document-intelligence&#34;&gt;Built for Computer Operation and Document Intelligence
&lt;/h2&gt;&lt;p&gt;NVIDIA specifically notes that &lt;code&gt;Nemotron 3 Nano Omni&lt;/code&gt; can be used for computer-operation tasks.
These tasks usually require the model to understand user interfaces:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;what controls are on the screen;&lt;/li&gt;
&lt;li&gt;what state the current window is in;&lt;/li&gt;
&lt;li&gt;which button or menu is the next target;&lt;/li&gt;
&lt;li&gt;what the content in tables, dialogs, and input boxes means.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is also one of the hard-to-avoid capabilities when AI Agents move into real deployment.
If an agent is going to help people operate office software, browsers, enterprise backends, or developer tools, it has to understand the interface, not just read API docs.&lt;/p&gt;
&lt;p&gt;Document intelligence follows a similar logic.
Enterprise materials often mix text, tables, images, scanned pages, and charts.
An omnimodal model can put all of that content into the same context for understanding, making it suitable for contract review, report analysis, invoice processing, knowledge-base QA, and process automation.&lt;/p&gt;
&lt;h2 id=&#34;audio-and-video-bring-agents-closer-to-real-scenarios&#34;&gt;Audio and Video Bring Agents Closer to Real Scenarios
&lt;/h2&gt;&lt;p&gt;Audio and video inputs can noticeably expand the range of agent applications.&lt;/p&gt;
&lt;p&gt;Audio scenarios include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;meeting recording summaries;&lt;/li&gt;
&lt;li&gt;customer service call analysis;&lt;/li&gt;
&lt;li&gt;voice command understanding;&lt;/li&gt;
&lt;li&gt;education and training content organization.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Video scenarios include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;instructional video understanding;&lt;/li&gt;
&lt;li&gt;security and industrial inspection;&lt;/li&gt;
&lt;li&gt;screen recording analysis;&lt;/li&gt;
&lt;li&gt;operation workflow review;&lt;/li&gt;
&lt;li&gt;temporal reasoning in multi-step tasks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If these tasks rely only on text transcription, a lot of visual and timing information is lost.
An omnimodal model can directly combine voice, frames, and textual clues, giving Agents a more complete sense of their environment.&lt;/p&gt;
&lt;h2 id=&#34;deployment-and-ecosystem&#34;&gt;Deployment and Ecosystem
&lt;/h2&gt;&lt;p&gt;NVIDIA is placing &lt;code&gt;Nemotron 3 Nano Omni&lt;/code&gt; inside an open ecosystem, and the model uses the Apache 2.0 license.
That matters for developers and enterprises because it lowers the licensing barrier for experimentation, integration, and secondary development.&lt;/p&gt;
&lt;p&gt;From NVIDIA&amp;rsquo;s introduction, this model is also closely tied to its inference ecosystem.
For enterprise users, real deployment usually raises questions like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;whether it can run efficiently on NVIDIA GPUs;&lt;/li&gt;
&lt;li&gt;whether it supports long context and multimodal input;&lt;/li&gt;
&lt;li&gt;whether it can connect to existing Agent frameworks;&lt;/li&gt;
&lt;li&gt;whether it can process internal documents, audio/video, and UI screenshots;&lt;/li&gt;
&lt;li&gt;whether it can be deployed in private environments.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;NVIDIA emphasizes that the model has a clear throughput advantage and says it can reach up to 9x the throughput of comparable open omnimodal reasoning models.
The real value of that number still depends on the specific hardware, context length, input modalities, and inference framework.
But the direction is clear: NVIDIA wants to bring open multimodal models and its inference infrastructure together into enterprise Agent scenarios.&lt;/p&gt;
&lt;h2 id=&#34;suitable-use-cases&#34;&gt;Suitable Use Cases
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Nemotron 3 Nano Omni&lt;/code&gt; is better suited to tasks such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Agents that need to understand text, images, audio, and video at the same time;&lt;/li&gt;
&lt;li&gt;enterprise document intelligence and knowledge-base QA;&lt;/li&gt;
&lt;li&gt;computer operation based on screenshots or web interfaces;&lt;/li&gt;
&lt;li&gt;multimodal analysis of meetings, customer service, and teaching content;&lt;/li&gt;
&lt;li&gt;video understanding, workflow review, and temporal reasoning;&lt;/li&gt;
&lt;li&gt;teams that require open licensing and private deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is not necessarily a fit for every regular user.
If the task is local chat, code completion, or simple QA, a single-modality language model may be lighter, faster, and more resource-efficient.
The value of &lt;code&gt;Nemotron 3 Nano Omni&lt;/code&gt; mainly appears in complex input and multimodal Agent workflows.&lt;/p&gt;
&lt;h2 id=&#34;what-this-means-for-ai-agents&#34;&gt;What This Means for AI Agents
&lt;/h2&gt;&lt;p&gt;For AI Agents to truly enter work scenarios, they cannot only write text.
They need to understand interfaces, speech, documents, and changes in video, then turn that information into the next action.&lt;/p&gt;
&lt;p&gt;That is where &lt;code&gt;Nemotron 3 Nano Omni&lt;/code&gt; matters.
It is not simply making the model larger; it is unifying the many kinds of input Agents face into one reasoning model.
This can make it easier for developers to build agents for real tasks instead of building only around chat windows.&lt;/p&gt;
&lt;p&gt;From this angle, the point of NVIDIA&amp;rsquo;s release is not just &amp;ldquo;another multimodal model&amp;rdquo;.
It is part of a continuing effort to connect open models, GPU inference, enterprise Agents, and private deployment.
What will be worth watching next is how it performs in concrete Agent frameworks, enterprise workflows, and local deployments.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://blogs.nvidia.cn/blog/nemotron-3-nano-omni-multimodal-ai-agents/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NVIDIA Technical Blog: NVIDIA Nemotron 3 Nano Omni&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>FinceptTerminal: An Open-Source Financial Terminal, Quant Research, and AI Agent Workbench</title>
        <link>https://knightli.com/en/2026/05/01/finceptterminal-open-source-financial-terminal/</link>
        <pubDate>Fri, 01 May 2026 03:47:18 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/finceptterminal-open-source-financial-terminal/</guid>
        <description>&lt;p&gt;&lt;code&gt;FinceptTerminal&lt;/code&gt; is an open-source financial terminal project from Fincept Corporation.&lt;/p&gt;
&lt;p&gt;Based on the README, it is not a simple market quote panel. It is a comprehensive desktop platform for financial analysis, quant research, trading workflows, and AI Agents. Version 4 is built with C++20 and Qt6 as a native desktop application, while embedding the Python ecosystem for analytics, scripting, machine learning, and financial modeling.&lt;/p&gt;
&lt;p&gt;If we need a comparison, it is closer to an open-source financial research workbench: connecting data sources on one side, and handling charts, portfolios, quant research, trading, intelligence analysis, and automated workflows on the other.&lt;/p&gt;
&lt;p&gt;One thing should be made clear first: tools like this can be used for research, analysis, education, and internal tool building, but no output should be treated directly as investment advice. Financial markets are risky, and data, models, strategies, and execution all require independent verification.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-does-it-solve&#34;&gt;What problem does it solve?
&lt;/h2&gt;&lt;p&gt;Financial research is often scattered across many tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Market data lives in one application&lt;/li&gt;
&lt;li&gt;Research code lives in Jupyter&lt;/li&gt;
&lt;li&gt;Charts live in another tool&lt;/li&gt;
&lt;li&gt;Portfolio analysis lives in spreadsheets&lt;/li&gt;
&lt;li&gt;Trading records live in brokerage systems&lt;/li&gt;
&lt;li&gt;News and intelligence live in the browser&lt;/li&gt;
&lt;li&gt;AI analysis lives in a chat window&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This approach works, but collaboration and reproducibility are difficult.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;FinceptTerminal&lt;/code&gt; tries to integrate these capabilities into one desktop terminal, so users can complete data access, analysis, modeling, visualization, Agent collaboration, and trading-related workflows in the same environment.&lt;/p&gt;
&lt;p&gt;Its goal is not to replace every professional system, but to provide an extensible open-source foundation for a financial terminal.&lt;/p&gt;
&lt;h2 id=&#34;technical-architecture&#34;&gt;Technical architecture
&lt;/h2&gt;&lt;p&gt;The README mentions that v4 uses C++20 and Qt6.&lt;/p&gt;
&lt;p&gt;This means it is not a pure web panel, but a native desktop application. For a financial terminal, native applications have several advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;More stable UI responsiveness&lt;/li&gt;
&lt;li&gt;Better fit for complex windows and multi-panel layouts&lt;/li&gt;
&lt;li&gt;Easier access to local files and system resources&lt;/li&gt;
&lt;li&gt;Ability to embed high-performance components&lt;/li&gt;
&lt;li&gt;Better suited for long-running desktop workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At the same time, the project also embeds Python.&lt;/p&gt;
&lt;p&gt;This is important. In financial research and quant analysis, Python is one of the de facto mainstream languages. Data analysis, machine learning, statistics, backtesting, charting, and financial modeling all rely heavily on the Python ecosystem. C++/Qt handles the application framework and desktop experience, while Python handles research and extensibility. That is a very practical combination.&lt;/p&gt;
&lt;h2 id=&#34;data-connectors&#34;&gt;Data connectors
&lt;/h2&gt;&lt;p&gt;The README says the project provides 100+ data connectors.&lt;/p&gt;
&lt;p&gt;The value of a financial terminal depends heavily on data access. Without data, even the best UI and models are just an empty shell.&lt;/p&gt;
&lt;p&gt;These connectors can usually cover different sources:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Market quotes&lt;/li&gt;
&lt;li&gt;Macroeconomic data&lt;/li&gt;
&lt;li&gt;Company financials&lt;/li&gt;
&lt;li&gt;News and intelligence&lt;/li&gt;
&lt;li&gt;Exchange data&lt;/li&gt;
&lt;li&gt;Crypto asset data&lt;/li&gt;
&lt;li&gt;Research data sources&lt;/li&gt;
&lt;li&gt;Internal or custom APIs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For users, data connectors reduce the workflow of &amp;ldquo;download CSV, clean it manually, then import it again&amp;rdquo;, making analysis closer to real-time and automation.&lt;/p&gt;
&lt;p&gt;That said, the quality, licensing, latency, coverage, and cost of financial data are all critical. Before using any data source, its license and usage boundaries need to be confirmed.&lt;/p&gt;
&lt;h2 id=&#34;ai-agents-module&#34;&gt;AI Agents module
&lt;/h2&gt;&lt;p&gt;The project emphasizes AI Agents, which is also where it differs from traditional financial terminals.&lt;/p&gt;
&lt;p&gt;Traditional terminals are mostly human-operated interfaces: people look at data and make judgments. With AI Agents, the tool can take on more assistant-style work:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Summarize market information&lt;/li&gt;
&lt;li&gt;Explain financial reports and announcements&lt;/li&gt;
&lt;li&gt;Generate research summaries&lt;/li&gt;
&lt;li&gt;Help filter data&lt;/li&gt;
&lt;li&gt;Assist with analysis scripts&lt;/li&gt;
&lt;li&gt;Organize trading or research workflows&lt;/li&gt;
&lt;li&gt;Pass context across modules&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This does not mean AI can replace analysts or traders.&lt;/p&gt;
&lt;p&gt;A more reasonable position is this: AI Agents help reduce repetitive organization work and provide preliminary analysis and interactive queries, but important conclusions still require data validation, model validation, and human judgment.&lt;/p&gt;
&lt;h2 id=&#34;quant-research-capabilities&#34;&gt;Quant research capabilities
&lt;/h2&gt;&lt;p&gt;FinceptTerminal is also aimed at quant research.&lt;/p&gt;
&lt;p&gt;Quant research usually includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data cleaning&lt;/li&gt;
&lt;li&gt;Factor construction&lt;/li&gt;
&lt;li&gt;Strategy hypotheses&lt;/li&gt;
&lt;li&gt;Backtesting&lt;/li&gt;
&lt;li&gt;Risk assessment&lt;/li&gt;
&lt;li&gt;Portfolio optimization&lt;/li&gt;
&lt;li&gt;Trading cost estimation&lt;/li&gt;
&lt;li&gt;Result visualization&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If a terminal can integrate data connections, Python analysis, charts, and workflows, it can be very useful for quant research. Researchers can move step by step from data to strategy validation in one environment.&lt;/p&gt;
&lt;p&gt;However, the biggest danger in quant research is something that &amp;ldquo;looks effective.&amp;rdquo; If a strategy does not strictly handle out-of-sample validation, trading costs, slippage, survivorship bias, overfitting, and data leakage, even a beautiful backtest is unreliable.&lt;/p&gt;
&lt;p&gt;So this kind of tool should be treated as a research platform, not an automatic money-making machine.&lt;/p&gt;
&lt;h2 id=&#34;quantlib-and-financial-modeling&#34;&gt;QuantLib and financial modeling
&lt;/h2&gt;&lt;p&gt;The README mentions QuantLib-related capabilities.&lt;/p&gt;
&lt;p&gt;QuantLib is a common open-source library in financial engineering. It is often used for interest rates, bonds, options, derivatives pricing, curve construction, risk calculation, and related areas.&lt;/p&gt;
&lt;p&gt;This means FinceptTerminal is not only about viewing stock quotes. It also tries to cover more professional financial modeling scenarios.&lt;/p&gt;
&lt;p&gt;These capabilities are suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learning financial engineering&lt;/li&gt;
&lt;li&gt;Experiments in derivatives pricing&lt;/li&gt;
&lt;li&gt;Curve and risk metric calculation&lt;/li&gt;
&lt;li&gt;Portfolio risk analysis&lt;/li&gt;
&lt;li&gt;Research model prototyping&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But financial modeling itself has a high barrier. Model parameters, market assumptions, data sources, and pricing logic all affect the results. A tool can reduce operating costs, but it cannot replace professional judgment.&lt;/p&gt;
&lt;h2 id=&#34;node-workflows&#34;&gt;Node workflows
&lt;/h2&gt;&lt;p&gt;The README also mentions node-based workflows.&lt;/p&gt;
&lt;p&gt;Node workflows are suitable for breaking complex tasks into visual processes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read data&lt;/li&gt;
&lt;li&gt;Clean data&lt;/li&gt;
&lt;li&gt;Run models&lt;/li&gt;
&lt;li&gt;Generate charts&lt;/li&gt;
&lt;li&gt;Trigger AI analysis&lt;/li&gt;
&lt;li&gt;Output reports&lt;/li&gt;
&lt;li&gt;Send notifications&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For financial scenarios, this approach has two advantages.&lt;/p&gt;
&lt;p&gt;First, the process becomes visible. Complex analysis is no longer hidden only inside a pile of scripts, and users can see how data flows.&lt;/p&gt;
&lt;p&gt;Second, it is suitable for automation. Repetitive research processes can be saved, reused, and adjusted.&lt;/p&gt;
&lt;p&gt;If these workflows can be combined with Python scripts, data connectors, Agents, and reporting systems, this kind of node workflow can become a very valuable module inside a financial terminal.&lt;/p&gt;
&lt;h2 id=&#34;trading-and-portfolio-management&#34;&gt;Trading and portfolio management
&lt;/h2&gt;&lt;p&gt;The project also mentions trading and portfolio-related capabilities.&lt;/p&gt;
&lt;p&gt;This is the area that requires the most caution.&lt;/p&gt;
&lt;p&gt;Portfolio management can help users understand asset exposure, returns, drawdowns, volatility, correlation, and risk concentration. Trading modules may involve orders, accounts, execution, and records.&lt;/p&gt;
&lt;p&gt;But whenever real trading is involved, the following must be considered:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data latency&lt;/li&gt;
&lt;li&gt;Order execution risk&lt;/li&gt;
&lt;li&gt;API permissions&lt;/li&gt;
&lt;li&gt;Trading costs&lt;/li&gt;
&lt;li&gt;Slippage&lt;/li&gt;
&lt;li&gt;Liquidity&lt;/li&gt;
&lt;li&gt;Risk control limits&lt;/li&gt;
&lt;li&gt;Auditing and logs&lt;/li&gt;
&lt;li&gt;Accidental strategy triggers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Trading features in development and research environments should not be equated with production-grade trading systems. Before connecting to live trading, strict testing, permission isolation, risk control mechanisms, and manual review are required.&lt;/p&gt;
&lt;h2 id=&#34;how-is-it-different-from-bloomberg-terminal&#34;&gt;How is it different from Bloomberg Terminal?
&lt;/h2&gt;&lt;p&gt;Many financial terminal projects are compared with Bloomberg Terminal.&lt;/p&gt;
&lt;p&gt;But the positioning is different.&lt;/p&gt;
&lt;p&gt;The value of Bloomberg Terminal is not only its software interface. It also includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data coverage&lt;/li&gt;
&lt;li&gt;Data licensing&lt;/li&gt;
&lt;li&gt;News network&lt;/li&gt;
&lt;li&gt;Trading ecosystem&lt;/li&gt;
&lt;li&gt;Customer support&lt;/li&gt;
&lt;li&gt;Financial institution workflows&lt;/li&gt;
&lt;li&gt;Long-accumulated industry trust&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;FinceptTerminal is more like an open-source financial terminal framework and research platform. Its strengths are extensibility, customization, localization, and integration with Python and AI workflows.&lt;/p&gt;
&lt;p&gt;It should not be understood simply as a free replacement for Bloomberg.&lt;/p&gt;
&lt;p&gt;A more reasonable view is this: if you want to study how financial terminals are built, or if you want to build your own financial analysis workbench, FinceptTerminal provides an open-source starting point.&lt;/p&gt;
&lt;h2 id=&#34;licensing-and-commercial-boundaries&#34;&gt;Licensing and commercial boundaries
&lt;/h2&gt;&lt;p&gt;The README mentions that the project uses AGPL and a commercial licensing model.&lt;/p&gt;
&lt;p&gt;AGPL has explicit requirements for network services and derivative works. If you only use it for learning, research, or personal experiments, it is usually not a big issue. But if you plan to turn it into a commercial product, internal platform, or external service, you need to read the license carefully.&lt;/p&gt;
&lt;p&gt;Financial tools often enter internal enterprise systems. In that case, open-source licenses, commercial licenses, data licenses, and model licenses all need to be reviewed together, instead of only asking whether the code can run.&lt;/p&gt;
&lt;h2 id=&#34;who-should-pay-attention&#34;&gt;Who should pay attention?
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;FinceptTerminal&lt;/code&gt; is suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Developers interested in financial terminal architecture&lt;/li&gt;
&lt;li&gt;People doing quant research or financial engineering experiments&lt;/li&gt;
&lt;li&gt;People who want to embed Python analysis into desktop tools&lt;/li&gt;
&lt;li&gt;People exploring AI Agent + finance workflows&lt;/li&gt;
&lt;li&gt;Teams building internal financial analysis platforms&lt;/li&gt;
&lt;li&gt;People learning C++/Qt financial application development&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only want to watch quotes for a few stocks, ordinary market software may be simpler.&lt;/p&gt;
&lt;p&gt;If you want to understand how a financial terminal integrates data, charts, models, Agents, trading, and workflows, this project is more worth studying.&lt;/p&gt;
&lt;h2 id=&#34;things-to-watch-when-using-it&#34;&gt;Things to watch when using it
&lt;/h2&gt;&lt;p&gt;First, distinguish research from trading.&lt;/p&gt;
&lt;p&gt;Research environments can tolerate experiments and failure. Trading environments cannot. Do not connect a research tool to real accounts before it has been verified.&lt;/p&gt;
&lt;p&gt;Second, take data licensing seriously.&lt;/p&gt;
&lt;p&gt;Financial data cannot simply be scraped and used commercially. Different data sources have different licensing terms, especially market data, news, financial statements, and exchange data.&lt;/p&gt;
&lt;p&gt;Third, do not blindly trust AI Agents.&lt;/p&gt;
&lt;p&gt;AI can help organize information, but financial conclusions must return to data, models, risk, and factual validation.&lt;/p&gt;
&lt;p&gt;Fourth, pay attention to security.&lt;/p&gt;
&lt;p&gt;If a tool connects to accounts, API keys, trading interfaces, or internal data, key management, permission isolation, logs, and network boundaries must be handled properly.&lt;/p&gt;
&lt;p&gt;Fifth, understand the open-source license.&lt;/p&gt;
&lt;p&gt;AGPL has important implications for commercial use and service deployment. Before productization, licensing issues should be handled first.&lt;/p&gt;
&lt;h2 id=&#34;reference&#34;&gt;Reference
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Fincept-Corporation/FinceptTerminal&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Fincept-Corporation/FinceptTerminal&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;final-thought&#34;&gt;Final thought
&lt;/h2&gt;&lt;p&gt;What makes &lt;code&gt;FinceptTerminal&lt;/code&gt; worth watching is that it puts financial terminals, Python quant research, AI Agents, data connectors, and node workflows into the same open-source desktop platform concept.&lt;/p&gt;
&lt;p&gt;It is better suited as a starting point for financial technology research and internal tool building than as a finished product that can directly replace professional financial terminals or live trading systems.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>mattpocock/skills: A Practical Skill Collection for AI Coding Agents</title>
        <link>https://knightli.com/en/2026/05/01/mattpocock-skills-ai-agent-coding-workflows/</link>
        <pubDate>Fri, 01 May 2026 03:43:20 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/mattpocock-skills-ai-agent-coding-workflows/</guid>
        <description>&lt;p&gt;&lt;code&gt;mattpocock/skills&lt;/code&gt; is a public collection of AI coding agent skills from Matt Pocock.&lt;/p&gt;
&lt;p&gt;It is not a full application, nor a new chat client. It is a set of working skills that can be used by AI coding assistants. The idea is practical: break common AI coding problems into small skills that an Agent can call in the right task, instead of relying on one huge prompt every time.&lt;/p&gt;
&lt;p&gt;If you often use Claude Code, Codex, Cursor, or similar AI coding tools, this kind of skills collection is worth watching. What really affects the AI coding experience is often not whether the model can write code, but whether it can move through the task in your preferred working style.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-it-solves&#34;&gt;What Problem It Solves
&lt;/h2&gt;&lt;p&gt;AI coding assistants are powerful, but they can easily go wrong.&lt;/p&gt;
&lt;p&gt;Common situations include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Starting code changes before understanding the requirement&lt;/li&gt;
&lt;li&gt;Modifying too many files at once&lt;/li&gt;
&lt;li&gt;Producing lots of explanation but little useful action&lt;/li&gt;
&lt;li&gt;Blindly trying things after errors&lt;/li&gt;
&lt;li&gt;Not running tests or checks in time&lt;/li&gt;
&lt;li&gt;Ignoring existing project patterns&lt;/li&gt;
&lt;li&gt;Introducing unnecessary abstractions to finish a task&lt;/li&gt;
&lt;li&gt;Writing code without truly reviewing risks afterward&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These problems are not always caused by weak model capability. Often, the workflow is not constrained well enough.&lt;/p&gt;
&lt;p&gt;The value of &lt;code&gt;mattpocock/skills&lt;/code&gt; is that it turns these common failure modes into reusable operating methods, making the Agent behave more like an experienced engineering collaborator in different scenarios.&lt;/p&gt;
&lt;h2 id=&#34;what-are-skills&#34;&gt;What Are Skills
&lt;/h2&gt;&lt;p&gt;In the AI Agent context, a skill can be understood as a reusable task instruction, working method, or professional workflow.&lt;/p&gt;
&lt;p&gt;It does not have to be a code plugin, and it does not always need to call an external service. In many cases, a skill is simply a clear set of rules:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When to use it&lt;/li&gt;
&lt;li&gt;What to do first&lt;/li&gt;
&lt;li&gt;What not to do&lt;/li&gt;
&lt;li&gt;What output is required&lt;/li&gt;
&lt;li&gt;How to judge task completion&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is somewhat like a normal prompt template, but the granularity is closer to a task capability.&lt;/p&gt;
&lt;p&gt;Normal prompt templates are usually copied and pasted manually by the user. Skills are better as part of an agent toolbox, allowing the Agent to choose the right workflow for the task.&lt;/p&gt;
&lt;h2 id=&#34;why-small-and-composable-matters&#34;&gt;Why Small and Composable Matters
&lt;/h2&gt;&lt;p&gt;The README emphasizes that these skills are small and composable.&lt;/p&gt;
&lt;p&gt;This direction matters.&lt;/p&gt;
&lt;p&gt;If one skill tries to handle everything, it quickly becomes a new giant prompt: long, vague, and hard to maintain. The advantage of small skills is clear boundaries.&lt;/p&gt;
&lt;p&gt;For example, one skill can focus on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Planning first&lt;/li&gt;
&lt;li&gt;Fixing TypeScript errors&lt;/li&gt;
&lt;li&gt;Running tests and fixing based on results&lt;/li&gt;
&lt;li&gt;Doing code review&lt;/li&gt;
&lt;li&gt;Summarizing project conventions&lt;/li&gt;
&lt;li&gt;Improving prompts&lt;/li&gt;
&lt;li&gt;Removing unnecessary abstractions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These skills can be combined according to the task. A simple task may need only one skill, while a complex task can chain several together.&lt;/p&gt;
&lt;p&gt;This is closer to real engineering work. You do not use the same workflow for every problem; you choose tools according to the situation.&lt;/p&gt;
&lt;h2 id=&#34;keeping-the-engineer-in-control&#34;&gt;Keeping the Engineer in Control
&lt;/h2&gt;&lt;p&gt;One important direction of this repository is keeping the engineer in control.&lt;/p&gt;
&lt;p&gt;AI coding can easily slide into two extremes.&lt;/p&gt;
&lt;p&gt;The first is fully manual. AI only helps write a few lines of code, while all context, planning, and verification still depend on you.&lt;/p&gt;
&lt;p&gt;The second is fully hands-off. You throw a task to an Agent, let it change a lot of things, and then face a diff that is hard to review.&lt;/p&gt;
&lt;p&gt;Skills help find a more stable middle position.&lt;/p&gt;
&lt;p&gt;They let AI take on more repetitive workflow, while still constraining it with rules:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Understand the task before acting&lt;/li&gt;
&lt;li&gt;Read relevant files before editing&lt;/li&gt;
&lt;li&gt;Keep the modification scope controlled&lt;/li&gt;
&lt;li&gt;Report uncertainty&lt;/li&gt;
&lt;li&gt;Verify after changes&lt;/li&gt;
&lt;li&gt;Do not refactor unrelated code just to show off&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This does not weaken AI. It makes AI actions easier for humans to review and take over.&lt;/p&gt;
&lt;h2 id=&#34;alignment-problems&#34;&gt;Alignment Problems
&lt;/h2&gt;&lt;p&gt;The first kind of AI coding failure is often alignment failure.&lt;/p&gt;
&lt;p&gt;The user wants a very specific change, but the Agent may understand it as a larger refactor. The user only wants a bug fixed, but it changes styles along the way. The user wants existing architecture to be followed, but it introduces a new pattern.&lt;/p&gt;
&lt;p&gt;Skills can help the Agent do several things at the start of a task:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Restate the goal&lt;/li&gt;
&lt;li&gt;Identify the impact scope&lt;/li&gt;
&lt;li&gt;Recognize existing implementation patterns&lt;/li&gt;
&lt;li&gt;Provide a plan&lt;/li&gt;
&lt;li&gt;Clarify what will not be done&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This step is like an engineer’s self-check before starting work.&lt;/p&gt;
&lt;p&gt;If the Agent cannot clearly state the task boundary and starts writing code directly, it is easy for the task to drift.&lt;/p&gt;
&lt;h2 id=&#34;feedback-loop-problems&#34;&gt;Feedback Loop Problems
&lt;/h2&gt;&lt;p&gt;AI should not write code through one-shot generation alone.&lt;/p&gt;
&lt;p&gt;In real development, feedback loops matter:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Change a small piece&lt;/li&gt;
&lt;li&gt;Run tests or type checks&lt;/li&gt;
&lt;li&gt;Read the errors&lt;/li&gt;
&lt;li&gt;Fix them&lt;/li&gt;
&lt;li&gt;Verify again&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Many Agents fail because they skip the middle feedback. They change many things at once and then summarize from intuition that “it should work.”&lt;/p&gt;
&lt;p&gt;Skills can make the feedback loop explicit. For example, they can require the Agent to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Run relevant checks after modification&lt;/li&gt;
&lt;li&gt;Read error messages first if checks fail&lt;/li&gt;
&lt;li&gt;Avoid blindly changing unrelated files&lt;/li&gt;
&lt;li&gt;Re-verify after each round of fixes&lt;/li&gt;
&lt;li&gt;Report final verification results&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This makes AI coding more like real debugging and less like one-shot writing.&lt;/p&gt;
&lt;h2 id=&#34;architecture-control-problems&#34;&gt;Architecture Control Problems
&lt;/h2&gt;&lt;p&gt;AI is good at generating abstractions, and also good at over-generating abstractions.&lt;/p&gt;
&lt;p&gt;To complete a small requirement, it may create a service layer, helper functions, configuration objects, type wrappers, and adapters, making the code much more complex than the requirement itself.&lt;/p&gt;
&lt;p&gt;This is especially dangerous in large projects. AI-generated abstractions often look “professional,” but they may not match existing project style and may increase maintenance cost.&lt;/p&gt;
&lt;p&gt;Good skills remind the Agent to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Prefer existing patterns&lt;/li&gt;
&lt;li&gt;Avoid unnecessary new abstractions&lt;/li&gt;
&lt;li&gt;Avoid refactoring unrelated areas&lt;/li&gt;
&lt;li&gt;Match the change to the size of the task&lt;/li&gt;
&lt;li&gt;Understand the code before designing structure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This reduces output that looks engineered but is actually harder to maintain.&lt;/p&gt;
&lt;h2 id=&#34;why-review-skills-matter&#34;&gt;Why Review Skills Matter
&lt;/h2&gt;&lt;p&gt;Writing code and reviewing code are different states.&lt;/p&gt;
&lt;p&gt;When an Agent writes code, it usually tends to prove that its implementation works. It may explain why the change should work, but it does not always actively look for risks.&lt;/p&gt;
&lt;p&gt;The purpose of a review skill is to switch the Agent’s role:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Find potential bugs&lt;/li&gt;
&lt;li&gt;Find behavior regressions&lt;/li&gt;
&lt;li&gt;Find missing tests&lt;/li&gt;
&lt;li&gt;Find edge cases&lt;/li&gt;
&lt;li&gt;Find increased complexity&lt;/li&gt;
&lt;li&gt;Find inconsistencies with existing conventions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This matters for AI coding because AI generates code quickly. Without review, users can easily be overwhelmed by large diffs.&lt;/p&gt;
&lt;p&gt;A good review output should list issues first, not praise the implementation first. It should help the engineer decide whether the change can be merged.&lt;/p&gt;
&lt;h2 id=&#34;difference-from-normal-rules-files&#34;&gt;Difference from Normal Rules Files
&lt;/h2&gt;&lt;p&gt;Many AI coding tools support rules, instructions, or memory.&lt;/p&gt;
&lt;p&gt;These files usually record long-term rules, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Project tech stack&lt;/li&gt;
&lt;li&gt;Naming conventions&lt;/li&gt;
&lt;li&gt;Test commands&lt;/li&gt;
&lt;li&gt;Directories not to modify&lt;/li&gt;
&lt;li&gt;Answer style preferences&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Skills are more focused on task workflow.&lt;/p&gt;
&lt;p&gt;Rules tell the Agent “how to behave in the long term,” while skills tell the Agent “how to execute this kind of task.”&lt;/p&gt;
&lt;p&gt;The two work best together.&lt;/p&gt;
&lt;p&gt;For example, rules can say the project uses &lt;code&gt;pnpm test&lt;/code&gt;, while a review skill requires checking test coverage after changes. Then the Agent knows not only the command, but also when to use it.&lt;/p&gt;
&lt;h2 id=&#34;suitable-scenarios&#34;&gt;Suitable Scenarios
&lt;/h2&gt;&lt;p&gt;Repositories like &lt;code&gt;mattpocock/skills&lt;/code&gt; are suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Frequent use of AI coding tools&lt;/li&gt;
&lt;li&gt;Agents working on real codebases&lt;/li&gt;
&lt;li&gt;Reducing out-of-scope AI edits&lt;/li&gt;
&lt;li&gt;Making the Agent verify results more actively&lt;/li&gt;
&lt;li&gt;Turning your engineering habits into skills&lt;/li&gt;
&lt;li&gt;Learning how others design agent workflows&lt;/li&gt;
&lt;li&gt;Turning temporary prompts into a maintainable skill collection&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only occasionally ask AI to write a small function, you may not need to maintain skills.&lt;/p&gt;
&lt;p&gt;But if you already treat AI as a long-term development partner, skills become increasingly important. They are like a reusable working method for the Agent.&lt;/p&gt;
&lt;h2 id=&#34;how-to-learn-from-this-repository&#34;&gt;How to Learn from This Repository
&lt;/h2&gt;&lt;p&gt;Even if you do not use every skill directly, you can learn several things from this repository.&lt;/p&gt;
&lt;p&gt;First, write down failure modes.&lt;/p&gt;
&lt;p&gt;Do not only complain when AI makes a mistake. Turn the patterns it often gets wrong into rules, so a skill can prevent them next time.&lt;/p&gt;
&lt;p&gt;Second, keep skills short.&lt;/p&gt;
&lt;p&gt;One skill should solve one clear problem. The shorter it is, the easier it is to call correctly and maintain.&lt;/p&gt;
&lt;p&gt;Third, make output format clear.&lt;/p&gt;
&lt;p&gt;If you want the Agent to list a plan first, execute next, and summarize verification results at the end, write that structure clearly. Vague requirements usually produce vague results.&lt;/p&gt;
&lt;p&gt;Fourth, keep human handoff points.&lt;/p&gt;
&lt;p&gt;A good skill should not let AI run too far alone. When there is uncertainty, expanded impact scope, failing tests, or a product decision, it should stop and explain the situation.&lt;/p&gt;
&lt;h2 id=&#34;notes-for-use&#34;&gt;Notes for Use
&lt;/h2&gt;&lt;p&gt;First, do not turn everything into a skill.&lt;/p&gt;
&lt;p&gt;Too many skills make the system complex, and the Agent may not know which one to choose. Start with the highest-frequency and most painful scenarios.&lt;/p&gt;
&lt;p&gt;Second, skills need iteration.&lt;/p&gt;
&lt;p&gt;The first version of a skill may not be good. Watch how AI actually executes it, then gradually delete, add, and rewrite.&lt;/p&gt;
&lt;p&gt;Third, do not let skills replace engineering judgment.&lt;/p&gt;
&lt;p&gt;Skills can improve workflow, but they cannot guarantee correct implementation. Tests, review, build checks, and human judgment still matter.&lt;/p&gt;
&lt;p&gt;Fourth, pay attention to differences between Agents.&lt;/p&gt;
&lt;p&gt;Claude Code, Codex, Cursor, and Copilot support instructions, skills, and rules differently. The same idea can be reused, but the specific format should be adjusted for each tool.&lt;/p&gt;
&lt;h2 id=&#34;reference&#34;&gt;Reference
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/mattpocock/skills&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;mattpocock/skills&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;final-thought&#34;&gt;Final Thought
&lt;/h2&gt;&lt;p&gt;What makes &lt;code&gt;mattpocock/skills&lt;/code&gt; worth watching is not one magic prompt inside it, but the practical AI coding idea it demonstrates: break engineering experience into small skills, then let the Agent combine them by scenario.&lt;/p&gt;
&lt;p&gt;As AI coding moves from occasional assistance into daily workflow, skills become important tools for constraining Agents, keeping engineers in control, and improving feedback quality.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>free-claude-code: Connecting Claude Code to OpenRouter, DeepSeek, and Local Models Through a Proxy</title>
        <link>https://knightli.com/en/2026/05/01/free-claude-code-anthropic-compatible-proxy/</link>
        <pubDate>Fri, 01 May 2026 03:41:49 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/free-claude-code-anthropic-compatible-proxy/</guid>
        <description>&lt;p&gt;&lt;code&gt;free-claude-code&lt;/code&gt; is an Anthropic-compatible proxy for &lt;code&gt;Claude Code&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Its idea is not to crack Claude Code, nor to provide an official free Claude service. Instead, it starts a local proxy service that looks like an Anthropic API, then forwards requests from Claude Code to other model backends. The README mentions backends such as NVIDIA NIM, OpenRouter, DeepSeek, LM Studio, llama.cpp, and Ollama.&lt;/p&gt;
&lt;p&gt;In simple terms, it solves this problem: you like the terminal experience of Claude Code, but want to send model requests to another provider or a local model.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-it-solves&#34;&gt;What Problem It Solves
&lt;/h2&gt;&lt;p&gt;Claude Code has an interaction model that works well for development tasks.&lt;/p&gt;
&lt;p&gt;It can read code, edit files, run commands, and move tasks forward based on project context inside the terminal. But many users may not always want to use the same model backend:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;They want to try different models on OpenRouter&lt;/li&gt;
&lt;li&gt;They want to use models such as DeepSeek to reduce cost&lt;/li&gt;
&lt;li&gt;They want to route requests to local Ollama&lt;/li&gt;
&lt;li&gt;They want to run local models through LM Studio or llama.cpp&lt;/li&gt;
&lt;li&gt;They want one proxy entry point in the development environment&lt;/li&gt;
&lt;li&gt;They want to compare different models inside the Claude Code workflow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;free-claude-code&lt;/code&gt; is positioned as a compatibility layer between Claude Code and these model services.&lt;/p&gt;
&lt;p&gt;Claude Code still sends requests in an Anthropic-like style, while the proxy adapts those requests to different backends.&lt;/p&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;How It Works
&lt;/h2&gt;&lt;p&gt;You can think of it as three layers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The frontend is Claude Code&lt;/li&gt;
&lt;li&gt;The middle layer is the &lt;code&gt;free-claude-code&lt;/code&gt; proxy&lt;/li&gt;
&lt;li&gt;The backend is OpenRouter, DeepSeek, a local model, or another model service&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Claude Code believes it is accessing an Anthropic-compatible API.&lt;/p&gt;
&lt;p&gt;After the proxy receives a request, it selects a target provider according to configuration, transforms the necessary fields, and returns the response to Claude Code.&lt;/p&gt;
&lt;p&gt;The benefit of this structure is that you do not need to modify Claude Code itself, and you do not need every model service to natively support Claude Code. As long as the proxy can align the interfaces, more models can be connected to the same workflow.&lt;/p&gt;
&lt;h2 id=&#34;supported-backends&#34;&gt;Supported Backends
&lt;/h2&gt;&lt;p&gt;The README lists these directions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;NVIDIA NIM&lt;/li&gt;
&lt;li&gt;OpenRouter&lt;/li&gt;
&lt;li&gt;DeepSeek&lt;/li&gt;
&lt;li&gt;LM Studio&lt;/li&gt;
&lt;li&gt;llama.cpp&lt;/li&gt;
&lt;li&gt;Ollama&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These backends represent different usage styles.&lt;/p&gt;
&lt;p&gt;OpenRouter is more like a model aggregation entry point, useful for testing different commercial and open-source models.&lt;/p&gt;
&lt;p&gt;DeepSeek is suitable for people who care about Chinese ability, coding ability, and cost.&lt;/p&gt;
&lt;p&gt;LM Studio, llama.cpp, and Ollama are more local-model oriented. They are suitable for running models on your own machine or inside an intranet, reducing dependence on external APIs and making offline experiments easier.&lt;/p&gt;
&lt;p&gt;NVIDIA NIM is more oriented toward enterprise and GPU inference deployment scenarios.&lt;/p&gt;
&lt;h2 id=&#34;why-an-anthropic-compatible-proxy&#34;&gt;Why an Anthropic-Compatible Proxy
&lt;/h2&gt;&lt;p&gt;Claude Code was originally designed around Anthropic interfaces and model conventions.&lt;/p&gt;
&lt;p&gt;If you want to connect it to other models, the most direct problem is interface mismatch:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Request fields differ&lt;/li&gt;
&lt;li&gt;Model names differ&lt;/li&gt;
&lt;li&gt;Streaming formats differ&lt;/li&gt;
&lt;li&gt;Tool use is represented differently&lt;/li&gt;
&lt;li&gt;Error response formats differ&lt;/li&gt;
&lt;li&gt;Token and context limits differ&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is where the proxy layer is useful.&lt;/p&gt;
&lt;p&gt;It keeps the interface seen by Claude Code close to the Anthropic shape, then adapts to the backend. For users, after configuring the proxy once, they can test different models inside the same Claude Code workflow.&lt;/p&gt;
&lt;h2 id=&#34;suitable-scenarios&#34;&gt;Suitable Scenarios
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;free-claude-code&lt;/code&gt; is suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Using the Claude Code terminal workflow&lt;/li&gt;
&lt;li&gt;Testing non-Anthropic models in Claude Code&lt;/li&gt;
&lt;li&gt;Reducing model calling costs&lt;/li&gt;
&lt;li&gt;Connecting Claude Code to OpenRouter&lt;/li&gt;
&lt;li&gt;Connecting to compatible model services such as DeepSeek&lt;/li&gt;
&lt;li&gt;Running local models through Ollama, LM Studio, or llama.cpp&lt;/li&gt;
&lt;li&gt;Giving a team one unified model proxy entry point&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only use official Claude Code normally and have no special needs around providers, cost, or local deployment, you may not need this type of proxy.&lt;/p&gt;
&lt;p&gt;But if you often compare models, or want Claude Code to connect to local and third-party models, this type of tool is useful.&lt;/p&gt;
&lt;h2 id=&#34;difference-from-directly-using-openrouter-or-ollama&#34;&gt;Difference from Directly Using OpenRouter or Ollama
&lt;/h2&gt;&lt;p&gt;Using OpenRouter, Ollama, or LM Studio directly usually means chatting with a model or calling it through an API.&lt;/p&gt;
&lt;p&gt;The point of &lt;code&gt;free-claude-code&lt;/code&gt; is not to replace those services, but to connect them to the Claude Code development workflow.&lt;/p&gt;
&lt;p&gt;The difference is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You still use the Claude Code terminal experience&lt;/li&gt;
&lt;li&gt;AI can execute tasks around a code repository&lt;/li&gt;
&lt;li&gt;The model backend can be changed to another provider&lt;/li&gt;
&lt;li&gt;Local models can enter the Claude Code workflow&lt;/li&gt;
&lt;li&gt;Configuration is centralized in the proxy layer instead of changed in each tool&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So it is more like a bridge than a new chat client.&lt;/p&gt;
&lt;h2 id=&#34;notes-about-local-models&#34;&gt;Notes About Local Models
&lt;/h2&gt;&lt;p&gt;Connecting Claude Code to local models is attractive, but there are real limitations.&lt;/p&gt;
&lt;p&gt;First, model capability differs.&lt;/p&gt;
&lt;p&gt;Claude Code tasks are usually not just chat. They include understanding code, planning modifications, editing files, and handling command output. Smaller local models may not complete these tasks reliably.&lt;/p&gt;
&lt;p&gt;Second, context window matters.&lt;/p&gt;
&lt;p&gt;Code tasks need a lot of context. If the model context is too small, it may fail to read full files, miss constraints, or lose background across multi-turn tasks.&lt;/p&gt;
&lt;p&gt;Third, tool use compatibility matters.&lt;/p&gt;
&lt;p&gt;Claude Code workflows depend on tool calls and structured behavior. Even if a backend model can chat, it may not follow tool-use protocols well.&lt;/p&gt;
&lt;p&gt;Fourth, speed and hardware matter.&lt;/p&gt;
&lt;p&gt;Local model speed depends on machine configuration, quantization, and model size. If code tasks respond too slowly, the experience drops noticeably.&lt;/p&gt;
&lt;p&gt;So local models are better for experiments, low-risk tasks, and specific scenarios. For truly complex coding tasks, choose carefully according to model capability.&lt;/p&gt;
&lt;h2 id=&#34;usage-boundaries&#34;&gt;Usage Boundaries
&lt;/h2&gt;&lt;p&gt;Projects like this are easy to misunderstand from the title, so the boundaries should be clear.&lt;/p&gt;
&lt;p&gt;First, it is not an official free Claude Code quota.&lt;/p&gt;
&lt;p&gt;It only forwards Claude Code requests to other model backends. When using OpenRouter, DeepSeek, NVIDIA NIM, or other APIs, you still need to follow the pricing, quotas, and terms of the corresponding services.&lt;/p&gt;
&lt;p&gt;Second, it is not a tool for bypassing authorization.&lt;/p&gt;
&lt;p&gt;When using any proxy tool, you should follow the licenses and terms of Claude Code, model providers, and the project itself. Do not interpret it as a way to avoid official restrictions.&lt;/p&gt;
&lt;p&gt;Third, the proxy handles your request content.&lt;/p&gt;
&lt;p&gt;Code, command output, and project context may pass through the proxy and backend services. When deploying, consider logs, keys, network boundaries, and privacy. For company code or sensitive projects, use a controlled environment.&lt;/p&gt;
&lt;p&gt;Fourth, model performance varies greatly.&lt;/p&gt;
&lt;p&gt;The same Claude Code operation may behave very differently after switching models. Do not assume every model can replace Claude.&lt;/p&gt;
&lt;h2 id=&#34;relationship-with-proxies-such-as-litellm&#34;&gt;Relationship with Proxies Such as LiteLLM
&lt;/h2&gt;&lt;p&gt;Conceptually, &lt;code&gt;free-claude-code&lt;/code&gt; belongs to the category of compatible interface proxies.&lt;/p&gt;
&lt;p&gt;The shared goal of such tools is to reduce coupling between upper-level applications and lower-level model services. The upper-level application faces a relatively unified interface, while backend providers can be switched by configuration.&lt;/p&gt;
&lt;p&gt;Different projects focus on different areas. Some are general model gateways, some focus on OpenAI-compatible APIs, and some specifically adapt tools such as Claude Code.&lt;/p&gt;
&lt;p&gt;What makes &lt;code&gt;free-claude-code&lt;/code&gt; worth noting is that it puts Claude Code directly at the center, rather than building a generic chat proxy.&lt;/p&gt;
&lt;h2 id=&#34;suitable-users&#34;&gt;Suitable Users
&lt;/h2&gt;&lt;p&gt;It is better suited to users who are comfortable tinkering:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Familiar with Claude Code&lt;/li&gt;
&lt;li&gt;Know how to configure API keys and model providers&lt;/li&gt;
&lt;li&gt;Understand proxy service startup and environment variables&lt;/li&gt;
&lt;li&gt;Can troubleshoot network, port, model name, and streaming issues&lt;/li&gt;
&lt;li&gt;Want to compare different models on coding tasks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only want something that works out of the box, the official configuration is usually simpler.&lt;/p&gt;
&lt;p&gt;If you are willing to set up a proxy, switch models, tune parameters, and let Claude Code enter more model environments, this project is worth studying.&lt;/p&gt;
&lt;h2 id=&#34;reference&#34;&gt;Reference
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Alishahryar1/free-claude-code&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Alishahryar1/free-claude-code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;final-thought&#34;&gt;Final Thought
&lt;/h2&gt;&lt;p&gt;The value of &lt;code&gt;free-claude-code&lt;/code&gt; is not in the word “free,” but in the bridge it builds between Claude Code and more model backends.&lt;/p&gt;
&lt;p&gt;When you want to keep the Claude Code development experience while testing OpenRouter, DeepSeek, local models, or enterprise inference services, an Anthropic-compatible proxy like this becomes useful.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Compound Engineering Plugin: Turning AI Coding into a Plan, Execute, Review Engineering Loop</title>
        <link>https://knightli.com/en/2026/05/01/compound-engineering-plugin-ai-coding-workflow/</link>
        <pubDate>Fri, 01 May 2026 03:15:39 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/compound-engineering-plugin-ai-coding-workflow/</guid>
        <description>&lt;p&gt;&lt;code&gt;Compound Engineering Plugin&lt;/code&gt; is an open-source AI coding workflow plugin from Every Inc.&lt;/p&gt;
&lt;p&gt;It is not focused on “making AI write a piece of code faster.” Instead, it places AI coding inside a loop that looks more like an engineering team: plan first, implement next, review afterward, then preserve what was learned. For people who frequently use tools such as Claude Code, Codex, Cursor, and Copilot, this kind of plugin solves a workflow problem, not just a prompt problem.&lt;/p&gt;
&lt;p&gt;AI coding tools are becoming stronger, but in real projects the hardest part is often not generating code. It is making the AI continuously follow project rules, understand task boundaries, avoid repeating mistakes, and accumulate context across multiple iterations.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-it-solves&#34;&gt;What Problem It Solves
&lt;/h2&gt;&lt;p&gt;Many people use AI coding assistants in a flow like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Describe the requirement directly&lt;/li&gt;
&lt;li&gt;Ask AI to modify the code&lt;/li&gt;
&lt;li&gt;Check whether the result runs&lt;/li&gt;
&lt;li&gt;Add more explanation after errors appear&lt;/li&gt;
&lt;li&gt;Explain the background again in the next task&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This can work for small tasks, but it easily breaks down in complex projects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Requirements are not clarified before AI starts editing&lt;/li&gt;
&lt;li&gt;There is no systematic review after code changes&lt;/li&gt;
&lt;li&gt;Project conventions depend on repeated user reminders&lt;/li&gt;
&lt;li&gt;Similar mistakes happen again next time&lt;/li&gt;
&lt;li&gt;Multiple Agent tools lack a shared working method&lt;/li&gt;
&lt;li&gt;Experience is not turned into reusable rules&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;Compound Engineering Plugin&lt;/code&gt; is designed for this class of problems. It splits AI coding into multiple stages, so an Agent is not only executing commands but participating in a more complete engineering process.&lt;/p&gt;
&lt;h2 id=&#34;what-is-compound-engineering&#34;&gt;What Is Compound Engineering
&lt;/h2&gt;&lt;p&gt;From the project README, Compound Engineering can be understood as a method for AI-assisted software development.&lt;/p&gt;
&lt;p&gt;It emphasizes a loop:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Plan: understand the goal, split the task, confirm the path&lt;/li&gt;
&lt;li&gt;Execute: modify code according to the plan, run commands, handle problems&lt;/li&gt;
&lt;li&gt;Review: check implementation quality, risks, and test coverage&lt;/li&gt;
&lt;li&gt;Learn: preserve experience as reusable rules for future work&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This loop resembles how real engineering teams work.&lt;/p&gt;
&lt;p&gt;A reliable engineer does not receive a requirement and immediately make random changes, nor does he finish edits and hand them off without checking. He first judges the impact scope, then implements, then checks risks and test results, and finally records the traps he stepped into. AI Agents need similar constraints.&lt;/p&gt;
&lt;h2 id=&#34;why-a-plugin-is-needed&#34;&gt;Why a Plugin Is Needed
&lt;/h2&gt;&lt;p&gt;A prompt can tell AI, “Please plan before executing,” but prompts themselves are not always stable.&lt;/p&gt;
&lt;p&gt;Once a conversation becomes long and context becomes complex, the model may skip planning, ignore rules, or become overconfident in order to finish the task. The value of a plugin is that it fixes the workflow so different Agent environments can follow similar methods.&lt;/p&gt;
&lt;p&gt;This kind of plugin usually breaks a workflow into commands, rules, templates, or subflows. The user does not need to manually write the full prompt every time. Instead, a fixed entry point triggers a specific stage.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ask the Agent to generate a plan first&lt;/li&gt;
&lt;li&gt;Implement step by step according to the plan&lt;/li&gt;
&lt;li&gt;Trigger review after edits&lt;/li&gt;
&lt;li&gt;Return to fixing after problems are found&lt;/li&gt;
&lt;li&gt;Write useful experience into memory or rules&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This makes AI coding feel more like controlled collaboration instead of one-off chat.&lt;/p&gt;
&lt;h2 id=&#34;supported-agent-environments&#34;&gt;Supported Agent Environments
&lt;/h2&gt;&lt;p&gt;The README mentions support for multiple AI coding environments, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Claude Code&lt;/li&gt;
&lt;li&gt;Codex&lt;/li&gt;
&lt;li&gt;Cursor&lt;/li&gt;
&lt;li&gt;GitHub Copilot&lt;/li&gt;
&lt;li&gt;Amp&lt;/li&gt;
&lt;li&gt;Factory&lt;/li&gt;
&lt;li&gt;Qwen Code&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is worth noting.&lt;/p&gt;
&lt;p&gt;Many workflow tools are tied to one client. Once you switch tools, the rules cannot be reused. &lt;code&gt;Compound Engineering Plugin&lt;/code&gt; is more like a cross-Agent engineering method, bringing similar planning, execution, and review workflows to different tools.&lt;/p&gt;
&lt;p&gt;If you use multiple AI coding assistants at the same time, this unified workflow becomes more valuable. Different tools have different capabilities, but project conventions, review habits, and task decomposition methods should remain as consistent as possible.&lt;/p&gt;
&lt;h2 id=&#34;why-the-planning-stage-matters&#34;&gt;Why the Planning Stage Matters
&lt;/h2&gt;&lt;p&gt;The value of the planning stage is to stop AI from acting too early.&lt;/p&gt;
&lt;p&gt;In complex tasks, the truly important questions are usually:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which files need to change?&lt;/li&gt;
&lt;li&gt;Which modules may be affected?&lt;/li&gt;
&lt;li&gt;What existing pattern should be followed?&lt;/li&gt;
&lt;li&gt;Are there tests?&lt;/li&gt;
&lt;li&gt;Where are the risks?&lt;/li&gt;
&lt;li&gt;Should documents be read first?&lt;/li&gt;
&lt;li&gt;Can the task be split into smaller steps?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If an Agent starts writing code before thinking through these questions, it can easily produce an implementation that looks finished but deviates from the project structure.&lt;/p&gt;
&lt;p&gt;A plan does not need to be long. A good plan should be short, specific, and executable. Its purpose is not to create documentation, but to give the following implementation clear boundaries.&lt;/p&gt;
&lt;h2 id=&#34;what-to-avoid-in-execution&#34;&gt;What to Avoid in Execution
&lt;/h2&gt;&lt;p&gt;When AI executes coding tasks, several problems appear easily:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Refactoring unrelated code&lt;/li&gt;
&lt;li&gt;Overwriting existing user changes&lt;/li&gt;
&lt;li&gt;Only handling the happy path&lt;/li&gt;
&lt;li&gt;Ignoring error handling&lt;/li&gt;
&lt;li&gt;Not following the existing project style&lt;/li&gt;
&lt;li&gt;Not running necessary verification&lt;/li&gt;
&lt;li&gt;Blindly trying things after errors&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A workflow plugin cannot guarantee these problems will disappear, but it can reduce their probability through rules and staged constraints.&lt;/p&gt;
&lt;p&gt;For example, the execution stage can require the Agent to proceed according to the plan. When it discovers something outside the plan, it should explain the risk first. When modifying shared modules, it should add tests or at least run related verification.&lt;/p&gt;
&lt;p&gt;This is especially important in large codebases. The faster AI writes code, the more process is needed to constrain its momentum.&lt;/p&gt;
&lt;h2 id=&#34;why-review-matters&#34;&gt;Why Review Matters
&lt;/h2&gt;&lt;p&gt;Many AI coding failures are not caused by code that cannot run at all. They come from detail problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Edge cases are not handled&lt;/li&gt;
&lt;li&gt;State updates are inconsistent&lt;/li&gt;
&lt;li&gt;API contracts are changed quietly&lt;/li&gt;
&lt;li&gt;Tests do not cover key paths&lt;/li&gt;
&lt;li&gt;Error messages are unclear&lt;/li&gt;
&lt;li&gt;Performance or security risks are not mentioned&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The review stage switches the Agent from “author mode” to “reviewer mode.”&lt;/p&gt;
&lt;p&gt;Author mode tends to justify its own implementation. Reviewer mode should actively look for holes, regression risks, and missing tests. Separating these two stages is more reliable than asking the same response to both implement and self-review.&lt;/p&gt;
&lt;p&gt;For users, review output is also more valuable. It helps you quickly judge whether the change is ready to merge or still needs rework.&lt;/p&gt;
&lt;h2 id=&#34;the-meaning-of-learning-and-memory&#34;&gt;The Meaning of Learning and Memory
&lt;/h2&gt;&lt;p&gt;The word “Compound” in the project name suggests an important idea: engineering experience should compound.&lt;/p&gt;
&lt;p&gt;If AI fixes a mistake only for the current task and then repeats the same mistake next time, the productivity gain is limited. A better approach is to preserve useful experience:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Directory conventions in this project&lt;/li&gt;
&lt;li&gt;Debugging methods for a class of errors&lt;/li&gt;
&lt;li&gt;Test commands and notes&lt;/li&gt;
&lt;li&gt;Generated files that should not be touched&lt;/li&gt;
&lt;li&gt;Code style preferences&lt;/li&gt;
&lt;li&gt;Common implementation patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These experiences can become rules, memories, documents, or templates. In later tasks, the Agent reads these accumulated notes before starting work.&lt;/p&gt;
&lt;p&gt;This is the key to moving AI coding from “one-off Q&amp;amp;A” toward “long-term collaboration.”&lt;/p&gt;
&lt;h2 id=&#34;suitable-scenarios&#34;&gt;Suitable Scenarios
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Compound Engineering Plugin&lt;/code&gt; is suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Long-term use of AI Agents for coding&lt;/li&gt;
&lt;li&gt;Projects that receive many rounds of modifications&lt;/li&gt;
&lt;li&gt;Teams that want AI to plan before implementing&lt;/li&gt;
&lt;li&gt;Users who want review thinking after changes&lt;/li&gt;
&lt;li&gt;Teams that want a unified AI coding workflow&lt;/li&gt;
&lt;li&gt;People who use Claude Code, Codex, Cursor, and other tools at the same time&lt;/li&gt;
&lt;li&gt;Teams that want to turn project experience into reusable rules&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only occasionally ask AI to write a small script, the full workflow may feel heavy.&lt;/p&gt;
&lt;p&gt;But if you treat AI coding assistants as daily development partners, the plan, execute, review, learn loop becomes clearly useful.&lt;/p&gt;
&lt;h2 id=&#34;difference-from-normal-prompt-templates&#34;&gt;Difference from Normal Prompt Templates
&lt;/h2&gt;&lt;p&gt;Normal prompt templates usually solve “how to state the task clearly.”&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Please think step by step&lt;/li&gt;
&lt;li&gt;Please read the files first&lt;/li&gt;
&lt;li&gt;Please keep code style consistent&lt;/li&gt;
&lt;li&gt;Please run tests&lt;/li&gt;
&lt;li&gt;Please summarize the changes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These prompts are useful, but they still rely on the user using them correctly every time.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Compound Engineering Plugin&lt;/code&gt; operates more at the workflow layer. It organizes these requirements into a repeatable process and adapts them to different Agent tools. You are not writing prompts from scratch every time; you are moving tasks through a workflow.&lt;/p&gt;
&lt;p&gt;Simply put, a prompt template is like a reminder, while a workflow plugin is like a system.&lt;/p&gt;
&lt;h2 id=&#34;notes-for-use&#34;&gt;Notes for Use
&lt;/h2&gt;&lt;p&gt;First, do not let the process become a burden.&lt;/p&gt;
&lt;p&gt;Small tasks do not always need a full plan and long review. A good workflow should adapt to task complexity: handle simple problems quickly and use the full loop for complex ones.&lt;/p&gt;
&lt;p&gt;Second, review cannot replace tests.&lt;/p&gt;
&lt;p&gt;Agent review can find many problems, but it can still miss real runtime errors. Final judgment still depends on tests, type checks, build results, and human review.&lt;/p&gt;
&lt;p&gt;Third, rules need continuous cleanup.&lt;/p&gt;
&lt;p&gt;Preserving experience is important, but rules can become noise as they accumulate. Outdated rules, duplicate rules, and temporary experience that only applied to one task should be cleaned up regularly.&lt;/p&gt;
&lt;p&gt;Fourth, cross-tool consistency does not mean everything is identical.&lt;/p&gt;
&lt;p&gt;Claude Code, Codex, Cursor, Copilot, and other tools have different capabilities and interaction models. What should be unified is the working method, not necessarily every command or configuration detail.&lt;/p&gt;
&lt;h2 id=&#34;suitable-teams&#34;&gt;Suitable Teams
&lt;/h2&gt;&lt;p&gt;If a team already allows AI Agents to modify real code, it is not enough to discuss only “which model is stronger.”&lt;/p&gt;
&lt;p&gt;The more important questions are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Does AI understand the task before editing?&lt;/li&gt;
&lt;li&gt;Does AI follow project boundaries during editing?&lt;/li&gt;
&lt;li&gt;Does AI actively review risks after editing?&lt;/li&gt;
&lt;li&gt;Can AI learn from historical mistakes?&lt;/li&gt;
&lt;li&gt;Does the team have unified Agent usage conventions?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is where projects such as &lt;code&gt;Compound Engineering Plugin&lt;/code&gt; matter. They move AI coding one step away from personal tricks and toward reusable team workflow.&lt;/p&gt;
&lt;h2 id=&#34;reference&#34;&gt;Reference
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/EveryInc/compound-engineering-plugin&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;EveryInc/compound-engineering-plugin&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;final-thought&#34;&gt;Final Thought
&lt;/h2&gt;&lt;p&gt;What makes &lt;code&gt;Compound Engineering Plugin&lt;/code&gt; worth watching is not that it adds another AI coding command, but that it organizes AI coding into an engineering workflow that can improve over time.&lt;/p&gt;
&lt;p&gt;When AI Agents start participating in real projects, planning, execution, review, and experience preservation become more important than one-off code generation.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>TradingAgents-CN: A Multi-Agent Financial Trading Research Framework for Chinese Users</title>
        <link>https://knightli.com/en/2026/05/01/tradingagents-cn-multi-agent-financial-research-framework/</link>
        <pubDate>Fri, 01 May 2026 03:14:15 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/tradingagents-cn-multi-agent-financial-research-framework/</guid>
        <description>&lt;p&gt;&lt;code&gt;TradingAgents-CN&lt;/code&gt; is a multi-agent financial trading research framework for Chinese users.&lt;/p&gt;
&lt;p&gt;Its goal is not to give a simple answer such as “which stock should I buy.” Instead, it uses multiple AI Agents to simulate a more complete financial analysis team: one role looks at fundamentals, another looks at technicals, another follows news and sentiment, while others handle risk and final decisions. For people studying LLM + Agent + financial analysis, this kind of project is a good experimental entry point.&lt;/p&gt;
&lt;p&gt;One thing should be clear first: tools like this are suitable for learning, research, and auxiliary analysis. They should not be treated as real trading advice. Financial markets involve risk, and model outputs can be wrong, delayed, or overconfident.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-it-solves&#34;&gt;What Problem It Solves
&lt;/h2&gt;&lt;p&gt;Normal chat models can also analyze stocks.&lt;/p&gt;
&lt;p&gt;You can directly ask, “Help me analyze whether a company is worth buying.” The model may return an answer that looks complete. But this approach has several problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The analysis chain is not transparent&lt;/li&gt;
&lt;li&gt;Different dimensions are easily mixed together&lt;/li&gt;
&lt;li&gt;There is no clear role division&lt;/li&gt;
&lt;li&gt;There is little collision between positive and negative views&lt;/li&gt;
&lt;li&gt;Risk warnings may become formulaic&lt;/li&gt;
&lt;li&gt;It is hard to reproduce the same analysis workflow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;TradingAgents-CN&lt;/code&gt; breaks financial analysis into multiple roles. Different Agents are responsible for different perspectives, and the final analysis is formed through collaboration, discussion, and summarization.&lt;/p&gt;
&lt;p&gt;This is closer to a real investment research workflow. An investment judgment usually does not rely on one news item or one technical indicator. It needs company fundamentals, market environment, price movement, capital sentiment, policy risk, and position control.&lt;/p&gt;
&lt;h2 id=&#34;what-multi-agent-analysis-means&#34;&gt;What Multi-Agent Analysis Means
&lt;/h2&gt;&lt;p&gt;Multi-agent analysis is not simply asking several models to speak in turn.&lt;/p&gt;
&lt;p&gt;The more valuable approach is assigning clear responsibilities to different Agents. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Market analysis Agent: focuses on market trends, price changes, and the market environment&lt;/li&gt;
&lt;li&gt;Fundamental analysis Agent: focuses on business, financial data, and long-term value&lt;/li&gt;
&lt;li&gt;News analysis Agent: focuses on announcements, news, public sentiment, and event impact&lt;/li&gt;
&lt;li&gt;Technical analysis Agent: focuses on trends, indicators, support and resistance, and trading signals&lt;/li&gt;
&lt;li&gt;Risk management Agent: focuses on volatility, drawdown, positions, and uncertainty&lt;/li&gt;
&lt;li&gt;Decision Agent: combines different views and forms a final judgment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This structure reduces the problem of a single model trying to “say everything in one breath.”&lt;/p&gt;
&lt;p&gt;When different roles analyze the same target, the system can present multi-dimensional judgments more easily and expose disagreements more naturally. For learners, this is more useful than reading only a summary.&lt;/p&gt;
&lt;h2 id=&#34;why-a-chinese-version-is-needed&#34;&gt;Why a Chinese Version Is Needed
&lt;/h2&gt;&lt;p&gt;Financial analysis is deeply connected to language and market context.&lt;/p&gt;
&lt;p&gt;Chinese users care about different data sources, market habits, stock names, trading systems, news expressions, and financial terms compared with English environments. Using an English framework directly often creates several problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chinese stock names and codes are not handled smoothly&lt;/li&gt;
&lt;li&gt;A-share, Hong Kong stock, and US stock contexts are mixed&lt;/li&gt;
&lt;li&gt;Chinese financial news is not understood stably&lt;/li&gt;
&lt;li&gt;Domestic data sources are inconvenient to access&lt;/li&gt;
&lt;li&gt;Output style does not match Chinese users’ reading habits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The value of &lt;code&gt;TradingAgents-CN&lt;/code&gt; is that it adapts the multi-agent financial analysis workflow for Chinese users. It makes it easier for Chinese users to set up, run, and understand the entire trading analysis experiment process.&lt;/p&gt;
&lt;h2 id=&#34;what-it-can-be-used-for&#34;&gt;What It Can Be Used For
&lt;/h2&gt;&lt;p&gt;This project is more suitable for research and auxiliary analysis than for automatic order execution.&lt;/p&gt;
&lt;p&gt;Suitable uses include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learning how multi-agent systems collaborate&lt;/li&gt;
&lt;li&gt;Studying LLM performance in financial analysis&lt;/li&gt;
&lt;li&gt;Organizing stock information from multiple perspectives&lt;/li&gt;
&lt;li&gt;Comparing different models on investment research tasks&lt;/li&gt;
&lt;li&gt;Building your own financial analysis Agent prototype&lt;/li&gt;
&lt;li&gt;Reviewing historical information and risk points for a target&lt;/li&gt;
&lt;li&gt;Practicing how to break investment research workflows into executable tasks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are learning quantitative trading, financial engineering, AI Agent systems, or LLM application development, this kind of project can help you understand the engineering structure behind an “AI investment research assistant.”&lt;/p&gt;
&lt;h2 id=&#34;what-it-is-not-suitable-for&#34;&gt;What It Is Not Suitable For
&lt;/h2&gt;&lt;p&gt;It is not suitable as a guaranteed profit tool.&lt;/p&gt;
&lt;p&gt;It is especially not suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Buying or selling with full position based directly on output&lt;/li&gt;
&lt;li&gt;Replacing your own risk judgment with model conclusions&lt;/li&gt;
&lt;li&gt;Treating short-term price predictions as certain results&lt;/li&gt;
&lt;li&gt;Ignoring transaction costs, slippage, and liquidity&lt;/li&gt;
&lt;li&gt;Connecting to a real account without backtesting&lt;/li&gt;
&lt;li&gt;Replacing a long-term investment strategy with one analysis result&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;LLMs are good at organizing information, generating explanations, and simulating reasoning workflows, but they do not naturally have stable market prediction ability. Financial markets contain strong noise, sudden events, and behavioral games. Model output can only be one reference material.&lt;/p&gt;
&lt;h2 id=&#34;difference-from-normal-quant-frameworks&#34;&gt;Difference from Normal Quant Frameworks
&lt;/h2&gt;&lt;p&gt;Traditional quantitative frameworks focus more on data, factors, backtesting, portfolio optimization, and trading execution.&lt;/p&gt;
&lt;p&gt;For example, you may define strategy rules such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Moving average breakout&lt;/li&gt;
&lt;li&gt;Momentum factor&lt;/li&gt;
&lt;li&gt;Value factor&lt;/li&gt;
&lt;li&gt;Volatility filter&lt;/li&gt;
&lt;li&gt;Stop loss and take profit&lt;/li&gt;
&lt;li&gt;Position management&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then you use historical data to backtest strategy performance.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;TradingAgents-CN&lt;/code&gt; is more of an “agent analysis framework.” It focuses on how multiple LLM Agents collaborate around financial tasks, how to simulate investment research discussions, and how to organize news, fundamentals, technicals, and risk judgment.&lt;/p&gt;
&lt;p&gt;The two are not replacements for each other.&lt;/p&gt;
&lt;p&gt;A more realistic usage is: traditional quant systems handle verifiable rules and backtesting, while Agent systems handle information organization, report generation, viewpoint comparison, and decision support. Whether it can enter real trading still requires rigorous backtesting, risk control, and human review.&lt;/p&gt;
&lt;h2 id=&#34;difference-from-directly-asking-chatgpt&#34;&gt;Difference from Directly Asking ChatGPT
&lt;/h2&gt;&lt;p&gt;Directly asking a model has the lowest barrier, but the process is loose.&lt;/p&gt;
&lt;p&gt;You ask once, it answers once. Change the wording, and the conclusion may change. It is hard to ensure that it analyzes from the same dimensions every time, and hard to make it consistently play multiple mutually checking roles.&lt;/p&gt;
&lt;p&gt;The value of &lt;code&gt;TradingAgents-CN&lt;/code&gt; is that it structures the analysis process:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Roles are clearer&lt;/li&gt;
&lt;li&gt;Steps are more reproducible&lt;/li&gt;
&lt;li&gt;Information sources are easier to organize&lt;/li&gt;
&lt;li&gt;Viewpoint collision is more natural&lt;/li&gt;
&lt;li&gt;Risk checks can be handled separately&lt;/li&gt;
&lt;li&gt;Output looks more like the result of an investment research workflow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is useful for learning and research. You can observe how different Agents affect the final conclusion, replace models, adjust prompts, modify role division, and compare how results change.&lt;/p&gt;
&lt;h2 id=&#34;risks-to-watch&#34;&gt;Risks to Watch
&lt;/h2&gt;&lt;p&gt;First, data quality.&lt;/p&gt;
&lt;p&gt;Financial analysis depends heavily on data. If market data, financial reports, news, or announcements are incomplete or delayed, even a fluent Agent analysis may be built on the wrong foundation.&lt;/p&gt;
&lt;p&gt;Second, model hallucination.&lt;/p&gt;
&lt;p&gt;LLMs may fabricate facts, misunderstand data meaning, or treat old information as new. When specific stocks are involved, you must verify against data sources.&lt;/p&gt;
&lt;p&gt;Third, over-explanation.&lt;/p&gt;
&lt;p&gt;Models are good at giving explanations that sound reasonable, but market price changes may not actually be caused by the reasons listed. Do not mistake post-hoc explanation for causal proof.&lt;/p&gt;
&lt;p&gt;Fourth, the gap between backtesting and live trading.&lt;/p&gt;
&lt;p&gt;Even if a strategy performs well on historical data, real trading still involves slippage, fees, liquidity, suspensions, limit-up and limit-down rules, and extreme market conditions.&lt;/p&gt;
&lt;p&gt;Fifth, license and commercial boundaries.&lt;/p&gt;
&lt;p&gt;The README mentions that the project uses a mixed license. Personal learning, research, and commercial use may have different conditions. If you plan to put it into a commercial product or service, read the project license carefully first.&lt;/p&gt;
&lt;h2 id=&#34;who-should-study-it&#34;&gt;Who Should Study It
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;TradingAgents-CN&lt;/code&gt; is suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Developers who want to learn AI Agent architecture&lt;/li&gt;
&lt;li&gt;People studying LLM financial analysis capability&lt;/li&gt;
&lt;li&gt;Quant traders who want to add natural-language analysis&lt;/li&gt;
&lt;li&gt;Teams building investment research support tools&lt;/li&gt;
&lt;li&gt;People interested in how multi-role collaboration affects decisions&lt;/li&gt;
&lt;li&gt;Users who want to experiment with trading Agents in a Chinese environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your goal is only to get a simple buy/sell suggestion, this project is not the best way to use it. What is more worth studying is its workflow, roles, collaboration, and risk control, not the conclusion of one output.&lt;/p&gt;
&lt;h2 id=&#34;possible-extensions&#34;&gt;Possible Extensions
&lt;/h2&gt;&lt;p&gt;Frameworks like this have many possible extension directions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Connect more reliable data sources&lt;/li&gt;
&lt;li&gt;Add local model support&lt;/li&gt;
&lt;li&gt;Add a backtesting module&lt;/li&gt;
&lt;li&gt;Refine rules for A-shares, Hong Kong stocks, and US stocks&lt;/li&gt;
&lt;li&gt;Add industry analysis Agents&lt;/li&gt;
&lt;li&gt;Add portfolio management and position control&lt;/li&gt;
&lt;li&gt;Improve report citations and data traceability&lt;/li&gt;
&lt;li&gt;Combine Agent conclusions with traditional quant signals&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A truly valuable financial AI system usually does not let the model decide everything alone. It embeds the model into a workflow that is verifiable, traceable, and risk-controlled.&lt;/p&gt;
&lt;h2 id=&#34;reference&#34;&gt;Reference
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/hsliuping/TradingAgents-CN&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;hsliuping/TradingAgents-CN&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;final-thought&#34;&gt;Final Thought
&lt;/h2&gt;&lt;p&gt;What makes &lt;code&gt;TradingAgents-CN&lt;/code&gt; worth watching is not whether it can predict the next candlestick, but that it breaks financial analysis into a multi-agent collaboration workflow.&lt;/p&gt;
&lt;p&gt;It is more reasonable to treat it as a learning and research tool than as an automatic money-making machine.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>qmd: Local Markdown Document Search for AI Agents</title>
        <link>https://knightli.com/en/2026/05/01/qmd-markdown-search-for-ai-agents/</link>
        <pubDate>Fri, 01 May 2026 03:12:57 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/qmd-markdown-search-for-ai-agents/</guid>
        <description>&lt;p&gt;&lt;code&gt;qmd&lt;/code&gt; is a search tool for local Markdown documents, with AI Agents as its main target users.&lt;/p&gt;
&lt;p&gt;It solves a specific problem: when a project contains many &lt;code&gt;.md&lt;/code&gt; documents, AI coding assistants often do not know which file to read, which section to cite, or which instructions are current. Full-text grep can find keywords, but it does not understand meaning well. Putting all documentation into the context wastes window space and easily introduces irrelevant content.&lt;/p&gt;
&lt;p&gt;The idea behind &lt;code&gt;qmd&lt;/code&gt; is to index Markdown documents first, then return the most relevant snippets through a search interface for AI to use. It can be used as a command-line tool, integrated through an SDK, or exposed as an MCP Server for clients that support MCP.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-it-solves&#34;&gt;What Problem It Solves
&lt;/h2&gt;&lt;p&gt;Real projects usually have more than one or two README files.&lt;/p&gt;
&lt;p&gt;You may have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Architecture notes&lt;/li&gt;
&lt;li&gt;API documentation&lt;/li&gt;
&lt;li&gt;Development conventions&lt;/li&gt;
&lt;li&gt;Deployment procedures&lt;/li&gt;
&lt;li&gt;Architecture decision records&lt;/li&gt;
&lt;li&gt;Troubleshooting notes&lt;/li&gt;
&lt;li&gt;Requirement documents&lt;/li&gt;
&lt;li&gt;AI usage instructions&lt;/li&gt;
&lt;li&gt;Toolchain notes and reminders&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Humans can browse documents through directories, but AI Agents need a clear retrieval entry point. Otherwise, they may:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read the wrong document&lt;/li&gt;
&lt;li&gt;Miss key constraints&lt;/li&gt;
&lt;li&gt;Use outdated instructions&lt;/li&gt;
&lt;li&gt;Put irrelevant content into context&lt;/li&gt;
&lt;li&gt;Invent rules in answers based on experience&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is where &lt;code&gt;qmd&lt;/code&gt; is useful. It turns local Markdown documents into a searchable knowledge source, so AI can search first when it needs context, then answer or act based on matched snippets.&lt;/p&gt;
&lt;h2 id=&#34;search-approach&#34;&gt;Search Approach
&lt;/h2&gt;&lt;p&gt;The README says &lt;code&gt;qmd&lt;/code&gt; combines several retrieval methods:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;BM25 keyword search&lt;/li&gt;
&lt;li&gt;Vector search&lt;/li&gt;
&lt;li&gt;LLM reranking&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;BM25 is good for clear keywords. If you search for a function name, configuration key, error code, or file name, it is usually direct and effective.&lt;/p&gt;
&lt;p&gt;Vector search is better for semantic questions. For example, if you ask “how does this project handle permission validation,” the documentation may not contain that exact phrase, but it may contain related descriptions about authentication, access control, and role checks.&lt;/p&gt;
&lt;p&gt;LLM reranking is used to reorder candidate results. The first two steps find potentially relevant content, and the model then judges which snippets best match the current question.&lt;/p&gt;
&lt;p&gt;This combination is more suitable for AI Agents than plain keyword search, because Agent questions are often task intentions rather than fixed keywords.&lt;/p&gt;
&lt;h2 id=&#34;why-markdown&#34;&gt;Why Markdown
&lt;/h2&gt;&lt;p&gt;Markdown is the most common documentation format in development projects.&lt;/p&gt;
&lt;p&gt;It is simple enough to store in Git and structured enough to include headings, lists, code blocks, links, and tables. For AI, Markdown is also easier to parse than PDFs, web snapshots, or screenshots.&lt;/p&gt;
&lt;p&gt;Because &lt;code&gt;qmd&lt;/code&gt; focuses on Markdown, it can process developer documentation more directly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Split content by headings and paragraphs&lt;/li&gt;
&lt;li&gt;Preserve code blocks&lt;/li&gt;
&lt;li&gt;Preserve document paths&lt;/li&gt;
&lt;li&gt;Return snippets suitable for citation&lt;/li&gt;
&lt;li&gt;Let the Agent know which document an answer comes from&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is more stable than asking AI to randomly scan a repository, and it saves more context than putting every document into a prompt at once.&lt;/p&gt;
&lt;h2 id=&#34;three-entry-points&#34;&gt;Three Entry Points
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;qmd&lt;/code&gt; provides three entry points: CLI, SDK, and MCP Server.&lt;/p&gt;
&lt;h3 id=&#34;1-cli&#34;&gt;1. CLI
&lt;/h3&gt;&lt;p&gt;The CLI is suitable for direct terminal use and for scripts.&lt;/p&gt;
&lt;p&gt;You can index a documentation directory and then search related content with commands. For developers, the CLI is the easiest way to validate the tool: first see whether it can find the correct documents, then consider integrating it into more complex workflows.&lt;/p&gt;
&lt;p&gt;This kind of tool is useful inside local projects. For example, before changing code you can search design documents; before debugging, search troubleshooting notes; before writing an API, search API conventions.&lt;/p&gt;
&lt;h3 id=&#34;2-sdk&#34;&gt;2. SDK
&lt;/h3&gt;&lt;p&gt;The SDK is suitable for integrating &lt;code&gt;qmd&lt;/code&gt; into your own tools.&lt;/p&gt;
&lt;p&gt;If you are building an internal development assistant, documentation Q&amp;amp;A system, code review bot, or project knowledge base, you can call the search capability through the SDK instead of asking users to run commands directly.&lt;/p&gt;
&lt;p&gt;The SDK gives more control over:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Search directories&lt;/li&gt;
&lt;li&gt;Query content&lt;/li&gt;
&lt;li&gt;Number of returned results&lt;/li&gt;
&lt;li&gt;Result format&lt;/li&gt;
&lt;li&gt;Whether to pass results to a model for summarization&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This fits scenarios that need deeper integration.&lt;/p&gt;
&lt;h3 id=&#34;3-mcp-server&#34;&gt;3. MCP Server
&lt;/h3&gt;&lt;p&gt;MCP is the most valuable entry point for AI Agents.&lt;/p&gt;
&lt;p&gt;Through MCP Server, clients that support MCP can call &lt;code&gt;qmd&lt;/code&gt; as a document search tool. This lets an Agent search local Markdown documents before acting, instead of guessing project rules.&lt;/p&gt;
&lt;p&gt;A typical workflow could be:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The user asks AI to modify a feature&lt;/li&gt;
&lt;li&gt;AI calls &lt;code&gt;qmd&lt;/code&gt; to search related design documents&lt;/li&gt;
&lt;li&gt;&lt;code&gt;qmd&lt;/code&gt; returns the most relevant Markdown snippets&lt;/li&gt;
&lt;li&gt;AI modifies code based on those document constraints&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is more natural than manually pasting all rules into a new session, and it is better suited to long-term projects.&lt;/p&gt;
&lt;h2 id=&#34;suitable-scenarios&#34;&gt;Suitable Scenarios
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;qmd&lt;/code&gt; is suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Projects with many Markdown documents&lt;/li&gt;
&lt;li&gt;AI Agents that often need to look up project rules&lt;/li&gt;
&lt;li&gt;Teams that want AI answers to cite local documents&lt;/li&gt;
&lt;li&gt;Documentation spread across multiple directories&lt;/li&gt;
&lt;li&gt;Reusing the same retrieval capability across CLI, SDK, and MCP&lt;/li&gt;
&lt;li&gt;Reducing AI coding assistants’ tendency to guess project conventions&lt;/li&gt;
&lt;li&gt;Connecting local knowledge bases to Claude Desktop, Claude Code, or other MCP clients&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your project only has one short README, directly asking AI to read the file is enough.&lt;/p&gt;
&lt;p&gt;But if the documentation has grown to dozens or hundreds of files, or if you want the Agent to search documents before acting, this type of indexing tool becomes meaningful.&lt;/p&gt;
&lt;h2 id=&#34;difference-from-grep&#34;&gt;Difference from grep
&lt;/h2&gt;&lt;p&gt;Tools such as &lt;code&gt;grep&lt;/code&gt; and &lt;code&gt;rg&lt;/code&gt; are excellent for exact search.&lt;/p&gt;
&lt;p&gt;If you know you need &lt;code&gt;DATABASE_URL&lt;/code&gt;, &lt;code&gt;authMiddleware&lt;/code&gt;, &lt;code&gt;404&lt;/code&gt;, or &lt;code&gt;docker compose&lt;/code&gt;, keyword search is usually the fastest.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;qmd&lt;/code&gt; is better when you do not know the exact words.&lt;/p&gt;
&lt;p&gt;For example, you may ask:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What is the release process for this project?&lt;/li&gt;
&lt;li&gt;What conventions apply when adding a new API?&lt;/li&gt;
&lt;li&gt;Was the caching strategy documented before?&lt;/li&gt;
&lt;li&gt;Which documents should AI read before changing code?&lt;/li&gt;
&lt;li&gt;Where is the design background for a module?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These questions usually require semantic retrieval rather than matching one word. The BM25 + vector + reranking combination in &lt;code&gt;qmd&lt;/code&gt; is intended to make these questions find the right context more easily.&lt;/p&gt;
&lt;h2 id=&#34;relationship-with-rag&#34;&gt;Relationship with RAG
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;qmd&lt;/code&gt; can be seen as a lightweight RAG component for Markdown documents.&lt;/p&gt;
&lt;p&gt;It does not try to build a full Q&amp;amp;A system for you. It focuses on one step: finding relevant document snippets. How those snippets are used afterward can be handled by CLI, SDK, an MCP client, or your own Agent workflow.&lt;/p&gt;
&lt;p&gt;This positioning is practical. Many projects do not need a large knowledge base system; they only need AI to search local documents more accurately and quickly, then bring the results back into the current task.&lt;/p&gt;
&lt;h2 id=&#34;notes-for-use&#34;&gt;Notes for Use
&lt;/h2&gt;&lt;p&gt;First, documentation quality still matters.&lt;/p&gt;
&lt;p&gt;A retrieval tool can only find existing content. If the documents are outdated, duplicated, or contradictory, AI may still receive wrong context. Before connecting &lt;code&gt;qmd&lt;/code&gt; to an Agent, clean up the key documents first.&lt;/p&gt;
&lt;p&gt;Second, do not make the index scope too broad.&lt;/p&gt;
&lt;p&gt;Indexing every Markdown file in the repository is not always better. Dependency documentation, temporary notes, and old draft solutions can pollute results. A better approach is to define which directories are trusted documentation sources.&lt;/p&gt;
&lt;p&gt;Third, search results should preserve sources.&lt;/p&gt;
&lt;p&gt;When AI uses document snippets, it should know which file and section they came from. This makes human review traceable and reduces the risk of “this looks like a document conclusion, but it is only a model summary.”&lt;/p&gt;
&lt;p&gt;Fourth, do not replace human judgment completely.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;qmd&lt;/code&gt; can improve context recall quality, but it is not a replacement for the source of truth. Important changes still require current code, test results, and the latest requirements.&lt;/p&gt;
&lt;h2 id=&#34;suitable-teams&#34;&gt;Suitable Teams
&lt;/h2&gt;&lt;p&gt;If your team has already started putting AI Agents into daily development workflows, tools like &lt;code&gt;qmd&lt;/code&gt; can be valuable.&lt;/p&gt;
&lt;p&gt;They are especially suitable for teams that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Write a lot of documentation&lt;/li&gt;
&lt;li&gt;Have a long project history&lt;/li&gt;
&lt;li&gt;Need both new people and AI to quickly understand context&lt;/li&gt;
&lt;li&gt;Maintain architecture decision records&lt;/li&gt;
&lt;li&gt;Have many Markdown convention documents&lt;/li&gt;
&lt;li&gt;Want AI to check rules before modifying code&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Its goal is not to make AI all-knowing. It is to make AI guess less and look things up more.&lt;/p&gt;
&lt;h2 id=&#34;reference&#34;&gt;Reference
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/tobi/qmd&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tobi/qmd&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;final-thought&#34;&gt;Final Thought
&lt;/h2&gt;&lt;p&gt;The value of &lt;code&gt;qmd&lt;/code&gt; is that it turns local Markdown documents into a search entry point that AI Agents can reliably call.&lt;/p&gt;
&lt;p&gt;When project documentation moves from “instructions for humans” to “a context source searchable by both humans and AI,” AI coding assistants can follow project rules more easily.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Claude Code Hooks Mastery: An Introduction to 13 Hook Lifecycle Events and Automation Control</title>
        <link>https://knightli.com/en/2026/05/01/claude-code-hooks-mastery-guide/</link>
        <pubDate>Fri, 01 May 2026 03:11:27 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/claude-code-hooks-mastery-guide/</guid>
        <description>&lt;p&gt;&lt;code&gt;claude-code-hooks-mastery&lt;/code&gt; is a learning project focused on &lt;code&gt;Claude Code Hooks&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;It is not just a collection of scattered scripts. It explains the Claude Code hook lifecycle, configuration methods, script patterns, and common automation scenarios in one place. For people who want Claude Code to be more controllable and more like an engineering assistant, this kind of material is worth reading.&lt;/p&gt;
&lt;p&gt;Claude Code can already read code, edit files, and run commands by default. But if you want it to automatically check permissions, block risky operations, inject project rules, run tests, or remind it of team conventions at specific moments, chat instructions alone are not stable enough. The value of hooks is that they turn “rules I need to remind the AI about every time” into executable workflow.&lt;/p&gt;
&lt;h2 id=&#34;what-problems-hooks-solve&#34;&gt;What Problems Hooks Solve
&lt;/h2&gt;&lt;p&gt;After using Claude Code for a while, common pain points include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Every new session needs the same project rules repeated&lt;/li&gt;
&lt;li&gt;You worry that it may run commands it should not run&lt;/li&gt;
&lt;li&gt;You want checks before and after file edits&lt;/li&gt;
&lt;li&gt;You want formatting, tests, or security scans before committing&lt;/li&gt;
&lt;li&gt;You want team conventions as fixed workflow instead of verbal reminders&lt;/li&gt;
&lt;li&gt;You want context before and after tool calls for logging or blocking&lt;/li&gt;
&lt;li&gt;You want complex tasks to trigger subagents or dedicated scripts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hooks are designed for these “automatic actions at fixed moments.”&lt;/p&gt;
&lt;p&gt;You can think of them as event hooks in the Claude Code workflow. When a session starts, a user submits a prompt, the model is about to call a tool, a tool call finishes, or an agent is about to stop, Claude Code can run the scripts you configured.&lt;/p&gt;
&lt;h2 id=&#34;the-13-hook-lifecycle-events&#34;&gt;The 13 Hook Lifecycle Events
&lt;/h2&gt;&lt;p&gt;One of the main points in the project README is that it systematically covers the 13 Claude Code hook events.&lt;/p&gt;
&lt;p&gt;These events span multiple stages, from session startup to tool calls, and from user input to agent termination. By purpose, they can be roughly grouped as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Session startup: initialize environment and inject project context&lt;/li&gt;
&lt;li&gt;User input: inspect prompts, add rules, and perform auditing&lt;/li&gt;
&lt;li&gt;Before tool calls: permission checks, command blocking, and security validation&lt;/li&gt;
&lt;li&gt;After tool calls: log results, trigger formatting, and run verification&lt;/li&gt;
&lt;li&gt;Task ending: summarize, clean up, notify, or save state&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This lifecycle design means you do not need to put every rule into one very long prompt.&lt;/p&gt;
&lt;p&gt;For example, permission control should happen before tool calls. Formatting checks are better after file edits. Project rule injection is better at session startup or after user input. Putting rules at the right hook point is usually more reliable than stuffing everything into a system prompt.&lt;/p&gt;
&lt;h2 id=&#34;where-configuration-lives&#34;&gt;Where Configuration Lives
&lt;/h2&gt;&lt;p&gt;Claude Code hooks are usually configured through settings files.&lt;/p&gt;
&lt;p&gt;Common locations include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;User-level configuration: &lt;code&gt;~/.claude/settings.json&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Project-level configuration: &lt;code&gt;.claude/settings.json&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;User-level configuration is good for personal preferences, such as general security rules, command blocking, and log paths.&lt;/p&gt;
&lt;p&gt;Project-level configuration is better for repository-specific rules, such as which tests must run, which directories cannot be edited, how generated files are handled, and which checks are required before commit.&lt;/p&gt;
&lt;p&gt;If you use Claude Code in a team, it is better to put project-level configuration into the repository. That way everyone opens the project with the same AI collaboration constraints instead of relying on personal memory.&lt;/p&gt;
&lt;h2 id=&#34;why-single-file-scripts-matter&#34;&gt;Why Single-File Scripts Matter
&lt;/h2&gt;&lt;p&gt;The project emphasizes &lt;code&gt;UV&lt;/code&gt; single-file scripts.&lt;/p&gt;
&lt;p&gt;The benefit is simple deployment. A single Python file can declare dependencies and run without maintaining a complex environment for one hook. This fits hooks well because many hooks only do one small thing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Check whether a command is allowed&lt;/li&gt;
&lt;li&gt;Determine whether a file path is safe&lt;/li&gt;
&lt;li&gt;Read project rules and return them to Claude&lt;/li&gt;
&lt;li&gt;Scan output for sensitive information&lt;/li&gt;
&lt;li&gt;Run formatting or tests after edits&lt;/li&gt;
&lt;li&gt;Write events to logs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The smaller a hook script is, the easier it is to maintain, and the less likely it is to become a new complicated system.&lt;/p&gt;
&lt;h2 id=&#34;what-automation-can-hooks-do&#34;&gt;What Automation Can Hooks Do
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;claude-code-hooks-mastery&lt;/code&gt; shows many directions. In real work, the most common ones are below.&lt;/p&gt;
&lt;h3 id=&#34;1-permission-and-security-control&#34;&gt;1. Permission and Security Control
&lt;/h3&gt;&lt;p&gt;This is the most direct use of hooks.&lt;/p&gt;
&lt;p&gt;Before Claude Code executes a command, a hook can inspect the command content. If it contains high-risk actions such as deletion, reset, cleanup, or overwrite, it can block execution or require manual confirmation.&lt;/p&gt;
&lt;p&gt;Similar rules can apply to file paths:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Do not modify production configuration&lt;/li&gt;
&lt;li&gt;Do not write to secret files&lt;/li&gt;
&lt;li&gt;Do not delete migration scripts&lt;/li&gt;
&lt;li&gt;Do not touch specific directories&lt;/li&gt;
&lt;li&gt;Do not run unapproved network commands&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Putting this protection before tool calls is more reliable than writing “do not perform dangerous operations” in a prompt.&lt;/p&gt;
&lt;h3 id=&#34;2-context-injection&#34;&gt;2. Context Injection
&lt;/h3&gt;&lt;p&gt;Many projects have fixed background information:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tech stack&lt;/li&gt;
&lt;li&gt;Coding conventions&lt;/li&gt;
&lt;li&gt;Test commands&lt;/li&gt;
&lt;li&gt;Branching strategy&lt;/li&gt;
&lt;li&gt;Directory structure&lt;/li&gt;
&lt;li&gt;Prohibited actions&lt;/li&gt;
&lt;li&gt;Rules for generated files&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Telling Claude Code this manually every time is annoying and easy to forget. Hooks can automatically inject necessary context at session startup or after the user submits a prompt.&lt;/p&gt;
&lt;p&gt;This is like giving Claude Code a project-level work manual. It does not replace the README or development documentation, but it helps AI enter the correct state before executing a task.&lt;/p&gt;
&lt;h3 id=&#34;3-verification-after-edits&#34;&gt;3. Verification After Edits
&lt;/h3&gt;&lt;p&gt;After Claude Code modifies files, hooks can automatically trigger checks.&lt;/p&gt;
&lt;p&gt;Common actions include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Run formatting&lt;/li&gt;
&lt;li&gt;Run lint&lt;/li&gt;
&lt;li&gt;Run unit tests&lt;/li&gt;
&lt;li&gt;Check type errors&lt;/li&gt;
&lt;li&gt;Scan generated files&lt;/li&gt;
&lt;li&gt;Validate Markdown or JSON format&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This helps reduce low-level mistakes. When AI edits multiple files, a lightweight verification pass after modification can reveal problems earlier.&lt;/p&gt;
&lt;p&gt;However, hooks should not run heavy tasks by default. Running the full test suite after every file change can make the experience slow. A better approach is to choose checks based on file type, directory, and task risk.&lt;/p&gt;
&lt;h3 id=&#34;4-team-rule-validation&#34;&gt;4. Team Rule Validation
&lt;/h3&gt;&lt;p&gt;If a team already has clear conventions, some of them can be placed in hooks.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Commit message format&lt;/li&gt;
&lt;li&gt;Code style rules&lt;/li&gt;
&lt;li&gt;Do not directly edit certain generated files&lt;/li&gt;
&lt;li&gt;Documentation must be updated together&lt;/li&gt;
&lt;li&gt;API changes must update tests&lt;/li&gt;
&lt;li&gt;Certain directories can only be generated by specific tools&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This makes Claude Code more like part of the team workflow rather than an unconstrained external assistant.&lt;/p&gt;
&lt;p&gt;Of course, hooks should not replace CI. They are better for local reminders and early blocking. Final validation should still belong to CI, review, and test systems.&lt;/p&gt;
&lt;h3 id=&#34;5-subagents-and-dedicated-tasks&#34;&gt;5. Subagents and Dedicated Tasks
&lt;/h3&gt;&lt;p&gt;The README also mentions subagent-related content.&lt;/p&gt;
&lt;p&gt;This type of usage is suitable for sending complex tasks into more specialized workflows. For example, the main conversation can understand the requirement, while a hook or configuration triggers dedicated checking, auditing, summarizing, or documentation tasks.&lt;/p&gt;
&lt;p&gt;For individual users, the first useful step is not complex agent orchestration. It is better to hand repetitive, clear, low-risk actions to hooks first. More complex automation can come after the rules become stable.&lt;/p&gt;
&lt;h2 id=&#34;statusline-and-output-styles&#34;&gt;Statusline and Output Styles
&lt;/h2&gt;&lt;p&gt;The project also covers statusline and output styles.&lt;/p&gt;
&lt;p&gt;This may look like a small experience detail, but it matters for long-term Claude Code usage. A statusline can show current context, task state, environment information, or hints. Output styles can make Claude Code answers fit your working habits better.&lt;/p&gt;
&lt;p&gt;If you collaborate with AI in the same terminal every day, these details affect efficiency. Good status hints reduce mistakes and help you quickly determine whether the current session is in the right project, branch, and environment.&lt;/p&gt;
&lt;h2 id=&#34;do-not-make-hooks-too-heavy&#34;&gt;Do Not Make Hooks Too Heavy
&lt;/h2&gt;&lt;p&gt;Hooks are powerful, but they are not the place to put everything.&lt;/p&gt;
&lt;p&gt;Good rules are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;High-frequency actions should be fast&lt;/li&gt;
&lt;li&gt;Security blocking should be clear&lt;/li&gt;
&lt;li&gt;Output should be short&lt;/li&gt;
&lt;li&gt;Failure reasons should be readable&lt;/li&gt;
&lt;li&gt;Scripts should have a single responsibility&lt;/li&gt;
&lt;li&gt;Heavy checks should be explicit commands or CI tasks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If a hook takes more than ten seconds every time, users will soon want to disable it. If a hook has vague blocking rules, both Claude Code and the user will struggle to understand what to do next.&lt;/p&gt;
&lt;p&gt;Hooks are best for tasks with clear boundaries: allow or reject, add context, log events, run lightweight checks, and suggest the next step.&lt;/p&gt;
&lt;h2 id=&#34;who-should-use-it&#34;&gt;Who Should Use It
&lt;/h2&gt;&lt;p&gt;If you only occasionally ask Claude Code to edit a small piece of code, you may not need to study hooks deeply yet.&lt;/p&gt;
&lt;p&gt;But this project is useful if you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use Claude Code frequently&lt;/li&gt;
&lt;li&gt;Often let AI modify real project code&lt;/li&gt;
&lt;li&gt;Worry about AI running dangerous commands&lt;/li&gt;
&lt;li&gt;Want to automatically inject team rules into AI workflows&lt;/li&gt;
&lt;li&gt;Want checks to run automatically after edits&lt;/li&gt;
&lt;li&gt;Want to turn repeated reminders into configuration&lt;/li&gt;
&lt;li&gt;Are building a more stable AI coding workflow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hooks are especially meaningful in collaborative projects. They can turn part of team experience into scripts instead of relying on every person to remind AI manually.&lt;/p&gt;
&lt;h2 id=&#34;notes-for-use&#34;&gt;Notes for Use
&lt;/h2&gt;&lt;p&gt;First, start with security hooks.&lt;/p&gt;
&lt;p&gt;Compared with complex automation, command blocking, path protection, and sensitive file checks are easier to implement and immediately reduce risk.&lt;/p&gt;
&lt;p&gt;Second, commit project-level rules carefully.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;.claude/settings.json&lt;/code&gt; affects everyone who uses the repository. Before committing rules, make sure they do not over-restrict normal development or depend on paths that only exist on your machine.&lt;/p&gt;
&lt;p&gt;Third, keep hook output concise.&lt;/p&gt;
&lt;p&gt;Claude Code consumes this output. If it is too long, it pollutes the context. If it is too vague, it does not guide the next step. It is best to return only the necessary judgment and next recommendation.&lt;/p&gt;
&lt;p&gt;Fourth, keep hooks debuggable.&lt;/p&gt;
&lt;p&gt;When hooks increase in number, problems can come from configuration, scripts, permissions, paths, dependencies, or Claude Code itself. Clear logs make later debugging much easier.&lt;/p&gt;
&lt;h2 id=&#34;reference&#34;&gt;Reference
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/disler/claude-code-hooks-mastery&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;disler/claude-code-hooks-mastery&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;final-thought&#34;&gt;Final Thought
&lt;/h2&gt;&lt;p&gt;The value of &lt;code&gt;Claude Code Hooks&lt;/code&gt; is turning “rules I hope AI remembers every time” into workflows that actually execute.&lt;/p&gt;
&lt;p&gt;If you already use Claude Code in real projects, hooks are a key step from “a coding assistant that can chat” toward “a constrained engineering collaborator.”&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Claude-Mem: Adding Cross-Session Long-Term Memory to Claude Code</title>
        <link>https://knightli.com/en/2026/05/01/claude-mem-persistent-memory-for-claude-code/</link>
        <pubDate>Fri, 01 May 2026 03:01:02 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/claude-mem-persistent-memory-for-claude-code/</guid>
        <description>&lt;p&gt;&lt;code&gt;Claude-Mem&lt;/code&gt; is a persistent memory system for &lt;code&gt;Claude Code&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;It tries to solve a very specific problem: every time an AI coding assistant starts a new session, it often forgets earlier architecture decisions, past pitfalls, project preferences, and implementation context.&lt;br&gt;
If a project lasts for a long time, repeatedly explaining the same background becomes a waste of time.&lt;/p&gt;
&lt;p&gt;The idea behind &lt;code&gt;Claude-Mem&lt;/code&gt; is to compress Claude Code conversations into memories, store them in a local database and vector store, and then retrieve them later through a search tool.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-does-it-solve&#34;&gt;What Problem Does It Solve?
&lt;/h2&gt;&lt;p&gt;Claude Code is good at code tasks, but session context is still limited.&lt;/p&gt;
&lt;p&gt;Common pain points include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A new session does not know what previous sessions did&lt;/li&gt;
&lt;li&gt;Project design decisions need to be explained repeatedly&lt;/li&gt;
&lt;li&gt;Problems that were already debugged are easy to repeat&lt;/li&gt;
&lt;li&gt;Long-running tasks lack continuity&lt;/li&gt;
&lt;li&gt;Project knowledge is hard to accumulate across conversations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;Claude-Mem&lt;/code&gt; is designed around these problems.&lt;/p&gt;
&lt;p&gt;It is not simply saving chat logs. Instead, it compresses conversations into memory fragments that are easier to retrieve. When needed later, semantic search can bring the relevant context back.&lt;/p&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;How It Works
&lt;/h2&gt;&lt;p&gt;From the README design, &lt;code&gt;Claude-Mem&lt;/code&gt; mainly consists of several parts.&lt;/p&gt;
&lt;p&gt;The first part is hooks.&lt;/p&gt;
&lt;p&gt;It integrates with the Claude Code session flow and captures conversation data at the right time.&lt;/p&gt;
&lt;p&gt;The second part is a background worker.&lt;/p&gt;
&lt;p&gt;The worker processes raw conversation content into shorter, more searchable memories.&lt;/p&gt;
&lt;p&gt;The third part is local storage.&lt;/p&gt;
&lt;p&gt;The project uses &lt;code&gt;SQLite&lt;/code&gt; for structured metadata and &lt;code&gt;Chroma&lt;/code&gt; for vector indexing. This preserves basic session information while supporting semantic retrieval.&lt;/p&gt;
&lt;p&gt;The fourth part is &lt;code&gt;mem-search&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This is the query entry point for Claude Code. When old context is needed, it can search relevant memories through this tool.&lt;/p&gt;
&lt;p&gt;The overall flow can be understood like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Claude Code sessions generate content&lt;/li&gt;
&lt;li&gt;Hooks capture session data&lt;/li&gt;
&lt;li&gt;The worker asynchronously compresses and organizes it&lt;/li&gt;
&lt;li&gt;Memories are written to SQLite and Chroma&lt;/li&gt;
&lt;li&gt;Later sessions retrieve them through &lt;code&gt;mem-search&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;when-is-it-useful&#34;&gt;When Is It Useful?
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Claude-Mem&lt;/code&gt; is suitable for long-running projects, not one-off small tasks.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A repository is developed over many days&lt;/li&gt;
&lt;li&gt;The code structure is complex and has a lot of background&lt;/li&gt;
&lt;li&gt;Project conventions, naming habits, and architecture choices need to be remembered&lt;/li&gt;
&lt;li&gt;Claude Code is often used for bug fixes, features, and documentation&lt;/li&gt;
&lt;li&gt;You want the AI to remember why something was changed earlier&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only ask Claude Code to make a one-line change, long-term memory is not very meaningful.&lt;br&gt;
But if you treat Claude Code as a long-term collaborator, it becomes useful.&lt;/p&gt;
&lt;h2 id=&#34;installation-and-startup&#34;&gt;Installation and Startup
&lt;/h2&gt;&lt;p&gt;The README gives a direct installation flow:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npm install -g claude-mem
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;claude-mem install
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Start it with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;claude-mem start
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Check status:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;claude-mem status
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Stop it when needed:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;claude-mem stop
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The goal behind these commands is to connect the memory system as a long-running local service to the Claude Code workflow.&lt;/p&gt;
&lt;h2 id=&#34;how-to-use-mem-search&#34;&gt;How to Use &lt;code&gt;mem-search&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;mem-search&lt;/code&gt; is the key entry point for retrieving memory.&lt;/p&gt;
&lt;p&gt;It is not meant to replace ordinary search. It lets Claude Code query past conversations by meaning.&lt;/p&gt;
&lt;p&gt;For example, Claude Code can search for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why a module was designed in a certain way&lt;/li&gt;
&lt;li&gt;How a bug was debugged earlier&lt;/li&gt;
&lt;li&gt;Naming rules agreed on in the project&lt;/li&gt;
&lt;li&gt;Technical trade-offs discussed before&lt;/li&gt;
&lt;li&gt;The background behind a refactor&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is different from simple keyword search.&lt;br&gt;
If memory compression and vector indexing work well, you can retrieve semantically related content even if you do not remember the exact wording.&lt;/p&gt;
&lt;h2 id=&#34;how-is-it-different-from-project-documentation&#34;&gt;How Is It Different from Project Documentation?
&lt;/h2&gt;&lt;p&gt;Project documentation is good for stable conclusions.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Architecture notes&lt;/li&gt;
&lt;li&gt;Deployment procedures&lt;/li&gt;
&lt;li&gt;API conventions&lt;/li&gt;
&lt;li&gt;Database structure&lt;/li&gt;
&lt;li&gt;Development rules&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;Claude-Mem&lt;/code&gt; is better for context created during conversations.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why a plan was rejected&lt;/li&gt;
&lt;li&gt;How a temporary issue was worked around&lt;/li&gt;
&lt;li&gt;The discussion behind an implementation&lt;/li&gt;
&lt;li&gt;Project preferences not yet written into docs&lt;/li&gt;
&lt;li&gt;Task background accumulated across multiple conversations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The two are not replacements for each other.&lt;br&gt;
A good workflow is to write stable knowledge into project docs and use the memory system to help retrieve conversational context.&lt;/p&gt;
&lt;h2 id=&#34;things-to-watch-out-for&#34;&gt;Things to Watch Out For
&lt;/h2&gt;&lt;p&gt;First, more long-term memory is not always better.&lt;/p&gt;
&lt;p&gt;If every conversation is saved without distinction, later retrieval can become noisy. The most valuable memories are project decisions, implementation background, debugging history, and long-term preferences.&lt;/p&gt;
&lt;p&gt;Second, memory cannot replace code and documentation.&lt;/p&gt;
&lt;p&gt;Old context found by AI is only a reference. Final judgment still depends on the current code, test results, and latest requirements.&lt;/p&gt;
&lt;p&gt;Third, pay attention to privacy and local data.&lt;/p&gt;
&lt;p&gt;Since it stores conversation content, you should know which projects are suitable for it and which sensitive information should not enter the conversation.&lt;/p&gt;
&lt;p&gt;Fourth, memory systems need maintenance.&lt;/p&gt;
&lt;p&gt;As a project moves forward, old memories may become outdated. If outdated context is reused incorrectly, it can mislead later tasks.&lt;/p&gt;
&lt;h2 id=&#34;why-this-kind-of-tool-matters&#34;&gt;Why This Kind of Tool Matters
&lt;/h2&gt;&lt;p&gt;AI coding tools are moving from one-off Q&amp;amp;A toward long-term collaboration.&lt;/p&gt;
&lt;p&gt;In one-off Q&amp;amp;A, the model only needs to answer the current question.&lt;br&gt;
In long-term collaboration, it needs to know project history, earlier decisions, team preferences, and pitfalls that have already been found.&lt;/p&gt;
&lt;p&gt;This is where tools like &lt;code&gt;Claude-Mem&lt;/code&gt; matter: they turn &amp;ldquo;remembering context&amp;rdquo; from a temporary chat capability into a local system that can be installed, run, and searched.&lt;/p&gt;
&lt;p&gt;For real engineering projects, this is more practical than simply making the model context window longer.&lt;br&gt;
Much information does not need to be stuffed into context all at once; it needs to be retrieved at the right time.&lt;/p&gt;
&lt;h2 id=&#34;who-should-try-it&#34;&gt;Who Should Try It?
&lt;/h2&gt;&lt;p&gt;You may want to try it if:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You use Claude Code frequently&lt;/li&gt;
&lt;li&gt;You often work on the same project across multiple days&lt;/li&gt;
&lt;li&gt;The project context is complex&lt;/li&gt;
&lt;li&gt;You repeatedly explain the same background to AI&lt;/li&gt;
&lt;li&gt;You want to preserve experience from conversations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only use Claude Code occasionally, or the project is small, you may not need this kind of system yet.&lt;/p&gt;
&lt;h2 id=&#34;reference&#34;&gt;Reference
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/thedotmack/claude-mem&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;thedotmack/claude-mem&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;final-thought&#34;&gt;Final Thought
&lt;/h2&gt;&lt;p&gt;The point of &lt;code&gt;Claude-Mem&lt;/code&gt; is not &amp;ldquo;saving chat logs.&amp;rdquo; It is helping Claude Code retrieve useful context in later tasks.&lt;/p&gt;
&lt;p&gt;As AI coding moves from one-off tasks to long-running project collaboration, memory systems will become increasingly important.&lt;br&gt;
They cannot replace documentation and tests, but they can reduce repeated explanations and make the AI feel more like an assistant that understands project history.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Claude.md Is Not Better When It Is Longer: How to Write Global Memory Files for AI Coding</title>
        <link>https://knightli.com/en/2026/04/29/how-to-write-claude-md-for-ai-coding/</link>
        <pubDate>Wed, 29 Apr 2026 21:07:37 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/29/how-to-write-claude-md-for-ai-coding/</guid>
        <description>&lt;p&gt;I recently saw a discussion about global memory files for AI coding: after projects add files such as &lt;code&gt;Claude.md&lt;/code&gt; or &lt;code&gt;AGENTS.md&lt;/code&gt;, the results do not necessarily improve. In some cases, success rates may even drop while reasoning cost rises.&lt;/p&gt;
&lt;p&gt;At first, this feels counterintuitive. We usually assume that if we give AI more project background, more rules, and more explanation, it should write code more accurately.&lt;br&gt;
The real issue is that &lt;code&gt;Claude.md&lt;/code&gt; is not an ordinary document. It is a global memory file that gets injected into the context on every conversation. The more it contains, the more the model has to read every time; the vaguer it is, the more judgment the model has to make; and if it contains workflows that should not always run, the model may trigger unnecessary actions in unrelated tasks.&lt;/p&gt;
&lt;p&gt;So the hard part of writing &lt;code&gt;Claude.md&lt;/code&gt; is not making it complete. It is deciding which pieces of information deserve to occupy context permanently.&lt;/p&gt;
&lt;h2 id=&#34;what-claudemd-is&#34;&gt;What Claude.md Is
&lt;/h2&gt;&lt;p&gt;In AI coding tools, files such as &lt;code&gt;Claude.md&lt;/code&gt; and &lt;code&gt;AGENTS.md&lt;/code&gt; are essentially global memory files.&lt;/p&gt;
&lt;p&gt;Normal conversation enters the context, but context length is limited. Once the conversation becomes long, historical content is compressed and some details are lost. A global memory file fixes important rules in place so the model can see them at the beginning of every task.&lt;/p&gt;
&lt;p&gt;This means two things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Content written there is harder to forget&lt;/li&gt;
&lt;li&gt;Content written there also costs something on every task&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is not like a README that is read only when needed. It is more like a long-lived set of working constraints. Once something is placed there, it affects the model&amp;rsquo;s judgment by default.&lt;/p&gt;
&lt;p&gt;Therefore, &lt;code&gt;Claude.md&lt;/code&gt; is not a project introduction, not a collection of tips, and not a place to dump every development process. It should only store rules that the model is likely to violate repeatedly if it does not know them.&lt;/p&gt;
&lt;h2 id=&#34;why-it-can-make-things-worse&#34;&gt;Why It Can Make Things Worse
&lt;/h2&gt;&lt;p&gt;A poorly written global memory file usually causes three kinds of problems.&lt;/p&gt;
&lt;p&gt;First, it consumes context.&lt;/p&gt;
&lt;p&gt;If &lt;code&gt;Claude.md&lt;/code&gt; has one thousand lines, those lines stay in the model context for a long time. Code, error messages, and requirements that are actually relevant to the current task may get squeezed. Context is not free space. The larger the global rule file, the easier it is to dilute the current task.&lt;/p&gt;
&lt;p&gt;Second, it can trigger unnecessary behavior.&lt;/p&gt;
&lt;p&gt;For example, a global file might say:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Before every task, fully read the project directory.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;After every change, run a complete end-to-end test.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;These lines look responsible, but in a global memory file they become &amp;ldquo;do this for every task.&amp;rdquo; Even if the task is only changing one line of copy, the model may perform unnecessary exploration and tests because of these rules. The result is slower work, higher cost, and sometimes more interference.&lt;/p&gt;
&lt;p&gt;Third, it increases the burden of judgment.&lt;/p&gt;
&lt;p&gt;Statements like &amp;ldquo;keep code elegant, concise, maintainable, and extensible&amp;rdquo; sound correct, but they are weak constraints. Every time the model generates code, it has to decide what elegant or extensible means, without receiving a clear boundary.&lt;/p&gt;
&lt;p&gt;A better approach is to write concrete prohibitions or counterexamples instead of abstract virtues. For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Do not add a generic abstraction for a single call site.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Do not change shared parsing logic without test coverage.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Do not put temporary scripts in the application source directory.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;These rules are more specific and easier to follow.&lt;/p&gt;
&lt;h2 id=&#34;what-should-go-in&#34;&gt;What Should Go In
&lt;/h2&gt;&lt;p&gt;You can use a simple standard to decide whether something belongs in &lt;code&gt;Claude.md&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;If the AI will repeatedly make the same mistake without it, then it is worth writing down.&lt;/p&gt;
&lt;p&gt;Content suitable for a global memory file usually has these traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It is durable&lt;/li&gt;
&lt;li&gt;It is strongly tied to the current repository&lt;/li&gt;
&lt;li&gt;It cannot be naturally inferred from the code structure&lt;/li&gt;
&lt;li&gt;It clearly changes model behavior&lt;/li&gt;
&lt;li&gt;It is preferably a constraint, prohibition, path rule, or fixed command&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;For all Hugo posts, only edit index.zh-cn.md and do not automatically generate other language versions.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Article front matter must include title/date/draft/tags/categories/slug/description.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Do not modify generated artifacts under public/.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;On PowerShell, use scripts/deploy.ps1 for deployment.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;These are not vague suggestions. They are tied to how the repository actually works. If the model does not know them, it may make mistakes; once it knows them, it can avoid real missteps.&lt;/p&gt;
&lt;h2 id=&#34;what-should-stay-out&#34;&gt;What Should Stay Out
&lt;/h2&gt;&lt;p&gt;Many people turn &lt;code&gt;Claude.md&lt;/code&gt; into a project manual. That is usually unnecessary.&lt;/p&gt;
&lt;p&gt;Content that generally does not belong there includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Project vision and background&lt;/li&gt;
&lt;li&gt;Large directory structure descriptions&lt;/li&gt;
&lt;li&gt;Temporary task plans&lt;/li&gt;
&lt;li&gt;One-off debugging steps&lt;/li&gt;
&lt;li&gt;Abstract code quality slogans&lt;/li&gt;
&lt;li&gt;Long workflows that are only needed in a few situations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example, a description like &amp;ldquo;this is an e-commerce project with product, order, and user modules&amp;rdquo; helps very little with a concrete coding task. During real development, the model should rely on the current requirement, specification, code structure, and tests, not on a rough project introduction in global memory.&lt;/p&gt;
&lt;p&gt;The same applies to directory structure. Unless a directory has a special convention, such as &amp;ldquo;shared components must be imported from this directory,&amp;rdquo; there is no need to write the entire tree into the file. The model can read the project directory itself. A static directory description is easy to become stale.&lt;/p&gt;
&lt;h2 id=&#34;workflows-belong-in-skills-or-commands&#34;&gt;Workflows Belong in Skills or Commands
&lt;/h2&gt;&lt;p&gt;If a section says &amp;ldquo;first do this, then do that, then do the third thing,&amp;rdquo; it may not belong in &lt;code&gt;Claude.md&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Long-lived workflows can be turned into skills, scripts, or commands. The benefit is that the global memory only needs to keep the name and trigger condition, while the detailed steps are loaded only when needed.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;When the user asks to translate a Hugo post, use the post-translate skill.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;When the user asks to deploy the site, run the hugo-rsync-deploy workflow.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This is lighter than putting the full translation and deployment processes into &lt;code&gt;Claude.md&lt;/code&gt;. Global memory stays short, and detailed workflows live in triggerable tools.&lt;/p&gt;
&lt;p&gt;Claude&amp;rsquo;s newer initialization flow is also moving in this direction. It does not only generate a &lt;code&gt;Claude.md&lt;/code&gt;; it also tries to split reusable workflows into skills and fixed events into hooks. The underlying idea is clear: global memory should be an entry point, while details should be loaded on demand.&lt;/p&gt;
&lt;h2 id=&#34;claudemd-needs-iteration&#34;&gt;Claude.md Needs Iteration
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Claude.md&lt;/code&gt; should not be written once and then ignored.&lt;/p&gt;
&lt;p&gt;A better approach is to keep it short at first and let real tasks expose problems. If an error happens once, handle it manually. If the same kind of error appears two or more times, it may deserve to become a global rule.&lt;/p&gt;
&lt;p&gt;This kind of iteration is more useful than writing a huge set of rules at the beginning. Early on, you do not know which rules are truly useful or which lines will become noise. As the project grows, collaboration increases, and the model&amp;rsquo;s behavior becomes clearer, you can gradually add the high-frequency problems.&lt;/p&gt;
&lt;p&gt;There is also an important trend: the stronger the model, the shorter the global memory file should become.&lt;/p&gt;
&lt;p&gt;Many requirements that once had to be written into prompts are now handled naturally by the model. Continuing to put those basic requirements into &lt;code&gt;Claude.md&lt;/code&gt; only increases context load. Global memory should shrink as model capability improves, keeping only what is unique to this repository and cannot be inferred automatically.&lt;/p&gt;
&lt;h2 id=&#34;a-more-practical-way-to-write-it&#34;&gt;A More Practical Way to Write It
&lt;/h2&gt;&lt;p&gt;When writing &lt;code&gt;Claude.md&lt;/code&gt;, think in this order:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;What special conventions does this repository have?&lt;/li&gt;
&lt;li&gt;Which mistakes has the model made more than once?&lt;/li&gt;
&lt;li&gt;Which directories, files, or commands must never be misused?&lt;/li&gt;
&lt;li&gt;Which workflows should become skills, scripts, or commands instead of permanent context?&lt;/li&gt;
&lt;li&gt;Which parts are merely introductions and can be deleted?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The final file may be only a few dozen lines. It does not need to fully explain the project. It needs to constrain behavior precisely.&lt;/p&gt;
&lt;p&gt;A good &lt;code&gt;Claude.md&lt;/code&gt; might look like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# Working Rules
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;- Only edit files related to the current task.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;- Do not modify generated artifact directories such as public/ or resources/.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;- Hugo post rewrites only process index.zh-cn.md and do not generate other language versions.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;- If deployment is involved, run the Hugo build first, then execute the existing rsync script.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;- When there are existing user changes, do not revert them. Continue from the current state.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;It is short, but every line affects real behavior. That is the kind of content worth keeping in context permanently.&lt;/p&gt;
&lt;h2 id=&#34;final-thought&#34;&gt;Final Thought
&lt;/h2&gt;&lt;p&gt;The value of &lt;code&gt;Claude.md&lt;/code&gt; is not to make AI &amp;ldquo;know more.&amp;rdquo; It is to make AI &amp;ldquo;avoid fixed mistakes.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;It is not a knowledge base or project encyclopedia. It is a long-lived constraint file for AI coding.&lt;br&gt;
The more specific, shorter, and closer to real mistakes it is, the more useful it becomes. The more generic, longer, and more like a project introduction it is, the more likely it is to slow the model down or even make results worse.&lt;/p&gt;
&lt;p&gt;Treat global memory as a scarce resource, not an unlimited scratchpad. That may be the most important principle for writing a good &lt;code&gt;Claude.md&lt;/code&gt;.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Codex Is Starting to Control the Computer. What Does That Mean for the Future?</title>
        <link>https://knightli.com/en/2026/04/29/codex-computer-use-update/</link>
        <pubDate>Wed, 29 Apr 2026 11:28:25 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/29/codex-computer-use-update/</guid>
        <description>&lt;p&gt;The most important part of this Codex update is not that it added another ordinary button. It is that Codex is starting to move toward &amp;ldquo;controlling the computer.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In the past, using AI usually meant asking questions in a chat box, copying, pasting, and then manually operating software.&lt;br&gt;
Now that boundary is expanding: AI does not just answer you. It can operate desktop applications according to your goal.&lt;/p&gt;
&lt;p&gt;In the short term, this is a new feature. In the long term, it may change how many people use computers.&lt;/p&gt;
&lt;h2 id=&#34;what-this-feature-is&#34;&gt;What This Feature Is
&lt;/h2&gt;&lt;p&gt;Simply put, Codex&amp;rsquo;s computer use capability lets it access and operate the desktop environment.&lt;/p&gt;
&lt;p&gt;It can do things such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;select and control an application&lt;/li&gt;
&lt;li&gt;receive tasks in natural language&lt;/li&gt;
&lt;li&gt;open browsers, AI tools, local files, or other software&lt;/li&gt;
&lt;li&gt;enter text, click buttons, and wait for results&lt;/li&gt;
&lt;li&gt;connect multiple steps into one task&lt;/li&gt;
&lt;li&gt;keep running in the background without requiring the user to follow every step manually&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Its role is not just to write a piece of text for you, but to complete an operation flow for you.&lt;/p&gt;
&lt;p&gt;That is the key difference between an Agent and an ordinary chatbot:&lt;br&gt;
a chatbot mainly gives answers; an Agent is closer to &amp;ldquo;receiving a goal and then executing it.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;why-this-matters&#34;&gt;Why This Matters
&lt;/h2&gt;&lt;p&gt;In the past, much automation required you to know how to write scripts.&lt;/p&gt;
&lt;p&gt;For example, suppose you want to complete a cross-software workflow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;open a web page&lt;/li&gt;
&lt;li&gt;find information&lt;/li&gt;
&lt;li&gt;copy content&lt;/li&gt;
&lt;li&gt;pass it to another AI tool&lt;/li&gt;
&lt;li&gt;save a file&lt;/li&gt;
&lt;li&gt;open the local directory and check the result&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To automate this traditionally, you might need browser scripts, APIs, local programs, and even window automation.&lt;/p&gt;
&lt;p&gt;But many ordinary users do not know how to write these things.&lt;br&gt;
Even if they do, it may not be worth writing a script for a temporary task.&lt;/p&gt;
&lt;p&gt;This is where computer use matters: it pushes &amp;ldquo;script-like capability&amp;rdquo; toward natural language.&lt;/p&gt;
&lt;p&gt;You do not necessarily need to tell it exactly where to click.&lt;br&gt;
You can tell it what result you want and let it try to complete the task.&lt;/p&gt;
&lt;h2 id=&#34;workflows-it-may-change&#34;&gt;Workflows It May Change
&lt;/h2&gt;&lt;p&gt;I think the first workflows to change will not be extremely serious or high-risk work, but the tasks that are annoying, fragmented, repetitive, and not worth writing a dedicated program for.&lt;/p&gt;
&lt;h3 id=&#34;1-moving-information-across-software&#34;&gt;1. Moving Information Across Software
&lt;/h3&gt;&lt;p&gt;The most typical case is moving information between applications.&lt;/p&gt;
&lt;p&gt;Previously, you might switch back and forth between a browser, a document, a chat window, and a local folder.&lt;br&gt;
In the future, you can hand this kind of task to an Agent:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;find a certain kind of information&lt;/li&gt;
&lt;li&gt;summarize it into a document&lt;/li&gt;
&lt;li&gt;save it to a specified directory&lt;/li&gt;
&lt;li&gt;open the result for you to review&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This work is not hard, but it consumes attention.&lt;br&gt;
The value of an Agent is that it absorbs these small operations.&lt;/p&gt;
&lt;h3 id=&#34;2-coordination-between-multiple-ai-tools&#34;&gt;2. Coordination Between Multiple AI Tools
&lt;/h3&gt;&lt;p&gt;Many people&amp;rsquo;s real workflow is no longer based on a single AI tool.&lt;/p&gt;
&lt;p&gt;It may look like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;one tool writes code&lt;/li&gt;
&lt;li&gt;one tool researches information&lt;/li&gt;
&lt;li&gt;one tool generates images&lt;/li&gt;
&lt;li&gt;one tool organizes documents&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Previously, these tools were connected by manual copy and paste.&lt;br&gt;
In the future, an Agent can become the middle layer: it opens tools, passes context, waits for output, and organizes results.&lt;/p&gt;
&lt;p&gt;This can turn &amp;ldquo;multiple AI tools working together&amp;rdquo; from a manual process into a semi-automated process.&lt;/p&gt;
&lt;h3 id=&#34;3-office-software-automation&#34;&gt;3. Office Software Automation
&lt;/h3&gt;&lt;p&gt;Spreadsheets, presentations, documents, and email share one trait: they are powerful, but many operations are fragmented.&lt;/p&gt;
&lt;p&gt;If Agents can reliably control this software, the barrier to office automation will drop noticeably.&lt;/p&gt;
&lt;p&gt;You do not need to remember where a menu is or learn complicated shortcuts.&lt;br&gt;
You only need to describe the goal, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;turn this spreadsheet into a monthly report&lt;/li&gt;
&lt;li&gt;make a one-page summary from this document&lt;/li&gt;
&lt;li&gt;combine these materials into a clearly structured explanation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The tedious button operations will gradually be hidden behind natural language.&lt;/p&gt;
&lt;h2 id=&#34;what-it-means-for-ordinary-users&#34;&gt;What It Means for Ordinary Users
&lt;/h2&gt;&lt;p&gt;For ordinary users, this kind of feature may have a more direct impact than &amp;ldquo;the model got a bit smarter.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Because it lowers the operation barrier, not just the knowledge barrier.&lt;/p&gt;
&lt;p&gt;Many people can describe what they want, but they do not know where to click or how to combine features inside software.&lt;br&gt;
If Agents can take over this part, using a computer may become:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;I describe the goal
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Agent operates the software
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;I check the result
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;That is closer to real productivity than simple chat.&lt;/p&gt;
&lt;h2 id=&#34;its-impact-on-software&#34;&gt;Its Impact on Software
&lt;/h2&gt;&lt;p&gt;If this kind of Agent capability continues to mature, software itself will also be affected.&lt;/p&gt;
&lt;p&gt;In the past, software design mainly served human clicking.&lt;br&gt;
In the future, software may also need to serve Agent operation.&lt;/p&gt;
&lt;p&gt;This means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;interface elements need to be clearer&lt;/li&gt;
&lt;li&gt;operation feedback needs to be more stable&lt;/li&gt;
&lt;li&gt;local permissions need to be more granular&lt;/li&gt;
&lt;li&gt;software may provide interfaces better suited for Agent calls&lt;/li&gt;
&lt;li&gt;users may care more about whether software can be operated smoothly by AI&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the long run, the boundaries between applications may become thinner.&lt;br&gt;
Users may care less about &amp;ldquo;which app should I open&amp;rdquo; and more about &amp;ldquo;what task do I want to complete.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;do-not-overhype-it-yet&#34;&gt;Do Not Overhype It Yet
&lt;/h2&gt;&lt;p&gt;Of course, it is not time to fully let go yet.&lt;/p&gt;
&lt;p&gt;This kind of capability still has several clear limitations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;stability still needs observation&lt;/li&gt;
&lt;li&gt;complex tasks may fail in the middle&lt;/li&gt;
&lt;li&gt;permission boundaries must be handled carefully&lt;/li&gt;
&lt;li&gt;account, payment, and file deletion operations should not be delegated casually&lt;/li&gt;
&lt;li&gt;quota consumption is not something you can completely ignore&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So at this stage, the best use case is not letting it take over the whole computer, but letting it handle low-risk, reviewable, step-heavy tasks.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;organizing materials&lt;/li&gt;
&lt;li&gt;generating drafts&lt;/li&gt;
&lt;li&gt;moving content across tools&lt;/li&gt;
&lt;li&gt;opening and checking files&lt;/li&gt;
&lt;li&gt;running semi-automated workflows that can be reviewed by a human&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;one-last-line&#34;&gt;One Last Line
&lt;/h2&gt;&lt;p&gt;The real importance of this Codex update is that it pushes AI from &amp;ldquo;answering questions&amp;rdquo; toward &amp;ldquo;operating the environment.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In the short term, it is a computer use feature.&lt;br&gt;
In the long term, it may mark a shift in how personal computers are used.&lt;/p&gt;
&lt;p&gt;In the future, we may spend less time remembering buttons, finding menus, and switching windows.&lt;br&gt;
More often, we will describe the goal, let an Agent execute it, and then let humans make the final judgment.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Why Does a Codex Skill Exist in the Directory but Still Not Show Up?</title>
        <link>https://knightli.com/en/2026/04/29/codex-skill-not-loaded-because-of-utf-8-bom/</link>
        <pubDate>Wed, 29 Apr 2026 11:18:00 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/29/codex-skill-not-loaded-because-of-utf-8-bom/</guid>
        <description>&lt;p&gt;This problem was easy to miss: several skills were already placed under &lt;code&gt;~/.codex/skills&lt;/code&gt;, but after opening a new Codex thread, the sidebar still showed only a small subset of them.&lt;/p&gt;
&lt;p&gt;At first, it looked like a cache or indexing issue. The real cause was more specific: several &lt;code&gt;SKILL.md&lt;/code&gt; files started with a UTF-8 BOM. Codex 0.111.0&amp;rsquo;s skill loader did not skip that byte sequence, so it misjudged the files as having no valid YAML front matter.&lt;/p&gt;
&lt;h2 id=&#34;symptom&#34;&gt;Symptom
&lt;/h2&gt;&lt;p&gt;The local directory contained these skills:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;~/.codex/skills/git-commit-push/SKILL.md
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;~/.codex/skills/hugo-rsync-deploy/SKILL.md
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;~/.codex/skills/bilibili-speech-transcriber/SKILL.md
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;~/.codex/skills/product-cutout-normalize/SKILL.md
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;But after opening a new thread, the actually exposed skills were only:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;bilibili-speech-transcriber
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;product-cutout-normalize
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;In other words, a file existing on disk does not mean the current session can load it successfully. Codex parses the front matter of each &lt;code&gt;SKILL.md&lt;/code&gt; first. If parsing fails, that skill is excluded directly.&lt;/p&gt;
&lt;h2 id=&#34;investigation&#34;&gt;Investigation
&lt;/h2&gt;&lt;p&gt;Starting a fresh session with &lt;code&gt;codex exec&lt;/code&gt; showed a more direct error. In VS Code or other IDEs, these logs may not be visible:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;failed to load skill C:\Users\knightli\.codex\skills\git-commit-push\SKILL.md: missing YAML frontmatter delimited by ---
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;failed to load skill C:\Users\knightli\.codex\skills\hugo-rsync-deploy\SKILL.md: missing YAML frontmatter delimited by ---
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Visually, these files seemed to have a normal header:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-md&#34; data-lang=&#34;md&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;name: post-rewrite
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;description: ...
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The real problem was at the byte level.&lt;/p&gt;
&lt;p&gt;The beginning of a failing file was:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;EF-BB-BF-2D-2D-2D
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The beginning of a file that loaded correctly was:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;2D-2D-2D
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;2D-2D-2D&lt;/code&gt; is &lt;code&gt;---&lt;/code&gt;. The preceding &lt;code&gt;EF-BB-BF&lt;/code&gt; is the UTF-8 BOM.&lt;/p&gt;
&lt;h2 id=&#34;cause&#34;&gt;Cause
&lt;/h2&gt;&lt;p&gt;In Codex 0.111.0, the skill loader expects the first byte of &lt;code&gt;SKILL.md&lt;/code&gt; to be the first &lt;code&gt;-&lt;/code&gt; in &lt;code&gt;---&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If the file starts with a UTF-8 BOM, the actual beginning becomes:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;BOM + ---
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;So the loader thinks the file does not start with the front matter delimiter and reports:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;missing YAML frontmatter delimited by ---
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The skill content was not wrong, and the directory was not wrong either. A small encoding detail prevented the parser from recognizing the file.&lt;/p&gt;
&lt;h2 id=&#34;fix&#34;&gt;Fix
&lt;/h2&gt;&lt;p&gt;Convert the affected &lt;code&gt;SKILL.md&lt;/code&gt; files to UTF-8 without BOM.&lt;/p&gt;
&lt;p&gt;In PowerShell, this can be done like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-powershell&#34; data-lang=&#34;powershell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;$paths&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;vm&#34;&gt;@&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;s1&#34;&gt;&amp;#39;C:\Users\knightli\.codex\skills\git-commit-push\SKILL.md&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;s1&#34;&gt;&amp;#39;C:\Users\knightli\.codex\skills\hugo-rsync-deploy\SKILL.md&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;$utf8NoBom&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;New-Object&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;System&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;py&#34;&gt;Text&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;py&#34;&gt;UTF8Encoding&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;vm&#34;&gt;$false&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;foreach&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;nv&#34;&gt;$p&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$paths&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;nv&#34;&gt;$text&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;no&#34;&gt;IO.File&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ReadAllText&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;nv&#34;&gt;$p&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;no&#34;&gt;Text.Encoding&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;UTF8&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;no&#34;&gt;IO.File&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;WriteAllText&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;nv&#34;&gt;$p&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$text&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$utf8NoBom&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After processing, the file header should change from:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;EF-BB-BF-2D-2D-2D
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;to:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;2D-2D-2D
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;verification&#34;&gt;Verification
&lt;/h2&gt;&lt;p&gt;After restarting a Codex session, the visible skills were restored to:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git-commit-push-zh
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;hugo-rsync-deploy
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;bilibili-speech-transcriber
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;product-cutout-normalize
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the sidebar still shows the old list, close the current Codex sidebar or window and reopen the project. The skill list is usually loaded when the session starts, so changes made in the middle of a session may not refresh immediately.&lt;/p&gt;
&lt;h2 id=&#34;one-last-line&#34;&gt;One Last Line
&lt;/h2&gt;&lt;p&gt;This kind of issue is easy to mistake for &amp;ldquo;Codex did not re-index&amp;rdquo; or &amp;ldquo;the skill was not installed correctly.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;When troubleshooting, check these three things first:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;whether &lt;code&gt;SKILL.md&lt;/code&gt; is really in the correct directory&lt;/li&gt;
&lt;li&gt;whether the file has valid &lt;code&gt;---&lt;/code&gt; front matter at the top&lt;/li&gt;
&lt;li&gt;whether the file is UTF-8 without BOM&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The key in this case was the third point: the file looked fine, but its first byte was not &lt;code&gt;-&lt;/code&gt;, so Codex did not treat it as a valid skill.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>What Is the Difference Between ~/.codex/skills and Project .codex/skills in Codex</title>
        <link>https://knightli.com/en/2026/04/29/difference-between-global-and-project-codex-skills/</link>
        <pubDate>Wed, 29 Apr 2026 11:08:00 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/29/difference-between-global-and-project-codex-skills/</guid>
        <description>&lt;p&gt;When organizing Codex skills, people most often get stuck on two questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What is the difference between &lt;code&gt;~/.codex/skills&lt;/code&gt; and &lt;code&gt;project/.codex/skills&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;Why does a skill exist in the directory but not appear in the current session?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here is the short version.&lt;/p&gt;
&lt;h2 id=&#34;the-difference&#34;&gt;The Difference
&lt;/h2&gt;&lt;p&gt;The simplest way to remember it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;~/.codex/skills&lt;/code&gt; is your global skill library&lt;/li&gt;
&lt;li&gt;&lt;code&gt;project/.codex/skills&lt;/code&gt; is the local skill library for that repository&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;codexskills&#34;&gt;&lt;code&gt;~/.codex/skills&lt;/code&gt;
&lt;/h3&gt;&lt;p&gt;Use it for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;skills you personally reuse across projects&lt;/li&gt;
&lt;li&gt;general workflows that are not tied to a specific repository&lt;/li&gt;
&lt;li&gt;workflows that clearly belong to your own habits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;post-rewrite&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;post-translate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;git-commit-push&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hugo-rsync-deploy&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bilibili-speech-transcriber&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The key trait of this kind of skill is: &lt;strong&gt;it still makes sense outside the current project.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id=&#34;projectcodexskills&#34;&gt;&lt;code&gt;project/.codex/skills&lt;/code&gt;
&lt;/h3&gt;&lt;p&gt;Use it for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;workflows that only apply to this repository&lt;/li&gt;
&lt;li&gt;rules tightly coupled to the current project structure, scripts, or templates&lt;/li&gt;
&lt;li&gt;skills that should be shared by the team&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a publishing workflow specific to this repository&lt;/li&gt;
&lt;li&gt;a generation template that only works in this project&lt;/li&gt;
&lt;li&gt;automation steps tightly bound to private project scripts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The key trait of this kind of skill is: &lt;strong&gt;it stops being meaningful once it leaves this repository.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id=&#34;when-to-use-global-and-when-to-use-project-skills&#34;&gt;When to Use Global and When to Use Project Skills
&lt;/h2&gt;&lt;p&gt;This rule of thumb is enough:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If it is about your personal habits, put it in &lt;code&gt;~/.codex/skills&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If it is about repository rules, put it in &lt;code&gt;project/.codex/skills&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If it can be reused across projects, prefer global&lt;/li&gt;
&lt;li&gt;If it should be shared by multiple people and evolve with the repository, prefer project-level&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;the-current-repository&#34;&gt;The Current Repository
&lt;/h2&gt;&lt;p&gt;Based on the current state:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;your machine has &lt;code&gt;~/.codex/skills&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;this repository does not have &lt;code&gt;.codex/skills&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So right now, you mainly rely on global skills.&lt;/p&gt;
&lt;p&gt;That means workflows such as &lt;code&gt;post-rewrite&lt;/code&gt;, &lt;code&gt;post-translate&lt;/code&gt;, and &lt;code&gt;git-commit-push&lt;/code&gt; are currently more like part of your personal workflow, not something explicitly bundled with this repository.&lt;/p&gt;
&lt;h2 id=&#34;why-a-skill-exists-on-disk-but-may-not-appear-in-the-current-session&#34;&gt;Why a Skill Exists on Disk but May Not Appear in the Current Session
&lt;/h2&gt;&lt;p&gt;There are two different things here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Existing on disk&lt;/strong&gt;: the skill file exists in a local directory&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Exposed to the session&lt;/strong&gt;: the current session registered it into the available skill list&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are not the same thing.&lt;/p&gt;
&lt;p&gt;So this can happen:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a skill already exists under &lt;code&gt;~/.codex/skills&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;but it does not appear in the list after &lt;code&gt;/&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This usually does not mean the skill is broken. More often, it means: &lt;strong&gt;the current session has not re-indexed it.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id=&#34;how-to-make-a-skill-available-in-the-current-session&#34;&gt;How to Make a Skill Available in the Current Session
&lt;/h2&gt;&lt;p&gt;The practical checklist is short.&lt;/p&gt;
&lt;h3 id=&#34;1-put-it-in-the-right-directory&#34;&gt;1. Put It in the Right Directory
&lt;/h3&gt;&lt;p&gt;Global:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;~/.codex/skills/&amp;lt;skill-name&amp;gt;/SKILL.md
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Project-level:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;project/.codex/skills/&amp;lt;skill-name&amp;gt;/SKILL.md
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;2-make-the-skillmd-header-recognizable&#34;&gt;2. Make the &lt;code&gt;SKILL.md&lt;/code&gt; Header Recognizable
&lt;/h3&gt;&lt;p&gt;At minimum, it needs:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-md&#34; data-lang=&#34;md&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;name: your-skill-name
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;description: What this skill does
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;3-open-a-new-session-after-creating-or-editing-it&#34;&gt;3. Open a New Session After Creating or Editing It
&lt;/h3&gt;&lt;p&gt;In many cases, a skill does not appear because the current session already fixed its available skill list when it started.&lt;/p&gt;
&lt;p&gt;So if you create a skill in the middle of a session, it may already exist on disk, but this session may not recognize it.&lt;/p&gt;
&lt;p&gt;The most reliable workflow is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Put the skill in place&lt;/li&gt;
&lt;li&gt;End the current session&lt;/li&gt;
&lt;li&gt;Re-enter the project&lt;/li&gt;
&lt;li&gt;Open a new session&lt;/li&gt;
&lt;li&gt;Check whether it appears under &lt;code&gt;/&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;4-put-project-skills-in-place-before-starting&#34;&gt;4. Put Project Skills in Place Before Starting
&lt;/h3&gt;&lt;p&gt;If you want &lt;code&gt;project/.codex/skills&lt;/code&gt; to be recognized more reliably, put those skills into the project before entering the repository and starting the session.&lt;/p&gt;
&lt;h2 id=&#34;one-last-line&#34;&gt;One Last Line
&lt;/h2&gt;&lt;p&gt;The shortest conclusion is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;~/.codex/skills&lt;/code&gt; is your personal skill library&lt;/li&gt;
&lt;li&gt;&lt;code&gt;project/.codex/skills&lt;/code&gt; is the repository&amp;rsquo;s local rule library&lt;/li&gt;
&lt;li&gt;a skill existing in the directory does not mean the current session will always show it&lt;/li&gt;
&lt;li&gt;the most common fix is to put it in the right directory, write a valid &lt;code&gt;SKILL.md&lt;/code&gt;, and then start a new session&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Ralph and Multi-Agent Collaboration: How to Keep AI Working Reliably Over Long Tasks</title>
        <link>https://knightli.com/en/2026/04/27/ralph-multi-agent-long-running-ai-workflows/</link>
        <pubDate>Mon, 27 Apr 2026 08:19:02 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/27/ralph-multi-agent-long-running-ai-workflows/</guid>
        <description>&lt;p&gt;If you have been using coding agents lately, you quickly run into a very practical question: &lt;strong&gt;AI can work, sure, but how do you keep it working for hours without drifting, forgetting requirements, or redoing the same work?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;That is the real question behind many discussions around &lt;code&gt;Ralph&lt;/code&gt; and multi-agent collaboration. The point is not simply to compare which model is stronger. The more useful question is this: &lt;strong&gt;how do you design a workflow that lets AI stay stable during long tasks?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you break the problem down, there are usually two main routes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;Ralph&lt;/code&gt; approach: keep starting fresh sessions and connect context through the filesystem&lt;/li&gt;
&lt;li&gt;The multi-agent approach: let a lead agent coordinate while worker agents split the execution&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Put more simply, the question is not &amp;ldquo;which model is more powerful,&amp;rdquo; but &amp;ldquo;how do you organize AI so it behaves more like a small team that can keep delivering?&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;01-why-long-tasks-go-off-the-rails&#34;&gt;01 Why Long Tasks Go Off the Rails
&lt;/h2&gt;&lt;p&gt;In short tasks, many problems stay hidden. You give an instruction, the model reads a few files, changes a few lines, and the job is done.&lt;/p&gt;
&lt;p&gt;Once the task gets longer, the common failure modes start to pile up:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Conversations grow longer and context starts to bloat&lt;/li&gt;
&lt;li&gt;Earlier requirements get squeezed out by newer information&lt;/li&gt;
&lt;li&gt;One agent has to plan, implement, and test at the same time&lt;/li&gt;
&lt;li&gt;Without a clear acceptance step, &amp;ldquo;it is done&amp;rdquo; often just means &amp;ldquo;it says it is done&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So when AI runs for a long time, the real challenge is often not single-shot model quality. It is &lt;strong&gt;task slicing, state handoff, role separation, and feedback loops&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&#34;02-the-ralph-approach-break-long-tasks-into-short-rounds&#34;&gt;02 The Ralph Approach: Break Long Tasks into Short Rounds
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Ralph&lt;/code&gt; is a good fit when the main problem is dirty, overloaded context.&lt;/p&gt;
&lt;p&gt;Its core pattern is straightforward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keep launching new agent sessions in a loop&lt;/li&gt;
&lt;li&gt;Let each round handle only one small enough task&lt;/li&gt;
&lt;li&gt;Store cross-round state in files instead of forcing everything into one conversation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The benefit is immediate: every round starts with fresh context, so the session stays more focused and is less likely to get dragged down by old history.&lt;/p&gt;
&lt;p&gt;If you have already looked at &lt;code&gt;Ralph&lt;/code&gt;-style projects, the structure will feel familiar:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Current tasks live in structured files&lt;/li&gt;
&lt;li&gt;Intermediate learnings go into progress files&lt;/li&gt;
&lt;li&gt;Code changes stay in git history&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, &lt;code&gt;Ralph&lt;/code&gt; does not try to make one agent remember everything forever. It externalizes memory on purpose so the session itself can stay lighter.&lt;/p&gt;
&lt;p&gt;This kind of setup works especially well when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The work can already be split into small stories&lt;/li&gt;
&lt;li&gt;Each story can fit inside one context window&lt;/li&gt;
&lt;li&gt;The project already has tests, typecheck, or other checks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is a solution to the problem of &lt;strong&gt;how to keep AI moving forward one round at a time&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&#34;03-the-multi-agent-approach-split-the-work-one-agent-cannot-handle-alone&#34;&gt;03 The Multi-Agent Approach: Split the Work One Agent Cannot Handle Alone
&lt;/h2&gt;&lt;p&gt;The other route is multi-agent collaboration.&lt;/p&gt;
&lt;p&gt;In this kind of workflow design, the more promising pattern is usually this: the lead agent should not do all the work directly. Instead, it coordinates while other agents handle development, testing, checking, and acceptance.&lt;/p&gt;
&lt;p&gt;That differs from &lt;code&gt;Ralph&lt;/code&gt; in an important way:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Ralph&lt;/code&gt; feels more like serial iteration&lt;/li&gt;
&lt;li&gt;Multi-agent work feels more like parallel division of labor&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When the task naturally contains different roles, multi-agent collaboration becomes easier to use. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One agent breaks down the task and writes the execution plan&lt;/li&gt;
&lt;li&gt;One agent implements the actual change&lt;/li&gt;
&lt;li&gt;One agent tests and validates the result&lt;/li&gt;
&lt;li&gt;One agent checks whether the result still matches the original goal&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The point is not to open more windows for the sake of it. The real value is role separation. Tasks that used to be piled onto one agent can now be split into clearer stages.&lt;/p&gt;
&lt;p&gt;Once the role boundaries are clear, several problems become lighter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The person writing does not have to be the same one reviewing&lt;/li&gt;
&lt;li&gt;The testing side does not have to reconstruct the full requirement every time&lt;/li&gt;
&lt;li&gt;The lead agent is less likely to drown in implementation detail&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a solution to the problem of &lt;strong&gt;how to make AI cooperate more like a small team&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&#34;04-the-real-key-is-not-parallelism-but-task-design&#34;&gt;04 The Real Key Is Not Parallelism, but Task Design
&lt;/h2&gt;&lt;p&gt;Whether you choose &lt;code&gt;Ralph&lt;/code&gt; or multi-agent collaboration, the easiest thing to underestimate is this: &lt;strong&gt;workflow design matters more than opening more agents.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If the task split is wrong, adding more agents only parallelizes the confusion.&lt;/p&gt;
&lt;p&gt;A more stable breakdown usually has a few traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One task maps to one clear objective&lt;/li&gt;
&lt;li&gt;One role owns one category of output&lt;/li&gt;
&lt;li&gt;Every round has a clear done condition&lt;/li&gt;
&lt;li&gt;The output of one round can be consumed directly by the next&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example, instead of giving AI one giant instruction like &amp;ldquo;build the whole feature,&amp;rdquo; a steadier structure is often:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Break out requirements and boundaries first&lt;/li&gt;
&lt;li&gt;Then split implementation&lt;/li&gt;
&lt;li&gt;Then split testing&lt;/li&gt;
&lt;li&gt;Then make acceptance its own step&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The advantage is that when something goes wrong, it becomes easier to tell whether the problem sits in understanding, implementation, testing, or delivery criteria.&lt;/p&gt;
&lt;h2 id=&#34;05-why-acceptance-matters-so-much&#34;&gt;05 Why Acceptance Matters So Much
&lt;/h2&gt;&lt;p&gt;Many AI workflows fail not because nothing happened earlier, but because the last step lacked a genuinely independent confirmation pass.&lt;/p&gt;
&lt;p&gt;In long tasks, there is often a wide gap between &amp;ldquo;a result was produced&amp;rdquo; and &amp;ldquo;the result is actually usable.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;So one especially important direction is to separate development from acceptance. Even without a complex process, it is worth asking at least these questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Did it really complete the original task?&lt;/li&gt;
&lt;li&gt;Did it only patch the surface without fixing the root cause?&lt;/li&gt;
&lt;li&gt;Did testing cover only the happiest path?&lt;/li&gt;
&lt;li&gt;Did the upstream requirement get silently changed along the way?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Without that layer, AI can easily keep declaring success inside a long workflow.&lt;/p&gt;
&lt;h2 id=&#34;06-how-to-choose-between-the-two&#34;&gt;06 How to Choose Between the Two
&lt;/h2&gt;&lt;p&gt;If you want a fast rule of thumb:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If your main pain is context bloat and long-session drift, start with &lt;code&gt;Ralph&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If your main pain is one agent wearing too many hats, start with multi-agent collaboration&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;More specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Ralph&lt;/code&gt; fits work that is clear, granular, and easy to move forward round by round&lt;/li&gt;
&lt;li&gt;Multi-agent collaboration fits work with strong role boundaries and a need for parallelism and cross-checking&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In practice, these two approaches are not always competitors. A mature setup often combines them:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a &lt;code&gt;Ralph&lt;/code&gt;-style outer loop to push the larger task forward&lt;/li&gt;
&lt;li&gt;Use multi-agent collaboration inside each round for research, implementation, testing, and acceptance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That gives you both better control over long context and better collaboration inside a single round.&lt;/p&gt;
&lt;h2 id=&#34;07-one-sentence-summary&#34;&gt;07 One-Sentence Summary
&lt;/h2&gt;&lt;p&gt;What makes these approaches worth studying is not that they recommend &lt;code&gt;Ralph&lt;/code&gt; or multi-agent collaboration in isolation. It is that they make one practical truth very clear: &lt;strong&gt;keeping AI stable over long tasks depends less on the model itself and more on whether you designed context, tasks, roles, and acceptance well.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you are already asking &lt;code&gt;Claude Code&lt;/code&gt;, &lt;code&gt;Codex&lt;/code&gt;, or other coding agents to handle longer real-world tasks, this kind of workflow thinking is often more valuable than simply switching to a stronger model.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>What Ralph Is: Turning Claude Code and Amp into a Repeatable Autonomous Development Loop</title>
        <link>https://knightli.com/en/2026/04/27/ralph-autonomous-agent-loop-claude-code-amp/</link>
        <pubDate>Mon, 27 Apr 2026 08:08:55 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/27/ralph-autonomous-agent-loop-claude-code-amp/</guid>
        <description>&lt;p&gt;If you have been paying attention to long-running coding agent workflows lately, &lt;code&gt;snarktank/ralph&lt;/code&gt; is a project worth a close look. It is not another model wrapper or another chat UI. Instead, it organizes &lt;code&gt;Claude Code&lt;/code&gt; or &lt;code&gt;Amp&lt;/code&gt; into an autonomous loop that keeps running through stories in a &lt;code&gt;PRD&lt;/code&gt; until everything is done.&lt;/p&gt;
&lt;p&gt;Its core idea is simple: &lt;strong&gt;do not force the same agent to keep working inside an increasingly long and messy context. Start a brand-new AI coding session for every iteration instead.&lt;/strong&gt; That keeps context from bloating and makes task boundaries much clearer.&lt;/p&gt;
&lt;h2 id=&#34;01-what-ralph-is&#34;&gt;01 What Ralph Is
&lt;/h2&gt;&lt;p&gt;Ralph describes itself very clearly: it is an autonomous AI agent loop that repeatedly runs an AI coding tool until the items in a &lt;code&gt;PRD&lt;/code&gt; are complete.&lt;/p&gt;
&lt;p&gt;The repository currently supports two tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Amp CLI&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Claude Code&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each iteration starts a fresh instance. In other words, it does not depend on one endlessly extended conversation. Instead, it keeps memory in external state:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;git history&lt;/li&gt;
&lt;li&gt;&lt;code&gt;progress.txt&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;prd.json&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That detail matters a lot. When people let an agent run on large tasks, the main problem is often not that the model cannot code. It is that the session becomes heavier over time, starts losing context, forgets requirements, and repeats work. Ralph is designed almost entirely around that problem.&lt;/p&gt;
&lt;h2 id=&#34;02-how-it-works&#34;&gt;02 How It Works
&lt;/h2&gt;&lt;p&gt;Ralph&amp;rsquo;s workflow has three steps.&lt;/p&gt;
&lt;h3 id=&#34;1-write-a-prd-first&#34;&gt;1. Write a PRD first
&lt;/h3&gt;&lt;p&gt;The README suggests starting with the bundled &lt;code&gt;prd&lt;/code&gt; skill to generate a requirements document and break the feature into smaller stories.&lt;/p&gt;
&lt;h3 id=&#34;2-convert-the-prd-into-prdjson&#34;&gt;2. Convert the PRD into &lt;code&gt;prd.json&lt;/code&gt;
&lt;/h3&gt;&lt;p&gt;Then the &lt;code&gt;ralph&lt;/code&gt; skill converts the Markdown PRD into a structured &lt;code&gt;prd.json&lt;/code&gt;. That file stores the user stories and whether each one has passed.&lt;/p&gt;
&lt;h3 id=&#34;3-run-the-loop-script&#34;&gt;3. Run the loop script
&lt;/h3&gt;&lt;p&gt;The actual execution is handled by &lt;code&gt;ralph.sh&lt;/code&gt;. The commands look like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;./scripts/ralph/ralph.sh &lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;max_iterations&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;./scripts/ralph/ralph.sh --tool claude &lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;max_iterations&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The default is 10 iterations. In each round, Ralph roughly does the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a branch from &lt;code&gt;branchName&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Pick the highest-priority story where &lt;code&gt;passes: false&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Implement only that story&lt;/li&gt;
&lt;li&gt;Run quality checks such as typecheck and tests&lt;/li&gt;
&lt;li&gt;Commit if the checks pass&lt;/li&gt;
&lt;li&gt;Update &lt;code&gt;prd.json&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Append learnings to &lt;code&gt;progress.txt&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Continue to the next round&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So Ralph is not trying to finish everything in one go. It compresses work into many small loops that can fit inside a single context window.&lt;/p&gt;
&lt;h2 id=&#34;03-what-makes-ralph-interesting&#34;&gt;03 What Makes Ralph Interesting
&lt;/h2&gt;&lt;h3 id=&#34;1-every-round-uses-fresh-context&#34;&gt;1. Every round uses fresh context
&lt;/h3&gt;&lt;p&gt;This is Ralph&amp;rsquo;s defining design choice. The README emphasizes that every iteration is a brand-new AI instance, and cross-iteration memory lives only in git, &lt;code&gt;progress.txt&lt;/code&gt;, and &lt;code&gt;prd.json&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;That is very different from the common pattern of keeping &lt;code&gt;Claude Code&lt;/code&gt; or another tool inside one long conversation. Once tasks get larger, that approach often slows down under its own history and gradually loses focus. Ralph accepts that no single round should remember everything, then moves memory into files instead.&lt;/p&gt;
&lt;h3 id=&#34;2-it-forces-tasks-to-stay-small&#34;&gt;2. It forces tasks to stay small
&lt;/h3&gt;&lt;p&gt;The docs explicitly say that each PRD item must be small enough to finish within one context window. Tasks like adding a filter, updating a server action, or adding a database column are about the right size. Tasks like rebuilding the whole API or creating an entire dashboard are too large.&lt;/p&gt;
&lt;p&gt;That constraint is practical. Many autonomous agent loops fail not because the loop is bad, but because the task slicing is too coarse and each round carries too much at once.&lt;/p&gt;
&lt;h3 id=&#34;3-it-preserves-learnings-not-just-code&#34;&gt;3. It preserves learnings, not just code
&lt;/h3&gt;&lt;p&gt;Beyond &lt;code&gt;progress.txt&lt;/code&gt;, the README also stresses updating &lt;code&gt;AGENTS.md&lt;/code&gt;. The reason is straightforward: future iterations and future developers will read those notes, so patterns, gotchas, and conventions discovered in each round should be written down in the project itself.&lt;/p&gt;
&lt;p&gt;Put differently, Ralph is not only trying to keep an agent coding continuously. It is also trying to help the agent build working memory about the codebase over time.&lt;/p&gt;
&lt;h2 id=&#34;04-when-it-fits-best&#34;&gt;04 When It Fits Best
&lt;/h2&gt;&lt;p&gt;Ralph is a good fit when your task looks like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It can already be broken into a clear set of user stories&lt;/li&gt;
&lt;li&gt;The codebase has reliable feedback loops such as tests, typecheck, or CI&lt;/li&gt;
&lt;li&gt;You want the agent to keep moving forward without putting everything into one long conversation&lt;/li&gt;
&lt;li&gt;You are fine with iterative progress instead of demanding a one-shot completion&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On the other hand, if the requirement is still vague, or the work depends on frequent discussion and constant changes of direction, Ralph may not be the first thing to reach for. It fits better once the requirements are already shaped and execution needs to be steady.&lt;/p&gt;
&lt;h2 id=&#34;05-how-it-differs-from-normal-claude-code-usage&#34;&gt;05 How It Differs from Normal Claude Code Usage
&lt;/h2&gt;&lt;p&gt;With plain &lt;code&gt;Claude Code&lt;/code&gt;, the usual pattern is simple: open a session and let it keep reading code, editing files, and running commands. That works very well for small and medium tasks, but larger tasks often hit two problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Context keeps growing&lt;/li&gt;
&lt;li&gt;Intermediate decisions are harder to preserve in a structured way&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ralph turns &lt;code&gt;Claude Code&lt;/code&gt; or &lt;code&gt;Amp&lt;/code&gt; into something closer to a batch executor:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The task source is &lt;code&gt;prd.json&lt;/code&gt;, not ad hoc chat instructions&lt;/li&gt;
&lt;li&gt;Each iteration recognizes only one story&lt;/li&gt;
&lt;li&gt;Completion state is written back to files&lt;/li&gt;
&lt;li&gt;Learnings go into &lt;code&gt;progress.txt&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Code changes are preserved in git&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So in practice, it feels less like a new AI assistant and more like an iteration controller added on top of a coding agent.&lt;/p&gt;
&lt;h2 id=&#34;06-one-important-requirement&#34;&gt;06 One Important Requirement
&lt;/h2&gt;&lt;p&gt;Whether Ralph works well depends less on the loop itself and more on the quality of your feedback loops. The README says this very directly: without typecheck, tests, and CI, errors will compound across later iterations.&lt;/p&gt;
&lt;p&gt;For frontend tasks, the repository even recommends adding browser verification to the acceptance criteria. Without real verification, an agent can easily confuse &amp;ldquo;it looks done&amp;rdquo; with &amp;ldquo;it actually works.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;That point is important. Ralph is not magical automation. It is more like a force multiplier for the engineering discipline you already have. If your project already has clear task breakdowns and reliable checks, Ralph becomes much more useful. If those foundations are missing, the loop will only repeat the confusion.&lt;/p&gt;
&lt;h2 id=&#34;07-one-sentence-summary&#34;&gt;07 One-Sentence Summary
&lt;/h2&gt;&lt;p&gt;What makes &lt;code&gt;Ralph&lt;/code&gt; worth studying is not that it introduces a huge amount of new infrastructure. It takes a simple but useful idea and turns it into a practical workflow: &lt;strong&gt;let &lt;code&gt;Claude Code&lt;/code&gt; or &lt;code&gt;Amp&lt;/code&gt; handle one small story per round, keep focus with fresh context, and preserve continuity through &lt;code&gt;git&lt;/code&gt;, &lt;code&gt;prd.json&lt;/code&gt;, and &lt;code&gt;progress.txt&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you are already using coding agents in real projects and keep getting stuck on how to push long tasks forward reliably, Ralph&amp;rsquo;s approach is well worth borrowing.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;GitHub repository: &lt;a class=&#34;link&#34; href=&#34;https://github.com/snarktank/ralph&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/snarktank/ralph&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Interactive flowchart: &lt;a class=&#34;link&#34; href=&#34;https://snarktank.github.io&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://snarktank.github.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>nuwa-skill: Turning &#34;distilling a person&#34; from an idea into an executable workflow</title>
        <link>https://knightli.com/en/2026/04/22/nuwa-skill-distill-how-someone-thinks/</link>
        <pubDate>Wed, 22 Apr 2026 16:20:00 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/22/nuwa-skill-distill-how-someone-thinks/</guid>
        <description>&lt;p&gt;&lt;code&gt;[alchaincyf/nuwa-skill](https://github.com/alchaincyf/nuwa-skill)&lt;/code&gt; can easily make people think of one thing first: using AI to answer in a famous person&amp;rsquo;s voice. But what makes it genuinely interesting is not whether it sounds convincing. The key is that it tries to turn &amp;ldquo;distilling how a person thinks&amp;rdquo; into a repeatable workflow.&lt;/p&gt;
&lt;p&gt;If that works, the value goes far beyond a few entertaining character prompts. It means taking someone&amp;rsquo;s judgment framework, priorities, common heuristics, and communication habits, and turning them into a skill that can be called again and again. What you want is not a sentence that sounds like something a person might say, but something closer to a working interface for &amp;ldquo;if this person analyzed the issue, what would they look at first, how would they trade things off, and what would they question?&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;it-solves-modeling-not-imitation&#34;&gt;It solves modeling, not imitation
&lt;/h2&gt;&lt;p&gt;Many so-called persona prompts are basically just style overlays.&lt;/p&gt;
&lt;p&gt;They usually ask the model to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;speak in someone&amp;rsquo;s tone&lt;/li&gt;
&lt;li&gt;quote their signature lines more often&lt;/li&gt;
&lt;li&gt;imitate the phrasing they use in public&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That looks great in demos, but it often falls apart in real work. The reason is simple: tone is surface-level, while judgment structure is the core. A person is memorable not because they like a few certain words, but because they reliably approach problems in certain ways.&lt;/p&gt;
&lt;p&gt;The direction of &lt;code&gt;nuwa-skill&lt;/code&gt; is closer to extracting those stable methods. In other words, it cares less about &amp;ldquo;how to sound like them&amp;rdquo; and more about &amp;ldquo;how to think like them.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;a-more-complete-workflow&#34;&gt;A more complete workflow
&lt;/h2&gt;&lt;p&gt;From the repository description, &lt;code&gt;nuwa-skill&lt;/code&gt; aims to build an end-to-end flow: enter a person&amp;rsquo;s name, then automatically do the research, extraction, and validation, and finally organize the result into a skill that can be used inside Claude Code.&lt;/p&gt;
&lt;p&gt;There are several important shifts behind that idea.&lt;/p&gt;
&lt;p&gt;First, it assumes the person being distilled does not have to be your coworker. Many people first encounter this kind of idea in the form of &amp;ldquo;capture how a strong teammate works.&amp;rdquo; That is valuable, but it is also limited: the sample pool is small, and it usually only covers internal team experience. &lt;code&gt;nuwa-skill&lt;/code&gt; expands the target set to a much broader range of people, such as founders, investors, scientists, product managers, and writers.&lt;/p&gt;
&lt;p&gt;Second, it emphasizes automation rather than asking the user to handcraft prompts. What really makes this kind of capability practical is not beautiful prompt wording, but whether you can consistently do source gathering, viewpoint synthesis, pattern extraction, and result validation. As soon as any one of those steps depends entirely on manual work, the reuse cost rises quickly.&lt;/p&gt;
&lt;p&gt;Third, it tries to make the output a skill rather than a one-off conversation. The former can be reused, combined, and iterated on. The latter usually only works in the current context and falls apart after a few turns.&lt;/p&gt;
&lt;h2 id=&#34;why-this-direction-matters&#34;&gt;Why this direction matters
&lt;/h2&gt;&lt;p&gt;If you treat AI as a question-answering machine, the natural use case is &amp;ldquo;give me an answer.&amp;rdquo; But if you treat AI as a workbench, the question becomes &amp;ldquo;give me a way to look at this problem.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;That is where the value of &lt;code&gt;nuwa-skill&lt;/code&gt; leans.&lt;/p&gt;
&lt;p&gt;For example, when facing a product decision, what you want may not be one standard answer. You may want several sharply different analytical frames:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;one person starts with long-term compounding&lt;/li&gt;
&lt;li&gt;one starts with resource constraints&lt;/li&gt;
&lt;li&gt;one starts with consistency of user experience&lt;/li&gt;
&lt;li&gt;one starts with timing of market entry&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If those frames can be packaged reliably, AI stops being &amp;ldquo;something that writes a paragraph for you&amp;rdquo; and becomes &amp;ldquo;something that helps you switch perspectives quickly.&amp;rdquo; That is much more useful than simply imitating famous quotes, because it directly affects decision quality.&lt;/p&gt;
&lt;h2 id=&#34;its-most-compelling-part-turning-tacit-knowledge-into-callable-assets&#34;&gt;Its most compelling part: turning tacit knowledge into callable assets
&lt;/h2&gt;&lt;p&gt;Many high-value capabilities are hard to write down as SOPs in the first place.&lt;/p&gt;
&lt;p&gt;Why someone consistently judges better than others is often not because they know more explicit rules, but because they have built a tacit filtering system through years of practice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;which signals deserve attention first&lt;/li&gt;
&lt;li&gt;which noise should be ignored immediately&lt;/li&gt;
&lt;li&gt;which questions should be broken apart&lt;/li&gt;
&lt;li&gt;which questions should be inverted&lt;/li&gt;
&lt;li&gt;which conclusions must wait for more evidence&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This kind of ability is hard to preserve because people cannot always explain it clearly themselves. That is exactly why structured extraction is so valuable. What makes &lt;code&gt;nuwa-skill&lt;/code&gt; appealing is that it is not trying to move around surface knowledge. It is trying to reorganize cognitive habits.&lt;/p&gt;
&lt;h2 id=&#34;where-it-fits-best&#34;&gt;Where it fits best
&lt;/h2&gt;&lt;p&gt;I think this kind of skill is especially useful in a few scenarios.&lt;/p&gt;
&lt;h3 id=&#34;1-multi-perspective-review-before-a-decision&#34;&gt;1. Multi-perspective review before a decision
&lt;/h3&gt;&lt;p&gt;If you already have a plan but worry that you are only thinking along the path you already know, switching into different &amp;ldquo;persona perspectives&amp;rdquo; to review the same issue is more valuable than asking the model to keep expanding your original wording.&lt;/p&gt;
&lt;h3 id=&#34;2-learning-the-judgment-framework-of-a-certain-kind-of-expert&#34;&gt;2. Learning the judgment framework of a certain kind of expert
&lt;/h3&gt;&lt;p&gt;Many people learn from experts by collecting quotes, watching interviews, and copying summaries. In the end, they often only remember a few nice lines. Once a thinking pattern becomes a skill, learning becomes much closer to &amp;ldquo;repeatedly invoking it with real questions&amp;rdquo; rather than &amp;ldquo;making a pile of static notes.&amp;rdquo;&lt;/p&gt;
&lt;h3 id=&#34;3-sharing-an-analytical-style-across-a-team&#34;&gt;3. Sharing an analytical style across a team
&lt;/h3&gt;&lt;p&gt;What teams truly lack is often not just documentation, but a shared answer to &amp;ldquo;how do we usually think when we hit a problem?&amp;rdquo; If this workflow matures further, it could also be used in reverse to preserve the methods of strong internal operators. It is just clear that the project does not want to limit the idea to internal use cases.&lt;/p&gt;
&lt;h2 id=&#34;the-hard-part-of-projects-like-this&#34;&gt;The hard part of projects like this
&lt;/h2&gt;&lt;p&gt;Of course, an attractive direction does not mean the hard problems are already solved.&lt;/p&gt;
&lt;p&gt;The real challenge is never simply installing a skill. It is things like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;whether the sources are reliable enough&lt;/li&gt;
&lt;li&gt;whether the extracted patterns are stable rather than illusions from scattered text&lt;/li&gt;
&lt;li&gt;whether the model is actually using a person&amp;rsquo;s framework or merely repeating common impressions&lt;/li&gt;
&lt;li&gt;whether the boundaries between different personas will blur inside the model&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, the key question is not &amp;ldquo;can it generate something that sounds plausible?&amp;rdquo; It is &amp;ldquo;can the cognitive framework produced by this skill survive reuse across many tasks?&amp;rdquo; If the project keeps going deeper on validation, its credibility will improve a lot.&lt;/p&gt;
&lt;h2 id=&#34;why-it-goes-beyond-a-prompt-template-library&#34;&gt;Why it goes beyond a prompt template library
&lt;/h2&gt;&lt;p&gt;In the past, many projects handled this kind of capability as a prompt template library: one persona, one prompt, and the user copies it into a chat. The problem is that a template library is still basically a static asset. It updates slowly, validation is weak, and it is hard to turn it into a complete production workflow.&lt;/p&gt;
&lt;p&gt;What &lt;code&gt;nuwa-skill&lt;/code&gt; pushes further is that it turns &amp;ldquo;persona distillation&amp;rdquo; from a template problem into a workflow problem.&lt;/p&gt;
&lt;p&gt;Once the center of gravity shifts from &amp;ldquo;write a prompt&amp;rdquo; to &amp;ldquo;systematically generate, validate, and iterate on a persona skill,&amp;rdquo; the whole thing starts to look more like engineering than inspiration. For anyone who wants to use it over the long term, that is the more important shift.&lt;/p&gt;
&lt;h2 id=&#34;closing&#34;&gt;Closing
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;nuwa-skill&lt;/code&gt; is interesting not because it turns AI into a celebrity impression show, but because it pushes &amp;ldquo;how to learn how someone thinks&amp;rdquo; one step closer to something executable, reusable, and iterable.&lt;/p&gt;
&lt;p&gt;If many persona prompts solve &amp;ldquo;how to talk like someone,&amp;rdquo; what this project wants to solve is &amp;ldquo;how to look at problems the way someone does.&amp;rdquo; The former is great for demos. The latter is much closer to a real productivity tool.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;GitHub repository: &lt;a class=&#34;link&#34; href=&#34;https://github.com/alchaincyf/nuwa-skill&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/alchaincyf/nuwa-skill&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Project README: &lt;a class=&#34;link&#34; href=&#34;https://github.com/alchaincyf/nuwa-skill/blob/main/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/alchaincyf/nuwa-skill/blob/main/README.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Skill definition: &lt;a class=&#34;link&#34; href=&#34;https://github.com/alchaincyf/nuwa-skill/blob/main/SKILL.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/alchaincyf/nuwa-skill/blob/main/SKILL.md&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>RAGFlow Project Notes: Features and Usage of an Open-Source RAG Engine</title>
        <link>https://knightli.com/en/2026/04/15/ragflow-rag-engine-guide/</link>
        <pubDate>Wed, 15 Apr 2026 22:09:25 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/15/ragflow-rag-engine-guide/</guid>
        <description>&lt;p&gt;&lt;code&gt;RAGFlow&lt;/code&gt; is an open-source RAG engine from &lt;code&gt;infiniflow&lt;/code&gt;. Its goal is not merely to provide a thin “upload documents and ask questions” shell, but to bring document parsing, chunking, retrieval, reranking, citation tracing, model configuration, agent capabilities, and API integration into one complete workflow.&lt;/p&gt;
&lt;p&gt;If you are building an enterprise knowledge base, document Q&amp;amp;A, a support assistant, internal information retrieval, or you want to give an LLM a more reliable context layer, RAGFlow is one of the open-source options worth serious attention.&lt;/p&gt;
&lt;h2 id=&#34;01-what-problem-ragflow-solves&#34;&gt;01 What Problem RAGFlow Solves
&lt;/h2&gt;&lt;p&gt;Most RAG systems run into three common issues:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Document parsing is unstable, especially for PDFs, scanned files, tables, images, and complex layouts.&lt;/li&gt;
&lt;li&gt;Chunking strategy is opaque, so retrieval may look correct while the actual context is incomplete.&lt;/li&gt;
&lt;li&gt;Answers lack trustworthy citations, making it hard for users to verify where the response came from.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;RAGFlow focuses on exactly these problems. The project README emphasizes &lt;code&gt;Deep document understanding&lt;/code&gt;, template-based chunking, chunk visualization, citation grounding, and multi-path retrieval with reranking. In other words, it cares more about “high-quality input leads to high-quality answers” than simply wiring a vector database to a chat UI.&lt;/p&gt;
&lt;h2 id=&#34;02-core-features&#34;&gt;02 Core Features
&lt;/h2&gt;&lt;h3 id=&#34;1-deep-document-understanding&#34;&gt;1. Deep Document Understanding
&lt;/h3&gt;&lt;p&gt;RAGFlow can extract knowledge from complex unstructured data. The README lists formats such as Word, PPT, Excel, TXT, images, scanned documents, structured data, and web pages.&lt;/p&gt;
&lt;p&gt;This matters a lot for enterprise knowledge bases. Real-world material is rarely clean Markdown. It is usually a mix of contracts, reports, tables, scanned PDFs, product manuals, screenshots, and web content. If parsing quality is weak, retrieval and LLM answers will both suffer.&lt;/p&gt;
&lt;h3 id=&#34;2-template-based-chunking&#34;&gt;2. Template-Based Chunking
&lt;/h3&gt;&lt;p&gt;RAGFlow provides template-based chunking. The value here is that chunking is not a black box; different document types can use different strategies.&lt;/p&gt;
&lt;p&gt;For example, articles, papers, tables, Q&amp;amp;A documents, image explanations, and contract clauses all need different chunk boundaries and granularity. Template-based chunking helps reduce problems like broken sentences, lost table context, and separated headings and body text.&lt;/p&gt;
&lt;h3 id=&#34;3-traceable-citations&#34;&gt;3. Traceable Citations
&lt;/h3&gt;&lt;p&gt;RAGFlow emphasizes grounded citations, meaning answers can be traced back to source passages. It also offers chunk visualization, making it easier for people to inspect and adjust parsing and chunking results.&lt;/p&gt;
&lt;p&gt;This is especially important in production. Internal enterprise Q&amp;amp;A is not only about producing something that “looks right”; it also has to be verifiable. For policy, compliance, finance, technical documents, and customer support content, citations and traceability are close to mandatory.&lt;/p&gt;
&lt;h3 id=&#34;4-automated-rag-workflow&#34;&gt;4. Automated RAG Workflow
&lt;/h3&gt;&lt;p&gt;RAGFlow turns the RAG lifecycle into a more complete workflow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create a knowledge base&lt;/li&gt;
&lt;li&gt;Upload or sync data&lt;/li&gt;
&lt;li&gt;Parse documents&lt;/li&gt;
&lt;li&gt;Review and adjust chunks&lt;/li&gt;
&lt;li&gt;Configure LLM and embedding models&lt;/li&gt;
&lt;li&gt;Run multi-path retrieval and reranking&lt;/li&gt;
&lt;li&gt;Build chat assistants&lt;/li&gt;
&lt;li&gt;Integrate through APIs into business systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That makes it closer to a RAG platform than a single library. For teams, both the UI and the API matter: non-engineers can maintain the knowledge base, while engineers can integrate the capability into existing systems.&lt;/p&gt;
&lt;h3 id=&#34;5-agent-mcp-and-workflow-extensions&#34;&gt;5. Agent, MCP, and Workflow Extensions
&lt;/h3&gt;&lt;p&gt;Recent RAGFlow updates already include Agentic workflow, MCP, Agent Memory, and code execution components. That suggests it is no longer limited to traditional knowledge-base Q&amp;amp;A and is also moving toward agent-oriented scenarios.&lt;/p&gt;
&lt;p&gt;A typical pattern is that an agent can use RAGFlow as a reliable enterprise knowledge layer: retrieve from the knowledge base when it needs context, generate answers with citations, and combine that with tools or workflow steps when necessary.&lt;/p&gt;
&lt;h2 id=&#34;03-basic-usage-flow&#34;&gt;03 Basic Usage Flow
&lt;/h2&gt;&lt;p&gt;According to the official quickstart documentation, the common usage path for RAGFlow can be summarized in the following steps.&lt;/p&gt;
&lt;h3 id=&#34;1-prepare-the-environment&#34;&gt;1. Prepare the Environment
&lt;/h3&gt;&lt;p&gt;The basic requirements listed in the official README are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU &amp;gt;= 4 cores&lt;/li&gt;
&lt;li&gt;RAM &amp;gt;= 16 GB&lt;/li&gt;
&lt;li&gt;Disk &amp;gt;= 50 GB&lt;/li&gt;
&lt;li&gt;Docker &amp;gt;= 24.0.0&lt;/li&gt;
&lt;li&gt;Docker Compose &amp;gt;= v2.26.1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want to use the sandbox for the code executor, you also need &lt;code&gt;gVisor&lt;/code&gt;. Another practical note is that the official Docker images mainly target x86 platforms. For ARM64, the project documentation recommends building the image yourself.&lt;/p&gt;
&lt;h3 id=&#34;2-clone-the-project&#34;&gt;2. Clone the Project
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone https://github.com/infiniflow/ragflow.git
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; ragflow/docker
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;3-check-vmmax_map_count&#34;&gt;3. Check &lt;code&gt;vm.max_map_count&lt;/code&gt;
&lt;/h3&gt;&lt;p&gt;RAGFlow deployment depends on components such as Elasticsearch or OpenSearch, so on Linux you usually need to verify:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sysctl vm.max_map_count
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the value is below &lt;code&gt;262144&lt;/code&gt;, you can set it temporarily:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo sysctl -w vm.max_map_count&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;262144&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want the change to persist after reboot, add it to &lt;code&gt;/etc/sysctl.conf&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;4-start-with-docker-compose&#34;&gt;4. Start with Docker Compose
&lt;/h3&gt;&lt;p&gt;You can start the CPU mode directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker compose -f docker-compose.yml up -d
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want GPU acceleration for DeepDoc tasks, the README shows enabling &lt;code&gt;DEVICE=gpu&lt;/code&gt; in &lt;code&gt;.env&lt;/code&gt; before startup:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sed -i &lt;span class=&#34;s1&#34;&gt;&amp;#39;1i DEVICE=gpu&amp;#39;&lt;/span&gt; .env
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker compose -f docker-compose.yml up -d
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then inspect the logs:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker logs -f docker-ragflow-cpu-1
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Once the services are ready, open the machine address in your browser. Under the default configuration, that is typically:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;http://IP_OF_YOUR_MACHINE
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;5-configure-model-api-keys&#34;&gt;5. Configure Model API Keys
&lt;/h3&gt;&lt;p&gt;RAGFlow needs LLM and embedding model configuration. The README mentions choosing the default LLM factory in &lt;code&gt;service_conf.yaml.template&lt;/code&gt; and updating the corresponding &lt;code&gt;API_KEY&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In practice, you need to configure models according to your provider:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chat model&lt;/li&gt;
&lt;li&gt;Embedding model&lt;/li&gt;
&lt;li&gt;Rerank model&lt;/li&gt;
&lt;li&gt;Multimodal model, if you want to understand images inside PDFs or DOCX files&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;6-create-the-knowledge-base-and-upload-documents&#34;&gt;6. Create the Knowledge Base and Upload Documents
&lt;/h3&gt;&lt;p&gt;After the service starts, the typical workflow is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Log in to the Web UI.&lt;/li&gt;
&lt;li&gt;Create a dataset or knowledge base.&lt;/li&gt;
&lt;li&gt;Upload documents or configure a data source sync.&lt;/li&gt;
&lt;li&gt;Wait for parsing to finish.&lt;/li&gt;
&lt;li&gt;Inspect chunk results and adjust them when necessary.&lt;/li&gt;
&lt;li&gt;Create a chat assistant and attach the knowledge base.&lt;/li&gt;
&lt;li&gt;Test answer quality and citation sources.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you need to integrate with a business system, you can continue with the RAGFlow API or SDK and connect retrieval and chat capabilities to your own application.&lt;/p&gt;
&lt;h2 id=&#34;04-suitable-scenarios&#34;&gt;04 Suitable Scenarios
&lt;/h2&gt;&lt;p&gt;RAGFlow fits these kinds of needs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enterprise internal knowledge-base Q&amp;amp;A&lt;/li&gt;
&lt;li&gt;Product manuals, technical documentation, and FAQ retrieval&lt;/li&gt;
&lt;li&gt;Customer support and pre-sales assistants&lt;/li&gt;
&lt;li&gt;Traceable Q&amp;amp;A over contracts, reports, and policy documents&lt;/li&gt;
&lt;li&gt;Unified handling of multi-format materials&lt;/li&gt;
&lt;li&gt;Teams that want both UI-based maintenance and API integration&lt;/li&gt;
&lt;li&gt;Systems that want to use RAG as the context layer for agents&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is especially suitable when document formats are complex, citations matter, and people want to inspect or intervene in parsing results.&lt;/p&gt;
&lt;h2 id=&#34;05-what-to-watch-out-for&#34;&gt;05 What to Watch Out For
&lt;/h2&gt;&lt;p&gt;First, RAGFlow is not a lightweight script. It has real infrastructure requirements. The official recommendation is at least 4 CPU cores, 16 GB RAM, and 50 GB disk. If you only want Q&amp;amp;A over a small amount of Markdown, a full platform may be unnecessary.&lt;/p&gt;
&lt;p&gt;Second, document quality still matters. RAGFlow can improve parsing and chunking, but it cannot magically make low-quality, outdated, or contradictory source material reliable. Knowledge-base governance still matters before production.&lt;/p&gt;
&lt;p&gt;Third, model selection directly affects quality. Embedding, rerank, chat, and multimodal model choices all influence retrieval and answer quality. RAGFlow gives you the workflow, but the final result still depends on data, models, and tuning.&lt;/p&gt;
&lt;p&gt;Fourth, production deployments need careful attention to permissions and data security. Enterprise knowledge bases often contain internal documents, so deployment model, access control, logs, API keys, and model-provider data policy all need to be designed in advance.&lt;/p&gt;
&lt;h2 id=&#34;06-quick-take&#34;&gt;06 Quick Take
&lt;/h2&gt;&lt;p&gt;RAGFlow’s strength is that it turns the hardest parts of RAG into platform capabilities: complex document parsing, explainable chunking, citation grounding, multi-path retrieval, reranking, model configuration, Web UI, API access, and agent extensions.&lt;/p&gt;
&lt;p&gt;If what you need is a verifiable, maintainable enterprise knowledge base that can connect to business systems, RAGFlow is more complete than a “vector database plus a simple chat UI” setup. On the other hand, if you only need small-scale personal Q&amp;amp;A over simple data, a lighter RAG framework may be more resource-efficient.&lt;/p&gt;
&lt;h2 id=&#34;related-links&#34;&gt;Related Links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;GitHub project: &lt;a class=&#34;link&#34; href=&#34;https://github.com/infiniflow/ragflow&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/infiniflow/ragflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Official docs: &lt;a class=&#34;link&#34; href=&#34;https://ragflow.io/docs/dev/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://ragflow.io/docs/dev/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Online demo: &lt;a class=&#34;link&#34; href=&#34;https://cloud.ragflow.io&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://cloud.ragflow.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Firecrawl Project Notes: Web Search, Scraping, and Interaction APIs for AI Agents</title>
        <link>https://knightli.com/en/2026/04/15/firecrawl-ai-web-data-api/</link>
        <pubDate>Wed, 15 Apr 2026 13:45:03 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/15/firecrawl-ai-web-data-api/</guid>
        <description>&lt;p&gt;&lt;code&gt;Firecrawl&lt;/code&gt; has a clear purpose: turning web pages into data that AI agents can consume more easily. It is not just a crawler script. It wraps search, single-page scraping, site crawling, page interaction, structured extraction, and agent workflows into APIs, so models and automation systems can spend less effort dealing with web noise.&lt;/p&gt;
&lt;h2 id=&#34;01-what-it-solves&#34;&gt;01 What It Solves
&lt;/h2&gt;&lt;p&gt;Many AI applications need to read web pages, but real websites are messy: JavaScript-rendered content, pop-ups, pagination, login state, anti-bot defenses, PDFs or DOCX files, and plenty of navigation, ads, scripts, and styling that have nothing to do with the main content.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Firecrawl&lt;/code&gt; tries to solve this middle-layer problem. The application asks for data from a page, a site, or a topic; Firecrawl handles opening, scraping, cleaning, and returning output in formats that are easier for LLMs to use, such as Markdown, HTML, screenshots, or JSON.&lt;/p&gt;
&lt;p&gt;The value of this kind of tool is not merely whether it can request a URL. The real question is whether it can reliably turn complex pages into usable data. For RAG, AI search, competitive research, automated information gathering, and web content monitoring, this layer often becomes the unpleasant plumbing in the system.&lt;/p&gt;
&lt;h2 id=&#34;02-core-features&#34;&gt;02 Core Features
&lt;/h2&gt;&lt;p&gt;The Firecrawl README groups its capabilities into several areas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Search&lt;/code&gt;: Search the web and return full page content from the results.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Scrape&lt;/code&gt;: Convert a single URL into Markdown, HTML, screenshots, or structured JSON.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Interact&lt;/code&gt;: Scrape a page, then use prompts or code to click, scroll, type, wait, and perform other actions.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Agent&lt;/code&gt;: Describe what you want, and let the agent search, navigate, and return the result.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Crawl&lt;/code&gt;: Scrape multiple pages under a website.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Map&lt;/code&gt;: Quickly discover URLs on a website.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Batch Scrape&lt;/code&gt;: Asynchronously scrape large batches of URLs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At first glance, it looks like a scraping service. But as a full set of features, it is closer to a data entry point for AI applications: search discovers sources, scraping cleans content, interaction handles dynamic pages, and Agent pushes the whole &amp;ldquo;find information&amp;rdquo; task further toward automation.&lt;/p&gt;
&lt;h2 id=&#34;03-why-it-fits-ai-agents&#34;&gt;03 Why It Fits AI Agents
&lt;/h2&gt;&lt;p&gt;Traditional crawlers usually assume that you already know the URL and understand the page structure. Agent workflows are often different. A user might simply ask, &amp;ldquo;Find the differences between the latest pricing plans on a company&amp;rsquo;s pricing page.&amp;rdquo; The system then has to search, open pages, compare content, and return sources.&lt;/p&gt;
&lt;p&gt;Firecrawl&amp;rsquo;s &lt;code&gt;Agent&lt;/code&gt; endpoint is designed for this kind of task. It can accept only a natural-language prompt, or it can be constrained to specific URLs. If structured results are needed, it can also work with a schema to return fixed fields.&lt;/p&gt;
&lt;p&gt;This gives the application layer two benefits:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;You do not have to write a separate parser for every website.&lt;/li&gt;
&lt;li&gt;The returned result is easier to send into an LLM, a database, or a downstream automation flow.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Of course, this does not mean it replaces every custom crawler. For highly constrained, high-frequency, large-scale tasks with very stable fields, writing dedicated parsing logic may still be cheaper and easier to control. Firecrawl is a better fit when sources are scattered, page structures change often, and you want to connect web data to an AI workflow quickly.&lt;/p&gt;
&lt;h2 id=&#34;04-mcp-cli-and-integrations&#34;&gt;04 MCP, CLI, and Integrations
&lt;/h2&gt;&lt;p&gt;Firecrawl is also clearly moving toward the agent tooling ecosystem. The README provides MCP Server setup, along with Skill/CLI initialization commands for AI coding agents.&lt;/p&gt;
&lt;p&gt;This means it is not only intended for backend API calls. It also wants to plug directly into Claude Code, OpenCode, Antigravity, MCP clients, and similar workflows. For people who frequently ask agents to research, scrape, and organize web content, this kind of integration is lighter than hand-writing API calls.&lt;/p&gt;
&lt;p&gt;It also lists integrations with platforms such as Zapier, n8n, and Lovable. That direction is practical: web data does not always go into code. It may flow into automation tables, low-code workflows, content systems, or internal knowledge bases.&lt;/p&gt;
&lt;h2 id=&#34;05-open-source-self-hosting-and-licensing&#34;&gt;05 Open Source, Self-Hosting, and Licensing
&lt;/h2&gt;&lt;p&gt;Firecrawl is open source. The main repository is primarily licensed under &lt;code&gt;AGPL-3.0&lt;/code&gt;; the README also notes that SDKs and some UI components use the &lt;code&gt;MIT&lt;/code&gt; license, with details depending on the LICENSE files in each directory.&lt;/p&gt;
&lt;p&gt;This matters. If you only use the cloud service, the main concerns are API cost, reliability, and compliance boundaries. If you plan to self-host it and provide a service to others, the obligations of &lt;code&gt;AGPL-3.0&lt;/code&gt; need careful review.&lt;/p&gt;
&lt;p&gt;The README also reminds users to respect website policies, privacy policies, and terms of use, and says that Firecrawl respects &lt;code&gt;robots.txt&lt;/code&gt; by default. The stronger this type of tool becomes, the more important it is to design compliance and scraping boundaries into the system instead of patching them in after launch.&lt;/p&gt;
&lt;h2 id=&#34;06-suitable-use-cases&#34;&gt;06 Suitable Use Cases
&lt;/h2&gt;&lt;p&gt;I would consider Firecrawl first in these scenarios:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Scraping web content for a RAG system and wanting clean Markdown directly.&lt;/li&gt;
&lt;li&gt;Building AI search or research assistants that need to read full pages after search.&lt;/li&gt;
&lt;li&gt;Scraping JavaScript-heavy sites without maintaining a browser cluster yourself.&lt;/li&gt;
&lt;li&gt;Monitoring public information such as competitors, pricing, documentation, news, and job pages.&lt;/li&gt;
&lt;li&gt;Giving MCP clients or AI coding agents real-time web reading ability.&lt;/li&gt;
&lt;li&gt;Quickly validating a web-data product before building crawler infrastructure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The less suitable cases are also clear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The target site has very few fields, a stable structure, and can be handled by a simple script.&lt;/li&gt;
&lt;li&gt;The scraping volume is huge, and cost sensitivity matters more than development and maintenance cost.&lt;/li&gt;
&lt;li&gt;The business needs very fine control over sources, retry strategy, anti-bot behavior, and audit trails.&lt;/li&gt;
&lt;li&gt;Licensing or compliance requirements do not allow AGPL components or external cloud services.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;07-quick-take&#34;&gt;07 Quick Take
&lt;/h2&gt;&lt;p&gt;Firecrawl&amp;rsquo;s core value is productizing the messy path from &amp;ldquo;web page&amp;rdquo; to &amp;ldquo;AI-usable data.&amp;rdquo; It puts search, scraping, cleaning, interaction, batch processing, and agent-style research into one interface, which is convenient for AI application developers.&lt;/p&gt;
&lt;p&gt;If your project often needs models to read real web pages, especially when sources are scattered, structures are unstable, and MCP or agent workflows are involved, Firecrawl is worth keeping in the toolbox. If the task is just low-cost bulk collection from fixed websites, a traditional crawler or dedicated parser may still be the better choice.&lt;/p&gt;
&lt;h2 id=&#34;related-links&#34;&gt;Related Links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;GitHub project: &lt;a class=&#34;link&#34; href=&#34;https://github.com/firecrawl/firecrawl&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/firecrawl/firecrawl&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>What Is OpenHarness: What This Open Source Agent Harness Can Do</title>
        <link>https://knightli.com/en/2026/04/12/openharness-basic-functions/</link>
        <pubDate>Sun, 12 Apr 2026 23:45:00 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/12/openharness-basic-functions/</guid>
        <description>&lt;p&gt;If you have been following open source AI agent tools lately, &lt;code&gt;HKUDS/OpenHarness&lt;/code&gt; is a project worth watching. It is not just another chat wrapper. Instead, it pulls the infrastructure layer for a runnable, extensible, and governable agent into a standalone open source &lt;strong&gt;Agent Harness&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;According to the official README, OpenHarness provides a lightweight but fairly complete set of agent capabilities, including tool calling, skill loading, memory, permission governance, and multi-agent coordination. The bundled &lt;code&gt;ohmo&lt;/code&gt; is the personal AI assistant application built on top of that foundation.&lt;/p&gt;
&lt;h2 id=&#34;01-what-is-openharness&#34;&gt;01 What Is OpenHarness
&lt;/h2&gt;&lt;p&gt;You can think of OpenHarness as the runtime layer that gives a foundation model hands, memory, and boundaries.&lt;/p&gt;
&lt;p&gt;A model may already be good at reasoning and generation, but if you want it to function as a long-running agent, it usually still needs these surrounding capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Calling tools instead of only producing text&lt;/li&gt;
&lt;li&gt;Reading and writing files, executing commands, and using search and web access&lt;/li&gt;
&lt;li&gt;Preserving context and memory across long sessions&lt;/li&gt;
&lt;li&gt;Applying permission controls to risky actions&lt;/li&gt;
&lt;li&gt;Splitting larger tasks across multiple sub-agents in parallel&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The goal of OpenHarness is to turn that engineering layer around the model into a clear, open source, inspectable Python implementation. It is closer to an agent operating substrate than to a single model experience or a single chat interface.&lt;/p&gt;
&lt;h2 id=&#34;02-the-projects-basic-functions&#34;&gt;02 The Project&amp;rsquo;s Basic Functions
&lt;/h2&gt;&lt;p&gt;Based on the current GitHub homepage and README, OpenHarness centers on the following capability areas.&lt;/p&gt;
&lt;h3 id=&#34;1-agent-loop&#34;&gt;1. Agent Loop
&lt;/h3&gt;&lt;p&gt;This is the core execution loop that lets an agent keep working over multiple steps. The official highlights include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Streaming tool-calling loops&lt;/li&gt;
&lt;li&gt;API retries with exponential backoff&lt;/li&gt;
&lt;li&gt;Parallel tool execution&lt;/li&gt;
&lt;li&gt;Token accounting and cost tracking&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The practical point is that the agent is not limited to a one-shot response. It can observe, reason, call tools, read results, and continue iterating within the same task.&lt;/p&gt;
&lt;h3 id=&#34;2-tools-skills-and-plugins&#34;&gt;2. Tools, Skills, and Plugins
&lt;/h3&gt;&lt;p&gt;OpenHarness puts serious effort into the tool layer. The project page says it already includes built-in tools for files, Shell, search, web access, and MCP, and it supports on-demand loading of Markdown skill files.&lt;/p&gt;
&lt;p&gt;Its value is not only that it has many tools, but that the composition model is fairly open:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You can use built-in tools directly&lt;/li&gt;
&lt;li&gt;You can load skills for a specific task&lt;/li&gt;
&lt;li&gt;You can extend hooks, skills, and agents through plugins&lt;/li&gt;
&lt;li&gt;It is compatible with the &lt;code&gt;anthropics/skills&lt;/code&gt; ecosystem and related plugins&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want to turn repeated workflows into reusable capabilities rather than re-describing them in prompts every time, this layer is especially useful.&lt;/p&gt;
&lt;h3 id=&#34;3-context-and-memory&#34;&gt;3. Context and Memory
&lt;/h3&gt;&lt;p&gt;This is one of the more important differentiators in OpenHarness. The official keywords include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; discovery and injection&lt;/li&gt;
&lt;li&gt;Automatic context compression&lt;/li&gt;
&lt;li&gt;Persistent memory through &lt;code&gt;MEMORY.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Session recovery and history continuation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That means it is not only reacting to the current input. It is designed to preserve project conventions, historical tasks, and long-term preferences, making the agent better suited for ongoing work instead of always starting from scratch.&lt;/p&gt;
&lt;h3 id=&#34;4-permission-governance-and-safety-boundaries&#34;&gt;4. Permission Governance and Safety Boundaries
&lt;/h3&gt;&lt;p&gt;Once an agent starts interacting with the filesystem, terminal, and network, governance becomes critical. OpenHarness provides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multiple permission modes&lt;/li&gt;
&lt;li&gt;Rule controls based on paths and commands&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PreToolUse&lt;/code&gt; / &lt;code&gt;PostToolUse&lt;/code&gt; hooks&lt;/li&gt;
&lt;li&gt;Interactive approval prompts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, it is not only about enabling the agent to do things. It also defines which things can be done directly and which ones should require confirmation first.&lt;/p&gt;
&lt;h3 id=&#34;5-multi-agent-coordination&#34;&gt;5. Multi-Agent Coordination
&lt;/h3&gt;&lt;p&gt;OpenHarness also supports delegating work to sub-agents. The currently public materials mention capabilities such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sub-agent creation and delegation&lt;/li&gt;
&lt;li&gt;Team registration and task management&lt;/li&gt;
&lt;li&gt;Background task lifecycle management&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more complex work, this means it can move beyond a single serial agent and attempt parallel collaboration.&lt;/p&gt;
&lt;h3 id=&#34;6-multi-provider-workflows&#34;&gt;6. Multi-Provider Workflows
&lt;/h3&gt;&lt;p&gt;OpenHarness does not treat providers as mere API labels. It abstracts them as workflow + profile combinations. According to the README, current directions include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Claude / Anthropic-compatible&lt;/li&gt;
&lt;li&gt;OpenAI-compatible&lt;/li&gt;
&lt;li&gt;Codex Subscription&lt;/li&gt;
&lt;li&gt;GitHub Copilot&lt;/li&gt;
&lt;li&gt;Compatible backends such as Moonshot(Kimi), GLM, and MiniMax&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That makes it feel more like a multi-model, multi-entry agent runtime framework rather than something tied to a single vendor.&lt;/p&gt;
&lt;h3 id=&#34;7-react-tui-and-non-interactive-mode&#34;&gt;7. React TUI and Non-Interactive Mode
&lt;/h3&gt;&lt;p&gt;OpenHarness ships with a terminal UI. Running &lt;code&gt;oh&lt;/code&gt; opens a React/Ink TUI, and the official README says it supports:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A command picker&lt;/li&gt;
&lt;li&gt;Permission confirmation&lt;/li&gt;
&lt;li&gt;Model switching&lt;/li&gt;
&lt;li&gt;Provider switching&lt;/li&gt;
&lt;li&gt;Session recovery&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you do not want to enter an interactive interface, you can also use non-interactive mode to run a task once and return the result as standard output, JSON, or streaming JSON, which is helpful for scripting and automation.&lt;/p&gt;
&lt;h2 id=&#34;03-what-is-ohmo&#34;&gt;03 What Is &lt;code&gt;ohmo&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;If OpenHarness is the infrastructure layer, &lt;code&gt;ohmo&lt;/code&gt; is the personal agent application built on top of it.&lt;/p&gt;
&lt;p&gt;The project homepage is very clear about its positioning: it is not just a generic chatbot, but a personal assistant that can keep working across long conversations. The official description says it can interact with you through channels such as Feishu, Slack, Telegram, and Discord, and carry out tasks like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;forking a branch&lt;/li&gt;
&lt;li&gt;writing code&lt;/li&gt;
&lt;li&gt;running tests&lt;/li&gt;
&lt;li&gt;opening a PR&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The README also highlights that &lt;code&gt;ohmo&lt;/code&gt; can run on top of your existing Claude Code or Codex subscription, so it does not necessarily require you to provision a new API key. For people already using those subscriptions, that lowers the barrier considerably.&lt;/p&gt;
&lt;h2 id=&#34;04-what-scenarios-it-fits&#34;&gt;04 What Scenarios It Fits
&lt;/h2&gt;&lt;p&gt;From the currently public capabilities, OpenHarness is a strong fit for people who:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Want to study what a production-grade agent is actually made of&lt;/li&gt;
&lt;li&gt;Want to build an extensible open source agent runtime of their own&lt;/li&gt;
&lt;li&gt;Want tools, skills, memory, permissions, and multi-agent coordination in one framework&lt;/li&gt;
&lt;li&gt;Do not want to be locked into a single model vendor or client form factor&lt;/li&gt;
&lt;li&gt;Want to build vertical agents or personal assistants on top of an existing architecture&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your goal is simply to find a finished assistant that can chat right away, OpenHarness itself may not be the lightest option. But if you care more about agent infrastructure, engineering control, and long-term extensibility, it is a very worthwhile project to study.&lt;/p&gt;
&lt;h2 id=&#34;05-a-quick-way-to-understand-its-positioning&#34;&gt;05 A Quick Way to Understand Its Positioning
&lt;/h2&gt;&lt;p&gt;In one sentence:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;OpenHarness turns foundation models into agents that can actually execute work, while &lt;code&gt;ohmo&lt;/code&gt; packages that capability into a personal assistant that can keep working with you over time.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You can also think of it as two layers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;OpenHarness: an open source Agent Harness, essentially the infrastructure layer&lt;/li&gt;
&lt;li&gt;ohmo: a personal-agent app built on top of that infrastructure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As of &lt;strong&gt;April 12, 2026&lt;/strong&gt;, the GitHub homepage shows the project had already advanced to &lt;strong&gt;v0.1.6 (April 10, 2026)&lt;/strong&gt;, with continued emphasis on automatic context compression, MCP transport support, the React TUI, and runtime stability for multi-agent workflows. That suggests it is still evolving quickly, but its direction is already quite clear.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;GitHub repository: &lt;a class=&#34;link&#34; href=&#34;https://github.com/HKUDS/OpenHarness&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/HKUDS/OpenHarness&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;English README: &lt;a class=&#34;link&#34; href=&#34;https://github.com/HKUDS/OpenHarness/blob/main/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/HKUDS/OpenHarness/blob/main/README.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Chinese README: &lt;a class=&#34;link&#34; href=&#34;https://github.com/HKUDS/OpenHarness/blob/main/README.zh-CN.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/HKUDS/OpenHarness/blob/main/README.zh-CN.md&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Getting Started with Playwright CLI: Installation, Skills, Sessions, and Essential Commands</title>
        <link>https://knightli.com/en/2026/04/12/playwright-cli-getting-started/</link>
        <pubDate>Sun, 12 Apr 2026 14:36:58 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/12/playwright-cli-getting-started/</guid>
        <description>&lt;p&gt;If you have been using Claude Code, GitHub Copilot, or other coding agents for browser automation, &lt;code&gt;microsoft/playwright-cli&lt;/code&gt; is a tool worth watching. It is not the traditional kind of browser helper meant mainly for humans typing commands by hand. Instead, it is a Playwright CLI designed for coding agents, with an emphasis on lower token overhead, a lighter command interface, and integration with Skills-based workflows.&lt;/p&gt;
&lt;p&gt;From the official README, the core idea behind Playwright CLI is very clear: compared with MCP, which can push large tool schemas and page structure into the model context, the CLI approach is more compact and better suited for agent workflows that constantly switch between large codebases, tests, and browser automation.&lt;/p&gt;
&lt;h2 id=&#34;01-what-playwright-cli-is&#34;&gt;01 What Playwright CLI is
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;playwright-cli&lt;/code&gt; is an open-source Playwright command-line tool from Microsoft. The official description is “CLI for common Playwright actions.” It is mainly used for tasks like these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Opening pages and driving the browser&lt;/li&gt;
&lt;li&gt;Recording and generating Playwright code&lt;/li&gt;
&lt;li&gt;Capturing page snapshots to get element references&lt;/li&gt;
&lt;li&gt;Taking screenshots and exporting PDFs&lt;/li&gt;
&lt;li&gt;Working with coding agents for test automation and web interaction&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The current GitHub README is very explicit about its positioning: if you are using coding agents, the CLI is often a better fit than Playwright MCP; if you need persistent state, richer introspection, and longer-running agentic loops, MCP still has its place.&lt;/p&gt;
&lt;p&gt;In other words, Playwright CLI feels more like a browser automation interface built for AI coding assistants, not just a tool for engineers to click around manually.&lt;/p&gt;
&lt;h2 id=&#34;02-where-it-stands-out&#34;&gt;02 Where it stands out
&lt;/h2&gt;&lt;h3 id=&#34;1-it-fits-agent-workflows-better&#34;&gt;1. It fits agent workflows better
&lt;/h3&gt;&lt;p&gt;The official README lists &lt;code&gt;Token-efficient&lt;/code&gt; as a key feature. It does not force full-page data into the LLM context. Instead, it lets the agent operate the browser through shorter and more focused commands.&lt;/p&gt;
&lt;p&gt;That matters a lot for coding agents. In real projects, an agent is not only driving the browser. It also has to read code, edit files, run tests, and inspect logs. If the browser interface itself consumes too much context, the overall workflow becomes less efficient.&lt;/p&gt;
&lt;h3 id=&#34;2-it-works-well-with-skills&#34;&gt;2. It works well with Skills
&lt;/h3&gt;&lt;p&gt;The README specifically highlights &lt;code&gt;playwright-cli install --skills&lt;/code&gt;. That shows Microsoft is not treating it as just another shell utility, but as something that can be consumed directly by Claude Code, GitHub Copilot, and similar agents through a Skills-based workflow.&lt;/p&gt;
&lt;p&gt;If your setup already relies on Skills, Playwright CLI should slot in naturally.&lt;/p&gt;
&lt;h3 id=&#34;3-session-management-is-fairly-complete&#34;&gt;3. Session management is fairly complete
&lt;/h3&gt;&lt;p&gt;Playwright CLI supports sessions. By default, the browser profile stays in memory, so cookies and storage state are preserved across multiple CLI calls within the same session. If you add &lt;code&gt;--persistent&lt;/code&gt;, the profile can be saved to disk and reused across browser restarts.&lt;/p&gt;
&lt;p&gt;That makes it much more practical than tools that simply open a browser for one command and then throw everything away. It is also a better fit for long debugging cycles and longer-running agent flows.&lt;/p&gt;
&lt;h3 id=&#34;4-it-includes-a-visual-monitoring-dashboard&#34;&gt;4. It includes a visual monitoring dashboard
&lt;/h3&gt;&lt;p&gt;The README provides &lt;code&gt;playwright-cli show&lt;/code&gt;, which opens a dashboard for observing and controlling all running browser sessions. This is especially useful when an agent is running automation in the background, because you can step in, inspect progress, and help with debugging instead of flying blind.&lt;/p&gt;
&lt;h2 id=&#34;03-installation-and-requirements&#34;&gt;03 Installation and requirements
&lt;/h2&gt;&lt;p&gt;According to the current GitHub README, the basic requirements for Playwright CLI are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Node.js 18 or newer&lt;/li&gt;
&lt;li&gt;Claude Code, GitHub Copilot, or another coding agent&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The installation commands are:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npm install -g @playwright/cli@latest
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli --help
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;There is one easy mistake worth calling out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The officially recommended package right now is &lt;code&gt;@playwright/cli&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Do not confuse it with the old deprecated npm package &lt;code&gt;playwright-cli&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So the package you actually want is the scoped package, not the older historical one.&lt;/p&gt;
&lt;h2 id=&#34;04-how-to-start-using-it&#34;&gt;04 How to start using it
&lt;/h2&gt;&lt;h3 id=&#34;1-install-skills&#34;&gt;1. Install skills
&lt;/h3&gt;&lt;p&gt;If you want a coding agent to use Playwright CLI directly, the official recommendation is to install the skills first:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli install --skills
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The README explicitly says that Claude Code, GitHub Copilot, and similar tools will use the locally installed skills.&lt;/p&gt;
&lt;h3 id=&#34;2-let-the-agent-call-the-cli-directly&#34;&gt;2. Let the agent call the CLI directly
&lt;/h3&gt;&lt;p&gt;If you do not want to handle Skills first, you can also let the agent read the CLI help output directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Test the &amp;#34;add todo&amp;#34; flow on https://demo.playwright.dev/todomvc using playwright-cli.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Check playwright-cli --help for available commands.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The README calls this “Skills-less operation.” The idea is that even without preinstalled skills, the CLI can still describe itself well enough for an agent to use it.&lt;/p&gt;
&lt;h3 id=&#34;3-try-a-minimal-flow-manually&#34;&gt;3. Try a minimal flow manually
&lt;/h3&gt;&lt;p&gt;The README includes a TodoMVC example that works very well as a first hands-on demo:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli open https://demo.playwright.dev/todomvc/ --headed
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli &lt;span class=&#34;nb&#34;&gt;type&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Buy groceries&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli press Enter
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli &lt;span class=&#34;nb&#34;&gt;type&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Water flowers&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli press Enter
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli check e21
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli check e35
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli screenshot
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This sequence is useful because it quickly shows how Playwright CLI works in practice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;open&lt;/code&gt; opens the page&lt;/li&gt;
&lt;li&gt;&lt;code&gt;type&lt;/code&gt; and &lt;code&gt;press&lt;/code&gt; handle text input&lt;/li&gt;
&lt;li&gt;&lt;code&gt;check&lt;/code&gt; uses an element reference to toggle checkboxes&lt;/li&gt;
&lt;li&gt;&lt;code&gt;screenshot&lt;/code&gt; saves the result&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;05---headed-sessions-and-the-monitoring-dashboard&#34;&gt;05 &lt;code&gt;--headed&lt;/code&gt;, sessions, and the monitoring dashboard
&lt;/h2&gt;&lt;h3 id=&#34;--headed&#34;&gt;&lt;code&gt;--headed&lt;/code&gt;
&lt;/h3&gt;&lt;p&gt;Playwright CLI is headless by default. If you want to see the browser window directly, you need to pass &lt;code&gt;--headed&lt;/code&gt; when using &lt;code&gt;open&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli open https://playwright.dev --headed
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This is especially helpful when debugging selectors, login flows, or any interaction that is easier to inspect visually.&lt;/p&gt;
&lt;h3 id=&#34;sessions&#34;&gt;sessions
&lt;/h3&gt;&lt;p&gt;The official README places a lot of emphasis on sessions. You can use different sessions to isolate different projects or sites:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli open https://playwright.dev
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli -s&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;example open https://example.com --persistent
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli list
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you are letting an agent run over a longer period, you can also pass the session through an environment variable:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;PLAYWRIGHT_CLI_SESSION&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;todo-app claude .
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Useful session management commands include:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli list
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli close-all
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli kill-all
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;In practice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;list&lt;/code&gt; shows all sessions&lt;/li&gt;
&lt;li&gt;&lt;code&gt;close-all&lt;/code&gt; closes all browsers gracefully&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kill-all&lt;/code&gt; forcefully terminates all browser processes&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;monitoring-dashboard&#34;&gt;Monitoring dashboard
&lt;/h3&gt;&lt;p&gt;If you want to see what the agent is actually doing in the browser, you can run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli show
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;According to the README, this dashboard has two main views:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Session grid: shows active sessions by workspace, with live preview, URL, and page title&lt;/li&gt;
&lt;li&gt;Session detail: shows a live view of a selected session and lets you take over mouse and keyboard input&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That means Playwright CLI is not only usable from the command line. It also has a fairly mature observability layer.&lt;/p&gt;
&lt;h2 id=&#34;06-which-commands-are-worth-memorizing-first&#34;&gt;06 Which commands are worth memorizing first
&lt;/h2&gt;&lt;p&gt;If this is your first time using Playwright CLI, you do not need to memorize every command up front. These are the core ones worth learning first:&lt;/p&gt;
&lt;h3 id=&#34;pages-and-interaction&#34;&gt;Pages and interaction
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli open &lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;url&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli goto &amp;lt;url&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli click &amp;lt;ref&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli fill &amp;lt;ref&amp;gt; &amp;lt;text&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli &lt;span class=&#34;nb&#34;&gt;type&lt;/span&gt; &amp;lt;text&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli hover &amp;lt;ref&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli press &amp;lt;key&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;getting-page-structure&#34;&gt;Getting page structure
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli snapshot
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli snapshot &amp;lt;ref&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli snapshot --depth&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;N
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli &lt;span class=&#34;nb&#34;&gt;eval&lt;/span&gt; &amp;lt;func&amp;gt; &lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;ref&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;snapshot&lt;/code&gt; is especially important because many later operations depend on element references stored as &lt;code&gt;ref&lt;/code&gt;. In practice, you usually capture a snapshot first, then use the returned element identifiers for clicking, filling, checking, or taking screenshots.&lt;/p&gt;
&lt;h3 id=&#34;saving-output&#34;&gt;Saving output
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli screenshot
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli pdf
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;tabs&#34;&gt;Tabs
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli tab-list
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli tab-new &lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;url&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli tab-close &lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;index&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;playwright-cli tab-select &amp;lt;index&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;07-who-should-try-it&#34;&gt;07 Who should try it
&lt;/h2&gt;&lt;p&gt;Playwright CLI is especially worth trying in these kinds of scenarios:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You are using Claude Code, Copilot, or another coding agent for E2E testing&lt;/li&gt;
&lt;li&gt;You want a lighter browser automation interface without pushing large page structures into model context&lt;/li&gt;
&lt;li&gt;You want one browser session to persist across multiple commands&lt;/li&gt;
&lt;li&gt;You want to monitor agent-driven web tasks through a dashboard while they run&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your main question is how to make browser automation work efficiently with coding agents, Playwright CLI will likely feel more natural than traditional manual debugging workflows.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/microsoft/playwright-cli&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/microsoft/playwright-cli&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;README: &lt;a class=&#34;link&#34; href=&#34;https://github.com/microsoft/playwright-cli/blob/main/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/microsoft/playwright-cli/blob/main/README.md&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>What Is Hermes Agent: Overview, Strengths, Getting Started, and How It Compares to OpenClaw</title>
        <link>https://knightli.com/en/2026/04/12/hermes-agent-intro-guide-vs-openclaw/</link>
        <pubDate>Sun, 12 Apr 2026 14:07:58 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/12/hermes-agent-intro-guide-vs-openclaw/</guid>
        <description>&lt;p&gt;If you have been following open-source AI agents lately, &lt;code&gt;Hermes Agent&lt;/code&gt; is a project worth paying attention to. Built by Nous Research, its main appeal is not simply that it is “another chat wrapper,” but that it tries to bring long-term memory, reusable skills, context files, MCP extensions, a messaging gateway, and parallel sub-agents into one unified agent runtime.&lt;/p&gt;
&lt;p&gt;Based on the official README, Hermes Agent has a very clear goal: it can work like a local CLI assistant in your terminal, or like a cloud-hosted personal assistant that stays available through Telegram, Discord, Slack, WhatsApp, Signal, and other channels. For users who want to combine a coding assistant, an automation assistant, and a personal AI workspace into one system, that positioning is compelling.&lt;/p&gt;
&lt;h2 id=&#34;01-an-overview-of-hermes-agent&#34;&gt;01 An overview of Hermes Agent
&lt;/h2&gt;&lt;p&gt;Hermes Agent is an open-source self-improving AI agent from Nous Research. It supports multiple model providers, including Nous Portal, OpenRouter, OpenAI, and custom OpenAI-compatible endpoints. It can also run across different execution backends such as a local terminal, Docker, SSH, Daytona, and Modal.&lt;/p&gt;
&lt;p&gt;What separates Hermes from many “tool-using chatbots” is that it does not focus only on tool calls within a single session. It puts much more emphasis on building persistent capability across sessions. The official docs break this idea down into several parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Persistent memory: stores key information about the environment, project, and user preferences through &lt;code&gt;MEMORY.md&lt;/code&gt; and &lt;code&gt;USER.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Skills system: turns successful workflows into reusable skills that can be loaded on demand.&lt;/li&gt;
&lt;li&gt;Context files: automatically reads files such as &lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;SOUL.md&lt;/code&gt;, and &lt;code&gt;.cursorrules&lt;/code&gt; to inject project conventions directly into the session.&lt;/li&gt;
&lt;li&gt;MCP integration: can connect to any MCP-compatible tool server to extend database, GitHub, filesystem, and scraping capabilities.&lt;/li&gt;
&lt;li&gt;Messaging gateway: beyond the CLI, it can also be used through Telegram, Discord, Slack, WhatsApp, Signal, Email, and other entry points.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In one sentence, Hermes Agent feels more like a general-purpose agent operating layer with memory, skills, extensibility, and multi-channel access.&lt;/p&gt;
&lt;h2 id=&#34;02-where-it-stands-out&#34;&gt;02 Where it stands out
&lt;/h2&gt;&lt;h3 id=&#34;1-it-covers-both-cli-workflows-and-messaging-workflows&#34;&gt;1. It covers both CLI workflows and messaging workflows
&lt;/h3&gt;&lt;p&gt;Many agent projects lean either toward terminal-based developer assistance or toward chat-platform bots. Hermes tries to combine both. You can run &lt;code&gt;hermes&lt;/code&gt; directly in the terminal, or continue with the same assistant through Telegram or Discord after starting the gateway.&lt;/p&gt;
&lt;p&gt;The practical benefit is that Hermes is not limited to being useful only when you are sitting in front of your computer. If you deploy it to the cloud or a VPS, it can become a continuously available personal AI assistant.&lt;/p&gt;
&lt;h3 id=&#34;2-it-is-designed-for-long-term-use&#34;&gt;2. It is designed for long-term use
&lt;/h3&gt;&lt;p&gt;Hermes does more than chat and call tools. It is also built around long-term accumulation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Persistent memory with boundaries, instead of endlessly stuffing more context into each conversation.&lt;/li&gt;
&lt;li&gt;A skills system that lets you save and reuse successful workflows.&lt;/li&gt;
&lt;li&gt;Search across past sessions for retrieval and recall.&lt;/li&gt;
&lt;li&gt;Project context files that reduce the need to repeatedly explain the same background.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This matters a lot for people who work repeatedly inside the same repositories, workflows, and team conventions. It means the agent is not just helping once; it can gradually become more familiar with your environment.&lt;/p&gt;
&lt;h3 id=&#34;3-mcp-support-gives-it-strong-extensibility&#34;&gt;3. MCP support gives it strong extensibility
&lt;/h3&gt;&lt;p&gt;The Hermes documentation explicitly supports MCP and describes both stdio and HTTP integration modes. In practice, that means if an external system already has an MCP server, Hermes can usually connect to it with much lower integration cost.&lt;/p&gt;
&lt;p&gt;That is more flexible than writing a custom plugin for every single system. For users who already have tools built around the MCP ecosystem, Hermes should be much easier to extend.&lt;/p&gt;
&lt;h3 id=&#34;4-it-is-friendly-to-openclaw-users&#34;&gt;4. It is friendly to OpenClaw users
&lt;/h3&gt;&lt;p&gt;This part is especially interesting. The Hermes README directly provides &lt;code&gt;hermes claw migrate&lt;/code&gt;, and explicitly says it can import configuration, memory, skills, API keys, and messaging platform settings from OpenClaw.&lt;/p&gt;
&lt;p&gt;That suggests Hermes is not trying to ignore the existing ecosystem and start from zero. It is clearly positioning some OpenClaw users as a migration audience.&lt;/p&gt;
&lt;h2 id=&#34;03-how-to-get-started-quickly&#34;&gt;03 How to get started quickly
&lt;/h2&gt;&lt;p&gt;The officially recommended Hermes Agent installation method is very straightforward:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; bash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;According to the official README, it supports Linux, macOS, WSL2, and Android Termux. One important note is that native Windows is explicitly not supported right now, so Windows users are advised to use WSL2.&lt;/p&gt;
&lt;p&gt;After installation, you would usually refresh your shell first:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;source&lt;/span&gt; ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then you can launch it directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;hermes
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want to go through a more complete step-by-step initialization flow, the easiest command is:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;hermes setup
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Based on the official documentation and README, a simple first-time setup path looks like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Run &lt;code&gt;hermes setup&lt;/code&gt; to finish the base configuration.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;hermes model&lt;/code&gt; to choose a model provider and model.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;hermes tools&lt;/code&gt; to enable the toolsets you want.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;hermes&lt;/code&gt; to enter the interactive CLI.&lt;/li&gt;
&lt;li&gt;If you want channels such as Telegram or Discord, continue with &lt;code&gt;hermes gateway&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you are already an OpenClaw user, it is also worth previewing the migration command:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;hermes claw migrate --dry-run
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;That lets you inspect what can be migrated before doing a real import.&lt;/p&gt;
&lt;h2 id=&#34;04-how-to-think-about-it-versus-openclaw&#34;&gt;04 How to think about it versus OpenClaw
&lt;/h2&gt;&lt;p&gt;From the official docs and README, Hermes Agent and OpenClaw are not simply a case of one replacing the other. Their positioning overlaps, but their priorities are clearly different.&lt;/p&gt;
&lt;h3 id=&#34;what-hermes-agent-feels-like&#34;&gt;What Hermes Agent feels like
&lt;/h3&gt;&lt;p&gt;Hermes feels more like a product centered on an agent core and workflow system. It emphasizes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CLI experience&lt;/li&gt;
&lt;li&gt;Memory and skill accumulation&lt;/li&gt;
&lt;li&gt;Project context files&lt;/li&gt;
&lt;li&gt;MCP extensibility&lt;/li&gt;
&lt;li&gt;Parallel sub-agents&lt;/li&gt;
&lt;li&gt;Switching execution backends across local, container, remote, and serverless environments&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your main goal is to make the agent understand your project better, reuse capabilities over time, and connect more naturally into MCP and developer workflows, Hermes is likely the better fit.&lt;/p&gt;
&lt;h3 id=&#34;what-openclaw-feels-like&#34;&gt;What OpenClaw feels like
&lt;/h3&gt;&lt;p&gt;OpenClaw feels more like a platform centered on a personal AI assistant plus a messaging gateway. It emphasizes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rich messaging channel integration&lt;/li&gt;
&lt;li&gt;A continuously running Gateway&lt;/li&gt;
&lt;li&gt;A browser-based Control UI&lt;/li&gt;
&lt;li&gt;Device pairing, remote access, and status management&lt;/li&gt;
&lt;li&gt;Stronger assistant-oriented surfaces such as voice, mobile access, and Canvas&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your main goal is to keep a personal AI assistant reliably available across multiple chat channels and devices, with a control panel to manage it, OpenClaw has a stronger product feel in that direction.&lt;/p&gt;
&lt;h3 id=&#34;a-more-practical-rule-of-thumb&#34;&gt;A more practical rule of thumb
&lt;/h3&gt;&lt;p&gt;You can roughly think of the two like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hermes Agent: more of a “growing general-purpose agent workspace”&lt;/li&gt;
&lt;li&gt;OpenClaw: more of a “multi-channel always-on personal AI assistant platform”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That distinction is not absolute, because both projects are still expanding and Hermes also offers a migration path from OpenClaw. But based on the currently public material, Hermes is more prominent on the memory, skills, context, MCP, and developer-workflow side, while OpenClaw looks more mature on the gateway, multi-channel, Control UI, and device-access side.&lt;/p&gt;
&lt;h2 id=&#34;05-who-should-try-it&#34;&gt;05 Who should try it
&lt;/h2&gt;&lt;p&gt;Hermes Agent is especially worth trying first if you fit one of these profiles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You already rely heavily on AI tools in the terminal and want an agent that better understands your codebase and project rules.&lt;/li&gt;
&lt;li&gt;You want to combine &lt;code&gt;AGENTS.md&lt;/code&gt;, skills, memory, and MCP into one workflow.&lt;/li&gt;
&lt;li&gt;You do not want to be locked into a single model vendor and prefer flexible provider switching.&lt;/li&gt;
&lt;li&gt;You already use OpenClaw and want to explore a direction that is more centered on agent workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you care more about mobile reach, broad IM platform integration, a browser control console, and the feeling of an always-online personal assistant, OpenClaw still has a lot of appeal.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Hermes Agent GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/NousResearch/hermes-agent&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/NousResearch/hermes-agent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hermes Agent Docs: &lt;a class=&#34;link&#34; href=&#34;https://hermes-agent.nousresearch.com/docs/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://hermes-agent.nousresearch.com/docs/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hermes Features Overview: &lt;a class=&#34;link&#34; href=&#34;https://hermes-agent.nousresearch.com/docs/user-guide/features/overview&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://hermes-agent.nousresearch.com/docs/user-guide/features/overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hermes MCP: &lt;a class=&#34;link&#34; href=&#34;https://hermes-agent.nousresearch.com/docs/user-guide/features/mcp/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://hermes-agent.nousresearch.com/docs/user-guide/features/mcp/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenClaw GitHub: &lt;a class=&#34;link&#34; href=&#34;https://github.com/openclaw/openclaw&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/openclaw/openclaw&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenClaw Getting Started: &lt;a class=&#34;link&#34; href=&#34;https://docs.openclaw.ai/start/quickstart&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://docs.openclaw.ai/start/quickstart&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;OpenClaw Control UI: &lt;a class=&#34;link&#34; href=&#34;https://docs.openclaw.ai/web/control-ui&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://docs.openclaw.ai/web/control-ui&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>OpenClaw Dreaming: Machines Start Dreaming While Humans Lose Sleep</title>
        <link>https://knightli.com/en/2026/04/12/openclaw-dreaming-machine-dreams-humans-lose-sleep/</link>
        <pubDate>Sun, 12 Apr 2026 12:41:34 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/12/openclaw-dreaming-machine-dreams-humans-lose-sleep/</guid>
        <description>&lt;p&gt;Long-term memory has always been a weak point for large models. As context grows, memory becomes harder to manage. An agent may appear to remember everything, yet become worse at judging what matters and what should be forgotten.&lt;/p&gt;
&lt;p&gt;On April 5, OpenClaw introduced an experimental feature called Dreaming. It is not just a catchy label. It is a background memory-management system modeled on human sleep, designed to help agents wake up with cleaner and more useful memory.&lt;/p&gt;
&lt;h2 id=&#34;01-a-sleep-based-pipeline-for-memory-consolidation&#34;&gt;01 A sleep-based pipeline for memory consolidation
&lt;/h2&gt;&lt;p&gt;Dreaming does more than index data. It breaks memory processing into three stages that mirror different functions of human sleep.&lt;/p&gt;
&lt;p&gt;Light Sleep: the system scans recent conversations and retrieval traces, removes duplication, and builds a candidate list. At this stage, it only buffers information and does not modify the core memory file &lt;code&gt;MEMORY.md&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Deep Sleep: the system applies stricter filters to identify durable information. Only entries that pass thresholds for score, recall count, and distinct query count move forward. Before writing anything, it checks the latest logs again to remove stale content. The final result is appended to &lt;code&gt;MEMORY.md&lt;/code&gt;, while a deep-sleep summary is written to &lt;code&gt;DREAMS.md&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;REM: after memory consolidation, the system looks for hidden links across recent behavior traces. It extracts patterns and reflective summaries, then stores them in a dedicated REM section to help the agent respond with better structure and broader context.&lt;/p&gt;
&lt;p&gt;Dreaming also produces a human-readable dream journal. Once enough material accumulates, a background sub-agent calls the default model and appends a short natural-language entry to &lt;code&gt;DREAMS.md&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;02-a-scoring-system-for-deciding-what-deserves-to-stay&#34;&gt;02 A scoring system for deciding what deserves to stay
&lt;/h2&gt;&lt;p&gt;The real point of Dreaming is not just organizing memory, but filtering it. Instead of keeping everything, OpenClaw uses a weighted scoring model to decide what belongs in long-term storage.&lt;/p&gt;
&lt;p&gt;The six dimensions are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Relevance (30%): how useful the information is when retrieved.&lt;/li&gt;
&lt;li&gt;Frequency (24%): how often the item appears in short-term signals.&lt;/li&gt;
&lt;li&gt;Query diversity (15%): whether it shows up across different prompts and contexts.&lt;/li&gt;
&lt;li&gt;Recency (15%): whether the information is still fresh and actionable.&lt;/li&gt;
&lt;li&gt;Integration (10%): whether it remains stable across multiple days.&lt;/li&gt;
&lt;li&gt;Concept richness (6%): how dense and connected its concept graph is.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In practice, this means the system tries to keep information that is repeated, useful, current, and broadly applicable, while letting lower-value noise fade away.&lt;/p&gt;
&lt;h2 id=&#34;03-why-it-reminds-people-of-claudes-dreaming-approach&#34;&gt;03 Why it reminds people of Claude&amp;rsquo;s &amp;ldquo;dreaming&amp;rdquo; approach
&lt;/h2&gt;&lt;p&gt;Some developers have noted that Dreaming resembles the automated dreaming logic described in leaked Claude Code material around the KAIROS system. Older approaches that repeatedly rewrote the entire &lt;code&gt;MEMORY.md&lt;/code&gt; could become messy over time. By splitting the flow into light sleep, deep sleep, and REM, Dreaming makes the pipeline more explicit: consolidate first, preserve next, and derive higher-level patterns last.&lt;/p&gt;
&lt;p&gt;Others have highlighted the neuroscience angle. Terms like Dreaming, Light Sleep, Deep Sleep, and REM are not random branding. They directly borrow from human models of sleep-based memory consolidation.&lt;/p&gt;
&lt;p&gt;OpenClaw already uses files like &lt;code&gt;IDENTITY.md&lt;/code&gt;, &lt;code&gt;USER.md&lt;/code&gt;, and &lt;code&gt;HEARTBEAT.md&lt;/code&gt; to preserve identity, user context, and continuity. &lt;code&gt;DREAMS.md&lt;/code&gt; fills in the missing piece: deciding which memories are actually worth keeping.&lt;/p&gt;
&lt;h2 id=&#34;04-the-most-ironic-part-machines-dream-humans-stay-awake&#34;&gt;04 The most ironic part: machines dream, humans stay awake
&lt;/h2&gt;&lt;p&gt;The value of Dreaming is not that AI remembers everything. It is that AI learns to review short-term traces, extract patterns, and discard noise. A strong agent should not behave like a dumb storage device. It should become better over time at understanding a user&amp;rsquo;s preferences, recurring goals, and long-term context.&lt;/p&gt;
&lt;p&gt;From an engineering perspective, the most interesting part is that the system is not presented as a mystical black box. It is a structured backend process with stages, thresholds, reflection, and forgetting rules. That makes AI memory feel less like uncontrolled context bloat and more like a designed system.&lt;/p&gt;
&lt;p&gt;That is also what makes the whole thing feel ironic. We are spending enormous effort teaching machines how to dream, while many people are losing sleep over being replaced by those same increasingly capable systems.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Drop MCP? Why CLI Is Becoming the Default Tool Layer for Agents</title>
        <link>https://knightli.com/en/2026/04/10/mcp-vs-cli-for-agents/</link>
        <pubDate>Fri, 10 Apr 2026 21:55:12 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/10/mcp-vs-cli-for-agents/</guid>
        <description>&lt;p&gt;Over the last year, debates about agent toolchains have increasingly centered on one question:&lt;/p&gt;
&lt;p&gt;Does MCP (Model Context Protocol) make tool calling simpler, or does it make simple tasks more complex?&lt;/p&gt;
&lt;p&gt;For most day-to-day engineering tasks, CLI is becoming the more practical default.&lt;/p&gt;
&lt;h2 id=&#34;cost-gap-is-not-a-ux-issue-but-an-order-of-magnitude-issue&#34;&gt;Cost gap is not a UX issue, but an order-of-magnitude issue
&lt;/h2&gt;&lt;p&gt;The biggest practical pressure in MCP is token overhead.&lt;/p&gt;
&lt;p&gt;In common scenarios, MCP often has to load large tool schemas before actual execution. Using a GitHub MCP Server as an example, initialization alone can consume tens of thousands of tokens. For long tasks, this directly squeezes context budget.&lt;/p&gt;
&lt;p&gt;Community benchmarks keep pointing to the same conclusion:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Single MCP calls commonly cost several to dozens of times more than CLI&lt;/li&gt;
&lt;li&gt;Retry recovery is also more expensive (reconnect plus context reload)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not just &amp;ldquo;a little slower.&amp;rdquo; It scales into API cost, latency, and reliability issues.&lt;/p&gt;
&lt;h2 id=&#34;why-models-are-naturally-better-at-cli&#34;&gt;Why models are naturally better at CLI
&lt;/h2&gt;&lt;p&gt;A frequently overlooked fact is training distribution.&lt;/p&gt;
&lt;p&gt;LLMs have seen massive amounts of terminal text during training: commands, outputs, errors, scripts, and man pages. In other words, CLI interaction is already close to the model&amp;rsquo;s native input pattern.&lt;/p&gt;
&lt;p&gt;By contrast, MCP&amp;rsquo;s JSON-RPC and tool schema style became widespread only in recent years. Models can learn it, but familiarity and compression efficiency are often still weaker than long-established CLI patterns.&lt;/p&gt;
&lt;p&gt;That also explains why, in many cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;for the same goal, CLI commands are shorter&lt;/li&gt;
&lt;li&gt;outputs are easier to continue reasoning over&lt;/li&gt;
&lt;li&gt;error recovery paths are more stable&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;security-and-isolation-mcp-still-has-catching-up-to-do&#34;&gt;Security and isolation: MCP still has catching up to do
&lt;/h2&gt;&lt;p&gt;MCP is not incapable of security, but its ecosystem is still early.&lt;/p&gt;
&lt;p&gt;Common concerns today include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tool Poisoning in descriptions&lt;/li&gt;
&lt;li&gt;behavior drift (Rug Pull)&lt;/li&gt;
&lt;li&gt;same-name tool override (Shadowing)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CLI also has security risks (injection, privilege misuse, path risks), but its process model, permission boundaries, and audit chain have been validated through decades of engineering practice. In production, that predictability matters.&lt;/p&gt;
&lt;h2 id=&#34;this-does-not-mean-mcp-has-no-value&#34;&gt;This does not mean MCP has no value
&lt;/h2&gt;&lt;p&gt;I do not think MCP should be abandoned.&lt;/p&gt;
&lt;p&gt;A more reasonable positioning is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CLI handles the execution layer (local, low-latency, high-frequency calls)&lt;/li&gt;
&lt;li&gt;MCP handles the connection layer (remote service discovery, unified auth, audit, and multitenancy)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That is the commonly discussed hybrid architecture: &lt;code&gt;CLI + MCP Gateway&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;When integrating many remote systems and enforcing unified governance and compliance, MCP still has clear value. But for helping agents complete engineering work quickly, CLI-first usually better matches current model capability boundaries.&lt;/p&gt;
&lt;p&gt;In today&amp;rsquo;s engineering reality, CLI is closer to an agent&amp;rsquo;s working native language; MCP is better positioned as a connection protocol rather than the only execution protocol.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>OpenClaw and Agent Harness: Why It Looks Like AGI</title>
        <link>https://knightli.com/en/2026/04/10/openclaw-agent-architecture-enterprise-ai/</link>
        <pubDate>Fri, 10 Apr 2026 09:16:17 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/10/openclaw-agent-architecture-enterprise-ai/</guid>
        <description>&lt;p&gt;When many people first try OpenClaw, it feels more like a teammate who can get work done than a chatbot.&lt;/p&gt;
&lt;p&gt;That feeling is not mysterious. The key is this: OpenClaw is not a jump in one model capability; it is a complete &lt;strong&gt;Agent Harness&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&#34;core-conclusion&#34;&gt;Core Conclusion
&lt;/h2&gt;&lt;p&gt;The essence of OpenClaw can be summarized as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the model handles understanding and decisions&lt;/li&gt;
&lt;li&gt;the harness handles memory, tools, triggers, execution, and outputs&lt;/li&gt;
&lt;li&gt;the two collaborate through a loop to create continuous action&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So the core reason it &amp;ldquo;feels like AGI&amp;rdquo; is not that the model suddenly became all-powerful, but that systems engineering amplifies what the model can execute.&lt;/p&gt;
&lt;h2 id=&#34;what-is-a-harness&#34;&gt;What Is a Harness
&lt;/h2&gt;&lt;p&gt;You can think of a harness as an exoskeleton for the model.&lt;/p&gt;
&lt;p&gt;A standalone LLM usually provides an answer in a single request. A harness adds these capabilities:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;session and state management: link multi-turn tasks&lt;/li&gt;
&lt;li&gt;memory mechanisms: store and retrieve context when needed&lt;/li&gt;
&lt;li&gt;tool system: call browsers, terminals, files, and external APIs&lt;/li&gt;
&lt;li&gt;trigger mechanisms: wake on timers or events instead of waiting for a human prompt every time&lt;/li&gt;
&lt;li&gt;output channels: write results back to systems, not just return a paragraph&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When these capabilities are connected in one loop, the model shifts from a responder to an executor.&lt;/p&gt;
&lt;h2 id=&#34;why-openclaw-feels-different&#34;&gt;Why OpenClaw Feels Different
&lt;/h2&gt;&lt;p&gt;A traditional chatbot is &amp;ldquo;ask once, answer once&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;OpenClaw is more like a closed loop of &amp;ldquo;observe -&amp;gt; use tools -&amp;gt; inspect results -&amp;gt; decide next&amp;rdquo;. Once this loop is established, the system can keep moving a task forward.&lt;/p&gt;
&lt;p&gt;This is also the most valuable lesson from OpenClaw:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it proves the agent experience mainly comes from architecture design&lt;/li&gt;
&lt;li&gt;it decomposes &amp;ldquo;autonomy&amp;rdquo; into modules that can be engineered&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;value-and-boundaries&#34;&gt;Value and Boundaries
&lt;/h2&gt;&lt;p&gt;OpenClaw is general and flexible, but the trade-offs are also clear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the more context and tool definitions you include, the higher the cost&lt;/li&gt;
&lt;li&gt;the more general the system is, the more complex debugging and governance become&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In production scenarios, many teams choose smaller, more specialized agents instead of one universal agent.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Anthropic and OpenClaw Timeline: The Full Sequence of Events</title>
        <link>https://knightli.com/en/2026/04/08/anthropic-openclaw-timeline-2026-04/</link>
        <pubDate>Wed, 08 Apr 2026 19:48:42 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/08/anthropic-openclaw-timeline-2026-04/</guid>
        <description>&lt;h2 id=&#34;background&#34;&gt;Background
&lt;/h2&gt;&lt;p&gt;On April 4, 2026, Anthropic announced that Claude subscriptions would no longer cover third-party tools such as OpenClaw.&lt;/p&gt;
&lt;p&gt;The direct user-level impact was that third-party workflows previously relying on the subscription path for Claude access had to move to alternative access methods or switch to other models.&lt;/p&gt;
&lt;h2 id=&#34;timeline-january-to-april-2026&#34;&gt;Timeline (January to April 2026)
&lt;/h2&gt;&lt;h3 id=&#34;january-2026&#34;&gt;January 2026
&lt;/h3&gt;&lt;p&gt;According to public reports, Anthropic asked the project formerly known as Clawdbot to change its name, citing pronunciation similarity to Claude.&lt;/p&gt;
&lt;p&gt;During the same period, community feedback began to appear regarding restrictions on third-party access via subscription credentials.&lt;/p&gt;
&lt;h3 id=&#34;february-2026&#34;&gt;February 2026
&lt;/h3&gt;&lt;p&gt;The relevant restrictions were written into the terms of service, further clarifying the boundary between subscriptions and third-party automated invocation.&lt;/p&gt;
&lt;p&gt;In the same month, OpenClaw released v4.0 and refactored its underlying architecture into a pluggable model backend. In other words, the model was no longer a single hardcoded entry point and could be switched across multiple providers.&lt;/p&gt;
&lt;h3 id=&#34;march-2026&#34;&gt;March 2026
&lt;/h3&gt;&lt;p&gt;Anthropic released Claude Dispatch and Computer Use, covering capabilities such as remote task execution and desktop operation.&lt;/p&gt;
&lt;p&gt;In subsequent updates, OpenClaw continued building its compatibility layer, unifying differences across model providers in authentication, tool-call formats, and response schemas, thereby reducing migration costs when switching models.&lt;/p&gt;
&lt;p&gt;Public reports also noted that OpenClaw and Anthropic communicated in late March, but the overall strategic direction remained unchanged.&lt;/p&gt;
&lt;h3 id=&#34;april-4-2026&#34;&gt;April 4, 2026
&lt;/h3&gt;&lt;p&gt;Anthropic formally executed the subscription coverage cutoff for third-party tools.&lt;/p&gt;
&lt;p&gt;This marked the execution phase of policy adjustments that had been underway for several months.&lt;/p&gt;
&lt;h3 id=&#34;april-5-2026&#34;&gt;April 5, 2026
&lt;/h3&gt;&lt;p&gt;OpenClaw released v4.5 with several main actions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reprioritizing model entry points in the onboarding flow&lt;/li&gt;
&lt;li&gt;Integrating alternative model paths such as GPT-5.4&lt;/li&gt;
&lt;li&gt;Continuing adaptation work for task flow and interaction experience&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Based on the release timing, OpenClaw&amp;rsquo;s switchover capability was not built entirely ad hoc, but rested on the multi-model architecture work launched since February.&lt;/p&gt;
&lt;h2 id=&#34;two-parallel-directions-in-the-process&#34;&gt;Two Parallel Directions in the Process
&lt;/h2&gt;&lt;p&gt;Viewed along the timeline, both parties advanced different priorities during the same period:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Anthropic: tightening subscription boundaries and integrating official product capabilities&lt;/li&gt;
&lt;li&gt;OpenClaw: strengthening model replaceability and cross-model compatibility&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These two routes are not inherently contradictory, but they do create competition over entry-point ownership and where user workflows accumulate.&lt;/p&gt;
&lt;h2 id=&#34;current-status-as-of-april-2026&#34;&gt;Current Status (as of April 2026)
&lt;/h2&gt;&lt;p&gt;Based on publicly available information, the following can be confirmed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The subscription coverage cutoff has been executed&lt;/li&gt;
&lt;li&gt;OpenClaw has completed its primary model-path transition and continues iterating&lt;/li&gt;
&lt;li&gt;Whether users perceive major changes depends on how strongly their workflows rely on any single model&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;what-to-watch-next&#34;&gt;What to Watch Next
&lt;/h2&gt;&lt;p&gt;Going forward, the more meaningful signals are not from this single event itself, but from three areas:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Whether boundaries between subscription plans and API usage become more explicit&lt;/li&gt;
&lt;li&gt;The long-term performance of multi-model agents in stability, cost, and user experience&lt;/li&gt;
&lt;li&gt;Whether user workflows settle primarily at the model layer, tool layer, or a hybrid layer between the two&lt;/li&gt;
&lt;/ol&gt;
</description>
        </item>
        
    </channel>
</rss>
