<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Harness on KnightLi Blog</title>
        <link>https://knightli.com/en/tags/harness/</link>
        <description>Recent content in Harness on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Mon, 27 Apr 2026 08:19:02 +0800</lastBuildDate><atom:link href="https://knightli.com/en/tags/harness/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>Ralph and Multi-Agent Collaboration: How to Keep AI Working Reliably Over Long Tasks</title>
        <link>https://knightli.com/en/2026/04/27/ralph-multi-agent-long-running-ai-workflows/</link>
        <pubDate>Mon, 27 Apr 2026 08:19:02 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/27/ralph-multi-agent-long-running-ai-workflows/</guid>
        <description>&lt;p&gt;If you have been using coding agents lately, you quickly run into a very practical question: &lt;strong&gt;AI can work, sure, but how do you keep it working for hours without drifting, forgetting requirements, or redoing the same work?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;That is the real question behind many discussions around &lt;code&gt;Ralph&lt;/code&gt; and multi-agent collaboration. The point is not simply to compare which model is stronger. The more useful question is this: &lt;strong&gt;how do you design a workflow that lets AI stay stable during long tasks?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you break the problem down, there are usually two main routes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;Ralph&lt;/code&gt; approach: keep starting fresh sessions and connect context through the filesystem&lt;/li&gt;
&lt;li&gt;The multi-agent approach: let a lead agent coordinate while worker agents split the execution&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Put more simply, the question is not &amp;ldquo;which model is more powerful,&amp;rdquo; but &amp;ldquo;how do you organize AI so it behaves more like a small team that can keep delivering?&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;01-why-long-tasks-go-off-the-rails&#34;&gt;01 Why Long Tasks Go Off the Rails
&lt;/h2&gt;&lt;p&gt;In short tasks, many problems stay hidden. You give an instruction, the model reads a few files, changes a few lines, and the job is done.&lt;/p&gt;
&lt;p&gt;Once the task gets longer, the common failure modes start to pile up:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Conversations grow longer and context starts to bloat&lt;/li&gt;
&lt;li&gt;Earlier requirements get squeezed out by newer information&lt;/li&gt;
&lt;li&gt;One agent has to plan, implement, and test at the same time&lt;/li&gt;
&lt;li&gt;Without a clear acceptance step, &amp;ldquo;it is done&amp;rdquo; often just means &amp;ldquo;it says it is done&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So when AI runs for a long time, the real challenge is often not single-shot model quality. It is &lt;strong&gt;task slicing, state handoff, role separation, and feedback loops&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&#34;02-the-ralph-approach-break-long-tasks-into-short-rounds&#34;&gt;02 The Ralph Approach: Break Long Tasks into Short Rounds
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Ralph&lt;/code&gt; is a good fit when the main problem is dirty, overloaded context.&lt;/p&gt;
&lt;p&gt;Its core pattern is straightforward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keep launching new agent sessions in a loop&lt;/li&gt;
&lt;li&gt;Let each round handle only one small enough task&lt;/li&gt;
&lt;li&gt;Store cross-round state in files instead of forcing everything into one conversation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The benefit is immediate: every round starts with fresh context, so the session stays more focused and is less likely to get dragged down by old history.&lt;/p&gt;
&lt;p&gt;If you have already looked at &lt;code&gt;Ralph&lt;/code&gt;-style projects, the structure will feel familiar:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Current tasks live in structured files&lt;/li&gt;
&lt;li&gt;Intermediate learnings go into progress files&lt;/li&gt;
&lt;li&gt;Code changes stay in git history&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, &lt;code&gt;Ralph&lt;/code&gt; does not try to make one agent remember everything forever. It externalizes memory on purpose so the session itself can stay lighter.&lt;/p&gt;
&lt;p&gt;This kind of setup works especially well when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The work can already be split into small stories&lt;/li&gt;
&lt;li&gt;Each story can fit inside one context window&lt;/li&gt;
&lt;li&gt;The project already has tests, typecheck, or other checks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is a solution to the problem of &lt;strong&gt;how to keep AI moving forward one round at a time&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&#34;03-the-multi-agent-approach-split-the-work-one-agent-cannot-handle-alone&#34;&gt;03 The Multi-Agent Approach: Split the Work One Agent Cannot Handle Alone
&lt;/h2&gt;&lt;p&gt;The other route is multi-agent collaboration.&lt;/p&gt;
&lt;p&gt;In this kind of workflow design, the more promising pattern is usually this: the lead agent should not do all the work directly. Instead, it coordinates while other agents handle development, testing, checking, and acceptance.&lt;/p&gt;
&lt;p&gt;That differs from &lt;code&gt;Ralph&lt;/code&gt; in an important way:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Ralph&lt;/code&gt; feels more like serial iteration&lt;/li&gt;
&lt;li&gt;Multi-agent work feels more like parallel division of labor&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When the task naturally contains different roles, multi-agent collaboration becomes easier to use. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One agent breaks down the task and writes the execution plan&lt;/li&gt;
&lt;li&gt;One agent implements the actual change&lt;/li&gt;
&lt;li&gt;One agent tests and validates the result&lt;/li&gt;
&lt;li&gt;One agent checks whether the result still matches the original goal&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The point is not to open more windows for the sake of it. The real value is role separation. Tasks that used to be piled onto one agent can now be split into clearer stages.&lt;/p&gt;
&lt;p&gt;Once the role boundaries are clear, several problems become lighter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The person writing does not have to be the same one reviewing&lt;/li&gt;
&lt;li&gt;The testing side does not have to reconstruct the full requirement every time&lt;/li&gt;
&lt;li&gt;The lead agent is less likely to drown in implementation detail&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a solution to the problem of &lt;strong&gt;how to make AI cooperate more like a small team&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id=&#34;04-the-real-key-is-not-parallelism-but-task-design&#34;&gt;04 The Real Key Is Not Parallelism, but Task Design
&lt;/h2&gt;&lt;p&gt;Whether you choose &lt;code&gt;Ralph&lt;/code&gt; or multi-agent collaboration, the easiest thing to underestimate is this: &lt;strong&gt;workflow design matters more than opening more agents.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If the task split is wrong, adding more agents only parallelizes the confusion.&lt;/p&gt;
&lt;p&gt;A more stable breakdown usually has a few traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One task maps to one clear objective&lt;/li&gt;
&lt;li&gt;One role owns one category of output&lt;/li&gt;
&lt;li&gt;Every round has a clear done condition&lt;/li&gt;
&lt;li&gt;The output of one round can be consumed directly by the next&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example, instead of giving AI one giant instruction like &amp;ldquo;build the whole feature,&amp;rdquo; a steadier structure is often:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Break out requirements and boundaries first&lt;/li&gt;
&lt;li&gt;Then split implementation&lt;/li&gt;
&lt;li&gt;Then split testing&lt;/li&gt;
&lt;li&gt;Then make acceptance its own step&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The advantage is that when something goes wrong, it becomes easier to tell whether the problem sits in understanding, implementation, testing, or delivery criteria.&lt;/p&gt;
&lt;h2 id=&#34;05-why-acceptance-matters-so-much&#34;&gt;05 Why Acceptance Matters So Much
&lt;/h2&gt;&lt;p&gt;Many AI workflows fail not because nothing happened earlier, but because the last step lacked a genuinely independent confirmation pass.&lt;/p&gt;
&lt;p&gt;In long tasks, there is often a wide gap between &amp;ldquo;a result was produced&amp;rdquo; and &amp;ldquo;the result is actually usable.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;So one especially important direction is to separate development from acceptance. Even without a complex process, it is worth asking at least these questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Did it really complete the original task?&lt;/li&gt;
&lt;li&gt;Did it only patch the surface without fixing the root cause?&lt;/li&gt;
&lt;li&gt;Did testing cover only the happiest path?&lt;/li&gt;
&lt;li&gt;Did the upstream requirement get silently changed along the way?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Without that layer, AI can easily keep declaring success inside a long workflow.&lt;/p&gt;
&lt;h2 id=&#34;06-how-to-choose-between-the-two&#34;&gt;06 How to Choose Between the Two
&lt;/h2&gt;&lt;p&gt;If you want a fast rule of thumb:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If your main pain is context bloat and long-session drift, start with &lt;code&gt;Ralph&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If your main pain is one agent wearing too many hats, start with multi-agent collaboration&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;More specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Ralph&lt;/code&gt; fits work that is clear, granular, and easy to move forward round by round&lt;/li&gt;
&lt;li&gt;Multi-agent collaboration fits work with strong role boundaries and a need for parallelism and cross-checking&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In practice, these two approaches are not always competitors. A mature setup often combines them:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a &lt;code&gt;Ralph&lt;/code&gt;-style outer loop to push the larger task forward&lt;/li&gt;
&lt;li&gt;Use multi-agent collaboration inside each round for research, implementation, testing, and acceptance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That gives you both better control over long context and better collaboration inside a single round.&lt;/p&gt;
&lt;h2 id=&#34;07-one-sentence-summary&#34;&gt;07 One-Sentence Summary
&lt;/h2&gt;&lt;p&gt;What makes these approaches worth studying is not that they recommend &lt;code&gt;Ralph&lt;/code&gt; or multi-agent collaboration in isolation. It is that they make one practical truth very clear: &lt;strong&gt;keeping AI stable over long tasks depends less on the model itself and more on whether you designed context, tasks, roles, and acceptance well.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you are already asking &lt;code&gt;Claude Code&lt;/code&gt;, &lt;code&gt;Codex&lt;/code&gt;, or other coding agents to handle longer real-world tasks, this kind of workflow thinking is often more valuable than simply switching to a stronger model.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Anthropic&#39;s Harness Direction: Agent Infrastructure Is Becoming an Agent OS</title>
        <link>https://knightli.com/en/2026/04/10/anthropic-harness-agent-os/</link>
        <pubDate>Fri, 10 Apr 2026 09:22:56 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/10/anthropic-harness-agent-os/</guid>
        <description>&lt;p&gt;Anthropic recently published an engineering write-up on Harness. On the surface, it explains product implementation. At a deeper level, it answers a longer-term question:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;As model capabilities keep evolving, which layers in an Agent system should stay stable, and which should remain fast to replace?&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id=&#34;core-judgment&#34;&gt;Core Judgment
&lt;/h2&gt;&lt;p&gt;My key takeaway is: Agent infrastructure is becoming more like a lightweight &lt;strong&gt;Agent OS&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The focus is not to hard-code today&amp;rsquo;s best workflow, but to define long-lived system abstractions.&lt;/p&gt;
&lt;h2 id=&#34;why-this-matters&#34;&gt;Why This Matters
&lt;/h2&gt;&lt;p&gt;Common problems in many Agent frameworks include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;turning temporary model limitations into permanent architecture&lt;/li&gt;
&lt;li&gt;treating prompt engineering as a system boundary&lt;/li&gt;
&lt;li&gt;turning one useful patch into a long-term dependency&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Models will keep improving. A patch that is reasonable today may become technical debt tomorrow.&lt;/p&gt;
&lt;h2 id=&#34;anthropics-approach-from-concrete-harness-to-meta-harness&#34;&gt;Anthropic&amp;rsquo;s Approach: From Concrete Harness to Meta-Harness
&lt;/h2&gt;&lt;p&gt;Instead of committing to one fixed orchestration style, this approach abstracts three stable interfaces:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;session&lt;/code&gt;: recoverable event and state history&lt;/li&gt;
&lt;li&gt;&lt;code&gt;harness&lt;/code&gt;: reasoning and orchestration loop (brain)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sandbox&lt;/code&gt;: execution environment and tool capabilities (hands)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After separation, the system becomes easier to replace, recover, and scale.&lt;/p&gt;
&lt;h2 id=&#34;1-session-is-not-the-context-window&#34;&gt;1) Session Is Not the Context Window
&lt;/h2&gt;&lt;p&gt;A critical point is: &lt;strong&gt;Session is not model context.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Session should be a queryable, replayable, and recoverable event log, not a direct history dump into the model.&lt;/p&gt;
&lt;p&gt;Benefits of this design:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;trimming does not mean history disappears&lt;/li&gt;
&lt;li&gt;compaction does not mean facts are lost&lt;/li&gt;
&lt;li&gt;crash recovery can return to the event layer instead of relying on summary memory&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2-harness-as-a-replaceable-orchestration-layer&#34;&gt;2) Harness as a Replaceable Orchestration Layer
&lt;/h2&gt;&lt;p&gt;Harness should focus on orchestration rather than holding business state.&lt;/p&gt;
&lt;p&gt;An ideal interface is closer to:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;execute(name, input) -&amp;gt; string&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;This means the model only needs to know what capabilities it can call, without being tightly bound to specific devices, containers, or operating systems.&lt;/p&gt;
&lt;h2 id=&#34;3-sandbox-is-the-hands-not-the-brain&#34;&gt;3) Sandbox Is the &amp;ldquo;Hands,&amp;rdquo; Not the &amp;ldquo;Brain&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;When brain and hands are decoupled:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool environments can evolve independently&lt;/li&gt;
&lt;li&gt;different infrastructure can be integrated in parallel&lt;/li&gt;
&lt;li&gt;not every session needs a fully prewarmed execution environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This directly improves startup and scalability behavior.&lt;/p&gt;
&lt;h2 id=&#34;performance-and-security-insights&#34;&gt;Performance and Security Insights
&lt;/h2&gt;&lt;p&gt;This split often improves both performance and security.&lt;/p&gt;
&lt;p&gt;On performance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;start the brain first, then provision hands on demand&lt;/li&gt;
&lt;li&gt;reduce Time To First Token (TTFT)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On security:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;do not expose high-value credentials directly to the model&lt;/li&gt;
&lt;li&gt;use controlled proxy/vault paths for indirect credential access&lt;/li&gt;
&lt;li&gt;build security boundaries on system constraints, not on assumptions that &amp;ldquo;the model probably can&amp;rsquo;t do this&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;related-links&#34;&gt;Related Links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://claude.com/blog/claude-managed-agents&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Usage patterns and customer examples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.anthropic.com/engineering/managed-agents&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;The design of Claude Managed Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://platform.claude.com/docs/en/managed-agents/quickstart&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Onboarding, quickstart, overview of the CLI and SKDs &lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        
    </channel>
</rss>
