Chrome on KnightLi Blog

How browser-harness domain skills keep AI agents from repeating browser automation mistakes

Sun, 24 May 2026 23:43:35 +0800

The most interesting part of browser-use/browser-harness is not only that it lets AI agents control real Chrome. It also turns web-operation experience into reusable domain skills.

That matters because browser automation is rarely difficult only because of clicking buttons. Each website has its own details:

Which pages require login.
Which data can be fetched directly through an API.
Which buttons do not respond to normal DOM clicks.
Which iframes, shadow DOM components, or popups block the flow.
Which selectors are stable and which are temporary classes.
Which actions involve accounts, payments, or irreversible changes and require human confirmation.

If this experience only stays in one task log, the agent will hit the same problems again next time. domain skills are meant to preserve that experience so the agent does not start from zero every time it opens a site.

What domain skills are

You can think of domain skills as site-operation manuals for agents.

They are not ordinary user documentation, and they are not one-off scripts. They are closer to field-tested site knowledge:

Whether the site is suitable for browser automation.
Which API should be used first if an API exists.
Which URL should be used when the browser is necessary.
Which DOM structures, aria-labels, and button behaviors have been verified.
Which common approaches fail.
Which scenarios should stop and ask for human intervention.

This content can be reviewed by humans and read by agents during tasks. It turns on-the-spot exploration into maintainable experience.

A good browser agent should not turn every problem into opening a webpage, looking at screenshots, and clicking buttons.

One important kind of experience in domain skills tells the agent when not to use the browser.

For sites such as ArXiv, paper search, metadata, and abstracts can be fetched directly through the Atom API or HTML meta tags. HTTP requests are usually faster, more stable, and easier to parse than opening a browser.

GitHub follows a similar pattern. Repository, user, and release data should use the REST API first. File contents should use raw.githubusercontent.com first. Only pages such as GitHub Trending, which do not have an equivalent API, need browser interaction.

This shows that browser-harness is not based on “the browser solves everything.” It puts the browser in the right place: when APIs, HTTP, and static pages cannot solve the problem, let the agent operate a real page.

They store site-level knowledge

Traditional automation scripts are usually written around one task, for example:

`1`	`Open page -> enter keyword -> click button -> download file`

That script may complete the task, but the experience is scattered inside code. When the site changes, the script may fail. When the task changes, much of the experience may not be reusable.

domain skills are closer to a site-level knowledge base. They care about:

Which container selector is stable in Amazon search results.
Which GitHub data should go through the REST API.
How LinkedIn invitation buttons differ in aria-label.
Which Shopify Admin pages are embedded apps.
Why Shopify Polaris inputs cannot always be filled with normal JS value assignment.
How Browser Use Cloud browser instances are created, listed, and cleaned up.

These are not steps for one task. They are decision-making knowledge that many future tasks can reuse.

Example: Amazon product search

For Amazon product search, the important part is not only how to search, but which path is more stable.

A more reliable approach is to use a direct search URL instead of opening the homepage and simulating typing every time. Search results can be extracted from a container such as [data-component-type="s-search-result"]. Field extraction also has details: title, price, rating, review count, and sponsored status each have more stable DOM sources.

This kind of experience is valuable for an agent. Without it, the agent may guess buttons from screenshots and repeatedly try selectors. With it, the agent can go directly to a more stable extraction path.

More importantly, a skill can record traps. For example, some selectors that look usable may misread sponsored results or cross-sell areas. You only learn that from field testing.

Example: LinkedIn invitation management

LinkedIn is closer to a real account workflow, and the risk is higher.

On the invitation manager page, the Accept and Ignore buttons use different aria-label formats. You cannot simply derive one from the other. Some invitation cards even render Accept as an <a> element rather than a <button>, and ordinary CDP clicks may not trigger the accept action.

This shows that real web automation does not end when an element is located. Button labels, event binding, soft navigation, and component implementation all affect whether an action really works.

For an agent, this experience also has a safety meaning. Operations involving social accounts, invitations, messages, and posting should not be fully delegated. A skill can record the path and traps, but accepting invitations in bulk, sending content externally, or changing account details should keep human confirmation.

Example: Shopify Admin

Shopify Admin shows another issue: backend systems are often not one page, but a combination of embedded apps and complex components.

Many Shopify apps run inside iframes. Polaris React inputs, Web Components, and embedded apps all behave differently. Some inputs cannot be filled with element.value = ...; they need CDP keystrokes that are closer to real keyboard input.

The value of this kind of skill is that it lets the agent first identify what kind of UI it is looking at, then choose the right operation method.

Shopify experience also emphasizes “do not use the browser if you do not have to”:

For read-only product and inventory data, use the Storefront API first.
If an Admin API token exists, use the Admin API first.
For theme code editing, use Shopify CLI first.
Use the browser only when there is no API, the change is rare, or you are exploring the admin.

That is a mature tool-selection logic for agents.

Example: Browser Use Cloud

domain skills do not only serve webpage clicking. They can also record API experience around browser runtimes.

Browser Use Cloud experience can record how to create cloud browsers through REST APIs, list running browsers, clean up zombie browsers, and obtain liveUrl and cdpUrl.

This means a skill is not limited to “how to click a button.” Any recurring task with a stable method can become a skill:

API call patterns.
Authentication header format.
Request and response structure.
Verified status codes.
Common failure modes.
Resource cleanup and recycling methods.

For agents, all of these are reusable capabilities.

Why this is more reliable than ad-hoc reasoning

Many people expect a large model to understand the webpage by itself every time. In real tasks, relying only on ad-hoc reasoning is unstable.

The reasons are simple:

Web UI changes often.
The same button may have multiple implementations.
Visible does not mean clickable.
Clickable does not mean the action really worked.
Some tasks should use APIs instead of browsers.
Some operations require human confirmation and should not be decided by the model alone.

Writing these experiences into files brings several benefits:

Humans can review them.
Wrong experience can be corrected.
Site knowledge can accumulate over time.
New agents can inherit old experience.
Temporary task discoveries can become long-term knowledge.

This is more stable than putting everything into a prompt or chat context.

How teams can use it

In a team, domain skills can become a lightweight automation knowledge base.

Useful content to record includes:

Post-login paths in internal systems.
Report export flows.
Common popup handling.
Which buttons require human confirmation.
Which pages have API alternatives.
Which selectors were tested and found reliable.
Which tasks agents are not allowed to run automatically.

This knowledge does not need to be complete at the beginning. A practical path is to start with low-risk, frequent, reversible workflows: read-only tasks, downloads, organization, and checks. Once the flow is stable, turn the experience into a skill.

For team managers, skill files also make automation boundaries visible. You can inspect what the agent knows, what it can do, and where it should stop.

Boundaries to keep

domain skills can improve an agent’s success rate, but they should not fully automate high-risk operations.

Several boundaries matter:

Do not record passwords, Cookie, token, customer data, or sensitive internal URLs.
Keep human confirmation for payments, deletion, bulk submission, account changes, and external publishing.
Record verification date and scope.
Allow skills to expire after site changes and require revalidation.
Do not make bypassing risk controls or platform limits a goal.

In other words, domain skills make agents steadier. They do not give agents unlimited permission.

Conclusion

The domain skills mechanism in browser-harness shows one thing: AI browser automation cannot rely only on the model improvising at runtime.

A usable browser agent needs at least three layers:

Low-level control: screenshots, clicks, input, downloads, CDP, HTTP.
Site-level knowledge: API priority, stable selectors, component traps, login boundaries.
Human safety rules: do not give credentials to the model, confirm high-risk actions, and do not write sensitive information into skills.

domain skills fill the second layer. They let an agent enter a web task with verified experience instead of rediscovering everything every time.

References:

browser-harness domain skills: https://github.com/browser-use/browser-harness/tree/main/agent-workspace/domain-skills
Amazon product-search skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/amazon/product-search.md
ArXiv scraping skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/arxiv/scraping.md
GitHub scraping skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/github/scraping.md
LinkedIn invitation-manager skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/linkedin/invitation-manager.md
Shopify admin skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/shopify-admin/README.md
Browser Use Cloud skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/browser-use-cloud/cloud.md

What is browser-harness? A browser automation tool that lets AI agents control real Chrome

Sun, 24 May 2026 17:19:54 +0800

browser-use/browser-harness is a browser control tool for AI agents. Its goal is not to build another heavy automation framework, but to connect large language models directly to real Chrome through CDP, so they can browse pages, click, take screenshots, download files, upload files, and fill forms.

The README describes the project as a thin, editable CDP harness for letting LLMs connect to a real browser. When a task lacks a helper, the agent can add code during execution and turn reusable experience into domain skills.

This is worth watching because the browser is still the entry point for many real workflows: admin panels, SaaS dashboards, ecommerce sites, recruiting platforms, CRMs, reimbursement systems, cloud consoles, and document platforms. Many of them do not expose stable APIs, or their API permissions are harder to obtain than webpage access. Giving an agent reliable browser control is a way to fill that last mile of automation.

What browser-harness is

Structurally, browser-harness is closer to a browser runtime for agents than a browser extension for manual users.

Its core ideas are:

Connect directly to Chrome or Chromium.
Control pages through a CDP WebSocket.
Let agents combine screenshots, coordinate clicks, DOM inspection, network requests, and raw CDP.
Put task-specific helpers in agent-workspace/agent_helpers.py.
Store site-specific experience in agent-workspace/domain-skills/.
Keep the core thin instead of turning it into a large automation platform.

The README says the core architecture is roughly four core files and about 1,000 lines of code, covering install.md, SKILL.md, src/browser_harness/, agent-workspace/agent_helpers.py, and agent-workspace/domain-skills/.

The point is not to ship built-in support for every website. The point is to give the agent an operation layer close enough to a real browser, so it can fill in missing capabilities for the task at hand.

How it differs from traditional browser automation

Traditional browser automation usually revolves around testing frameworks such as Playwright, Selenium, or Puppeteer. They are good for deterministic scripts: open a page, locate an element, click it, and assert the result.

browser-harness targets a different kind of work. A user gives a goal, and the agent explores the page, judges the state, handles popups, adds helpers, and reuses site knowledge. It emphasizes adaptation during interaction.

The difference can be summarized like this:

Playwright is better when humans write scripts and agents run them.
browser-harness is better when agents look at the page and act step by step.
Traditional automation favors fixed flows.
browser-harness favors open-ended tasks.
Traditional scripts often depend on selectors.
browser-harness encourages screenshots first, visible UI actions next, and DOM or CDP when needed.

This does not mean it replaces Playwright. For stable tests, Playwright is still more mature. browser-harness is valuable because it turns real webpages into an environment an agent can operate, especially when page structure is complex, steps are not fixed, and situational judgment matters.

Why real Chrome matters

Many browser-agent tools use isolated headless browsers. That is simple to deploy and good for batch jobs, but it does not always reuse the user’s real working environment: login state, extensions, history, bookmarks, and daily browser setup.

browser-harness supports local Chrome and the Browser Use cloud browser. For local browsers, it offers two approaches:

Use chrome://inspect/#remote-debugging to allow the current Chrome instance to be connected.
Start an isolated profile with --remote-debugging-port=9222 --user-data-dir=....

If you want an agent to help with tasks inside real accounts, the docs lean toward the first approach because it reuses everyday Chrome login state, extensions, and bookmarks. For unattended automation, or when you do not want popups to interrupt work, an isolated profile or cloud browser is usually safer.

The trade-off is clear: real Chrome is closer to the user’s workflow, but the security boundary is more sensitive. An isolated browser is easier to control, but login and environment setup must be handled again.

Editable helpers and domain skills

The most interesting part of browser-harness is that it designs “what the agent learns” into the project structure.

agent-workspace/agent_helpers.py stores helpers that are created during tasks. For example, if an agent needs to upload a file and the existing tools are not enough, it can add a stable upload helper. The next time it sees a similar page, it does not have to start from scratch.

agent-workspace/domain-skills/ stores site-level experience. The README mentions areas such as LinkedIn outreach, Amazon ordering, and reimbursement systems. The project recommends letting agents generate these skills from real tasks instead of hand-writing them, because they should reflect actual page behavior.

This fits browser automation well. The hard part is often not “how to click a button,” but:

How a website redirects after login.
Which popups block the main flow.
Which selectors are stable and which are temporary class names.
How uploads, downloads, iframes, shadow DOM, and cross-origin components behave.
What hidden waits and asynchronous states exist in a specific backend.

If this knowledge only stays in one run log, it is quickly lost. Turning it into domain skills gives the agent a chance to improve over time.

Suitable scenarios

browser-harness is better suited for:

Operating real web admin panels for users.
Completing repeated flows in systems without APIs.
Personal or enterprise web tasks that depend heavily on login state.
Complex interactions where screenshots are needed to judge page state.
Agents that need to add tools and site knowledge while running.
Multiple sub-agents each using an isolated browser.
Researching browser-agent runtime design.

Concrete examples include organizing web tables, submitting internal forms, downloading invoices, uploading files, handling reimbursement workflows, checking order status, configuring SaaS dashboards, and extracting information from logged-in pages.

If the task is only to fetch static pages, a browser may not be needed. The project’s own SKILL.md also notes that static pages can often be fetched through HTTP in bulk. Browsers should be reserved for tasks that truly need page state, login state, and interaction.

Risks to watch

Letting an AI agent control real Chrome is powerful, but risky.

First, the permission boundary must be clear. Real Chrome may contain email, payment dashboards, cloud consoles, company systems, and personal accounts. Once an agent can operate the browser, it effectively has access to part of those webpage permissions.

Second, do not hand credentials to the model. For login pages, payment verification, and second confirmations, the user should handle the sensitive step. The agent can wait for login to finish, but it should not read or enter passwords, verification codes, or payment details from screenshots.

Third, automation is not the same as delegation. Many web tasks look simple but may involve risk controls, mistaken clicks, data deletion, bulk submissions, or irreversible operations. Start with read-only, low-risk, reversible workflows.

Fourth, domain skills should not leak private data. Site knowledge can be shared, but account names, internal URLs, customer data, coordinate logs, and one-off task details should not be written into skills.

Fifth, choose the browser connection mode carefully. Reusing daily Chrome is convenient when login state matters. For long-running automation, an isolated profile or cloud browser is more controllable.

Why it matters for AI agent tools

browser-harness represents a pragmatic direction for agent tooling: build less platform, and give the model a direct interface to the real environment.

Many agents fail at two ends. On one end, the model can reason but cannot touch the real page. On the other, automation frameworks are powerful but require humans to hard-code the flow. browser-harness tries to connect the two: the browser holds real-world state, while the agent observes, decides, and adds tools.

That is also the meaning of a self-improving harness. It does not mean the agent magically becomes smarter. It means reusable operation experience is placed into the project structure, so the next task can avoid some of the same detours.

For developers, its value is mainly in three areas:

A browser control layer for personal agents.
A reference for studying browser automation and agent workflows.
An experimental framework for turning web workflows into reusable skills.

It is not the answer to every browser automation problem, but it points in a clear direction: when agents truly help people do work, the tool layer should not only call APIs. It should also understand and operate the web interfaces people use every day.

Conclusion

browser-use/browser-harness is interesting not because it wraps many advanced features, but because it brings several key browser-agent questions into focus: real Chrome, CDP, screenshot-driven control, editable helpers, site skill accumulation, and user permission boundaries.

If you are writing stable end-to-end tests, Playwright or Selenium is still a better fit. If you want agents such as Codex or Claude Code to handle real webpage tasks, browser-harness offers an entry point that matches how agents work.

In practice, start with low-risk tasks: let it read pages, take screenshots, and extract information first. Then gradually try clicking and submitting. Once it can reliably understand page state, you can consider giving it longer workflows.

References:

GitHub project: https://github.com/browser-use/browser-harness
README: https://github.com/browser-use/browser-harness/blob/main/README.md
Installation guide: https://github.com/browser-use/browser-harness/blob/main/install.md
Usage guide: https://github.com/browser-use/browser-harness/blob/main/SKILL.md

Chrome Silently Downloads 4GB Gemini Nano: How to Check, Disable, and Delete It

Sat, 09 May 2026 21:37:18 +0800

Google Chrome has been reported to download a roughly 4GB local AI model file in the background without explicit user permission, sparking debate about privacy, storage usage, and environmental impact.

The files are related to Gemini Nano and are mainly used for Chrome’s local AI features. The dispute is not simply that the browser supports local AI, but whether the download process is transparent enough, whether users should be informed in advance, and whether system resources are being used reasonably.

What happened

The model file being discussed is named weights.bin and is located in Chrome’s OptGuideOnDeviceModel directory. It is believed to be a localized version of Gemini Nano, used to perform some AI inference directly on the device.

Chrome decides in the background whether to download it based on hardware capability, especially RAM and VRAM. Users generally do not need to start the download themselves, and they may not see a clear prompt before it happens.

The more frustrating part is that manually deleting the model file usually does not stop it from coming back. As long as the related feature remains enabled, Chrome may download the model again after a restart or a later update.

The platforms mentioned in the discussion include Windows 11, macOS, and Ubuntu desktop systems. Based on Chrome’s desktop install base, the number of potentially affected devices could reach hundreds of millions.

Google’s explanation

Google says these files support local AI features such as “Help me write” and scam detection. Running the model locally can reduce some data uploads and improve privacy protection.

Google also says that if device storage is low, Chrome will automatically remove the related model to free up space. In other words, the model does not necessarily occupy disk space permanently.

At the same time, Google says users have been able to disable the related feature in Chrome settings since February 2024. Once disabled, the model will no longer continue downloading or updating.

How to check and disable it

If you do not want Chrome to keep the Gemini Nano model locally, start by checking a few places.

First, open Chrome settings and look for options related to “on-device AI”, local AI, writing assistance, or optimization suggestions, then disable the features you do not need.

Second, enter this in the address bar:

`1`	`chrome://flags`

Then search for and disable:

`1`	`Enables optimization guide on device`

Finally, check Chrome’s user data directory for the OptGuideOnDeviceModel folder and delete the model files inside it. Keep in mind that deleting the file alone is usually not enough. It is better to disable the related flag or setting first, otherwise Chrome may download it again later.

Possible paths on different systems

OptGuideOnDeviceModel is usually under Chrome’s user data directory. The exact location can vary depending on the operating system and installation method, but these are good places to check first:

Windows: %LOCALAPPDATA%\Google\Chrome\User Data\
macOS: ~/Library/Application Support/Google/Chrome/
Linux: ~/.config/google-chrome/
Chromium: ~/.config/chromium/

After opening the relevant directory, search for OptGuideOnDeviceModel or weights.bin. If you use Chrome Beta, Dev, or Canary, the directory name may include the corresponding release channel.

How to tell whether weights.bin has been downloaded

The simplest method is to search Chrome’s user data directory for:

`1`	`weights.bin`

If it has been downloaded, it will usually appear inside OptGuideOnDeviceModel, and the file size may be close to several GB. You can also check the modified time to see whether Chrome recently created or updated it in the background.

If you cannot find weights.bin, that does not necessarily mean the device will never download it. Chrome may decide whether to fetch the model based on hardware conditions, region, version, feature flags, and experiment configuration.

Which Chrome AI features may be affected

After disabling the related local AI or optimization features, some on-device capabilities that depend on Gemini Nano may be affected, such as “Help me write”, local scam detection, and future browser AI features that do not go through the cloud.

For users who do not use these features, everyday browsing is usually not affected much. For users who frequently use Chrome’s built-in writing assistance, page understanding, or experimental safety detection features, the experience may fall back to cloud processing, become unavailable, or use another browser-provided alternative.

Where the controversy lies

The central question is whether a browser should download several GB of model files for AI features before the user has clearly agreed.

Supporters argue that local AI can reduce cloud processing, improve privacy, and make responses faster. Critics argue that users should at least see a clear prompt before the download, especially when the file is close to 4GB and may affect storage space and network traffic.

Privacy experts also point out that this kind of insufficiently disclosed background download may raise compliance questions under the EU ePrivacy Directive and GDPR. Whether it constitutes a violation depends on Google’s notice mechanism, default settings, data processing path, and user controls.

Summary

Chrome’s adoption of Gemini Nano shows that browsers are moving more AI capabilities onto the local device. But it also creates a new product boundary problem: local models still consume disk space and bandwidth, and they can affect the user’s sense of control over their own device.

For ordinary users, the most direct step is to check Chrome’s local AI and optimization settings. If you do not need these features, disable the related options and then delete the model files in the OptGuideOnDeviceModel directory.

Chrome on KnightLi Blog

How browser-harness domain skills keep AI agents from repeating browser automation mistakes

What domain skills are

They are not about blind clicking

They store site-level knowledge

Example: Amazon product search

Example: LinkedIn invitation management

Example: Shopify Admin

Example: Browser Use Cloud

Why this is more reliable than ad-hoc reasoning

How teams can use it

Boundaries to keep

Conclusion

What is browser-harness? A browser automation tool that lets AI agents control real Chrome

What browser-harness is

How it differs from traditional browser automation

Why real Chrome matters

Editable helpers and domain skills

Suitable scenarios

Risks to watch

Why it matters for AI agent tools

Conclusion

Chrome Silently Downloads 4GB Gemini Nano: How to Check, Disable, and Delete It

What happened

Google’s explanation

How to check and disable it

Possible paths on different systems

How to tell whether weights.bin has been downloaded

Which Chrome AI features may be affected

Where the controversy lies

Summary