How browser-harness domain skills keep AI agents from repeating browser automation mistakes

The most interesting part of browser-use/browser-harness is not only that it lets AI agents control real Chrome. It also turns web-operation experience into reusable domain skills.

That matters because browser automation is rarely difficult only because of clicking buttons. Each website has its own details:

Which pages require login.
Which data can be fetched directly through an API.
Which buttons do not respond to normal DOM clicks.
Which iframes, shadow DOM components, or popups block the flow.
Which selectors are stable and which are temporary classes.
Which actions involve accounts, payments, or irreversible changes and require human confirmation.

If this experience only stays in one task log, the agent will hit the same problems again next time. domain skills are meant to preserve that experience so the agent does not start from zero every time it opens a site.

What domain skills are

You can think of domain skills as site-operation manuals for agents.

They are not ordinary user documentation, and they are not one-off scripts. They are closer to field-tested site knowledge:

Whether the site is suitable for browser automation.
Which API should be used first if an API exists.
Which URL should be used when the browser is necessary.
Which DOM structures, aria-labels, and button behaviors have been verified.
Which common approaches fail.
Which scenarios should stop and ask for human intervention.

This content can be reviewed by humans and read by agents during tasks. It turns on-the-spot exploration into maintainable experience.

A good browser agent should not turn every problem into opening a webpage, looking at screenshots, and clicking buttons.

One important kind of experience in domain skills tells the agent when not to use the browser.

For sites such as ArXiv, paper search, metadata, and abstracts can be fetched directly through the Atom API or HTML meta tags. HTTP requests are usually faster, more stable, and easier to parse than opening a browser.

GitHub follows a similar pattern. Repository, user, and release data should use the REST API first. File contents should use raw.githubusercontent.com first. Only pages such as GitHub Trending, which do not have an equivalent API, need browser interaction.

This shows that browser-harness is not based on “the browser solves everything.” It puts the browser in the right place: when APIs, HTTP, and static pages cannot solve the problem, let the agent operate a real page.

They store site-level knowledge

Traditional automation scripts are usually written around one task, for example:

1

Open page -> enter keyword -> click button -> download file

That script may complete the task, but the experience is scattered inside code. When the site changes, the script may fail. When the task changes, much of the experience may not be reusable.

domain skills are closer to a site-level knowledge base. They care about:

Which container selector is stable in Amazon search results.
Which GitHub data should go through the REST API.
How LinkedIn invitation buttons differ in aria-label.
Which Shopify Admin pages are embedded apps.
Why Shopify Polaris inputs cannot always be filled with normal JS value assignment.
How Browser Use Cloud browser instances are created, listed, and cleaned up.

These are not steps for one task. They are decision-making knowledge that many future tasks can reuse.

Example: Amazon product search

For Amazon product search, the important part is not only how to search, but which path is more stable.

A more reliable approach is to use a direct search URL instead of opening the homepage and simulating typing every time. Search results can be extracted from a container such as [data-component-type="s-search-result"]. Field extraction also has details: title, price, rating, review count, and sponsored status each have more stable DOM sources.

This kind of experience is valuable for an agent. Without it, the agent may guess buttons from screenshots and repeatedly try selectors. With it, the agent can go directly to a more stable extraction path.

More importantly, a skill can record traps. For example, some selectors that look usable may misread sponsored results or cross-sell areas. You only learn that from field testing.

Example: LinkedIn invitation management

LinkedIn is closer to a real account workflow, and the risk is higher.

On the invitation manager page, the Accept and Ignore buttons use different aria-label formats. You cannot simply derive one from the other. Some invitation cards even render Accept as an <a> element rather than a <button>, and ordinary CDP clicks may not trigger the accept action.

This shows that real web automation does not end when an element is located. Button labels, event binding, soft navigation, and component implementation all affect whether an action really works.

For an agent, this experience also has a safety meaning. Operations involving social accounts, invitations, messages, and posting should not be fully delegated. A skill can record the path and traps, but accepting invitations in bulk, sending content externally, or changing account details should keep human confirmation.

Example: Shopify Admin

Shopify Admin shows another issue: backend systems are often not one page, but a combination of embedded apps and complex components.

Many Shopify apps run inside iframes. Polaris React inputs, Web Components, and embedded apps all behave differently. Some inputs cannot be filled with element.value = ...; they need CDP keystrokes that are closer to real keyboard input.

The value of this kind of skill is that it lets the agent first identify what kind of UI it is looking at, then choose the right operation method.

Shopify experience also emphasizes “do not use the browser if you do not have to”:

For read-only product and inventory data, use the Storefront API first.
If an Admin API token exists, use the Admin API first.
For theme code editing, use Shopify CLI first.
Use the browser only when there is no API, the change is rare, or you are exploring the admin.

That is a mature tool-selection logic for agents.

Example: Browser Use Cloud

domain skills do not only serve webpage clicking. They can also record API experience around browser runtimes.

Browser Use Cloud experience can record how to create cloud browsers through REST APIs, list running browsers, clean up zombie browsers, and obtain liveUrl and cdpUrl.

This means a skill is not limited to “how to click a button.” Any recurring task with a stable method can become a skill:

API call patterns.
Authentication header format.
Request and response structure.
Verified status codes.
Common failure modes.
Resource cleanup and recycling methods.

For agents, all of these are reusable capabilities.

Why this is more reliable than ad-hoc reasoning

Many people expect a large model to understand the webpage by itself every time. In real tasks, relying only on ad-hoc reasoning is unstable.

The reasons are simple:

Web UI changes often.
The same button may have multiple implementations.
Visible does not mean clickable.
Clickable does not mean the action really worked.
Some tasks should use APIs instead of browsers.
Some operations require human confirmation and should not be decided by the model alone.

Writing these experiences into files brings several benefits:

Humans can review them.
Wrong experience can be corrected.
Site knowledge can accumulate over time.
New agents can inherit old experience.
Temporary task discoveries can become long-term knowledge.

This is more stable than putting everything into a prompt or chat context.

How teams can use it

In a team, domain skills can become a lightweight automation knowledge base.

Useful content to record includes:

Post-login paths in internal systems.
Report export flows.
Common popup handling.
Which buttons require human confirmation.
Which pages have API alternatives.
Which selectors were tested and found reliable.
Which tasks agents are not allowed to run automatically.

This knowledge does not need to be complete at the beginning. A practical path is to start with low-risk, frequent, reversible workflows: read-only tasks, downloads, organization, and checks. Once the flow is stable, turn the experience into a skill.

For team managers, skill files also make automation boundaries visible. You can inspect what the agent knows, what it can do, and where it should stop.

Boundaries to keep

domain skills can improve an agent’s success rate, but they should not fully automate high-risk operations.

Several boundaries matter:

Do not record passwords, Cookie, token, customer data, or sensitive internal URLs.
Keep human confirmation for payments, deletion, bulk submission, account changes, and external publishing.
Record verification date and scope.
Allow skills to expire after site changes and require revalidation.
Do not make bypassing risk controls or platform limits a goal.

In other words, domain skills make agents steadier. They do not give agents unlimited permission.

Conclusion

The domain skills mechanism in browser-harness shows one thing: AI browser automation cannot rely only on the model improvising at runtime.

A usable browser agent needs at least three layers:

Low-level control: screenshots, clicks, input, downloads, CDP, HTTP.
Site-level knowledge: API priority, stable selectors, component traps, login boundaries.
Human safety rules: do not give credentials to the model, confirm high-risk actions, and do not write sensitive information into skills.

domain skills fill the second layer. They let an agent enter a web task with verified experience instead of rediscovering everything every time.

References:

browser-harness domain skills: https://github.com/browser-use/browser-harness/tree/main/agent-workspace/domain-skills
Amazon product-search skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/amazon/product-search.md
ArXiv scraping skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/arxiv/scraping.md
GitHub scraping skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/github/scraping.md
LinkedIn invitation-manager skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/linkedin/invitation-manager.md
Shopify admin skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/shopify-admin/README.md
Browser Use Cloud skill: https://github.com/browser-use/browser-harness/blob/main/agent-workspace/domain-skills/browser-use-cloud/cloud.md