Holo 3.1 is a local computer-use Agent model family released by H Company. It is positioned as a vision-language model for operating computers. According to the official model card, Holo3.1 supports web, desktop, and mobile environments, offers sizes such as 0.8B, 4B, 9B, and 35B-A3B, and provides quantized versions suitable for local deployment.
It is suitable for users who want to run an AI Agent on their own machine: no cloud API, no token-based billing, and more control over browser automation, desktop actions, and local file workflows.
The following is a direct local setup flow: use llama.cpp to start an OpenAI-compatible service for Holo 3.1, then point OpenClaw to the local address.
Requirements
Prepare the following:
- A Windows, macOS, or Linux computer.
- A discrete GPU with enough VRAM, or an Apple Silicon Mac.
llama-serverfromllama.cpp.- The main Holo 3.1 GGUF model file and the vision
mmprojfile. - OpenClaw.
Choose the model size based on your hardware:
| Hardware | Recommended model |
|---|---|
| RTX 4090 / RTX 3090 24GB | 35B-A3B Q4_K_M |
| RTX 5070 Ti / RTX 4060 Ti 16GB | 9B |
| Apple Silicon | 9B GGUF |
| 12GB VRAM | 4B |
| 8GB VRAM | 0.8B |
If you only want to try browser automation and simple desktop tasks, 9B is easier to run. 35B-A3B is better suited to machines with 24GB VRAM or more, but it also consumes more context, VRAM, and loading time.
1. Download llama.cpp
You can download a prebuilt version from llama.cpp releases, or build it yourself. Windows users can download and extract it, then confirm that the directory contains:
|
|
Then create this folder under the llama.cpp directory:
|
|
Put the Holo 3.1 main model and mmproj file into this folder.
2. Download the Holo 3.1 Model
The official Hugging Face organization for Holo 3.1 is Hcompany. If you use llama.cpp, choose the GGUF format.
For 35B-A3B, download:
- The main model, such as a
Q4_K_Mquantized GGUF. - The corresponding vision projection model, such as
mmproj.f16.gguf.
After placing the files, the structure can look like this:
|
|
You can customize the file names, but the paths in the startup script must match.
3. Start the Local Holo 3.1 Service
The following is a Windows batch script example. Save it as start-holo31.bat and place it in the same directory as llama-server.exe.
|
|
Run the script and select the tier that matches your VRAM. If startup succeeds, llama-server will expose a local OpenAI-compatible API:
|
|
If startup fails, check these three things first:
- Whether the model file names match the script.
- Whether the
mmprojfile exists. - Whether your VRAM is enough for the selected model and context length.
4. Install OpenClaw
On Windows, open PowerShell as administrator and run:
|
|
On macOS / Linux, run:
|
|
After installation, open OpenClaw settings and configure the model provider as a local OpenAI-compatible service:
|
|
You can choose browser startup mode. After entering the OpenClaw visual interface, you should see the local model loaded at the bottom.
If there is a thinking mode switch in the interface, turn it off first. In computer-use Agent scenarios like Holo 3.1, action planning and UI execution matter more; enabling extra thinking may noticeably slow responses.
5. Install Browser Automation Skills
To help OpenClaw operate the browser better, install two common skills:
|
|
After installation, restart OpenClaw gateway:
|
|
You can also enter this in the OpenClaw chat box:
|
|
This starts a new session and reloads capabilities.
6. Test a Simple Task
Start with a low-risk task:
|
|
The key thing to observe is not whether the answer looks polished, but whether:
- It can open the browser correctly.
- It can recognize page content.
- It can continuously search, click, read, and summarize.
- It gets stuck or repeats actions frequently.
- The local model response speed is acceptable.
If browser actions work normally, try more complex tasks such as organizing materials, comparing model pages, generating Markdown summaries, or analyzing web tables.
Usage Notes
The advantages of a local Agent are low cost, clear privacy boundaries, and no cloud token bill. But it also has practical limits:
- Small models are suitable for lightweight browser tasks, not hard reasoning.
- The vision model is critical for UI recognition; do not download only the main model.
- Very large context settings can consume a lot of VRAM, so start with conservative parameters.
- Automation can misclick. Do not start by letting it handle payments, deletion, production systems, or other high-risk tasks.
- A local model is not automatically safe. Browser permissions, file permissions, and command execution permissions still need control.
For everyday web material organization, lightweight automation, and local experiments, Holo 3.1 + llama.cpp + OpenClaw is worth trying. Its key value is not the slogan “free unlimited tokens,” but keeping the Agent runtime, model, and data flow as local as possible.
References
- Holo 3.1 official page: https://hcompany.ai/holo3.1
- H Company Hugging Face: https://huggingface.co/Hcompany
- Holo 3.1 35B-A3B GGUF: https://huggingface.co/Hcompany/Holo-3.1-35B-A3B-GGUF
- llama.cpp: https://github.com/ggml-org/llama.cpp
- OpenClaw + llama.cpp setup reference: https://openclawlaunch.com/guides/openclaw-llamacpp