How to choose the five MinerU 3.4 modes: pipeline, hybrid-engine, vlm-engine, hybrid-http-client, and vlm-http-client

MinerU 3.4 officially supports five backend names in its CLI:

1
2
3
4
5


pipeline
hybrid-engine
vlm-engine
hybrid-http-client
vlm-http-client

The default backend is hybrid-engine, and Hybrid uses --effort medium by default. The confusing part is not the command syntax, but where the model actually runs, whether your local GPU is required, and which type of PDF each mode is best for.

Here is the short version: use pipeline for normal digital PDFs and batch jobs; use hybrid-engine --effort medium for the best overall local quality; try vlm-engine separately for difficult scanned files; only consider the two HTTP Client modes if the model is deployed on another GPU server.

Quick comparison of the five modes

Backend	Compute location	Core method	Local GPU	Characteristics
`pipeline`	Local machine	Multiple specialized models such as OCR, layout analysis, and formula recognition	Optional	Best compatibility, stable, almost no hallucination
`hybrid-engine`	Local machine	Native text extraction + VLM + Pipeline	Required, about 8GB minimum	Best overall accuracy, suitable for most high-quality parsing
`vlm-engine`	Local machine	Mainly lets a vision-language model understand the whole page	Required, about 8GB minimum	Good for complex scans, tables, formulas, and unusual layouts
`hybrid-http-client`	Small local models + remote VLM	Hybrid, but the large model runs on a server	Local GPU can be avoided	Suitable when you already have a remote GPU server
`vlm-http-client`	Remote server	VLM runs entirely on the server	No local GPU required	Local machine only uploads files and receives results

HTTP Client is not a “local mode that saves VRAM.” It is a remote deployment mode. Your local machine may avoid running the large model, but the remote server still has to perform VLM inference.

pipeline: stable, light on VRAM, good for batch jobs

Command:

1

mineru -p "input.pdf" -o "output" -b pipeline

pipeline does not send the whole page to one large model. It combines multiple specialized modules:

Native PDF text extraction.
OCR.
Layout detection.
Table recognition.
Formula recognition.
Reading-order reconstruction.

Its strengths are stability and low resource requirements. It can run on CPU only, and it can also use an NVIDIA GPU for acceleration. The official description emphasizes that it is fast, stable, and hallucination-free. The benchmark table lists an overall accuracy of about 86.47, and GPU mode needs about 4GB of VRAM at minimum.

pipeline is suitable for:

Normal digital PDFs.
Large batch processing jobs.
Text-heavy documents.
Scenarios where you do not want the model to guess content.
8GB GPUs where stability matters more than maximum accuracy.

If you use an RTX 4060 8GB, this is usually the safest local GPU mode. It is also a good first step for checking whether your CUDA environment works.

vlm-engine: let the vision-language model read the whole page

Command:

1

mineru -p "input.pdf" -o "output" -b vlm-engine

vlm-engine mainly uses MinerU’s vision-language model to understand each page as an image. It identifies titles, body text, table structures, formulas, reading order, and relationships between complex layout blocks.

Its table accuracy is about 95.30, much higher than pipeline. However, local execution requires about 8GB of VRAM at minimum, and CPU-only mode is not supported.

vlm-engine is suitable for:

Scanned papers.
Complex multi-column layouts.
Tables with irregular borders.
Formula-heavy pages.
Handwritten or unusual layouts.
Files where pipeline performs poorly.

The downside is higher VRAM pressure. Compared with hybrid-engine, it also lacks the combined benefit of first extracting native PDF text and then using VLM for difficult areas, so it is not always the best default mode.

hybrid-engine: Pipeline and VLM combined

Command:

1

mineru -p "input.pdf" -o "output" -b hybrid-engine --effort medium

hybrid-engine combines two approaches:

For digital PDFs, it tries to extract native text directly.
For scanned content, complex tables, formulas, and unusual layouts, it calls the VLM.
It then uses parts of Pipeline for auxiliary processing.

This gives it VLM-level accuracy, the reliability of native text extraction, lower hallucination risk, and better support for multilingual digital PDFs. Officially, it is positioned as a high-accuracy, native-text-extraction, low-hallucination mode, and it is the current recommended local default.

Hybrid has two common effort levels.

Medium:

1

mineru -p "input.pdf" -o "output" -b hybrid-engine --effort medium

Its table accuracy is about 95.26. It is faster and suitable for most documents. The current default is medium, but Medium automatically disables image and chart analysis.

High:

1

mineru -p "input.pdf" -o "output" -b hybrid-engine --effort high

Its table accuracy is about 95.39. It supports image and chart analysis, but processing is slower. In the official data, Medium is only about 0.13 points lower than High, while it can be noticeably faster in some Windows setups.

If your GPU is an RTX 4060 8GB, hybrid-engine --effort medium is the preferred high-quality local mode. Before running it, close games, browser hardware acceleration, and other programs that occupy VRAM, because 8GB is the lower end of the requirement.

vlm-http-client: the local machine does not run the model

Example:

1
2
3


mineru -p "input.pdf" -o "output" `
  -b vlm-http-client `
  -u "http://192.168.1.100:30000"

In this mode, your computer is only a client:

1

local machine uploads pages -> remote GPU server parses them -> local machine receives results

The actual VLM runs on another GPU machine, a Linux GPU server, a LAN server, or an OpenAI API-compatible inference service. Therefore, the local machine does not need an NVIDIA GPU and can even use a lightweight MinerU installation. The official docs also describe vlm-http-client as suitable for edge devices with only CPU and network access.

The important detail: “no local GPU required” does not mean the whole system needs no GPU. The remote server still performs VLM inference.

hybrid-http-client: split work between local machine and server

Command:

1
2
3


mineru -p "input.pdf" -o "output" `
  -b hybrid-http-client `
  -u "http://192.168.1.100:30000"

hybrid-http-client is not the same as vlm-http-client. It usually works like this:

The local machine handles PDF text extraction and some small-model tasks.
The remote server handles VLM inference.
MinerU combines the results.

So the local machine can run on CPU only. If it has a GPU, the local auxiliary steps can be faster. The official recommendation is to install mineru[pipeline] on the client. The roughly 2GB minimum VRAM listed in the table mainly refers to optional local GPU acceleration for the small Hybrid client-side models. It does not mean the remote VLM server only needs 2GB.

Why HTTP Client and Engine have the same accuracy

The official table shows results like this:

1
2


hybrid-engine        95.39 / 95.26
hybrid-http-client   95.39 / 95.26

The reason is that both modes use basically the same parsing logic and models. The main difference is where the model runs:

hybrid-engine: the model runs on your local GPU.
hybrid-http-client: the model runs on a remote server.

So HTTP Client is not a lower-accuracy edition. It is the remote deployment edition. It is useful for teams that already have a GPU server, not for single-machine users trying to casually save VRAM.

How to choose with an RTX 4060 8GB

If your GPU is an RTX 4060 8GB, choose in this order.

For daily stable use:

1

mineru -p "input.pdf" -o "output" -b pipeline

It has low VRAM pressure, is good for checking CUDA, and works well for batch processing normal PDFs.

For the best overall local quality:

1
2
3


mineru -p "input.pdf" -o "output" `
  -b hybrid-engine `
  --effort medium

This is the preferred high-quality mode on an 8GB GPU. Try to free VRAM before running it.

For image analysis or maximum accuracy:

1
2
3


mineru -p "input.pdf" -o "output" `
  -b hybrid-engine `
  --effort high

It is slower, but it enables image and chart analysis.

For difficult scanned layouts where results are still poor:

1

mineru -p "input.pdf" -o "output" -b vlm-engine

You can compare it with Hybrid results, but it usually does not need to be your permanent default.

If you do not have a remote server, you do not need to consider:

1
2


hybrid-http-client
vlm-http-client

They require an additional OpenAI-compatible inference server, or at least an available remote GPU machine.

One-line choice guide

Normal PDFs, batch jobs, stability first:

1

pipeline

Best overall local quality:

1

hybrid-engine --effort medium

Image analysis or maximum accuracy:

1

hybrid-engine --effort high

Very complex scanned layouts where you want to test VLM separately:

1

vlm-engine

Models deployed on another GPU server:

1

hybrid-http-client / vlm-http-client

Finally, check your PyTorch environment. If you are still on torch 2.8.0+cpu, pipeline can only run on CPU, and hybrid-engine plus vlm-engine cannot actually use your RTX 4060 until you install the CUDA build of PyTorch.