<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>RAG on KnightLi Blog</title>
        <link>https://knightli.com/en/tags/rag/</link>
        <description>Recent content in RAG on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Wed, 20 May 2026 23:51:37 +0800</lastBuildDate><atom:link href="https://knightli.com/en/tags/rag/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>What Is PageIndex? A Reasoning-Based RAG Document Index Without Vector Databases</title>
        <link>https://knightli.com/en/2026/05/20/vectifyai-pageindex-vectorless-rag/</link>
        <pubDate>Wed, 20 May 2026 23:51:37 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/20/vectifyai-pageindex-vectorless-rag/</guid>
        <description>&lt;p&gt;&lt;code&gt;VectifyAI/PageIndex&lt;/code&gt; is an interesting RAG project. Instead of starting with &amp;ldquo;build another vector database,&amp;rdquo; it first organizes long documents into a tree structure similar to a table of contents, then lets an LLM perform reasoning-based retrieval along that tree.&lt;/p&gt;
&lt;p&gt;Project link: &lt;a class=&#34;link&#34; href=&#34;https://github.com/VectifyAI/PageIndex&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;VectifyAI/PageIndex&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;At the time of writing, the GitHub page shows about 31.8k stars and 2.7k forks, with an MIT license. The README positions it as &lt;code&gt;Vectorless, Reasoning-based RAG&lt;/code&gt;: RAG without a vector database, based on reasoning.&lt;/p&gt;
&lt;h2 id=&#34;what-problem-it-tries-to-solve&#34;&gt;What Problem It Tries to Solve
&lt;/h2&gt;&lt;p&gt;The common path for traditional RAG is: chunk the document, vectorize the chunks, store them in a vector database, then retrieve passages by similarity search. This approach is simple, general, and mature, but it often runs into several problems with long professional documents:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Similarity is not the same as true relevance.&lt;/li&gt;
&lt;li&gt;Document structure is broken apart by chunking, and section relationships are lost.&lt;/li&gt;
&lt;li&gt;Retrieval results are hard to explain, making it difficult to say why a passage was selected.&lt;/li&gt;
&lt;li&gt;For financial reports, regulatory filings, legal documents, and technical manuals, questions often require reasoning across sections.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PageIndex takes the opposite route: first organize the document into a semantic tree, then let the model search it like a human reading a table of contents, jumping into sections, and narrowing down to details.&lt;/p&gt;
&lt;h2 id=&#34;the-basic-pageindex-workflow&#34;&gt;The Basic PageIndex Workflow
&lt;/h2&gt;&lt;p&gt;The README describes PageIndex retrieval in two steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Generate a &lt;code&gt;Table-of-Contents&lt;/code&gt;-like tree index for the document.&lt;/li&gt;
&lt;li&gt;Perform reasoning-based retrieval through tree search.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This tree is not just a file directory. It is a document structure designed for LLM use. Nodes can contain titles, page ranges, summaries, child nodes, and other metadata. When answering a question, the model does not need to face a pile of fragmented chunks immediately. It can first decide which section to enter, then continue searching downward.&lt;/p&gt;
&lt;p&gt;This method is better suited to documents that are well structured but very long, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Financial reports and SEC filings.&lt;/li&gt;
&lt;li&gt;Regulatory and compliance documents.&lt;/li&gt;
&lt;li&gt;Academic textbooks and papers.&lt;/li&gt;
&lt;li&gt;Legal documents.&lt;/li&gt;
&lt;li&gt;Technical manuals and product documentation.&lt;/li&gt;
&lt;li&gt;Large PDFs that exceed the model context window.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-it-differs-from-traditional-vector-rag&#34;&gt;How It Differs From Traditional Vector RAG
&lt;/h2&gt;&lt;p&gt;PageIndex&amp;rsquo;s main selling points can be summarized in five areas.&lt;/p&gt;
&lt;p&gt;First, it does not require a Vector DB. It relies on document structure and LLM reasoning to locate content, rather than only using vector similarity search.&lt;/p&gt;
&lt;p&gt;Second, it does not use traditional chunking. Documents are organized by natural sections instead of fixed-length text fragments.&lt;/p&gt;
&lt;p&gt;Third, explainability is stronger. The retrieval path can map back to pages, sections, and tree nodes, making it easier to trace than &amp;ldquo;this text was hit by vector similarity.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Fourth, retrieval is context-aware. The question, conversation history, and domain background can all affect the tree search path.&lt;/p&gt;
&lt;p&gt;Fifth, it is closer to how human experts read documents. People usually do not cut an entire document into small chunks and calculate similarity; they first inspect the table of contents, locate sections, and then read details.&lt;/p&gt;
&lt;p&gt;This does not mean vector databases have no value. A more accurate view is that PageIndex fits scenarios where &amp;ldquo;semantic similarity is not enough, and structure plus reasoning need to participate&amp;rdquo; in long-document retrieval.&lt;/p&gt;
&lt;h2 id=&#34;how-to-run-it-locally&#34;&gt;How to Run It Locally
&lt;/h2&gt;&lt;p&gt;The README provides a local self-hosting path. First install dependencies:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip3 install --upgrade -r requirements.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then create a &lt;code&gt;.env&lt;/code&gt; file in the project root and write your LLM API key. The project supports multiple models through &lt;code&gt;LiteLLM&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;your_openai_key_here
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Generate a PageIndex structure for a PDF:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python3 run_pageindex.py --pdf_path /path/to/your/document.pdf
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Markdown is also supported:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python3 run_pageindex.py --md_path /path/to/your/document.md
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Common optional parameters include:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--model
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--toc-check-pages
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--max-pages-per-node
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--max-tokens-per-node
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--if-add-node-id
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--if-add-node-summary
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--if-add-doc-description
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The README also notes that the local open-source version uses standard PDF parsing. For complex PDFs, the project&amp;rsquo;s cloud service provides enhanced OCR, tree building, and retrieval pipelines.&lt;/p&gt;
&lt;h2 id=&#34;agentic-vectorless-rag-example&#34;&gt;Agentic Vectorless RAG Example
&lt;/h2&gt;&lt;p&gt;The project also provides an agentic vectorless RAG example using self-hosted PageIndex and OpenAI Agents SDK. Install the optional dependency and run it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip3 install openai-agents
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python3 examples/agentic_vectorless_rag_demo.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The value of this example is that it pushes PageIndex from &amp;ldquo;generate a document tree&amp;rdquo; to &amp;ldquo;let an Agent use the document tree for retrieval.&amp;rdquo; If you are building an enterprise knowledge base, financial report Q&amp;amp;A, regulatory Q&amp;amp;A, or technical documentation Agent, this example is more worth running than only reading the README.&lt;/p&gt;
&lt;h2 id=&#34;cloud-service-mcp-and-api&#34;&gt;Cloud Service, MCP, and API
&lt;/h2&gt;&lt;p&gt;PageIndex is not just a GitHub repo. The project page also lists several entry points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Self-hosting: run the open-source code locally, suitable for experiments and controlled deployments.&lt;/li&gt;
&lt;li&gt;Chat Platform: a ChatGPT-style document analysis platform.&lt;/li&gt;
&lt;li&gt;MCP / API: useful for integrating with existing Agents or automation workflows.&lt;/li&gt;
&lt;li&gt;Enterprise: for private or on-premises deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This shows that its positioning is not a simple demo. It aims to turn &amp;ldquo;reasoning-based document retrieval&amp;rdquo; into an integrable document intelligence infrastructure.&lt;/p&gt;
&lt;h2 id=&#34;suitable-scenarios&#34;&gt;Suitable Scenarios
&lt;/h2&gt;&lt;p&gt;PageIndex is suitable for tasks such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Long PDF Q&amp;amp;A.&lt;/li&gt;
&lt;li&gt;Financial reports, annual reports, prospectuses, and regulatory filing analysis.&lt;/li&gt;
&lt;li&gt;Legal and compliance document retrieval.&lt;/li&gt;
&lt;li&gt;Technical manual Q&amp;amp;A.&lt;/li&gt;
&lt;li&gt;Multi-section textbook or paper retrieval.&lt;/li&gt;
&lt;li&gt;Enterprise knowledge bases that need explainable retrieval paths.&lt;/li&gt;
&lt;li&gt;Providing structured document context to Agents.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your material is short, has little structure, or is just a normal FAQ, traditional embedding + vector DB may already be enough. PageIndex&amp;rsquo;s advantages are more likely to appear in long documents, strong structure, professional domains, and questions that require reasoning.&lt;/p&gt;
&lt;h2 id=&#34;things-to-watch&#34;&gt;Things to Watch
&lt;/h2&gt;&lt;p&gt;First, PageIndex still depends on LLMs. Tree building, summaries, and retrieval quality are affected by model capability, prompts, and document parsing quality.&lt;/p&gt;
&lt;p&gt;Second, the local version uses standard PDF parsing. Complex scanned documents, chart-heavy PDFs, or messy layouts may require OCR and stronger preprocessing.&lt;/p&gt;
&lt;p&gt;Third, vectorless does not mean zero cost. Tree building itself also consumes model calls and time, especially for large-scale document collections.&lt;/p&gt;
&lt;p&gt;Fourth, PageIndex is more like a document structure indexing and reasoning retrieval framework. It does not directly replace every RAG stack. In production, it may also be combined with vector retrieval, keyword retrieval, permission control, caching, and audit systems.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;What makes PageIndex interesting is that it shifts RAG from &amp;ldquo;text similarity retrieval&amp;rdquo; toward &amp;ldquo;document structure + LLM reasoning.&amp;rdquo; For long and professional documents, this direction is worth watching.&lt;/p&gt;
&lt;p&gt;If you are building enterprise document Q&amp;amp;A, financial report analysis, regulatory retrieval, or technical manual Agents, PageIndex is a new RAG architecture reference: give documents structure first, then let the model reason along that structure, instead of breaking everything into chunks and putting it all into a vector database from the beginning.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/VectifyAI/PageIndex&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GitHub: VectifyAI/PageIndex&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>OpenKB: Compiling Documents into a Continuously Updated LLM Knowledge Base</title>
        <link>https://knightli.com/en/2026/05/17/openkb-llm-knowledge-base/</link>
        <pubDate>Sun, 17 May 2026 17:15:08 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/17/openkb-llm-knowledge-base/</guid>
        <description>&lt;p&gt;OpenKB is an open-source LLM knowledge base tool from VectifyAI.&lt;/p&gt;
&lt;p&gt;It is not a traditional RAG system that chunks documents, vectorizes them, and then stitches context back together at query time. Instead, it first compiles raw documents into a structured wiki: document summaries, concept pages, cross-references, follow-up queries, and lint checks. In other words, it feels more like a knowledge-base CLI that keeps organizing your material over time.&lt;/p&gt;
&lt;p&gt;Project link: &lt;a class=&#34;link&#34; href=&#34;https://github.com/VectifyAI/OpenKB&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/VectifyAI/OpenKB&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;the-short-version&#34;&gt;The Short Version
&lt;/h2&gt;&lt;p&gt;OpenKB is worth watching for three reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It outputs the knowledge base as ordinary Markdown files instead of locking it inside a dedicated database.&lt;/li&gt;
&lt;li&gt;It uses PageIndex for long PDFs, focusing on vector-database-free retrieval for long documents.&lt;/li&gt;
&lt;li&gt;It emphasizes &amp;ldquo;knowledge compilation&amp;rdquo;: the LLM generates summaries, concept pages, and cross-links instead of retrieving from scratch on every question.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That makes OpenKB better suited to long-term knowledge accumulation: paper reading, project documentation, internal company materials, technical standards, product research, and personal knowledge bases.&lt;/p&gt;
&lt;p&gt;It is not a universal replacement. If you need high-concurrency online Q&amp;amp;A, complex permissions, a web admin console, enterprise audit trails, or large-scale multi-tenancy, OpenKB currently looks more like a developer tool and knowledge-base prototype than a complete enterprise knowledge platform.&lt;/p&gt;
&lt;h2 id=&#34;what-openkb-is&#34;&gt;What OpenKB Is
&lt;/h2&gt;&lt;p&gt;OpenKB stands for Open Knowledge Base.&lt;/p&gt;
&lt;p&gt;It works as a CLI: it converts, organizes, summarizes, and writes documents into a set of wiki files. The official README describes it directly: OpenKB uses LLMs to compile raw documents into a structured, interlinked wiki-style knowledge base, with PageIndex providing vectorless long-document retrieval.&lt;/p&gt;
&lt;p&gt;Supported input formats include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PDF&lt;/li&gt;
&lt;li&gt;Word&lt;/li&gt;
&lt;li&gt;Markdown&lt;/li&gt;
&lt;li&gt;PowerPoint&lt;/li&gt;
&lt;li&gt;HTML&lt;/li&gt;
&lt;li&gt;Excel&lt;/li&gt;
&lt;li&gt;Plain text&lt;/li&gt;
&lt;li&gt;Other formats that markitdown can convert&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The generated knowledge base lives under &lt;code&gt;wiki/&lt;/code&gt; and mainly includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;index.md&lt;/code&gt;: knowledge base overview&lt;/li&gt;
&lt;li&gt;&lt;code&gt;log.md&lt;/code&gt;: operation timeline&lt;/li&gt;
&lt;li&gt;&lt;code&gt;AGENTS.md&lt;/code&gt;: knowledge base structure and maintenance instructions&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sources/&lt;/code&gt;: converted source text&lt;/li&gt;
&lt;li&gt;&lt;code&gt;summaries/&lt;/code&gt;: summaries for each document&lt;/li&gt;
&lt;li&gt;&lt;code&gt;concepts/&lt;/code&gt;: cross-document concept pages&lt;/li&gt;
&lt;li&gt;&lt;code&gt;explorations/&lt;/code&gt;: saved query results&lt;/li&gt;
&lt;li&gt;&lt;code&gt;reports/&lt;/code&gt;: lint reports&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The biggest benefit of this design is transparency. You can open the Markdown files directly instead of only receiving answers through a black-box retrieval interface.&lt;/p&gt;
&lt;h2 id=&#34;how-it-differs-from-traditional-rag&#34;&gt;How It Differs from Traditional RAG
&lt;/h2&gt;&lt;p&gt;A typical traditional RAG pipeline looks like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Chunk the documents.&lt;/li&gt;
&lt;li&gt;Generate embeddings.&lt;/li&gt;
&lt;li&gt;Store them in a vector database.&lt;/li&gt;
&lt;li&gt;Retrieve relevant chunks at query time.&lt;/li&gt;
&lt;li&gt;Feed those chunks to the LLM to generate an answer.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That workflow is mature and works well for Q&amp;amp;A systems. But it has one problem: the knowledge itself does not really accumulate. Every question repeats the work of finding chunks, assembling context, and generating an answer.&lt;/p&gt;
&lt;p&gt;OpenKB is closer to &amp;ldquo;organize first, ask later&amp;rdquo;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Documents enter &lt;code&gt;raw/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Short documents are converted to Markdown with markitdown.&lt;/li&gt;
&lt;li&gt;Long PDFs go through PageIndex to produce tree indexes and summaries.&lt;/li&gt;
&lt;li&gt;The LLM generates document summaries.&lt;/li&gt;
&lt;li&gt;The LLM reads existing concept pages and creates or updates cross-document concepts.&lt;/li&gt;
&lt;li&gt;The knowledge base index, log, and cross-links are updated.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As a result, adding one document does more than create another searchable file. It may update a dozen wiki pages. Knowledge is written into concept pages and connected to existing material.&lt;/p&gt;
&lt;p&gt;This is closer to how humans maintain knowledge bases: when new material arrives, you do not just archive it; you update topic pages, summarize differences, and add references.&lt;/p&gt;
&lt;h2 id=&#34;what-pageindex-solves&#34;&gt;What PageIndex Solves
&lt;/h2&gt;&lt;p&gt;Long documents have always been difficult for RAG and LLM knowledge bases.&lt;/p&gt;
&lt;p&gt;If you simply split a long PDF into many chunks, several problems appear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chapter relationships are lost.&lt;/li&gt;
&lt;li&gt;Tables, images, and footnotes are hard to handle.&lt;/li&gt;
&lt;li&gt;Retrieved snippets are too fragmented, so answers lack global structure.&lt;/li&gt;
&lt;li&gt;Even a large context window is not ideal for stuffing an entire document into the prompt.&lt;/li&gt;
&lt;li&gt;Long summary chains can compress away important details.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;OpenKB uses PageIndex for long PDFs. According to the project description, PageIndex builds tree indexes and summaries for long documents, letting the LLM reason over the document tree instead of reading the whole document directly.&lt;/p&gt;
&lt;p&gt;The focus is not &amp;ldquo;the few text snippets with the highest vector similarity.&amp;rdquo; It is about helping the model use document hierarchy to find relevant content. For research reports, papers, manuals, prospectuses, and compliance documents, this direction makes a lot of sense.&lt;/p&gt;
&lt;p&gt;OpenKB can use the open-source PageIndex locally by default. If you need OCR, complex PDF handling, or faster structure generation, you can configure &lt;code&gt;PAGEINDEX_API_KEY&lt;/code&gt; to use PageIndex Cloud.&lt;/p&gt;
&lt;h2 id=&#34;install-and-quick-start&#34;&gt;Install and Quick Start
&lt;/h2&gt;&lt;p&gt;Install OpenKB with pip:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install openkb
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Or install the latest GitHub version:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install git+https://github.com/VectifyAI/OpenKB.git
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For editable source installation:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone https://github.com/VectifyAI/OpenKB.git
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; OpenKB
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install -e .
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Create a knowledge base directory:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;mkdir my-kb &lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; my-kb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;openkb init
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Add documents:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;openkb add paper.pdf
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;openkb add ~/papers/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Ask a question:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;openkb query &lt;span class=&#34;s2&#34;&gt;&amp;#34;What are the main findings?&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Start an interactive chat:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;openkb chat
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want OpenKB to process new files automatically, use watch mode:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;openkb watch
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After that, drop files into &lt;code&gt;raw/&lt;/code&gt;, and OpenKB will update the wiki automatically.&lt;/p&gt;
&lt;h2 id=&#34;llm-configuration&#34;&gt;LLM Configuration
&lt;/h2&gt;&lt;p&gt;OpenKB uses LiteLLM to support multiple model providers, including OpenAI, Claude, and Gemini.&lt;/p&gt;
&lt;p&gt;You can set the model during initialization, or configure it in &lt;code&gt;.openkb/config.yaml&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nt&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;l&#34;&gt;gpt-5.4&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nt&#34;&gt;language&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;l&#34;&gt;en&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nt&#34;&gt;pageindex_threshold&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;m&#34;&gt;20&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Model names follow LiteLLM&amp;rsquo;s &lt;code&gt;provider/model&lt;/code&gt; format. OpenAI models can omit the provider prefix:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nt&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;l&#34;&gt;gpt-5.4&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Models such as Anthropic and Gemini are usually written like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nt&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;l&#34;&gt;anthropic/claude-sonnet-4-6&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nt&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;l&#34;&gt;gemini/gemini-3.1-pro-preview&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Put the API key in &lt;code&gt;.env&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;LLM_API_KEY&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;your_llm_api_key
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you enable PageIndex Cloud, add:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;PAGEINDEX_API_KEY&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;your_pageindex_api_key
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;common-commands&#34;&gt;Common Commands
&lt;/h2&gt;&lt;p&gt;OpenKB&amp;rsquo;s commands are developer-friendly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;openkb init&lt;/code&gt;: initialize a knowledge base.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;openkb add &amp;lt;file_or_dir&amp;gt;&lt;/code&gt;: add a file or directory.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;openkb remove &amp;lt;doc&amp;gt;&lt;/code&gt;: remove a document and clean up related wiki pages, images, registry entries, and PageIndex state.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;openkb query &amp;quot;question&amp;quot;&lt;/code&gt;: ask a one-off question against the knowledge base.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;openkb chat&lt;/code&gt;: enter a multi-turn conversation.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;openkb watch&lt;/code&gt;: monitor &lt;code&gt;raw/&lt;/code&gt; and update automatically.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;openkb lint&lt;/code&gt;: check knowledge base structure and content health.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;openkb list&lt;/code&gt;: list indexed documents and concepts.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;openkb status&lt;/code&gt;: show knowledge base statistics.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;openkb chat&lt;/code&gt; is better than &lt;code&gt;openkb query&lt;/code&gt; for continuous exploration. It supports session resume, session listing, deletion, and slash commands such as &lt;code&gt;/status&lt;/code&gt;, &lt;code&gt;/list&lt;/code&gt;, &lt;code&gt;/add &amp;lt;path&amp;gt;&lt;/code&gt;, &lt;code&gt;/save&lt;/code&gt;, and &lt;code&gt;/lint&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;why-a-markdown-wiki-matters&#34;&gt;Why a Markdown Wiki Matters
&lt;/h2&gt;&lt;p&gt;Many knowledge-base tools are painful because of migration cost.&lt;/p&gt;
&lt;p&gt;Once material enters a proprietary database, index, or format, it becomes hard to inspect, edit, back up, or migrate directly. OpenKB writes the result as ordinary Markdown, which makes it naturally compatible with existing tools.&lt;/p&gt;
&lt;p&gt;The most direct use is opening &lt;code&gt;wiki/&lt;/code&gt; in Obsidian:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Summary pages can be read directly.&lt;/li&gt;
&lt;li&gt;Concept pages can connect through &lt;code&gt;[[wikilinks]]&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Graph view can show relationships between knowledge items.&lt;/li&gt;
&lt;li&gt;Query results can be saved to &lt;code&gt;explorations/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;AGENTS.md&lt;/code&gt; can define how the knowledge base should be maintained.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That makes OpenKB more than a Q&amp;amp;A tool. It can become a knowledge-organizing pipeline for individuals or teams.&lt;/p&gt;
&lt;h2 id=&#34;best-fit-scenarios&#34;&gt;Best-Fit Scenarios
&lt;/h2&gt;&lt;p&gt;OpenKB is especially useful for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reading papers and technical reports.&lt;/li&gt;
&lt;li&gt;Organizing project documentation.&lt;/li&gt;
&lt;li&gt;Building product research archives.&lt;/li&gt;
&lt;li&gt;Creating documentation knowledge bases around open-source projects.&lt;/li&gt;
&lt;li&gt;Organizing internal policies, meeting notes, and explanatory documents.&lt;/li&gt;
&lt;li&gt;Maintaining a personal Obsidian knowledge base automatically.&lt;/li&gt;
&lt;li&gt;Structuring long PDFs, PPTs, Word files, and web materials.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you often face piles of documents and want more than &amp;ldquo;ask one question, get one answer,&amp;rdquo; OpenKB&amp;rsquo;s direction is a good fit: it gradually turns material into a browsable, reusable, and traceable knowledge base.&lt;/p&gt;
&lt;h2 id=&#34;what-to-watch-out-for&#34;&gt;What to Watch Out For
&lt;/h2&gt;&lt;p&gt;First, OpenKB depends on LLM quality.&lt;/p&gt;
&lt;p&gt;Summaries, concept pages, and cross-links are generated by models. Stronger models usually produce more stable knowledge compilation; weaker models may struggle with concept extraction, contradiction detection, and cross-document synthesis.&lt;/p&gt;
&lt;p&gt;Second, estimate cost early.&lt;/p&gt;
&lt;p&gt;If you import many long documents at once, LLM calls may become expensive. Start with a small dataset, check the output structure and quality, and then expand.&lt;/p&gt;
&lt;p&gt;Third, the generated wiki still needs human review.&lt;/p&gt;
&lt;p&gt;OpenKB can organize material, but it does not automatically guarantee factual correctness. Important knowledge bases still need humans to review summaries, concept pages, and references.&lt;/p&gt;
&lt;p&gt;Fourth, be careful with sensitive material.&lt;/p&gt;
&lt;p&gt;If you use cloud LLMs or PageIndex Cloud, pay attention to privacy, trade secrets, and compliance requirements. For internal materials, confirm the model provider, data retention policy, and access boundaries first.&lt;/p&gt;
&lt;p&gt;Fifth, it is currently more of a CLI tool.&lt;/p&gt;
&lt;p&gt;The roadmap mentions a future Web UI, database-backed storage, support for large collections, and hierarchical concept indexing. At this stage, if teammates are not comfortable with the command line, there is still some adoption friction.&lt;/p&gt;
&lt;h2 id=&#34;relationship-with-obsidian-notebooklm-and-enterprise-rag&#34;&gt;Relationship with Obsidian, NotebookLM, and Enterprise RAG
&lt;/h2&gt;&lt;p&gt;OpenKB and Obsidian are best understood as an &amp;ldquo;automatic organization layer&amp;rdquo; plus a &amp;ldquo;reading and editing layer.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Obsidian is good for humans to write, edit, browse, and link notes. OpenKB is good for turning raw documents into a wiki that can enter Obsidian.&lt;/p&gt;
&lt;p&gt;OpenKB and NotebookLM differ more around local control and open file formats.&lt;/p&gt;
&lt;p&gt;NotebookLM is more direct for quickly asking questions and generating summaries after dropping in materials. OpenKB is better for developers who want the organized result to remain in a local directory and continue evolving as Markdown.&lt;/p&gt;
&lt;p&gt;OpenKB does not replace enterprise RAG; it complements it.&lt;/p&gt;
&lt;p&gt;Enterprise RAG cares more about permissions, auditability, service deployment, access isolation, monitoring, and stable throughput. OpenKB is better for building a readable, editable, long-lived knowledge layer. If you later build online Q&amp;amp;A, the wiki generated by OpenKB can also become a higher-quality corpus.&lt;/p&gt;
&lt;h2 id=&#34;a-recommended-workflow&#34;&gt;A Recommended Workflow
&lt;/h2&gt;&lt;p&gt;If you want to try OpenKB, start like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a test knowledge base directory.&lt;/li&gt;
&lt;li&gt;Add 3 to 5 documents on the same topic.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;openkb add&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Open &lt;code&gt;wiki/&lt;/code&gt; and inspect the summaries and concept pages.&lt;/li&gt;
&lt;li&gt;Ask a few specific questions with &lt;code&gt;openkb query&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;openkb lint&lt;/code&gt; to check knowledge-base health.&lt;/li&gt;
&lt;li&gt;Open &lt;code&gt;wiki/&lt;/code&gt; in Obsidian and see whether the link graph is meaningful.&lt;/li&gt;
&lt;li&gt;Once quality looks good, import a larger document collection.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Do not throw in hundreds of files at the beginning. First see whether it understands your material type well, especially tables, images, long PDFs, and multi-document concept merging.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;OpenKB&amp;rsquo;s value is that it moves an LLM knowledge base one step earlier than &amp;ldquo;assemble context at query time&amp;rdquo;: organize the material into a wiki first, then ask questions, chat, lint, and keep maintaining that wiki.&lt;/p&gt;
&lt;p&gt;This direction is not right for every Q&amp;amp;A system, but it is well suited to knowledge work that needs long-term accumulation. Markdown files, Obsidian compatibility, PageIndex long-document handling, multi-model support, and a CLI workflow combine into a useful tool for developers and research-oriented users.&lt;/p&gt;
&lt;p&gt;If you have many PDFs, reports, web pages, papers, and project documents, OpenKB is worth trying. It may not immediately replace a mature enterprise knowledge base, but it can become a practical entry point for organizing material: first turn documents into readable, linked, traceable knowledge, then let the LLM work on top of that knowledge.&lt;/p&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/VectifyAI/OpenKB&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;VectifyAI/OpenKB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://openkb.ai/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OpenKB project page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://pageindex.ai/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;PageIndex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/microsoft/markitdown&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;markitdown&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.litellm.ai/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;LiteLLM&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Computer Terms in Plain Language: What TTS, STT, API, RAG, and Agent Really Mean</title>
        <link>https://knightli.com/en/2026/05/12/computer-terms-in-plain-language/</link>
        <pubDate>Tue, 12 May 2026 22:15:34 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/12/computer-terms-in-plain-language/</guid>
        <description>&lt;p&gt;Computer science has many terms that sound advanced the first time you hear them. But once translated into plain language, many of them describe everyday actions.&lt;/p&gt;
&lt;p&gt;For example, when AI can speak, it is called &lt;code&gt;TTS&lt;/code&gt;; when AI can listen to you, it is called &lt;code&gt;STT&lt;/code&gt;. It sounds like a complex system, but the simple version is &amp;ldquo;read text aloud&amp;rdquo; and &amp;ldquo;write down speech.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Reference link: &lt;a class=&#34;link&#34; href=&#34;https://www.zhihu.com/question/267978646/answer/2035405228460201515&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://www.zhihu.com/question/267978646/answer/2035405228460201515&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This article strings together several common terms from that angle: keep the terms themselves, but explain them in plain language.&lt;/p&gt;
&lt;h2 id=&#34;tts-and-stt-converting-between-text-and-speech&#34;&gt;TTS and STT: Converting Between Text and Speech
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;TTS&lt;/code&gt; means &lt;code&gt;Text-to-Speech&lt;/code&gt;. It converts a piece of text into playable audio. Navigation announcements, audiobook reading, AI customer service voices, and voice assistants all use this kind of capability.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;STT&lt;/code&gt; means &lt;code&gt;Speech-to-Text&lt;/code&gt;. It does the reverse: it turns spoken audio into text, then passes that text to the next program. Voice input, meeting transcription, automatic subtitles, and smart speakers all rely on STT.&lt;/p&gt;
&lt;p&gt;Many voice AI products are basically this pipeline:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;STT&lt;/code&gt;: convert what you said into text.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;LLM&lt;/code&gt;: generate a reply from that text.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;TTS&lt;/code&gt;: read the reply aloud.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So it may feel like a natural conversation, but underneath, several modules are handing work to one another.&lt;/p&gt;
&lt;h2 id=&#34;ocr-copying-text-out-of-images&#34;&gt;OCR: Copying Text Out of Images
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;OCR&lt;/code&gt; means &lt;code&gt;Optical Character Recognition&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In plain language, it copies text out of images. Taking a photo of an invoice, scanning a page from a book, or reading a name and ID number from an identity document are all OCR tasks.&lt;/p&gt;
&lt;p&gt;Early OCR was closer to &amp;ldquo;guess the character shape.&amp;rdquo; Modern OCR uses deep learning and is more tolerant of messy backgrounds, tilted text, handwriting, and blurry images. But the core question remains simple: what words are in the image?&lt;/p&gt;
&lt;h2 id=&#34;nlp-and-llm-letting-machines-handle-human-language&#34;&gt;NLP and LLM: Letting Machines Handle Human Language
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;NLP&lt;/code&gt; means &lt;code&gt;Natural Language Processing&lt;/code&gt;. It deals with human language: tokenization, translation, summarization, sentiment analysis, question answering, classification, and more.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;LLM&lt;/code&gt; means &lt;code&gt;Large Language Model&lt;/code&gt;. It can understand and generate text, so many NLP tasks today are handled by LLMs.&lt;/p&gt;
&lt;p&gt;Plain-language version:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;NLP&lt;/code&gt;: make machines process what people say and write.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;LLM&lt;/code&gt;: a larger text model that can handle many language tasks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When you ask AI to summarize an article, write an email, polish a title, or explain code, it all belongs to this broad direction.&lt;/p&gt;
&lt;h2 id=&#34;api-and-sdk-one-is-an-interface-the-other-is-a-toolkit&#34;&gt;API and SDK: One Is an Interface, the Other Is a Toolkit
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;API&lt;/code&gt; means &lt;code&gt;Application Programming Interface&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In plain language, someone exposes an entry point for you to call a capability. A weather API takes a city and returns weather; a payment API takes an order and returns a payment result.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;SDK&lt;/code&gt; means &lt;code&gt;Software Development Kit&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In plain language, the official team packages common code, types, examples, and tools so you can call the API more easily. An API is like the restaurant counter; an SDK is like the ordering app. You can talk to the counter directly, or use the app to make ordering easier.&lt;/p&gt;
&lt;h2 id=&#34;crud-create-read-update-delete&#34;&gt;CRUD: Create, Read, Update, Delete
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;CRUD&lt;/code&gt; means &lt;code&gt;Create&lt;/code&gt;, &lt;code&gt;Read&lt;/code&gt;, &lt;code&gt;Update&lt;/code&gt;, and &lt;code&gt;Delete&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In plain language: add, view, edit, and delete.&lt;/p&gt;
&lt;p&gt;Many admin systems, management systems, and database operations revolve around CRUD. User management, article management, order management, and inventory management may look like different businesses, but underneath they are often forms plus create/read/update/delete operations.&lt;/p&gt;
&lt;p&gt;That is why programmers say they wrote &amp;ldquo;another CRUD.&amp;rdquo; It is not necessarily dismissive; it is simply very common.&lt;/p&gt;
&lt;h2 id=&#34;cache-keep-a-copy-so-you-do-not-recompute-every-time&#34;&gt;Cache: Keep a Copy So You Do Not Recompute Every Time
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Cache&lt;/code&gt; means caching.&lt;/p&gt;
&lt;p&gt;In plain language, keep frequently used things close by so you can grab them directly next time instead of searching, computing, or requesting them again.&lt;/p&gt;
&lt;p&gt;Web pages can cache images and scripts; slow database queries can put hot results in Redis; expensive model inference can cache answers to repeated questions.&lt;/p&gt;
&lt;p&gt;The hard part of caching is not &amp;ldquo;keeping a copy,&amp;rdquo; but &amp;ldquo;knowing when to update it.&amp;rdquo; If the data changes but the cache does not, users see stale data. That is the root of many cache problems.&lt;/p&gt;
&lt;h2 id=&#34;queue-line-up-tasks-and-process-them-slowly&#34;&gt;Queue: Line Up Tasks and Process Them Slowly
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Queue&lt;/code&gt; means a queue.&lt;/p&gt;
&lt;p&gt;In plain language: too many things are happening, so put them in line and process them one by one.&lt;/p&gt;
&lt;p&gt;For example, after a user uploads a video, transcoding may not finish immediately. The system can put the job into a queue and let a background service process it later. Sending SMS messages, emails, reports, and order callbacks also commonly use queues.&lt;/p&gt;
&lt;p&gt;Queues solve the problem of not blocking the current request with every slow task. The user gets a response first, and time-consuming work happens later.&lt;/p&gt;
&lt;h2 id=&#34;index-a-table-of-contents-for-the-database&#34;&gt;Index: A Table of Contents for the Database
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Index&lt;/code&gt; means an index.&lt;/p&gt;
&lt;p&gt;A database index is like a table of contents in a book. Without it, you may need to scan from the first page to the last page; with it, you can locate content much faster.&lt;/p&gt;
&lt;p&gt;But more indexes are not always better. Queries may become faster, while writes and updates may become slower, because the index also needs to be maintained when data changes.&lt;/p&gt;
&lt;p&gt;That is why database optimization often starts with indexes. But when creating one, you still need to consider query conditions, sorting fields, data volume, and write frequency.&lt;/p&gt;
&lt;h2 id=&#34;rpc-rest-and-webhook-how-systems-talk-to-each-other&#34;&gt;RPC, REST, and Webhook: How Systems Talk to Each Other
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;RPC&lt;/code&gt; means &lt;code&gt;Remote Procedure Call&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In plain language, it lets you call a function on another machine as if it were a local function.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;REST&lt;/code&gt; is common in Web APIs. It uses URLs and HTTP methods to describe operations on resources, such as &lt;code&gt;GET /users&lt;/code&gt; to query users and &lt;code&gt;POST /orders&lt;/code&gt; to create orders.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Webhook&lt;/code&gt; is a callback in the opposite direction. Instead of constantly asking &amp;ldquo;is it done?&amp;rdquo;, the other side calls your URL when something happens.&lt;/p&gt;
&lt;p&gt;Simple memory aid:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;RPC&lt;/code&gt;: call a remote function.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;REST&lt;/code&gt;: manage resources with HTTP.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Webhook&lt;/code&gt;: notify you when something happens.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;cdn-and-load-balancing-move-closer-and-share-the-load&#34;&gt;CDN and Load Balancing: Move Closer and Share the Load
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;CDN&lt;/code&gt; means &lt;code&gt;Content Delivery Network&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In plain language, put static resources on nodes closer to users. When users access images, videos, CSS, or JS, they do not always have to reach the origin server.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Load Balancing&lt;/code&gt; means load balancing.&lt;/p&gt;
&lt;p&gt;In plain language, when traffic is too high, do not make one server carry everything; distribute requests across multiple machines.&lt;/p&gt;
&lt;p&gt;One is about being closer to users; the other is about not exhausting one machine. Large websites usually use both.&lt;/p&gt;
&lt;h2 id=&#34;docker-container-and-kubernetes-package-run-and-schedule&#34;&gt;Docker, Container, and Kubernetes: Package, Run, and Schedule
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Docker&lt;/code&gt; is a common container tool, and &lt;code&gt;Container&lt;/code&gt; means container.&lt;/p&gt;
&lt;p&gt;In plain language, package the program together with its runtime environment so it can run similarly on another machine. This reduces &amp;ldquo;it works on my computer but not on the server&amp;rdquo; problems.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Kubernetes&lt;/code&gt;, often written as &lt;code&gt;K8s&lt;/code&gt;, is a container orchestration system.&lt;/p&gt;
&lt;p&gt;In plain language, when there are many containers, it decides where they run, how to restart them if they fail, how to route traffic, and how to roll out versions.&lt;/p&gt;
&lt;p&gt;If you only have one small service, Docker may be enough. If you have many services, machines, and replicas, K8s becomes more useful.&lt;/p&gt;
&lt;h2 id=&#34;cicd-automated-build-and-deployment&#34;&gt;CI/CD: Automated Build and Deployment
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;CI&lt;/code&gt; means &lt;code&gt;Continuous Integration&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In plain language, whenever code is submitted, the system automatically pulls the code, runs tests, and builds it to catch problems early.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;CD&lt;/code&gt; can mean &lt;code&gt;Continuous Delivery&lt;/code&gt; or &lt;code&gt;Continuous Deployment&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In plain language, after the build passes, the code is delivered to testing or production environments in a more stable and automated way.&lt;/p&gt;
&lt;p&gt;It does not solve &amp;ldquo;how to write code&amp;rdquo;; it solves &amp;ldquo;how to ship code with fewer mistakes.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;serialization-pack-objects-into-a-transmittable-format&#34;&gt;Serialization: Pack Objects Into a Transmittable Format
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Serialization&lt;/code&gt; means turning objects inside a program into a format that can be saved or transmitted, such as JSON, XML, or Protobuf.&lt;/p&gt;
&lt;p&gt;The reverse, &lt;code&gt;Deserialization&lt;/code&gt;, turns those formats back into objects the program can use.&lt;/p&gt;
&lt;p&gt;When frontend and backend exchange JSON, or services exchange Protobuf, serialization is involved.&lt;/p&gt;
&lt;h2 id=&#34;token-embedding-and-vector-db-turning-text-into-forms-models-can-process&#34;&gt;Token, Embedding, and Vector DB: Turning Text Into Forms Models Can Process
&lt;/h2&gt;&lt;p&gt;In large models, &lt;code&gt;Token&lt;/code&gt; usually refers to the basic units that text is split into. It is not necessarily one Chinese character or one English word; it is more like the model&amp;rsquo;s internal granularity for processing text.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Embedding&lt;/code&gt; means an embedding vector.&lt;/p&gt;
&lt;p&gt;In plain language, it turns text, images, or other content into a sequence of numbers so models can compare similarity.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Vector DB&lt;/code&gt; means vector database.&lt;/p&gt;
&lt;p&gt;In plain language, it stores those vectors and can quickly find content with similar meaning.&lt;/p&gt;
&lt;p&gt;For example, if you ask &amp;ldquo;how do I reset my router?&amp;rdquo;, the system may search the vector database for content about &amp;ldquo;factory reset,&amp;rdquo; &amp;ldquo;forgot Wi-Fi password,&amp;rdquo; or &amp;ldquo;admin login failure,&amp;rdquo; then pass related materials back to the model.&lt;/p&gt;
&lt;h2 id=&#34;rag-search-first-then-answer&#34;&gt;RAG: Search First, Then Answer
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;RAG&lt;/code&gt; means &lt;code&gt;Retrieval-Augmented Generation&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In plain language, before the model answers, it first searches a knowledge base and then answers with those materials.&lt;/p&gt;
&lt;p&gt;It addresses the problem that large models may make things up from memory. By connecting enterprise documents, knowledge bases, product manuals, or code snippets, the model can refer to your latest materials instead of relying only on training memory.&lt;/p&gt;
&lt;p&gt;A typical flow is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The user asks a question.&lt;/li&gt;
&lt;li&gt;The system turns the question into an &lt;code&gt;Embedding&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;It searches related documents in a &lt;code&gt;Vector DB&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;It sends the document snippets and the question to an &lt;code&gt;LLM&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The model generates an answer.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So RAG sounds advanced, but the essence is: look up the materials first, then organize the answer.&lt;/p&gt;
&lt;h2 id=&#34;agent-an-automated-flow-that-can-break-down-tasks&#34;&gt;Agent: An Automated Flow That Can Break Down Tasks
&lt;/h2&gt;&lt;p&gt;In AI contexts, &lt;code&gt;Agent&lt;/code&gt; often means an intelligent agent.&lt;/p&gt;
&lt;p&gt;In plain language, it does not just answer one message. It can break a goal into steps, call tools, observe results, and decide the next action.&lt;/p&gt;
&lt;p&gt;For example, if you ask it to analyze why tests fail in a repository, a regular chat model may only give suggestions. An Agent may read files, run tests, inspect errors, edit code, and run tests again.&lt;/p&gt;
&lt;p&gt;Of course, Agent does not mean guaranteed reliability. It is essentially &amp;ldquo;model + tool calling + state loop.&amp;rdquo; Whether it works well depends on tool permissions, task boundaries, error handling, and human confirmation.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;Many computer terms sound impressive because they are wrapped in acronyms, architecture diagrams, and product copy. Once unpacked, many describe very simple actions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;TTS&lt;/code&gt;: read text aloud.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;STT&lt;/code&gt;: write down speech.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;OCR&lt;/code&gt;: copy text out of images.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;API&lt;/code&gt;: expose a calling entry point.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SDK&lt;/code&gt;: package calling tools.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CRUD&lt;/code&gt;: create, read, update, delete.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Cache&lt;/code&gt;: keep a copy of common results.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Queue&lt;/code&gt;: line tasks up for later processing.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Index&lt;/code&gt;: add a table of contents to data.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CDN&lt;/code&gt;: put content closer to users.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Load Balancing&lt;/code&gt;: distribute requests.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Docker&lt;/code&gt;: package the runtime environment.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CI/CD&lt;/code&gt;: automate testing and deployment.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Embedding&lt;/code&gt;: turn content into numeric vectors.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RAG&lt;/code&gt;: search first, then answer.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Agent&lt;/code&gt;: let a model use tools step by step.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The terms should be preserved because they make searching, communication, and documentation easier. But you do not need to be intimidated by them. Translate them into plain language first, then return to the technical details; many concepts become much clearer.&lt;/p&gt;
&lt;h2 id=&#34;reference&#34;&gt;Reference
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Zhihu answer: &lt;a class=&#34;link&#34; href=&#34;https://www.zhihu.com/question/267978646/answer/2035405228460201515&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://www.zhihu.com/question/267978646/answer/2035405228460201515&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Gemini Embedding 2: Putting Text, Images, Video, and Audio in One Vector Space</title>
        <link>https://knightli.com/en/2026/05/04/gemini-embedding-2-multimodal-rag/</link>
        <pubDate>Mon, 04 May 2026 06:01:10 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/04/gemini-embedding-2-multimodal-rag/</guid>
        <description>&lt;p&gt;Google Developers Blog introduced how to build with Gemini Embedding 2. The model is now generally available through the Gemini API and Gemini Enterprise Agent Platform. The important point is not simply that it is a new embedding model, but that it maps text, images, video, audio, and documents into the same semantic space.&lt;/p&gt;
&lt;p&gt;This broadens what retrieval systems can handle. Many RAG pipelines previously had to convert images, video, or audio into text or metadata before indexing them separately. Gemini Embedding 2 can process multimodal inputs directly, making it easier for agents, search systems, and classifiers to work with real business materials.&lt;/p&gt;
&lt;p&gt;Original article: &lt;a class=&#34;link&#34; href=&#34;https://developers.googleblog.com/building-with-gemini-embedding-2/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Building with Gemini Embedding 2: Agentic multimodal RAG and beyond&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;model-capabilities&#34;&gt;Model Capabilities
&lt;/h2&gt;&lt;p&gt;Gemini Embedding 2 supports more than 100 languages. A single request can process:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Up to 8,192 text tokens&lt;/li&gt;
&lt;li&gt;Up to 6 images&lt;/li&gt;
&lt;li&gt;Up to 120 seconds of video&lt;/li&gt;
&lt;li&gt;Up to 180 seconds of audio&lt;/li&gt;
&lt;li&gt;Up to 6 pages of PDF&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Its key idea is a unified semantic space. Developers can place content from different modalities into one vector representation system, then use the same retrieval, clustering, or reranking logic to process it.&lt;/p&gt;
&lt;p&gt;For example, a text description and an image can be included in the same embedding request:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;google&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;genai&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;google.genai&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;types&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;client&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;genai&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Client&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;open&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;dog.png&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;rb&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;image_bytes&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;read&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;result&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;client&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;models&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;embed_content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;gemini-embedding-2&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;s2&#34;&gt;&amp;#34;An image of a dog&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;types&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Part&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_bytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image_bytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;mime_type&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;image/png&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;result&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;embeddings&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want a separate embedding for each input rather than one aggregated vector, you can use the Batch API. The original article also notes that Agent Platform support for this kind of batch workflow is still in progress.&lt;/p&gt;
&lt;h2 id=&#34;what-it-means-for-rag&#34;&gt;What It Means for RAG
&lt;/h2&gt;&lt;p&gt;Multimodal embeddings are useful for agentic RAG. An AI agent may need to inspect a code repository, PDFs, screenshots, charts, audio meeting notes, and product images at the same time. If all of these materials can enter the same semantic space, the retrieval pipeline no longer needs a separate entry point for every format.&lt;/p&gt;
&lt;p&gt;Google recommends using task prefixes according to the goal of the task, so the embeddings better match the retrieval objective. For example, question answering, fact checking, code retrieval, and search results can use different prefixes:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Generate embedding for your task&amp;#39;s query:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;prepare_query&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;query&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;task: question answering | query: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;content&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#34;task: fact checking | query: {content}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#34;task: code retrieval | query: {content}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#34;task: search result | query: {content}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Generate embedding for document of an asymmetric retrieval task:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;prepare_document&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;title&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;title&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;is&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;title&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;none&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;title: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;title&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt; | text: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;content&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This kind of prefix is suitable for asymmetric retrieval: user queries are often short, while documents are often long. Formatting &lt;code&gt;query&lt;/code&gt; and &lt;code&gt;document&lt;/code&gt; differently for the task can improve matching between short queries and long documents.&lt;/p&gt;
&lt;p&gt;The original article gives two real-world examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Harvey saw a 3% increase in Recall@20 precision on legal retrieval benchmarks compared with its previous embeddings.&lt;/li&gt;
&lt;li&gt;Supermemory saw a 40% increase in Recall@1 search accuracy and uses it across memory, indexing, search, and Q&amp;amp;A pipelines.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These numbers do not mean every scenario will improve by the same amount, but they show that multimodal embeddings are already producing results in real retrieval products, not only demos.&lt;/p&gt;
&lt;h2 id=&#34;visual-search&#34;&gt;Visual Search
&lt;/h2&gt;&lt;p&gt;Gemini Embedding 2 is also suitable for image-to-image search, image-text hybrid search, and product identification. The original article mentions Nuuly, URBN&amp;rsquo;s clothing rental company, using it to match photos of untagged garments in warehouses against its catalog. Match@20 improved from 60% to nearly 87%, and the overall successful identification rate rose from 74% to over 90%.&lt;/p&gt;
&lt;p&gt;The point in this type of scenario is not content generation, but understanding which inventory item, document, or product record is closest to a given image. If your business has many images, video clips, or scanned documents, multimodal embeddings can be more natural than text-only indexing.&lt;/p&gt;
&lt;h2 id=&#34;search-reranking&#34;&gt;Search Reranking
&lt;/h2&gt;&lt;p&gt;Embeddings can also be used for reranking. A common approach is to first retrieve a set of candidate results, then calculate the similarity between each candidate and the user&amp;rsquo;s query, pushing more relevant content to the top:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# 1. Define a function to calculate the dot product (cosine similarity)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;dot_product&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;a&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ndarray&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;b&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ndarray&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;array&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;a&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;@&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;array&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;b&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# 2. Retrieve your embeddings&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# (Assuming &amp;#39;summaries&amp;#39; is your list of search results)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;search_res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;get_embeddings&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;summaries&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;embedded_query&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;get_embeddings&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;([&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;query&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# 3. Calculate similarity scores&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;sim_value&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;dot_product&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;search_res&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;embedded_query&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# 4. Select the most relevant result&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;best_match_index&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;argmax&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sim_value&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The original article also mentions another idea: first ask the model to generate a baseline hypothetical answer from its internal knowledge, embed that answer, and compare it with candidate content to find the most semantically relevant result. This is especially useful for Q&amp;amp;A-style RAG.&lt;/p&gt;
&lt;h2 id=&#34;clustering-classification-and-anomaly-detection&#34;&gt;Clustering, Classification, and Anomaly Detection
&lt;/h2&gt;&lt;p&gt;Beyond retrieval, embeddings are also useful for clustering, classification, and anomaly detection. Unlike the asymmetric question-answering retrieval above, these are symmetric tasks, where the same task prefix can be used for both query and document:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Generate embedding for query &amp;amp; document of your task.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;prepare_query_and_document&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#39;task: clustering | query: {content}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#39;task: sentence similarity | query: {content}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#39;task: classification | query: {content}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;These tasks can be used for sentiment classification, content moderation, similar asset grouping, and anomaly discovery. They can also help agents organize large amounts of context before moving into later reasoning steps.&lt;/p&gt;
&lt;h2 id=&#34;storage-and-cost&#34;&gt;Storage and Cost
&lt;/h2&gt;&lt;p&gt;Gemini Embedding 2 outputs 3,072-dimensional vectors by default. It uses Matryoshka Representation Learning, so vectors can be truncated to smaller dimensions with &lt;code&gt;output_dimensionality&lt;/code&gt;. Google recommends 1,536 or 768 dimensions when efficiency is the priority:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;result&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;client&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;models&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;embed_content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;gemini-embedding-2&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;What is the meaning of life?&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;config&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;output_dimensionality&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;768&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Vectors can be stored in Agent Platform Vector Search, Pinecone, Weaviate, Qdrant, ChromaDB, and similar systems. For cost, the original article notes that the Batch API provides higher throughput and reaches 50% of the default embedding price.&lt;/p&gt;
&lt;h2 id=&#34;how-developers-can-use-it&#34;&gt;How Developers Can Use It
&lt;/h2&gt;&lt;p&gt;If you already have text-based RAG, you can start with two incremental upgrades:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Put PDFs, screenshots, image descriptions, and text documents into the same index, then test whether retrieval recall becomes more stable.&lt;/li&gt;
&lt;li&gt;Add task prefixes for different tasks, such as question answering, fact checking, code retrieval, and product search. Do not process all content with the same embedding format.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you are building a new product, consider these directions first:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enterprise knowledge bases: retrieve documents, charts, presentation screenshots, and meeting materials together.&lt;/li&gt;
&lt;li&gt;Visual search: use images, text, or mixed inputs to find products, assets, design drafts, and archives.&lt;/li&gt;
&lt;li&gt;Agent toolchains: let coding agents, research agents, or customer support agents retrieve business materials in multiple formats.&lt;/li&gt;
&lt;li&gt;Content governance: classify, cluster, and detect anomalies across text, images, and video clips.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The value of Gemini Embedding 2 is that it turns multimodal materials into one searchable asset system. For developers, this reduces the need for an intermediate &amp;ldquo;convert to text, then retrieve&amp;rdquo; layer and makes RAG systems closer to the shape of real-world data.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>How to Choose Common Embedding Models: OpenAI vs BGE vs E5 vs GTE vs Jina</title>
        <link>https://knightli.com/en/2026/04/23/compare-openai-bge-e5-gte-jina-embedding-models/</link>
        <pubDate>Thu, 23 Apr 2026 15:23:47 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/23/compare-openai-bge-e5-gte-jina-embedding-models/</guid>
        <description>&lt;p&gt;When people start building RAG systems, semantic search, or knowledge base retrieval, they often get stuck on the same question: there are so many embedding models, so which one should you choose?&lt;/p&gt;
&lt;p&gt;Common options can roughly be split into two groups. One group is general-purpose text embeddings that cover Chinese, English, and multilingual tasks. The other group is better suited to Chinese scenarios, especially Chinese retrieval, Chinese QA, and Chinese knowledge bases.&lt;/p&gt;
&lt;p&gt;If you want the short version first, this is a practical way to think about it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you want the easiest path and prefer using an API directly: &lt;code&gt;text-embedding-3-small&lt;/code&gt; or &lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If you want Chinese retrieval and prefer open-source models you can self-host: &lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;, &lt;code&gt;bge-m3&lt;/code&gt;, &lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If you need multilingual support: &lt;code&gt;multilingual-e5-base&lt;/code&gt;, &lt;code&gt;multilingual-e5-large&lt;/code&gt;, &lt;code&gt;jina-embeddings-v3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If you want to keep costs down in Chinese scenarios: &lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;, &lt;code&gt;gte-base-zh&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;1-first-look-at-them-by-category&#34;&gt;1. First, Look at Them by Category
&lt;/h2&gt;&lt;h3 id=&#34;1-openai-series&#34;&gt;1. OpenAI Series
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The main strengths of these models are simplicity and stability. They are a good fit if you want to call an API directly for retrieval, RAG, classification, and similarity matching. Their advantage is not that they dominate one specific Chinese leaderboard, but that the overall experience is complete: low integration cost, stable quality, and low engineering overhead.&lt;/p&gt;
&lt;p&gt;If your team does not want to host models or maintain inference services, OpenAI is usually the most time-saving option.&lt;/p&gt;
&lt;h3 id=&#34;2-bge-series&#34;&gt;2. BGE Series
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;BAAI/bge-small-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;BAAI/bge-base-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;BGE is one of the most common families used in Chinese retrieval. &lt;code&gt;bge-small-zh-v1.5&lt;/code&gt; and &lt;code&gt;bge-base-zh-v1.5&lt;/code&gt; lean more toward Chinese monolingual tasks, making them suitable for Chinese semantic search, knowledge base retrieval, and FAQ matching. &lt;code&gt;bge-m3&lt;/code&gt; is more general-purpose and can cover multilingual, multi-granularity, and more complex retrieval scenarios.&lt;/p&gt;
&lt;p&gt;If most of your data is Chinese text, BGE is often one of the easiest families to put on the shortlist.&lt;/p&gt;
&lt;h3 id=&#34;3-e5-series&#34;&gt;3. E5 Series
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;intfloat/multilingual-e5-base&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-large&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The strength of the E5 family is more balanced multilingual capability. It works well for mixed Chinese-English data, cross-lingual retrieval, and internationalized content libraries. It is not focused only on Chinese. Instead, it is built around the idea that different languages can live inside one unified retrieval system.&lt;/p&gt;
&lt;p&gt;If your corpus is not purely Chinese, but a mix of Chinese, English, Japanese, or even more languages, E5 is usually more reliable than a Chinese-only model.&lt;/p&gt;
&lt;h3 id=&#34;4-gte-series&#34;&gt;4. GTE Series
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Alibaba-NLP/gte-base-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;GTE is also common in Chinese tasks. Its positioning is similar to BGE: both are practical choices for Chinese retrieval. GTE is usually seen as balanced and easy to use, without much complexity in deployment. It works well for Chinese knowledge bases, site search, and enterprise internal document retrieval.&lt;/p&gt;
&lt;p&gt;If you want one more open-source Chinese model family for side-by-side evaluation, GTE is well worth testing.&lt;/p&gt;
&lt;h3 id=&#34;5-jina-embeddings&#34;&gt;5. Jina Embeddings
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;jina-embeddings-v3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Jina is more oriented toward general-purpose and modern engineering scenarios, and often appears in multilingual retrieval, long-text processing, and web content tasks. It is frequently mentioned in discussions around using a single model to cover more task types, so it is a good fit for teams that want one unified embedding layer.&lt;/p&gt;
&lt;p&gt;If your content sources are mixed, such as webpages, documents, and multilingual text, Jina is often a strong candidate to test.&lt;/p&gt;
&lt;h2 id=&#34;2-which-models-are-most-common-in-chinese-scenarios&#34;&gt;2. Which Models Are Most Common in Chinese Scenarios
&lt;/h2&gt;&lt;p&gt;If we narrow the scope to Chinese use cases, the usual candidates are basically these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-base-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-large&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Among them, the most useful split is not really &amp;ldquo;which one is absolutely better,&amp;rdquo; but these three questions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Is your data primarily Chinese?&lt;/li&gt;
&lt;li&gt;Do you need multilingual support?&lt;/li&gt;
&lt;li&gt;Do you care more about quality, cost, or deployment convenience?&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;3-put-these-models-side-by-side&#34;&gt;3. Put These Models Side by Side
&lt;/h2&gt;&lt;h3 id=&#34;1-if-you-only-care-about-chinese-performance&#34;&gt;1. If You Only Care About Chinese Performance
&lt;/h3&gt;&lt;p&gt;For pure Chinese knowledge bases, Chinese QA, and Chinese document retrieval, BGE and GTE are usually the first families to check.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;: lighter and better for cost-sensitive scenarios&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;: usually one of the most balanced options for Chinese use cases&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-base-zh&lt;/code&gt;: similar to lightweight BGE and good for building a baseline first&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;: better when retrieval quality matters more&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt;: suitable if you want to evaluate Chinese retrieval together with broader capabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your corpus is almost entirely Chinese, E5 can still work, but it often will not be the first priority.&lt;/p&gt;
&lt;h3 id=&#34;2-if-you-need-multilingual-support&#34;&gt;2. If You Need Multilingual Support
&lt;/h3&gt;&lt;p&gt;The priorities change quite a bit here.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt; and &lt;code&gt;multilingual-e5-large&lt;/code&gt; are better suited to unified multilingual retrieval&lt;/li&gt;
&lt;li&gt;&lt;code&gt;jina-embeddings-v3&lt;/code&gt; also fits multilingual and general text tasks&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt; is better than traditional Chinese-only models when you want to expand into multilingual usage&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt; and &lt;code&gt;text-embedding-3-large&lt;/code&gt; are good if you want the simplest API-based route&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your corpus contains Chinese, English, product documentation, website copy, and user questions at the same time, multilingual models can save you a lot of future migration work.&lt;/p&gt;
&lt;h3 id=&#34;3-if-you-need-to-control-inference-and-storage-cost&#34;&gt;3. If You Need to Control Inference and Storage Cost
&lt;/h3&gt;&lt;p&gt;Lightweight models have the advantage here.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-base-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These models are usually a better fit when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You have a large document volume&lt;/li&gt;
&lt;li&gt;Data is updated frequently&lt;/li&gt;
&lt;li&gt;You need batch vectorization&lt;/li&gt;
&lt;li&gt;You are sensitive to latency and cost&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your dataset is large, embedding dimensions, inference speed, and index size will all directly affect total cost. That is why starting with a smaller model as a baseline is often the safer choice.&lt;/p&gt;
&lt;h3 id=&#34;4-if-you-want-the-highest-ceiling-first&#34;&gt;4. If You Want the Highest Ceiling First
&lt;/h3&gt;&lt;p&gt;Larger models are usually better suited to complex retrieval or higher-quality recall, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-large&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But one thing to keep in mind is that a larger model does not automatically lead to a better production experience. In many projects, the real bottleneck is not the model itself, but chunking strategy, recall count, reranking, data cleaning, and evaluation design.&lt;/p&gt;
&lt;h2 id=&#34;4-what-each-model-is-better-at&#34;&gt;4. What Each Model Is Better At
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Model&lt;/th&gt;
          &lt;th&gt;Better suited for&lt;/th&gt;
          &lt;th&gt;Quick judgment&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;General retrieval, RAG, fast integration&lt;/td&gt;
          &lt;td&gt;Simple API usage and cost-friendly&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;General retrieval where quality matters more&lt;/td&gt;
          &lt;td&gt;Quality-first and lowest engineering burden&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Lightweight Chinese retrieval&lt;/td&gt;
          &lt;td&gt;A common entry-level Chinese option&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Chinese knowledge bases, FAQ, semantic search&lt;/td&gt;
          &lt;td&gt;Very balanced in Chinese scenarios&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;bge-m3&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Chinese-focused setups that also need more complex retrieval&lt;/td&gt;
          &lt;td&gt;More extensible&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Foundational multilingual retrieval&lt;/td&gt;
          &lt;td&gt;Common in international products&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;multilingual-e5-large&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;High-quality multilingual recall&lt;/td&gt;
          &lt;td&gt;More quality-oriented&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;gte-base-zh&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Lightweight Chinese retrieval&lt;/td&gt;
          &lt;td&gt;Good for building a baseline&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Chinese scenarios that prioritize quality&lt;/td&gt;
          &lt;td&gt;A good comparison point against BGE&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;jina-embeddings-v3&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Multilingual, web, and general text tasks&lt;/td&gt;
          &lt;td&gt;Worth testing if you want one unified embedding layer&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;5-a-practical-way-to-make-the-choice&#34;&gt;5. A Practical Way to Make the Choice
&lt;/h2&gt;&lt;p&gt;If you are trying to ship a system rather than write a paper, you can keep the decision process simple.&lt;/p&gt;
&lt;h3 id=&#34;scenario-1-chinese-knowledge-base&#34;&gt;Scenario 1: Chinese Knowledge Base
&lt;/h3&gt;&lt;p&gt;Start with these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If budget is tight, start from the smaller model first. If retrieval quality matters more, then move upward to larger models.&lt;/p&gt;
&lt;h3 id=&#34;scenario-2-mixed-chinese-english-knowledge-base&#34;&gt;Scenario 2: Mixed Chinese-English Knowledge Base
&lt;/h3&gt;&lt;p&gt;Start with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-large&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you do not want to self-host, OpenAI is the more direct option. If you want to host the model yourself, E5 is the more common path.&lt;/p&gt;
&lt;h3 id=&#34;scenario-3-mostly-chinese-now-but-possibly-multilingual-later&#34;&gt;Scenario 3: Mostly Chinese Now, but Possibly Multilingual Later
&lt;/h3&gt;&lt;p&gt;Start with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;jina-embeddings-v3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The biggest risk in this kind of setup is optimizing only for Chinese at the beginning and then having to rebuild the whole vector database later.&lt;/p&gt;
&lt;h2 id=&#34;6-in-the-end-the-key-is-not-top-of-the-leaderboard&#34;&gt;6. In the End, the Key Is Not &amp;ldquo;Top of the Leaderboard&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;The easiest mistake in embedding model selection is to look only at public benchmark scores and then ship directly to production.&lt;/p&gt;
&lt;p&gt;A more reliable process is usually:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Pick 2 to 4 candidate models first&lt;/li&gt;
&lt;li&gt;Run embeddings on your own real data&lt;/li&gt;
&lt;li&gt;Evaluate one round of retrieval performance&lt;/li&gt;
&lt;li&gt;Then make the final decision based on cost, latency, and deployment style&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Because in practice, what really determines the result is often not the model name itself, but whether the model matches your corpus, chunking strategy, and query patterns.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;If you only want one practical summary to remember, use this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chinese-first: start with &lt;code&gt;bge-base-zh-v1.5&lt;/code&gt; and &lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Cost-first: start with &lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;, &lt;code&gt;gte-base-zh&lt;/code&gt;, and &lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Multilingual-first: start with &lt;code&gt;multilingual-e5-base&lt;/code&gt;, &lt;code&gt;multilingual-e5-large&lt;/code&gt;, and &lt;code&gt;jina-embeddings-v3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;API-first: start with &lt;code&gt;text-embedding-3-small&lt;/code&gt; and &lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If you want Chinese now and flexibility later: start with &lt;code&gt;bge-m3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There is no single model that fits every project, but for most projects, you can quickly narrow down the first batch of candidates from these few groups.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>AI Terms Explained: Agent, MCP, RAG, and Token in Plain Language</title>
        <link>https://knightli.com/en/2026/04/23/ai-terms-agent-mcp-rag-token-explained/</link>
        <pubDate>Thu, 23 Apr 2026 13:13:40 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/23/ai-terms-agent-mcp-rag-token-explained/</guid>
        <description>&lt;p&gt;When people first get into AI, what pushes them away is often not the models themselves, but the long list of terms that keeps showing up in every discussion. &lt;code&gt;Agent&lt;/code&gt;, &lt;code&gt;MCP&lt;/code&gt;, &lt;code&gt;RAG&lt;/code&gt;, &lt;code&gt;AIGC&lt;/code&gt;, and &lt;code&gt;Token&lt;/code&gt; all look familiar, but without a simple explanation, many people only recognize the words without really understanding them.&lt;/p&gt;
&lt;p&gt;This article follows a common beginner-friendly line of explanation and condenses 10 high-frequency AI terms into a set of meanings that is easier to remember. The goal is not to sound academic. It is to help you build a basic mental model that lets you follow everyday AI conversations.&lt;/p&gt;
&lt;h2 id=&#34;10-common-ai-terms-and-what-they-mean&#34;&gt;10 common AI terms and what they mean
&lt;/h2&gt;&lt;h3 id=&#34;1-agent-an-ai-that-does-more-than-chat&#34;&gt;1. Agent: an AI that does more than chat
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;Agent&lt;/code&gt; can be understood as an AI assistant that actually gets work done.&lt;/p&gt;
&lt;p&gt;A normal chatbot usually works in a simple question-and-answer pattern. An &lt;code&gt;Agent&lt;/code&gt; goes a step further. It can break a task into steps, arrange a process, call tools, and return a finished result. If you ask it to organize materials, look something up, or generate a document, it may do more than give advice. It may actually chain those actions together and complete them.&lt;/p&gt;
&lt;p&gt;That is why the key point of an &lt;code&gt;Agent&lt;/code&gt; is not whether it can talk, but whether it can act.&lt;/p&gt;
&lt;h3 id=&#34;2-openclaw-an-ai-assistant-that-stays-on-your-computer&#34;&gt;2. OpenClaw: an AI assistant that stays on your computer
&lt;/h3&gt;&lt;p&gt;Here, &lt;code&gt;OpenClaw&lt;/code&gt; is described as a kind of AI assistant that lives on your computer.&lt;/p&gt;
&lt;p&gt;You can think of this type of tool as a more desktop-oriented AI helper. It does not only receive text. It may also observe the interface, call local tools, and execute tasks step by step. Compared with a normal web chat interface, this kind of tool emphasizes operational ability much more.&lt;/p&gt;
&lt;p&gt;If &lt;code&gt;Agent&lt;/code&gt; is the abstract idea of an execution-oriented AI, this kind of desktop assistant is a more concrete personal-computer version of that idea.&lt;/p&gt;
&lt;h3 id=&#34;3-skills-capability-packs-added-to-an-agent&#34;&gt;3. Skills: capability packs added to an Agent
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;Skills&lt;/code&gt; can be understood as functional modules or operating instructions for an &lt;code&gt;Agent&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The same &lt;code&gt;Agent&lt;/code&gt; can behave very differently depending on which &lt;code&gt;Skills&lt;/code&gt; it has. Some may focus on copywriting, some on data organization, and some on code-related work. They are a bit like apps on a phone, and a bit like reusable workflows.&lt;/p&gt;
&lt;p&gt;So in many cases, it is not that the model suddenly became smarter. It is that a clearer set of rules, tools, and steps was added behind it.&lt;/p&gt;
&lt;h3 id=&#34;4-mcp-a-unified-way-for-ai-to-connect-to-tools&#34;&gt;4. MCP: a unified way for AI to connect to tools
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;MCP&lt;/code&gt; stands for &lt;code&gt;Model Context Protocol&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In everyday terms, it is a bit like a &lt;code&gt;Type-C&lt;/code&gt; connector for the AI world. In the past, connecting a model to different tools often meant building separate integrations one by one. With a unified protocol, the way those tools connect becomes more standardized and easier to reuse.&lt;/p&gt;
&lt;p&gt;For most users, the most important thing to remember is this: &lt;code&gt;MCP&lt;/code&gt; is not about whether a model can answer a question. It is about how a model can connect to external tools and resources in a safe and stable way.&lt;/p&gt;
&lt;h3 id=&#34;5-gacha-ai-output-is-inherently-random&#34;&gt;5. Gacha: AI output is inherently random
&lt;/h3&gt;&lt;p&gt;The term &amp;ldquo;gacha&amp;rdquo; often appears in &lt;code&gt;AI&lt;/code&gt; image generation, video generation, and creative work.&lt;/p&gt;
&lt;p&gt;The idea is simple. Even with the same prompt and the same general direction, the result can still be different each time. Sometimes the output is great. Sometimes it falls apart. That is why people compare repeated generation attempts to pulling gacha in a game.&lt;/p&gt;
&lt;p&gt;What this really reminds us is that AI generation is not a fixed formula. It is a probabilistic process with variation.&lt;/p&gt;
&lt;h3 id=&#34;6-api-the-connection-between-an-app-and-a-model&#34;&gt;6. API: the connection between an app and a model
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;API&lt;/code&gt; stands for &lt;code&gt;Application Programming Interface&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;You can think of it as the standard entry point through which programs communicate. When you call a model service from your own app, script, or editor, you are essentially using an &lt;code&gt;API&lt;/code&gt; to send a request and receive a result.&lt;/p&gt;
&lt;p&gt;If you compare a model service to a restaurant, then:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the menu is like the &lt;code&gt;API&lt;/code&gt; documentation&lt;/li&gt;
&lt;li&gt;placing an order is like making an &lt;code&gt;API&lt;/code&gt; request&lt;/li&gt;
&lt;li&gt;the kitchen sending back the dish is like the model returning a result&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That is why many tools may look different on the surface while still calling some form of &lt;code&gt;API&lt;/code&gt; underneath.&lt;/p&gt;
&lt;h3 id=&#34;7-multimodality-ai-handles-more-than-text&#34;&gt;7. Multimodality: AI handles more than text
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;Multimodality&lt;/code&gt; means AI no longer only reads and writes text. It can process multiple kinds of input and output.&lt;/p&gt;
&lt;p&gt;For example, it may be able to read images, understand voice, interpret video, generate pictures, or even support real-time voice and video interaction. Compared with early text-only models, multimodal models are much closer to having the combined abilities to see, hear, speak, and write.&lt;/p&gt;
&lt;p&gt;That is also why many AI products are no longer centered around a single text box.&lt;/p&gt;
&lt;h3 id=&#34;8-rag-retrieve-information-first-then-generate-an-answer&#34;&gt;8. RAG: retrieve information first, then generate an answer
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;RAG&lt;/code&gt; stands for &lt;code&gt;Retrieval-Augmented Generation&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;It is useful for solving a practical problem: a model&amp;rsquo;s training data has a time boundary, and it does not automatically know your company&amp;rsquo;s newest documents, customer-service records, or business rules. The idea behind &lt;code&gt;RAG&lt;/code&gt; is to retrieve relevant material from specified sources first, and then generate an answer based on that material.&lt;/p&gt;
&lt;p&gt;Its value usually shows up in three ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;answers are more likely to stay close to real source material&lt;/li&gt;
&lt;li&gt;you can trace where the answer came from&lt;/li&gt;
&lt;li&gt;new documents can be added and reflected quickly&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That is why many enterprise knowledge bases, AI customer-service systems, and internal Q&amp;amp;A tools rely on &lt;code&gt;RAG&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;9-aigc-the-general-term-for-ai-generated-content&#34;&gt;9. AIGC: the general term for AI-generated content
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;AIGC&lt;/code&gt; stands for &lt;code&gt;AI Generated Content&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;It is not a single tool. It is a broad label for content produced by AI, including text, images, audio, video, and more. AI writing, AI illustration, AI short-form video generation, and AI voice synthesis all fit under the umbrella of &lt;code&gt;AIGC&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;What matters most about this term is that it describes a way of producing content, not one specific model.&lt;/p&gt;
&lt;h3 id=&#34;10-token-the-unit-used-to-measure-model-processing&#34;&gt;10. Token: the unit used to measure model processing
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;Token&lt;/code&gt; can be understood as the basic unit a model uses to process text.&lt;/p&gt;
&lt;p&gt;It is not exactly the same as one character or one word, but in practice, you can treat it as the common unit used for model computation and billing. Your input consumes &lt;code&gt;Token&lt;/code&gt;, the model&amp;rsquo;s output consumes &lt;code&gt;Token&lt;/code&gt;, and the context kept in memory also takes up &lt;code&gt;Token&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;That is why model services keep talking about context length, cost control, and prompt compression. At the core, all of those topics are tied to &lt;code&gt;Token&lt;/code&gt;.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>RAGFlow Project Notes: Features and Usage of an Open-Source RAG Engine</title>
        <link>https://knightli.com/en/2026/04/15/ragflow-rag-engine-guide/</link>
        <pubDate>Wed, 15 Apr 2026 22:09:25 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/15/ragflow-rag-engine-guide/</guid>
        <description>&lt;p&gt;&lt;code&gt;RAGFlow&lt;/code&gt; is an open-source RAG engine from &lt;code&gt;infiniflow&lt;/code&gt;. Its goal is not merely to provide a thin “upload documents and ask questions” shell, but to bring document parsing, chunking, retrieval, reranking, citation tracing, model configuration, agent capabilities, and API integration into one complete workflow.&lt;/p&gt;
&lt;p&gt;If you are building an enterprise knowledge base, document Q&amp;amp;A, a support assistant, internal information retrieval, or you want to give an LLM a more reliable context layer, RAGFlow is one of the open-source options worth serious attention.&lt;/p&gt;
&lt;h2 id=&#34;01-what-problem-ragflow-solves&#34;&gt;01 What Problem RAGFlow Solves
&lt;/h2&gt;&lt;p&gt;Most RAG systems run into three common issues:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Document parsing is unstable, especially for PDFs, scanned files, tables, images, and complex layouts.&lt;/li&gt;
&lt;li&gt;Chunking strategy is opaque, so retrieval may look correct while the actual context is incomplete.&lt;/li&gt;
&lt;li&gt;Answers lack trustworthy citations, making it hard for users to verify where the response came from.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;RAGFlow focuses on exactly these problems. The project README emphasizes &lt;code&gt;Deep document understanding&lt;/code&gt;, template-based chunking, chunk visualization, citation grounding, and multi-path retrieval with reranking. In other words, it cares more about “high-quality input leads to high-quality answers” than simply wiring a vector database to a chat UI.&lt;/p&gt;
&lt;h2 id=&#34;02-core-features&#34;&gt;02 Core Features
&lt;/h2&gt;&lt;h3 id=&#34;1-deep-document-understanding&#34;&gt;1. Deep Document Understanding
&lt;/h3&gt;&lt;p&gt;RAGFlow can extract knowledge from complex unstructured data. The README lists formats such as Word, PPT, Excel, TXT, images, scanned documents, structured data, and web pages.&lt;/p&gt;
&lt;p&gt;This matters a lot for enterprise knowledge bases. Real-world material is rarely clean Markdown. It is usually a mix of contracts, reports, tables, scanned PDFs, product manuals, screenshots, and web content. If parsing quality is weak, retrieval and LLM answers will both suffer.&lt;/p&gt;
&lt;h3 id=&#34;2-template-based-chunking&#34;&gt;2. Template-Based Chunking
&lt;/h3&gt;&lt;p&gt;RAGFlow provides template-based chunking. The value here is that chunking is not a black box; different document types can use different strategies.&lt;/p&gt;
&lt;p&gt;For example, articles, papers, tables, Q&amp;amp;A documents, image explanations, and contract clauses all need different chunk boundaries and granularity. Template-based chunking helps reduce problems like broken sentences, lost table context, and separated headings and body text.&lt;/p&gt;
&lt;h3 id=&#34;3-traceable-citations&#34;&gt;3. Traceable Citations
&lt;/h3&gt;&lt;p&gt;RAGFlow emphasizes grounded citations, meaning answers can be traced back to source passages. It also offers chunk visualization, making it easier for people to inspect and adjust parsing and chunking results.&lt;/p&gt;
&lt;p&gt;This is especially important in production. Internal enterprise Q&amp;amp;A is not only about producing something that “looks right”; it also has to be verifiable. For policy, compliance, finance, technical documents, and customer support content, citations and traceability are close to mandatory.&lt;/p&gt;
&lt;h3 id=&#34;4-automated-rag-workflow&#34;&gt;4. Automated RAG Workflow
&lt;/h3&gt;&lt;p&gt;RAGFlow turns the RAG lifecycle into a more complete workflow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create a knowledge base&lt;/li&gt;
&lt;li&gt;Upload or sync data&lt;/li&gt;
&lt;li&gt;Parse documents&lt;/li&gt;
&lt;li&gt;Review and adjust chunks&lt;/li&gt;
&lt;li&gt;Configure LLM and embedding models&lt;/li&gt;
&lt;li&gt;Run multi-path retrieval and reranking&lt;/li&gt;
&lt;li&gt;Build chat assistants&lt;/li&gt;
&lt;li&gt;Integrate through APIs into business systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That makes it closer to a RAG platform than a single library. For teams, both the UI and the API matter: non-engineers can maintain the knowledge base, while engineers can integrate the capability into existing systems.&lt;/p&gt;
&lt;h3 id=&#34;5-agent-mcp-and-workflow-extensions&#34;&gt;5. Agent, MCP, and Workflow Extensions
&lt;/h3&gt;&lt;p&gt;Recent RAGFlow updates already include Agentic workflow, MCP, Agent Memory, and code execution components. That suggests it is no longer limited to traditional knowledge-base Q&amp;amp;A and is also moving toward agent-oriented scenarios.&lt;/p&gt;
&lt;p&gt;A typical pattern is that an agent can use RAGFlow as a reliable enterprise knowledge layer: retrieve from the knowledge base when it needs context, generate answers with citations, and combine that with tools or workflow steps when necessary.&lt;/p&gt;
&lt;h2 id=&#34;03-basic-usage-flow&#34;&gt;03 Basic Usage Flow
&lt;/h2&gt;&lt;p&gt;According to the official quickstart documentation, the common usage path for RAGFlow can be summarized in the following steps.&lt;/p&gt;
&lt;h3 id=&#34;1-prepare-the-environment&#34;&gt;1. Prepare the Environment
&lt;/h3&gt;&lt;p&gt;The basic requirements listed in the official README are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU &amp;gt;= 4 cores&lt;/li&gt;
&lt;li&gt;RAM &amp;gt;= 16 GB&lt;/li&gt;
&lt;li&gt;Disk &amp;gt;= 50 GB&lt;/li&gt;
&lt;li&gt;Docker &amp;gt;= 24.0.0&lt;/li&gt;
&lt;li&gt;Docker Compose &amp;gt;= v2.26.1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want to use the sandbox for the code executor, you also need &lt;code&gt;gVisor&lt;/code&gt;. Another practical note is that the official Docker images mainly target x86 platforms. For ARM64, the project documentation recommends building the image yourself.&lt;/p&gt;
&lt;h3 id=&#34;2-clone-the-project&#34;&gt;2. Clone the Project
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone https://github.com/infiniflow/ragflow.git
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; ragflow/docker
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;3-check-vmmax_map_count&#34;&gt;3. Check &lt;code&gt;vm.max_map_count&lt;/code&gt;
&lt;/h3&gt;&lt;p&gt;RAGFlow deployment depends on components such as Elasticsearch or OpenSearch, so on Linux you usually need to verify:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sysctl vm.max_map_count
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the value is below &lt;code&gt;262144&lt;/code&gt;, you can set it temporarily:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo sysctl -w vm.max_map_count&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;262144&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want the change to persist after reboot, add it to &lt;code&gt;/etc/sysctl.conf&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;4-start-with-docker-compose&#34;&gt;4. Start with Docker Compose
&lt;/h3&gt;&lt;p&gt;You can start the CPU mode directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker compose -f docker-compose.yml up -d
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want GPU acceleration for DeepDoc tasks, the README shows enabling &lt;code&gt;DEVICE=gpu&lt;/code&gt; in &lt;code&gt;.env&lt;/code&gt; before startup:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sed -i &lt;span class=&#34;s1&#34;&gt;&amp;#39;1i DEVICE=gpu&amp;#39;&lt;/span&gt; .env
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker compose -f docker-compose.yml up -d
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then inspect the logs:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker logs -f docker-ragflow-cpu-1
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Once the services are ready, open the machine address in your browser. Under the default configuration, that is typically:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;http://IP_OF_YOUR_MACHINE
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;5-configure-model-api-keys&#34;&gt;5. Configure Model API Keys
&lt;/h3&gt;&lt;p&gt;RAGFlow needs LLM and embedding model configuration. The README mentions choosing the default LLM factory in &lt;code&gt;service_conf.yaml.template&lt;/code&gt; and updating the corresponding &lt;code&gt;API_KEY&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In practice, you need to configure models according to your provider:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chat model&lt;/li&gt;
&lt;li&gt;Embedding model&lt;/li&gt;
&lt;li&gt;Rerank model&lt;/li&gt;
&lt;li&gt;Multimodal model, if you want to understand images inside PDFs or DOCX files&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;6-create-the-knowledge-base-and-upload-documents&#34;&gt;6. Create the Knowledge Base and Upload Documents
&lt;/h3&gt;&lt;p&gt;After the service starts, the typical workflow is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Log in to the Web UI.&lt;/li&gt;
&lt;li&gt;Create a dataset or knowledge base.&lt;/li&gt;
&lt;li&gt;Upload documents or configure a data source sync.&lt;/li&gt;
&lt;li&gt;Wait for parsing to finish.&lt;/li&gt;
&lt;li&gt;Inspect chunk results and adjust them when necessary.&lt;/li&gt;
&lt;li&gt;Create a chat assistant and attach the knowledge base.&lt;/li&gt;
&lt;li&gt;Test answer quality and citation sources.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you need to integrate with a business system, you can continue with the RAGFlow API or SDK and connect retrieval and chat capabilities to your own application.&lt;/p&gt;
&lt;h2 id=&#34;04-suitable-scenarios&#34;&gt;04 Suitable Scenarios
&lt;/h2&gt;&lt;p&gt;RAGFlow fits these kinds of needs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enterprise internal knowledge-base Q&amp;amp;A&lt;/li&gt;
&lt;li&gt;Product manuals, technical documentation, and FAQ retrieval&lt;/li&gt;
&lt;li&gt;Customer support and pre-sales assistants&lt;/li&gt;
&lt;li&gt;Traceable Q&amp;amp;A over contracts, reports, and policy documents&lt;/li&gt;
&lt;li&gt;Unified handling of multi-format materials&lt;/li&gt;
&lt;li&gt;Teams that want both UI-based maintenance and API integration&lt;/li&gt;
&lt;li&gt;Systems that want to use RAG as the context layer for agents&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is especially suitable when document formats are complex, citations matter, and people want to inspect or intervene in parsing results.&lt;/p&gt;
&lt;h2 id=&#34;05-what-to-watch-out-for&#34;&gt;05 What to Watch Out For
&lt;/h2&gt;&lt;p&gt;First, RAGFlow is not a lightweight script. It has real infrastructure requirements. The official recommendation is at least 4 CPU cores, 16 GB RAM, and 50 GB disk. If you only want Q&amp;amp;A over a small amount of Markdown, a full platform may be unnecessary.&lt;/p&gt;
&lt;p&gt;Second, document quality still matters. RAGFlow can improve parsing and chunking, but it cannot magically make low-quality, outdated, or contradictory source material reliable. Knowledge-base governance still matters before production.&lt;/p&gt;
&lt;p&gt;Third, model selection directly affects quality. Embedding, rerank, chat, and multimodal model choices all influence retrieval and answer quality. RAGFlow gives you the workflow, but the final result still depends on data, models, and tuning.&lt;/p&gt;
&lt;p&gt;Fourth, production deployments need careful attention to permissions and data security. Enterprise knowledge bases often contain internal documents, so deployment model, access control, logs, API keys, and model-provider data policy all need to be designed in advance.&lt;/p&gt;
&lt;h2 id=&#34;06-quick-take&#34;&gt;06 Quick Take
&lt;/h2&gt;&lt;p&gt;RAGFlow’s strength is that it turns the hardest parts of RAG into platform capabilities: complex document parsing, explainable chunking, citation grounding, multi-path retrieval, reranking, model configuration, Web UI, API access, and agent extensions.&lt;/p&gt;
&lt;p&gt;If what you need is a verifiable, maintainable enterprise knowledge base that can connect to business systems, RAGFlow is more complete than a “vector database plus a simple chat UI” setup. On the other hand, if you only need small-scale personal Q&amp;amp;A over simple data, a lighter RAG framework may be more resource-efficient.&lt;/p&gt;
&lt;h2 id=&#34;related-links&#34;&gt;Related Links
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;GitHub project: &lt;a class=&#34;link&#34; href=&#34;https://github.com/infiniflow/ragflow&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/infiniflow/ragflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Official docs: &lt;a class=&#34;link&#34; href=&#34;https://ragflow.io/docs/dev/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://ragflow.io/docs/dev/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Online demo: &lt;a class=&#34;link&#34; href=&#34;https://cloud.ragflow.io&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://cloud.ragflow.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        
    </channel>
</rss>
