<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Embedding on KnightLi Blog</title>
        <link>https://knightli.com/en/tags/embedding/</link>
        <description>Recent content in Embedding on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Mon, 04 May 2026 06:01:10 +0800</lastBuildDate><atom:link href="https://knightli.com/en/tags/embedding/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>Gemini Embedding 2: Putting Text, Images, Video, and Audio in One Vector Space</title>
        <link>https://knightli.com/en/2026/05/04/gemini-embedding-2-multimodal-rag/</link>
        <pubDate>Mon, 04 May 2026 06:01:10 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/04/gemini-embedding-2-multimodal-rag/</guid>
        <description>&lt;p&gt;Google Developers Blog introduced how to build with Gemini Embedding 2. The model is now generally available through the Gemini API and Gemini Enterprise Agent Platform. The important point is not simply that it is a new embedding model, but that it maps text, images, video, audio, and documents into the same semantic space.&lt;/p&gt;
&lt;p&gt;This broadens what retrieval systems can handle. Many RAG pipelines previously had to convert images, video, or audio into text or metadata before indexing them separately. Gemini Embedding 2 can process multimodal inputs directly, making it easier for agents, search systems, and classifiers to work with real business materials.&lt;/p&gt;
&lt;p&gt;Original article: &lt;a class=&#34;link&#34; href=&#34;https://developers.googleblog.com/building-with-gemini-embedding-2/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Building with Gemini Embedding 2: Agentic multimodal RAG and beyond&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;model-capabilities&#34;&gt;Model Capabilities
&lt;/h2&gt;&lt;p&gt;Gemini Embedding 2 supports more than 100 languages. A single request can process:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Up to 8,192 text tokens&lt;/li&gt;
&lt;li&gt;Up to 6 images&lt;/li&gt;
&lt;li&gt;Up to 120 seconds of video&lt;/li&gt;
&lt;li&gt;Up to 180 seconds of audio&lt;/li&gt;
&lt;li&gt;Up to 6 pages of PDF&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Its key idea is a unified semantic space. Developers can place content from different modalities into one vector representation system, then use the same retrieval, clustering, or reranking logic to process it.&lt;/p&gt;
&lt;p&gt;For example, a text description and an image can be included in the same embedding request:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;google&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;genai&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;google.genai&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;types&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;client&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;genai&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Client&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;open&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;dog.png&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;rb&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;image_bytes&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;read&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;result&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;client&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;models&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;embed_content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;gemini-embedding-2&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;s2&#34;&gt;&amp;#34;An image of a dog&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;types&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Part&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_bytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image_bytes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;mime_type&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;image/png&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;result&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;embeddings&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want a separate embedding for each input rather than one aggregated vector, you can use the Batch API. The original article also notes that Agent Platform support for this kind of batch workflow is still in progress.&lt;/p&gt;
&lt;h2 id=&#34;what-it-means-for-rag&#34;&gt;What It Means for RAG
&lt;/h2&gt;&lt;p&gt;Multimodal embeddings are useful for agentic RAG. An AI agent may need to inspect a code repository, PDFs, screenshots, charts, audio meeting notes, and product images at the same time. If all of these materials can enter the same semantic space, the retrieval pipeline no longer needs a separate entry point for every format.&lt;/p&gt;
&lt;p&gt;Google recommends using task prefixes according to the goal of the task, so the embeddings better match the retrieval objective. For example, question answering, fact checking, code retrieval, and search results can use different prefixes:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Generate embedding for your task&amp;#39;s query:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;prepare_query&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;query&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;task: question answering | query: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;content&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#34;task: fact checking | query: {content}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#34;task: code retrieval | query: {content}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#34;task: search result | query: {content}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Generate embedding for document of an asymmetric retrieval task:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;prepare_document&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;title&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;title&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;is&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;title&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;none&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;title: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;title&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt; | text: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;content&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This kind of prefix is suitable for asymmetric retrieval: user queries are often short, while documents are often long. Formatting &lt;code&gt;query&lt;/code&gt; and &lt;code&gt;document&lt;/code&gt; differently for the task can improve matching between short queries and long documents.&lt;/p&gt;
&lt;p&gt;The original article gives two real-world examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Harvey saw a 3% increase in Recall@20 precision on legal retrieval benchmarks compared with its previous embeddings.&lt;/li&gt;
&lt;li&gt;Supermemory saw a 40% increase in Recall@1 search accuracy and uses it across memory, indexing, search, and Q&amp;amp;A pipelines.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These numbers do not mean every scenario will improve by the same amount, but they show that multimodal embeddings are already producing results in real retrieval products, not only demos.&lt;/p&gt;
&lt;h2 id=&#34;visual-search&#34;&gt;Visual Search
&lt;/h2&gt;&lt;p&gt;Gemini Embedding 2 is also suitable for image-to-image search, image-text hybrid search, and product identification. The original article mentions Nuuly, URBN&amp;rsquo;s clothing rental company, using it to match photos of untagged garments in warehouses against its catalog. Match@20 improved from 60% to nearly 87%, and the overall successful identification rate rose from 74% to over 90%.&lt;/p&gt;
&lt;p&gt;The point in this type of scenario is not content generation, but understanding which inventory item, document, or product record is closest to a given image. If your business has many images, video clips, or scanned documents, multimodal embeddings can be more natural than text-only indexing.&lt;/p&gt;
&lt;h2 id=&#34;search-reranking&#34;&gt;Search Reranking
&lt;/h2&gt;&lt;p&gt;Embeddings can also be used for reranking. A common approach is to first retrieve a set of candidate results, then calculate the similarity between each candidate and the user&amp;rsquo;s query, pushing more relevant content to the top:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# 1. Define a function to calculate the dot product (cosine similarity)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;dot_product&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;a&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ndarray&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;b&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ndarray&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;array&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;a&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;@&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;array&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;b&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# 2. Retrieve your embeddings&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# (Assuming &amp;#39;summaries&amp;#39; is your list of search results)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;search_res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;get_embeddings&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;summaries&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;embedded_query&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;get_embeddings&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;([&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;query&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# 3. Calculate similarity scores&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;sim_value&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;dot_product&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;search_res&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;embedded_query&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# 4. Select the most relevant result&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;best_match_index&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;argmax&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sim_value&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The original article also mentions another idea: first ask the model to generate a baseline hypothetical answer from its internal knowledge, embed that answer, and compare it with candidate content to find the most semantically relevant result. This is especially useful for Q&amp;amp;A-style RAG.&lt;/p&gt;
&lt;h2 id=&#34;clustering-classification-and-anomaly-detection&#34;&gt;Clustering, Classification, and Anomaly Detection
&lt;/h2&gt;&lt;p&gt;Beyond retrieval, embeddings are also useful for clustering, classification, and anomaly detection. Unlike the asymmetric question-answering retrieval above, these are symmetric tasks, where the same task prefix can be used for both query and document:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Generate embedding for query &amp;amp; document of your task.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;prepare_query_and_document&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#39;task: clustering | query: {content}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#39;task: sentence similarity | query: {content}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return f&amp;#39;task: classification | query: {content}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;These tasks can be used for sentiment classification, content moderation, similar asset grouping, and anomaly discovery. They can also help agents organize large amounts of context before moving into later reasoning steps.&lt;/p&gt;
&lt;h2 id=&#34;storage-and-cost&#34;&gt;Storage and Cost
&lt;/h2&gt;&lt;p&gt;Gemini Embedding 2 outputs 3,072-dimensional vectors by default. It uses Matryoshka Representation Learning, so vectors can be truncated to smaller dimensions with &lt;code&gt;output_dimensionality&lt;/code&gt;. Google recommends 1,536 or 768 dimensions when efficiency is the priority:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;result&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;client&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;models&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;embed_content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;gemini-embedding-2&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;What is the meaning of life?&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;config&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;output_dimensionality&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;768&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Vectors can be stored in Agent Platform Vector Search, Pinecone, Weaviate, Qdrant, ChromaDB, and similar systems. For cost, the original article notes that the Batch API provides higher throughput and reaches 50% of the default embedding price.&lt;/p&gt;
&lt;h2 id=&#34;how-developers-can-use-it&#34;&gt;How Developers Can Use It
&lt;/h2&gt;&lt;p&gt;If you already have text-based RAG, you can start with two incremental upgrades:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Put PDFs, screenshots, image descriptions, and text documents into the same index, then test whether retrieval recall becomes more stable.&lt;/li&gt;
&lt;li&gt;Add task prefixes for different tasks, such as question answering, fact checking, code retrieval, and product search. Do not process all content with the same embedding format.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you are building a new product, consider these directions first:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enterprise knowledge bases: retrieve documents, charts, presentation screenshots, and meeting materials together.&lt;/li&gt;
&lt;li&gt;Visual search: use images, text, or mixed inputs to find products, assets, design drafts, and archives.&lt;/li&gt;
&lt;li&gt;Agent toolchains: let coding agents, research agents, or customer support agents retrieve business materials in multiple formats.&lt;/li&gt;
&lt;li&gt;Content governance: classify, cluster, and detect anomalies across text, images, and video clips.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The value of Gemini Embedding 2 is that it turns multimodal materials into one searchable asset system. For developers, this reduces the need for an intermediate &amp;ldquo;convert to text, then retrieve&amp;rdquo; layer and makes RAG systems closer to the shape of real-world data.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>How to Choose Common Embedding Models: OpenAI vs BGE vs E5 vs GTE vs Jina</title>
        <link>https://knightli.com/en/2026/04/23/compare-openai-bge-e5-gte-jina-embedding-models/</link>
        <pubDate>Thu, 23 Apr 2026 15:23:47 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/23/compare-openai-bge-e5-gte-jina-embedding-models/</guid>
        <description>&lt;p&gt;When people start building RAG systems, semantic search, or knowledge base retrieval, they often get stuck on the same question: there are so many embedding models, so which one should you choose?&lt;/p&gt;
&lt;p&gt;Common options can roughly be split into two groups. One group is general-purpose text embeddings that cover Chinese, English, and multilingual tasks. The other group is better suited to Chinese scenarios, especially Chinese retrieval, Chinese QA, and Chinese knowledge bases.&lt;/p&gt;
&lt;p&gt;If you want the short version first, this is a practical way to think about it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you want the easiest path and prefer using an API directly: &lt;code&gt;text-embedding-3-small&lt;/code&gt; or &lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If you want Chinese retrieval and prefer open-source models you can self-host: &lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;, &lt;code&gt;bge-m3&lt;/code&gt;, &lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If you need multilingual support: &lt;code&gt;multilingual-e5-base&lt;/code&gt;, &lt;code&gt;multilingual-e5-large&lt;/code&gt;, &lt;code&gt;jina-embeddings-v3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If you want to keep costs down in Chinese scenarios: &lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;, &lt;code&gt;gte-base-zh&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;1-first-look-at-them-by-category&#34;&gt;1. First, Look at Them by Category
&lt;/h2&gt;&lt;h3 id=&#34;1-openai-series&#34;&gt;1. OpenAI Series
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The main strengths of these models are simplicity and stability. They are a good fit if you want to call an API directly for retrieval, RAG, classification, and similarity matching. Their advantage is not that they dominate one specific Chinese leaderboard, but that the overall experience is complete: low integration cost, stable quality, and low engineering overhead.&lt;/p&gt;
&lt;p&gt;If your team does not want to host models or maintain inference services, OpenAI is usually the most time-saving option.&lt;/p&gt;
&lt;h3 id=&#34;2-bge-series&#34;&gt;2. BGE Series
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;BAAI/bge-small-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;BAAI/bge-base-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;BGE is one of the most common families used in Chinese retrieval. &lt;code&gt;bge-small-zh-v1.5&lt;/code&gt; and &lt;code&gt;bge-base-zh-v1.5&lt;/code&gt; lean more toward Chinese monolingual tasks, making them suitable for Chinese semantic search, knowledge base retrieval, and FAQ matching. &lt;code&gt;bge-m3&lt;/code&gt; is more general-purpose and can cover multilingual, multi-granularity, and more complex retrieval scenarios.&lt;/p&gt;
&lt;p&gt;If most of your data is Chinese text, BGE is often one of the easiest families to put on the shortlist.&lt;/p&gt;
&lt;h3 id=&#34;3-e5-series&#34;&gt;3. E5 Series
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;intfloat/multilingual-e5-base&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-large&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The strength of the E5 family is more balanced multilingual capability. It works well for mixed Chinese-English data, cross-lingual retrieval, and internationalized content libraries. It is not focused only on Chinese. Instead, it is built around the idea that different languages can live inside one unified retrieval system.&lt;/p&gt;
&lt;p&gt;If your corpus is not purely Chinese, but a mix of Chinese, English, Japanese, or even more languages, E5 is usually more reliable than a Chinese-only model.&lt;/p&gt;
&lt;h3 id=&#34;4-gte-series&#34;&gt;4. GTE Series
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Alibaba-NLP/gte-base-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;GTE is also common in Chinese tasks. Its positioning is similar to BGE: both are practical choices for Chinese retrieval. GTE is usually seen as balanced and easy to use, without much complexity in deployment. It works well for Chinese knowledge bases, site search, and enterprise internal document retrieval.&lt;/p&gt;
&lt;p&gt;If you want one more open-source Chinese model family for side-by-side evaluation, GTE is well worth testing.&lt;/p&gt;
&lt;h3 id=&#34;5-jina-embeddings&#34;&gt;5. Jina Embeddings
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;jina-embeddings-v3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Jina is more oriented toward general-purpose and modern engineering scenarios, and often appears in multilingual retrieval, long-text processing, and web content tasks. It is frequently mentioned in discussions around using a single model to cover more task types, so it is a good fit for teams that want one unified embedding layer.&lt;/p&gt;
&lt;p&gt;If your content sources are mixed, such as webpages, documents, and multilingual text, Jina is often a strong candidate to test.&lt;/p&gt;
&lt;h2 id=&#34;2-which-models-are-most-common-in-chinese-scenarios&#34;&gt;2. Which Models Are Most Common in Chinese Scenarios
&lt;/h2&gt;&lt;p&gt;If we narrow the scope to Chinese use cases, the usual candidates are basically these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-base-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-large&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Among them, the most useful split is not really &amp;ldquo;which one is absolutely better,&amp;rdquo; but these three questions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Is your data primarily Chinese?&lt;/li&gt;
&lt;li&gt;Do you need multilingual support?&lt;/li&gt;
&lt;li&gt;Do you care more about quality, cost, or deployment convenience?&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;3-put-these-models-side-by-side&#34;&gt;3. Put These Models Side by Side
&lt;/h2&gt;&lt;h3 id=&#34;1-if-you-only-care-about-chinese-performance&#34;&gt;1. If You Only Care About Chinese Performance
&lt;/h3&gt;&lt;p&gt;For pure Chinese knowledge bases, Chinese QA, and Chinese document retrieval, BGE and GTE are usually the first families to check.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;: lighter and better for cost-sensitive scenarios&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;: usually one of the most balanced options for Chinese use cases&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-base-zh&lt;/code&gt;: similar to lightweight BGE and good for building a baseline first&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;: better when retrieval quality matters more&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt;: suitable if you want to evaluate Chinese retrieval together with broader capabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your corpus is almost entirely Chinese, E5 can still work, but it often will not be the first priority.&lt;/p&gt;
&lt;h3 id=&#34;2-if-you-need-multilingual-support&#34;&gt;2. If You Need Multilingual Support
&lt;/h3&gt;&lt;p&gt;The priorities change quite a bit here.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt; and &lt;code&gt;multilingual-e5-large&lt;/code&gt; are better suited to unified multilingual retrieval&lt;/li&gt;
&lt;li&gt;&lt;code&gt;jina-embeddings-v3&lt;/code&gt; also fits multilingual and general text tasks&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt; is better than traditional Chinese-only models when you want to expand into multilingual usage&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt; and &lt;code&gt;text-embedding-3-large&lt;/code&gt; are good if you want the simplest API-based route&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your corpus contains Chinese, English, product documentation, website copy, and user questions at the same time, multilingual models can save you a lot of future migration work.&lt;/p&gt;
&lt;h3 id=&#34;3-if-you-need-to-control-inference-and-storage-cost&#34;&gt;3. If You Need to Control Inference and Storage Cost
&lt;/h3&gt;&lt;p&gt;Lightweight models have the advantage here.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-base-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These models are usually a better fit when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You have a large document volume&lt;/li&gt;
&lt;li&gt;Data is updated frequently&lt;/li&gt;
&lt;li&gt;You need batch vectorization&lt;/li&gt;
&lt;li&gt;You are sensitive to latency and cost&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your dataset is large, embedding dimensions, inference speed, and index size will all directly affect total cost. That is why starting with a smaller model as a baseline is often the safer choice.&lt;/p&gt;
&lt;h3 id=&#34;4-if-you-want-the-highest-ceiling-first&#34;&gt;4. If You Want the Highest Ceiling First
&lt;/h3&gt;&lt;p&gt;Larger models are usually better suited to complex retrieval or higher-quality recall, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-large&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But one thing to keep in mind is that a larger model does not automatically lead to a better production experience. In many projects, the real bottleneck is not the model itself, but chunking strategy, recall count, reranking, data cleaning, and evaluation design.&lt;/p&gt;
&lt;h2 id=&#34;4-what-each-model-is-better-at&#34;&gt;4. What Each Model Is Better At
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Model&lt;/th&gt;
          &lt;th&gt;Better suited for&lt;/th&gt;
          &lt;th&gt;Quick judgment&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;General retrieval, RAG, fast integration&lt;/td&gt;
          &lt;td&gt;Simple API usage and cost-friendly&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;General retrieval where quality matters more&lt;/td&gt;
          &lt;td&gt;Quality-first and lowest engineering burden&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Lightweight Chinese retrieval&lt;/td&gt;
          &lt;td&gt;A common entry-level Chinese option&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Chinese knowledge bases, FAQ, semantic search&lt;/td&gt;
          &lt;td&gt;Very balanced in Chinese scenarios&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;bge-m3&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Chinese-focused setups that also need more complex retrieval&lt;/td&gt;
          &lt;td&gt;More extensible&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Foundational multilingual retrieval&lt;/td&gt;
          &lt;td&gt;Common in international products&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;multilingual-e5-large&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;High-quality multilingual recall&lt;/td&gt;
          &lt;td&gt;More quality-oriented&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;gte-base-zh&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Lightweight Chinese retrieval&lt;/td&gt;
          &lt;td&gt;Good for building a baseline&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Chinese scenarios that prioritize quality&lt;/td&gt;
          &lt;td&gt;A good comparison point against BGE&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;jina-embeddings-v3&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Multilingual, web, and general text tasks&lt;/td&gt;
          &lt;td&gt;Worth testing if you want one unified embedding layer&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;5-a-practical-way-to-make-the-choice&#34;&gt;5. A Practical Way to Make the Choice
&lt;/h2&gt;&lt;p&gt;If you are trying to ship a system rather than write a paper, you can keep the decision process simple.&lt;/p&gt;
&lt;h3 id=&#34;scenario-1-chinese-knowledge-base&#34;&gt;Scenario 1: Chinese Knowledge Base
&lt;/h3&gt;&lt;p&gt;Start with these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bge-base-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If budget is tight, start from the smaller model first. If retrieval quality matters more, then move upward to larger models.&lt;/p&gt;
&lt;h3 id=&#34;scenario-2-mixed-chinese-english-knowledge-base&#34;&gt;Scenario 2: Mixed Chinese-English Knowledge Base
&lt;/h3&gt;&lt;p&gt;Start with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-large&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you do not want to self-host, OpenAI is the more direct option. If you want to host the model yourself, E5 is the more common path.&lt;/p&gt;
&lt;h3 id=&#34;scenario-3-mostly-chinese-now-but-possibly-multilingual-later&#34;&gt;Scenario 3: Mostly Chinese Now, but Possibly Multilingual Later
&lt;/h3&gt;&lt;p&gt;Start with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bge-m3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multilingual-e5-base&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;jina-embeddings-v3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The biggest risk in this kind of setup is optimizing only for Chinese at the beginning and then having to rebuild the whole vector database later.&lt;/p&gt;
&lt;h2 id=&#34;6-in-the-end-the-key-is-not-top-of-the-leaderboard&#34;&gt;6. In the End, the Key Is Not &amp;ldquo;Top of the Leaderboard&amp;rdquo;
&lt;/h2&gt;&lt;p&gt;The easiest mistake in embedding model selection is to look only at public benchmark scores and then ship directly to production.&lt;/p&gt;
&lt;p&gt;A more reliable process is usually:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Pick 2 to 4 candidate models first&lt;/li&gt;
&lt;li&gt;Run embeddings on your own real data&lt;/li&gt;
&lt;li&gt;Evaluate one round of retrieval performance&lt;/li&gt;
&lt;li&gt;Then make the final decision based on cost, latency, and deployment style&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Because in practice, what really determines the result is often not the model name itself, but whether the model matches your corpus, chunking strategy, and query patterns.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;If you only want one practical summary to remember, use this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chinese-first: start with &lt;code&gt;bge-base-zh-v1.5&lt;/code&gt; and &lt;code&gt;gte-large-zh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Cost-first: start with &lt;code&gt;bge-small-zh-v1.5&lt;/code&gt;, &lt;code&gt;gte-base-zh&lt;/code&gt;, and &lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Multilingual-first: start with &lt;code&gt;multilingual-e5-base&lt;/code&gt;, &lt;code&gt;multilingual-e5-large&lt;/code&gt;, and &lt;code&gt;jina-embeddings-v3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;API-first: start with &lt;code&gt;text-embedding-3-small&lt;/code&gt; and &lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If you want Chinese now and flexibility later: start with &lt;code&gt;bge-m3&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There is no single model that fits every project, but for most projects, you can quickly narrow down the first batch of candidates from these few groups.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
