<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Gemma 4 on KnightLi Blog</title>
        <link>https://knightli.com/en/tags/gemma-4/</link>
        <description>Recent content in Gemma 4 on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Fri, 01 May 2026 11:42:34 +0800</lastBuildDate><atom:link href="https://knightli.com/en/tags/gemma-4/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>Running Gemma 4 Locally: VRAM Requirements for E2B, E4B, 26B, and 31B Quantized Models</title>
        <link>https://knightli.com/en/2026/05/01/gemma-4-local-vram-quantization-table/</link>
        <pubDate>Fri, 01 May 2026 11:42:34 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/05/01/gemma-4-local-vram-quantization-table/</guid>
        <description>&lt;p&gt;Gemma 4 currently has four main sizes for local deployment: &lt;code&gt;E2B&lt;/code&gt;, &lt;code&gt;E4B&lt;/code&gt;, &lt;code&gt;26B A4B&lt;/code&gt;, and &lt;code&gt;31B&lt;/code&gt;.
&lt;code&gt;E2B&lt;/code&gt; and &lt;code&gt;E4B&lt;/code&gt; target lightweight and edge devices, &lt;code&gt;26B A4B&lt;/code&gt; uses an MoE architecture, and &lt;code&gt;31B&lt;/code&gt; is the larger dense model.&lt;/p&gt;
&lt;p&gt;The easiest mistake in local inference is mixing up two numbers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GGUF file size&lt;/strong&gt;: how large the model weight file is.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Actual VRAM usage&lt;/strong&gt;: affected by model weights, KV cache, runtime overhead, context length, and whether multimodal projection files are loaded.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The tables below estimate VRAM requirements based on GGUF file size.
The default assumption is local text inference with &lt;code&gt;llama.cpp&lt;/code&gt;, LM Studio, Ollama, or similar runtimes, using short to medium context.
If you need long context, image/audio input, or concurrent requests, leave more VRAM headroom.&lt;/p&gt;
&lt;h2 id=&#34;quick-summary&#34;&gt;Quick Summary
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;VRAM&lt;/th&gt;
          &lt;th&gt;Good Fit&lt;/th&gt;
          &lt;th&gt;Avoid&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;4GB&lt;/td&gt;
          &lt;td&gt;Low-bit E2B quantizations&lt;/td&gt;
          &lt;td&gt;E4B and above&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;6GB&lt;/td&gt;
          &lt;td&gt;E2B Q4/Q5, low-bit E4B&lt;/td&gt;
          &lt;td&gt;26B, 31B&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;8GB&lt;/td&gt;
          &lt;td&gt;E2B Q8, E4B Q4/Q5&lt;/td&gt;
          &lt;td&gt;26B Q4, 31B Q4&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;12GB&lt;/td&gt;
          &lt;td&gt;E4B Q8, low-quality 2-bit/3-bit 26B or 31B tests&lt;/td&gt;
          &lt;td&gt;26B Q4 with long context, 31B Q4&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;16GB&lt;/td&gt;
          &lt;td&gt;Low-bit 26B, low-bit 31B&lt;/td&gt;
          &lt;td&gt;31B Q4 with long context, 26B Q5 and above&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;24GB&lt;/td&gt;
          &lt;td&gt;26B Q4/Q5, 31B Q4&lt;/td&gt;
          &lt;td&gt;31B Q8, BF16&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;32GB&lt;/td&gt;
          &lt;td&gt;26B Q6/Q8, 31B Q5/Q6&lt;/td&gt;
          &lt;td&gt;BF16&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;48GB&lt;/td&gt;
          &lt;td&gt;31B Q8 more comfortably, 26B Q8 with longer context&lt;/td&gt;
          &lt;td&gt;31B BF16&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;80GB+&lt;/td&gt;
          &lt;td&gt;26B/31B BF16&lt;/td&gt;
          &lt;td&gt;Single consumer GPU deployment&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you just want something usable locally, start with &lt;code&gt;E4B Q4_K_M&lt;/code&gt; or &lt;code&gt;E2B Q4_K_M&lt;/code&gt;.
With 24GB VRAM, &lt;code&gt;26B A4B Q4_K_M&lt;/code&gt; and &lt;code&gt;31B Q4_K_M&lt;/code&gt; start to become realistic choices.&lt;/p&gt;
&lt;h2 id=&#34;gemma-4-e2b-vram-table&#34;&gt;Gemma 4 E2B VRAM Table
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;E2B&lt;/code&gt; is the lightest version, suitable for laptops, mini PCs, mobile devices, and low-VRAM testing.
It is easy to run, but complex reasoning, coding, and long tasks are limited.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Quantization&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;GGUF File Size&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Minimum VRAM&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Safer VRAM&lt;/th&gt;
          &lt;th&gt;Best For&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-IQ2_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;2.29GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;4GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;6GB&lt;/td&gt;
          &lt;td&gt;Extreme low-VRAM tests&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-Q2_K_XL&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;2.40GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;4GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;6GB&lt;/td&gt;
          &lt;td&gt;Low-VRAM usability&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q3_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;2.54GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;4GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;6GB&lt;/td&gt;
          &lt;td&gt;Lightweight chat and summaries&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;IQ4_XS&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;2.98GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;6GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8GB&lt;/td&gt;
          &lt;td&gt;Balance of quality and size&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q4_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;3.11GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;6GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8GB&lt;/td&gt;
          &lt;td&gt;Recommended E2B default&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q5_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;3.36GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;6GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8GB&lt;/td&gt;
          &lt;td&gt;Slightly steadier than Q4&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q6_K&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;4.50GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;10GB&lt;/td&gt;
          &lt;td&gt;Higher-quality small model&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q8_0&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;5.05GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;10GB&lt;/td&gt;
          &lt;td&gt;Near-original precision for lightweight deployment&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;BF16&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;9.31GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;12GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16GB&lt;/td&gt;
          &lt;td&gt;Debugging, comparison, research&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For daily use, &lt;code&gt;E2B Q4_K_M&lt;/code&gt; is already enough.
With only 4GB VRAM, 2-bit or 3-bit variants can work, but output quality will be less stable.&lt;/p&gt;
&lt;h2 id=&#34;gemma-4-e4b-vram-table&#34;&gt;Gemma 4 E4B VRAM Table
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;E4B&lt;/code&gt; is the more practical lightweight model.
Compared with E2B, it is better for everyday writing, document summaries, light coding assistance, and local assistant use.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Quantization&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;GGUF File Size&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Minimum VRAM&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Safer VRAM&lt;/th&gt;
          &lt;th&gt;Best For&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-IQ2_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;3.53GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;6GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8GB&lt;/td&gt;
          &lt;td&gt;Low-VRAM tests&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-Q2_K_XL&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;3.74GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;6GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8GB&lt;/td&gt;
          &lt;td&gt;Low-VRAM usability&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q3_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;4.06GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;6GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;10GB&lt;/td&gt;
          &lt;td&gt;Lightweight local assistant&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;IQ4_XS&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;4.72GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;12GB&lt;/td&gt;
          &lt;td&gt;Balance of quality and speed&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q4_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;4.98GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;12GB&lt;/td&gt;
          &lt;td&gt;Recommended E4B default&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q5_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;5.48GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;12GB&lt;/td&gt;
          &lt;td&gt;Steadier everyday use&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q6_K&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;7.07GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;10GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16GB&lt;/td&gt;
          &lt;td&gt;Quality first&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q8_0&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8.19GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;12GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16GB&lt;/td&gt;
          &lt;td&gt;Near-original precision&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;BF16&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;15.05GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;20GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;24GB&lt;/td&gt;
          &lt;td&gt;Research, evaluation, precision comparison&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If your GPU has 8GB VRAM, &lt;code&gt;E4B Q4_K_M&lt;/code&gt; is a realistic starting point.
With 12GB or 16GB VRAM, &lt;code&gt;E4B Q8_0&lt;/code&gt; is also worth considering.&lt;/p&gt;
&lt;h2 id=&#34;gemma-4-26b-a4b-vram-table&#34;&gt;Gemma 4 26B A4B VRAM Table
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;26B A4B&lt;/code&gt; is the MoE version. It has a larger total parameter count, but activates only part of the experts during inference.
It is better suited to more complex Q&amp;amp;A, coding, tool use, and agent workflows.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Quantization&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;GGUF File Size&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Minimum VRAM&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Safer VRAM&lt;/th&gt;
          &lt;th&gt;Best For&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-IQ2_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;9.97GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;14GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16GB&lt;/td&gt;
          &lt;td&gt;Extreme 16GB GPU tests&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-Q2_K_XL&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;10.55GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;14GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16GB&lt;/td&gt;
          &lt;td&gt;Running 26B with low VRAM&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-Q3_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;12.53GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;20GB&lt;/td&gt;
          &lt;td&gt;Better quality while still VRAM-conscious&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-IQ4_XS&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;13.42GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;24GB&lt;/td&gt;
          &lt;td&gt;Balance of quality and size&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-Q4_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16.87GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;20GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;24GB&lt;/td&gt;
          &lt;td&gt;Recommended 26B default&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-Q5_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;21.15GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;24GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;32GB&lt;/td&gt;
          &lt;td&gt;Higher-quality quantization&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-Q6_K&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;23.17GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;28GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;32GB&lt;/td&gt;
          &lt;td&gt;Quality first&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q8_0&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;26.86GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;32GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;40GB&lt;/td&gt;
          &lt;td&gt;Near-original precision&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;BF16&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;50.51GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;64GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;80GB&lt;/td&gt;
          &lt;td&gt;Not realistic for most single consumer GPUs&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;24GB VRAM is the comfortable dividing line for 26B A4B.
A 16GB GPU can try low-bit versions, but context length, concurrency, and multimodal input should be kept modest.&lt;/p&gt;
&lt;h2 id=&#34;gemma-4-31b-vram-table&#34;&gt;Gemma 4 31B VRAM Table
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;31B&lt;/code&gt; is the larger dense model.
Its strength is stronger overall capability, but its VRAM pressure is more direct than 26B A4B.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Quantization&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;GGUF File Size&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Minimum VRAM&lt;/th&gt;
          &lt;th style=&#34;text-align: right&#34;&gt;Safer VRAM&lt;/th&gt;
          &lt;th&gt;Best For&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-IQ2_XXS&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;8.53GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;12GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16GB&lt;/td&gt;
          &lt;td&gt;Extreme low-VRAM tests with clear quality loss&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-IQ2_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;10.75GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;14GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;18GB&lt;/td&gt;
          &lt;td&gt;Low-VRAM tests&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;UD-Q2_K_XL&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;11.77GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;20GB&lt;/td&gt;
          &lt;td&gt;16GB GPU experiments&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q3_K_S&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;13.21GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;24GB&lt;/td&gt;
          &lt;td&gt;More VRAM-efficient 3-bit&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q3_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;14.74GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;20GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;24GB&lt;/td&gt;
          &lt;td&gt;Common 3-bit compromise&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;IQ4_XS&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;16.37GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;20GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;24GB&lt;/td&gt;
          &lt;td&gt;Near-Q4 compromise&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q4_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;18.32GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;24GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;32GB&lt;/td&gt;
          &lt;td&gt;Recommended 31B default&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q5_K_M&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;21.66GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;28GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;32GB&lt;/td&gt;
          &lt;td&gt;Higher-quality quantization&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q6_K&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;25.20GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;32GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;40GB&lt;/td&gt;
          &lt;td&gt;Quality first&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;Q8_0&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;32.64GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;40GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;48GB&lt;/td&gt;
          &lt;td&gt;Near-original precision&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;BF16&lt;/code&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;61.41GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;80GB&lt;/td&gt;
          &lt;td style=&#34;text-align: right&#34;&gt;96GB&lt;/td&gt;
          &lt;td&gt;Server or large-VRAM workstation&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Low-bit 31B can be tested on a 16GB GPU, but for daily use, 24GB VRAM is a better starting point.
&lt;code&gt;Q4_K_M&lt;/code&gt; is the balanced choice, while &lt;code&gt;Q5_K_M&lt;/code&gt; and above make more sense with 32GB+ VRAM.&lt;/p&gt;
&lt;h2 id=&#34;why-actual-usage-is-higher-than-file-size&#34;&gt;Why Actual Usage Is Higher Than File Size
&lt;/h2&gt;&lt;p&gt;The GGUF file size is only the weight size.
Runtime usage also includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;KV cache&lt;/code&gt;: longer context means higher memory use.&lt;/li&gt;
&lt;li&gt;Batch size and concurrency: processing more tokens or more users increases VRAM.&lt;/li&gt;
&lt;li&gt;Multimodal components: image, audio, or video input often requires &lt;code&gt;mmproj&lt;/code&gt; or extra modules.&lt;/li&gt;
&lt;li&gt;Runtime backend: CUDA, Metal, ROCm, and CPU/GPU split loading behave differently.&lt;/li&gt;
&lt;li&gt;KV cache quantization: &lt;code&gt;q8_0&lt;/code&gt;, &lt;code&gt;q4_0&lt;/code&gt;, and similar modes can save VRAM, but may affect detail.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So the “minimum VRAM” column should be read as the threshold for startup and short-context inference.
For 32K, 64K, 128K, or even 256K context, VRAM requirements rise significantly.&lt;/p&gt;
&lt;h2 id=&#34;how-to-choose&#34;&gt;How to Choose
&lt;/h2&gt;&lt;p&gt;If you just want to try Gemma 4 locally:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;4GB to 6GB VRAM: choose &lt;code&gt;E2B Q3_K_M&lt;/code&gt; or &lt;code&gt;E2B Q4_K_M&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;8GB VRAM: prefer &lt;code&gt;E4B Q4_K_M&lt;/code&gt;; &lt;code&gt;E2B Q8_0&lt;/code&gt; is also fine.&lt;/li&gt;
&lt;li&gt;12GB VRAM: choose &lt;code&gt;E4B Q8_0&lt;/code&gt;, or try low-bit 26B/31B variants.&lt;/li&gt;
&lt;li&gt;16GB VRAM: try &lt;code&gt;26B A4B UD-Q3_K_M&lt;/code&gt; or &lt;code&gt;31B Q3_K_S&lt;/code&gt;, but do not expect long context to feel comfortable.&lt;/li&gt;
&lt;li&gt;24GB VRAM: focus on &lt;code&gt;26B A4B UD-Q4_K_M&lt;/code&gt; and &lt;code&gt;31B Q4_K_M&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;32GB and above: consider &lt;code&gt;Q5_K_M&lt;/code&gt;, &lt;code&gt;Q6_K&lt;/code&gt;, or longer context.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most users do not need BF16.
Local deployment is not about picking the largest file, but about balancing VRAM, speed, context length, and output quality.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/google/gemma-4-E2B-it&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;google/gemma-4-E2B-it - Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/google/gemma-4-E4B-it&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;google/gemma-4-E4B-it - Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/ggml-org/gemma-4-26B-A4B-it-GGUF&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ggml-org/gemma-4-26B-A4B-it-GGUF - Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/unsloth/gemma-4-E2B-it-GGUF&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;unsloth/gemma-4-E2B-it-GGUF - Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;unsloth/gemma-4-E4B-it-GGUF - Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;unsloth/gemma-4-26B-A4B-it-GGUF - Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/unsloth/gemma-4-31B-it-GGUF&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;unsloth/gemma-4-31B-it-GGUF - Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Gemma 4 E4B Uncensored vs Official: What Actually Changes</title>
        <link>https://knightli.com/en/2026/04/18/gemma-4-e4b-uncensored-vs-official/</link>
        <pubDate>Sat, 18 Apr 2026 10:20:00 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/18/gemma-4-e4b-uncensored-vs-official/</guid>
        <description>&lt;p&gt;If you see a model like &lt;code&gt;HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive&lt;/code&gt;, the most important point is this: it is &lt;strong&gt;not a new Google base model&lt;/strong&gt;. It is a derivative release built on top of the official &lt;code&gt;google/gemma-4-E4B-it&lt;/code&gt;, but with alignment behavior intentionally pushed toward fewer refusals.&lt;/p&gt;
&lt;p&gt;That means the real difference is usually &lt;strong&gt;behavioral policy and response style&lt;/strong&gt;, not a brand-new architecture.&lt;/p&gt;
&lt;h2 id=&#34;what-the-derivative-model-explicitly-claims&#34;&gt;What the derivative model explicitly claims
&lt;/h2&gt;&lt;p&gt;According to its Hugging Face model card, the HauhauCS release says:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it is based on &lt;code&gt;google/gemma-4-E4B-it&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;it makes &amp;ldquo;no changes to datasets or capabilities&amp;rdquo;&lt;/li&gt;
&lt;li&gt;it is &amp;ldquo;just without the refusals&amp;rdquo;&lt;/li&gt;
&lt;li&gt;the &lt;code&gt;Aggressive&lt;/code&gt; variant is &amp;ldquo;fully unlocked and won&amp;rsquo;t refuse prompts&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Those are the creator&amp;rsquo;s claims, not an independent benchmark. Still, they tell you the intended positioning very clearly: this is an unofficial derivative optimized to reduce safety refusals.&lt;/p&gt;
&lt;h2 id=&#34;official-model-vs-uncensored-derivative&#34;&gt;Official model vs &amp;ldquo;uncensored&amp;rdquo; derivative
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Dimension&lt;/th&gt;
          &lt;th&gt;Official &lt;code&gt;google/gemma-4-E4B-it&lt;/code&gt;&lt;/th&gt;
          &lt;th&gt;&lt;code&gt;Gemma-4-E4B-Uncensored-HauhauCS-Aggressive&lt;/code&gt;&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Source&lt;/td&gt;
          &lt;td&gt;Official Google release&lt;/td&gt;
          &lt;td&gt;Third-party derivative on Hugging Face&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Base architecture&lt;/td&gt;
          &lt;td&gt;Gemma 4 E4B instruction-tuned model&lt;/td&gt;
          &lt;td&gt;Same base family, explicitly described as based on &lt;code&gt;google/gemma-4-E4B-it&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Main goal&lt;/td&gt;
          &lt;td&gt;General-purpose helpful assistant with responsible-use framing&lt;/td&gt;
          &lt;td&gt;Reduce refusals and keep answering even when the official model might decline&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Safety posture&lt;/td&gt;
          &lt;td&gt;Aligned with Gemma family safety docs and prohibited-use policy&lt;/td&gt;
          &lt;td&gt;Intentionally weakened refusal behavior&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Response style&lt;/td&gt;
          &lt;td&gt;More likely to refuse, redirect, or soften certain requests&lt;/td&gt;
          &lt;td&gt;More likely to answer directly, including prompts the official model may block&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Risk profile&lt;/td&gt;
          &lt;td&gt;Lower misuse risk by default, but still not risk-free&lt;/td&gt;
          &lt;td&gt;Higher misuse risk, higher chance of unsafe or non-compliant output&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Predictability in products&lt;/td&gt;
          &lt;td&gt;Easier to justify in normal apps and enterprise environments&lt;/td&gt;
          &lt;td&gt;Harder to justify in public-facing, business, or policy-sensitive deployments&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Compliance burden&lt;/td&gt;
          &lt;td&gt;Still requires application-level safeguards&lt;/td&gt;
          &lt;td&gt;Requires even stronger downstream safeguards because the model itself is less restrictive&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;the-core-difference-is-alignment-not-raw-capability&#34;&gt;The core difference is alignment, not raw capability
&lt;/h2&gt;&lt;p&gt;Many users mistakenly treat &amp;ldquo;uncensored&amp;rdquo; as if it means &amp;ldquo;smarter.&amp;rdquo; That is usually the wrong frame.&lt;/p&gt;
&lt;p&gt;For a derivative like this, what changes first is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;how often the model refuses&lt;/li&gt;
&lt;li&gt;how strongly it follows harmful or policy-sensitive instructions&lt;/li&gt;
&lt;li&gt;how much filtering remains in its final answers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What does &lt;strong&gt;not&lt;/strong&gt; automatically change:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the underlying Gemma 4 family architecture&lt;/li&gt;
&lt;li&gt;context window class&lt;/li&gt;
&lt;li&gt;multimodal support class&lt;/li&gt;
&lt;li&gt;general reasoning ceiling&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, an uncensored derivative is often better described as a &lt;strong&gt;different behavioral tuning&lt;/strong&gt; of the same model family, not a higher-tier model.&lt;/p&gt;
&lt;h2 id=&#34;why-the-official-version-behaves-differently&#34;&gt;Why the official version behaves differently
&lt;/h2&gt;&lt;p&gt;Google&amp;rsquo;s official Gemma materials frame the family as being built for responsible AI development. The Gemma model card highlights misuse, harmful content, privacy, and bias risks, and Google&amp;rsquo;s Gemma Prohibited Use Policy explicitly forbids using Gemma or model derivatives to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;facilitate dangerous, illegal, or malicious activities&lt;/li&gt;
&lt;li&gt;generate harmful or deceptive content&lt;/li&gt;
&lt;li&gt;override or circumvent safety filters&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So the official model is not just &amp;ldquo;more conservative&amp;rdquo; by accident. Its surrounding policy and intended deployment posture are deliberately different.&lt;/p&gt;
&lt;h2 id=&#34;when-the-official-model-is-the-better-choice&#34;&gt;When the official model is the better choice
&lt;/h2&gt;&lt;p&gt;Use the official &lt;code&gt;google/gemma-4-E4B-it&lt;/code&gt; path if you care about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;product deployment&lt;/li&gt;
&lt;li&gt;enterprise or team use&lt;/li&gt;
&lt;li&gt;lower legal and policy exposure&lt;/li&gt;
&lt;li&gt;fewer obviously unsafe outputs&lt;/li&gt;
&lt;li&gt;easier documentation and review&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For most normal applications, this is the safer default.&lt;/p&gt;
&lt;h2 id=&#34;when-people-choose-the-uncensored-derivative&#34;&gt;When people choose the uncensored derivative
&lt;/h2&gt;&lt;p&gt;Users usually choose an uncensored derivative for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;local private experimentation&lt;/li&gt;
&lt;li&gt;testing where the official model refuses too early&lt;/li&gt;
&lt;li&gt;roleplay or open-ended creative prompting&lt;/li&gt;
&lt;li&gt;comparing alignment behavior across variants&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But this comes with a real trade-off: you are moving more safety responsibility from the model provider to yourself.&lt;/p&gt;
&lt;h2 id=&#34;practical-conclusion&#34;&gt;Practical conclusion
&lt;/h2&gt;&lt;p&gt;The difference between a so-called &amp;ldquo;jailbroken&amp;rdquo; Gemma 4 E4B and the ordinary official version is mostly this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the official version is optimized for usable capability &lt;strong&gt;with guardrails&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;the uncensored derivative is optimized for fewer refusals &lt;strong&gt;with weaker guardrails&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That does &lt;strong&gt;not&lt;/strong&gt; automatically make the uncensored model stronger. It mainly makes it more permissive.&lt;/p&gt;
&lt;p&gt;If your goal is stable, explainable, and lower-risk deployment, use the official model first. If your goal is local experimentation and you understand the compliance and safety trade-offs, then an uncensored derivative is a behavior variant worth testing separately, not a drop-in &amp;ldquo;better&amp;rdquo; replacement.&lt;/p&gt;
&lt;h2 id=&#34;sources&#34;&gt;Sources
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Hugging Face: &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hugging Face: &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/google/gemma-4-E4B-it&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;google/gemma-4-E4B-it&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Google AI for Developers: &lt;a class=&#34;link&#34; href=&#34;https://ai.google.dev/gemma/prohibited_use_policy&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Gemma Prohibited Use Policy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Google AI for Developers: &lt;a class=&#34;link&#34; href=&#34;https://ai.google.dev/gemma/docs/core/model_card&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Gemma model card&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Deploy Hermes Agent Locally on Windows with WSL &#43; Ollama and Connect Telegram</title>
        <link>https://knightli.com/en/2026/04/18/windows-wsl-ollama-hermes-agent-telegram/</link>
        <pubDate>Sat, 18 Apr 2026 00:48:22 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/18/windows-wsl-ollama-hermes-agent-telegram/</guid>
        <description>&lt;p&gt;If you want to run &lt;code&gt;Hermes Agent&lt;/code&gt; on &lt;code&gt;Windows&lt;/code&gt; with as little friction as possible, a practical path is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;keep Windows as the host system&lt;/li&gt;
&lt;li&gt;run &lt;code&gt;Ubuntu&lt;/code&gt; inside &lt;code&gt;WSL&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;use &lt;code&gt;Ollama&lt;/code&gt; to serve the local model&lt;/li&gt;
&lt;li&gt;let &lt;code&gt;Hermes Agent&lt;/code&gt; connect directly to the local Ollama endpoint&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This approach keeps the environment relatively clean, lets you run most commands in a Linux-style workflow, and avoids preparing a separate Linux machine.&lt;/p&gt;
&lt;h2 id=&#34;overall-flow&#34;&gt;Overall flow
&lt;/h2&gt;&lt;p&gt;You can split the setup into 4 steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Enable &lt;code&gt;WSL&lt;/code&gt; and install &lt;code&gt;Ubuntu&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Install Python, Node.js, Git, and other basics inside Ubuntu&lt;/li&gt;
&lt;li&gt;Install &lt;code&gt;Ollama&lt;/code&gt; and pull a local model&lt;/li&gt;
&lt;li&gt;Install &lt;code&gt;Hermes Agent&lt;/code&gt;, then connect &lt;code&gt;Telegram&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If your goal is simply to get Hermes Agent running first, by the end of step 3 you are already close.&lt;/p&gt;
&lt;h2 id=&#34;1-install-wsl-and-ubuntu&#34;&gt;1. Install WSL and Ubuntu
&lt;/h2&gt;&lt;p&gt;Run this in PowerShell with administrator privileges:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-powershell&#34; data-lang=&#34;powershell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;wsl&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;-install&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After the installation finishes, restart the PC, then continue with Ubuntu:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-powershell&#34; data-lang=&#34;powershell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;wsl&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;-install&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;-d&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Ubuntu&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After that, open Ubuntu in WSL. Most of the remaining commands are run there.&lt;/p&gt;
&lt;h2 id=&#34;2-update-ubuntu-and-install-the-base-environment&#34;&gt;2. Update Ubuntu and install the base environment
&lt;/h2&gt;&lt;p&gt;Update the system first:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt update
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt upgrade -y
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then install Python, extraction tools, Node.js, and Git.&lt;/p&gt;
&lt;h3 id=&#34;install-python&#34;&gt;Install Python
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt install python3-pip python3-venv -y
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;install-zstd&#34;&gt;Install zstd
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt install -y zstd
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;install-nodejs&#34;&gt;Install Node.js
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -fsSL https://deb.nodesource.com/setup_22.x &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; sudo -E bash -
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt install -y nodejs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;install-git&#34;&gt;Install Git
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt update
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt install -y git
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;You can quickly verify the installation with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;node -v
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npm -v
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git --version
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;3-install-ollama-and-pull-gemma-4&#34;&gt;3. Install Ollama and pull Gemma 4
&lt;/h2&gt;&lt;p&gt;Install Ollama:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -fsSL https://ollama.com/install.sh &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you want a local model for Hermes Agent, starting with &lt;code&gt;Gemma 4&lt;/code&gt; is reasonable.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama run gemma4:e4b
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If your machine is weaker, you can also try:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama run gemma4:e2b
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Larger variants include:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama run gemma4:26b
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama run gemma4:31b
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For most normal &lt;code&gt;Windows + WSL&lt;/code&gt; setups, &lt;code&gt;gemma4:e4b&lt;/code&gt; is usually the more practical starting point.&lt;/p&gt;
&lt;h2 id=&#34;4-install-and-configure-hermes-agent&#34;&gt;4. Install and configure Hermes Agent
&lt;/h2&gt;&lt;p&gt;Install it with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; bash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After installation, point it to the local Ollama endpoint:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;http://127.0.0.1:11434
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Use the local model name you actually installed, for example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;gemma4:e4b
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the installer asks you to refresh the shell, run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;source&lt;/span&gt; ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;common-hermes-agent-commands&#34;&gt;Common Hermes Agent commands
&lt;/h2&gt;&lt;p&gt;These are the commands you will use most often:&lt;/p&gt;
&lt;h3 id=&#34;start&#34;&gt;Start
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;hermes
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;re-enter-setup&#34;&gt;Re-enter setup
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;hermes setup
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;configure-the-chat-gateway&#34;&gt;Configure the chat gateway
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;hermes setup gateway
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;update&#34;&gt;Update
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;hermes update
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;basic-telegram-connection-steps&#34;&gt;Basic Telegram connection steps
&lt;/h2&gt;&lt;p&gt;If you want Hermes Agent to send and receive messages through Telegram, the core step is still:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;hermes setup gateway
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then prepare the two Telegram-side items you need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;create a bot with &lt;code&gt;BotFather&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;get your &lt;code&gt;User ID&lt;/code&gt; with &lt;code&gt;@userinfobot&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once you have those basics, continue filling them into the Hermes Agent gateway setup.&lt;/p&gt;
&lt;h2 id=&#34;who-this-setup-fits&#34;&gt;Who this setup fits
&lt;/h2&gt;&lt;p&gt;This workflow is a good fit if:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Windows is your main desktop system&lt;/li&gt;
&lt;li&gt;you do not want to maintain a separate Linux host&lt;/li&gt;
&lt;li&gt;you want to get a local Agent running first, then expand to chat platforms&lt;/li&gt;
&lt;li&gt;you prefer local models instead of depending on cloud APIs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you mainly want to experience a local Agent rather than build a full production deployment immediately, this path is already practical enough.&lt;/p&gt;
&lt;h2 id=&#34;a-few-things-to-keep-in-mind&#34;&gt;A few things to keep in mind
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;WSL&lt;/code&gt; is still a compatibility layer, so in extreme cases it may not behave exactly like native Linux&lt;/li&gt;
&lt;li&gt;whether a large model runs smoothly still depends on your RAM, VRAM, and CPU / GPU&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemma4:e4b&lt;/code&gt; is a realistic starting point, but actual experience still depends on the machine&lt;/li&gt;
&lt;li&gt;Hermes Agent platform integration is an extension step; getting the local model path working first, then adding Telegram, is usually more stable&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;If you want to deploy Hermes Agent locally on Windows with as little friction as possible, the smoother order is:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;WSL -&amp;gt; Ubuntu -&amp;gt; Ollama -&amp;gt; Gemma 4 -&amp;gt; Hermes Agent -&amp;gt; Telegram&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Get the local model running first, then add the gateway integration. That usually gives you a much higher success rate. For most users, this is easier to troubleshoot than piling on every component at the beginning, and it also leaves room for later expansion.&lt;/p&gt;
&lt;h2 id=&#34;original-reference&#34;&gt;Original reference
&lt;/h2&gt;&lt;p&gt;This post is rewritten and organized based on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Xchaoge Blog: &lt;a class=&#34;link&#34; href=&#34;https://www.xchaoge.com/21.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;太简单了！Hermes Agent 本地部署（无需API）接入 Telegram + 微信&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>How to Fix SSL Certificate Verification Failed When llama-cli Downloads from Hugging Face on Windows</title>
        <link>https://knightli.com/en/2026/04/17/llama-cli-hugging-face-ssl-certificate-failed-on-windows/</link>
        <pubDate>Fri, 17 Apr 2026 14:20:29 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/17/llama-cli-hugging-face-ssl-certificate-failed-on-windows/</guid>
        <description>&lt;p&gt;If you run this command on Windows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;llama-cli -hf unsloth/gemma-4-E4B-it-GGUF
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;and see an error like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;get_repo_commit: error: HTTPLIB failed: SSL server verification failed
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;error: failed to download model from Hugging Face
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;the problem is usually not CUDA or &lt;code&gt;llama.cpp&lt;/code&gt; itself. More often, the program cannot correctly access the system certificate chain in the current environment, so HTTPS verification fails.&lt;/p&gt;
&lt;p&gt;From the log, &lt;code&gt;ggml-rpc.dll&lt;/code&gt; and &lt;code&gt;ggml-cpu-alderlake.dll&lt;/code&gt; were loaded successfully, which means the runtime environment is mostly fine. The issue is mainly in the model download step.&lt;/p&gt;
&lt;h2 id=&#34;the-easiest-workaround-download-the-model-manually&#34;&gt;The easiest workaround: download the model manually
&lt;/h2&gt;&lt;p&gt;If you just want to get it running quickly, downloading the model manually is usually the most stable option.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Open the matching Hugging Face repository page.&lt;/li&gt;
&lt;li&gt;Download the required &lt;code&gt;.gguf&lt;/code&gt; file from &lt;code&gt;Files and versions&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;After the download finishes, run it with the local file path:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-gdscript3&#34; data-lang=&#34;gdscript3&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;llama&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cli&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;m&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;C&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;\&lt;span class=&#34;n&#34;&gt;Users&lt;/span&gt;\&lt;span class=&#34;n&#34;&gt;knightli&lt;/span&gt;\&lt;span class=&#34;n&#34;&gt;Downloads&lt;/span&gt;\&lt;span class=&#34;n&#34;&gt;gemma&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;4&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;e4b&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;it&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;gguf&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This bypasses SSL verification during the &lt;code&gt;-hf&lt;/code&gt; download step and is useful when you only want to verify that the model can run locally.&lt;/p&gt;
&lt;h2 id=&#34;if-you-still-want-to-use--hf-automatic-download&#34;&gt;If you still want to use &lt;code&gt;-hf&lt;/code&gt; automatic download
&lt;/h2&gt;&lt;p&gt;You can manually specify a certificate file path so the program can find a usable CA bundle in the current session.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cacert.pem&lt;/code&gt; can be obtained from the CA Extract page maintained by the curl project:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Page: &lt;a class=&#34;link&#34; href=&#34;https://curl.se/docs/caextract.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://curl.se/docs/caextract.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Direct download: &lt;a class=&#34;link&#34; href=&#34;https://curl.se/ca/cacert.pem&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://curl.se/ca/cacert.pem&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you download it in a browser, open the direct download link and save it as &lt;code&gt;cacert.pem&lt;/code&gt;. You can also download it to a fixed directory with PowerShell:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-powershell&#34; data-lang=&#34;powershell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;New-Item&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;-ItemType&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Directory&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;-Force&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;C:&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;\&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;certs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;Invoke-WebRequest&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;-Uri&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;https&lt;/span&gt;&lt;span class=&#34;err&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;//&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;curl&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;se&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ca&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cacert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;py&#34;&gt;pem&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;-OutFile&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;C:&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;\&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;certs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;\&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cacert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;py&#34;&gt;pem&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After the download finishes, set these variables in the command line:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;set SSL_CERT_FILE=C:\certs\cacert.pem
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;set CURL_CA_BUNDLE=C:\certs\cacert.pem
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then run the original command again:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;llama-cli -hf unsloth/gemma-4-E4B-it-GGUF
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the issue really comes from the certificate chain, this usually fixes it directly.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>What Does `it` Mean in Gemma-4-31B-it</title>
        <link>https://knightli.com/en/2026/04/11/gemma-4-31b-it-meaning/</link>
        <pubDate>Sat, 11 Apr 2026 20:45:34 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/11/gemma-4-31b-it-meaning/</guid>
        <description>&lt;p&gt;In &lt;code&gt;gemma-4-31B-it&lt;/code&gt;, &lt;code&gt;it&lt;/code&gt; stands for &lt;code&gt;Instruction Tuned&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For most users, that means this version is designed for chat, Q&amp;amp;A, coding help, and other instruction-following tasks.&lt;/p&gt;
&lt;h2 id=&#34;what-it-means&#34;&gt;What &lt;code&gt;it&lt;/code&gt; means
&lt;/h2&gt;&lt;p&gt;Models often come in two common forms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Base / Pre-trained: closer to a raw next-token predictor&lt;/li&gt;
&lt;li&gt;&lt;code&gt;it&lt;/code&gt;: tuned to follow user instructions more reliably&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you ask something like &amp;ldquo;translate this text&amp;rdquo; or &amp;ldquo;write a Python script&amp;rdquo;, the &lt;code&gt;it&lt;/code&gt; version usually behaves more like an assistant.&lt;/p&gt;
&lt;h2 id=&#34;what-31b-means&#34;&gt;What &lt;code&gt;31B&lt;/code&gt; means
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;31B&lt;/code&gt; means the model has about 31 billion parameters.&lt;/p&gt;
&lt;p&gt;In general:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;more parameters often mean stronger capability&lt;/li&gt;
&lt;li&gt;but also higher VRAM or RAM requirements&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So &lt;code&gt;31B&lt;/code&gt; is a relatively large model and needs stronger hardware.&lt;/p&gt;
&lt;h2 id=&#34;what-gemma-4-means&#34;&gt;What &lt;code&gt;Gemma-4&lt;/code&gt; means
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Gemma-4&lt;/code&gt; identifies the model family and generation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Gemma&lt;/code&gt;: Google&amp;rsquo;s open model family&lt;/li&gt;
&lt;li&gt;&lt;code&gt;4&lt;/code&gt;: the fourth generation in that family&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;which-one-to-choose&#34;&gt;Which one to choose
&lt;/h2&gt;&lt;p&gt;If your goal is chat, Q&amp;amp;A, translation, or coding, the &lt;code&gt;-it&lt;/code&gt; version is usually the better choice.&lt;/p&gt;
&lt;p&gt;The base version is more relevant for lower-level research, fine-tuning, or custom training workflows.&lt;/p&gt;
&lt;h2 id=&#34;one-line-summary&#34;&gt;One-line summary
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;gemma-4-31B-it&lt;/code&gt; means: Gemma 4 family, 31 billion parameters, instruction-tuned for conversation and task execution.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Gemma 4 Local Runtime Guide: From One-Command Start to Dev Integration</title>
        <link>https://knightli.com/en/2026/04/10/gemma4-local-runtime-options/</link>
        <pubDate>Fri, 10 Apr 2026 22:54:17 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/10/gemma4-local-runtime-options/</guid>
        <description>&lt;p&gt;If you want to run Gemma 4 locally, you can choose from four practical paths depending on your goal and hardware.&lt;/p&gt;
&lt;h2 id=&#34;1-fastest-start-ollama-recommended&#34;&gt;1) Fastest start: Ollama (recommended)
&lt;/h2&gt;&lt;p&gt;This is the lowest-friction option for quick testing, daily chat, and local API usage.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama run gemma4
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Highlights:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Works on Windows, macOS, and Linux&lt;/li&gt;
&lt;li&gt;Handles hardware acceleration automatically&lt;/li&gt;
&lt;li&gt;Offers OpenAI-style local API compatibility&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;2-gui-workflow-lm-studio--unsloth-studio&#34;&gt;2) GUI workflow: LM Studio / Unsloth Studio
&lt;/h2&gt;&lt;p&gt;If you prefer a desktop UI instead of terminal commands:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LM Studio: browse and run Gemma 4 quantized variants from Hugging Face (for example 4-bit, 8-bit), with resource visibility.&lt;/li&gt;
&lt;li&gt;Unsloth Studio: supports both inference and low-VRAM fine-tuning, often friendlier on 6GB-8GB GPUs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;3-low-spec-and-maximum-control-llamacpp&#34;&gt;3) Low-spec and maximum control: llama.cpp
&lt;/h2&gt;&lt;p&gt;Good for older hardware, CPU-focused setups, or users who want deeper runtime control.&lt;/p&gt;
&lt;p&gt;With &lt;code&gt;.gguf&lt;/code&gt; model files and quantization, Gemma 4 can be made practical on much smaller hardware budgets.&lt;/p&gt;
&lt;h2 id=&#34;4-developer-integration-transformers--vllm&#34;&gt;4) Developer integration: Transformers / vLLM
&lt;/h2&gt;&lt;p&gt;If you need Gemma 4 inside your own application:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Transformers: straightforward Python integration&lt;/li&gt;
&lt;li&gt;vLLM: high-throughput inference for stronger GPU environments&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;quick-selection&#34;&gt;Quick selection
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Need&lt;/th&gt;
          &lt;th&gt;Recommended tools&lt;/th&gt;
          &lt;th&gt;Hardware bar&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;I just want it running now&lt;/td&gt;
          &lt;td&gt;Ollama&lt;/td&gt;
          &lt;td&gt;Low&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;I want a ChatGPT-like UI&lt;/td&gt;
          &lt;td&gt;LM Studio&lt;/td&gt;
          &lt;td&gt;Medium&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;My VRAM is limited (6GB-8GB)&lt;/td&gt;
          &lt;td&gt;Unsloth / llama.cpp&lt;/td&gt;
          &lt;td&gt;Low&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;I am building local AI apps&lt;/td&gt;
          &lt;td&gt;Ollama / Transformers / vLLM&lt;/td&gt;
          &lt;td&gt;Medium to high&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;I need fine-tuning&lt;/td&gt;
          &lt;td&gt;Unsloth Studio&lt;/td&gt;
          &lt;td&gt;Medium to high&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;model-size-suggestion&#34;&gt;Model size suggestion
&lt;/h2&gt;&lt;p&gt;Gemma 4 comes in multiple sizes (for example E2B, E4B, 31B).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Start with quantized E2B/E4B on mainstream laptops&lt;/li&gt;
&lt;li&gt;Move to larger variants only after your baseline pipeline is stable&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>How to Troubleshoot Slow `ollama pull` Model Downloads</title>
        <link>https://knightli.com/en/2026/04/09/ollama-download-slow-troubleshooting/</link>
        <pubDate>Thu, 09 Apr 2026 10:42:39 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/09/ollama-download-slow-troubleshooting/</guid>
        <description>&lt;p&gt;&lt;code&gt;ollama pull model_name:tag&lt;/code&gt; can be very slow in some regions, and the download process is not always stable.&lt;/p&gt;
&lt;p&gt;If your issue looks like repeated interruptions halfway through a large model download, with errors such as &lt;code&gt;TLS handshake timeout&lt;/code&gt; or &lt;code&gt;unexpected EOF&lt;/code&gt;, the bottleneck may not be &lt;code&gt;registry.ollama.ai&lt;/code&gt; itself, but the actual download path after the redirect.&lt;/p&gt;
&lt;p&gt;This article walks through a simple troubleshooting approach: first get the real model file URLs, then confirm where the traffic actually ends up, and finally optimize only the domains that matter.&lt;/p&gt;
&lt;h2 id=&#34;get-the-model-file-download-urls&#34;&gt;Get the model file download URLs
&lt;/h2&gt;&lt;p&gt;You can use the following project to extract the manifest and blob download URLs for an Ollama model directly:&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Gholamrezadar/ollama-direct-downloader&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/Gholamrezadar/ollama-direct-downloader&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;gemma4:latest&lt;/code&gt; as an example, you can extract links like the following.&lt;/p&gt;
&lt;h3 id=&#34;manifest-url&#34;&gt;Manifest URL
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;https://registry.ollama.ai/v2/library/gemma4/manifests/latest
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;blob-urls&#34;&gt;Blob URLs
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:f0988ff50a2458c598ff6b1b87b94d0f5c44d73061c2795391878b00b2285e11
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:7339fa418c9ad3e8e12e74ad0fd26a9cc4be8703f9c110728a992b193be85cb2
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:56380ca2ab89f1f68c283f4d50863c0bcab52ae3f1b9a88e4ab5617b176f71a3
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you only want a quick verification, you can also download the manifest and blobs directly with &lt;code&gt;curl&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -L &lt;span class=&#34;s2&#34;&gt;&amp;#34;https://registry.ollama.ai/v2/library/gemma4/manifests/latest&amp;#34;&lt;/span&gt; -o &lt;span class=&#34;s2&#34;&gt;&amp;#34;latest&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -L &lt;span class=&#34;s2&#34;&gt;&amp;#34;https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:f0988ff50a2458c598ff6b1b87b94d0f5c44d73061c2795391878b00b2285e11&amp;#34;&lt;/span&gt; -o &lt;span class=&#34;s2&#34;&gt;&amp;#34;sha256-f0988ff50a2458c598ff6b1b87b94d0f5c44d73061c2795391878b00b2285e11&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -L &lt;span class=&#34;s2&#34;&gt;&amp;#34;https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a&amp;#34;&lt;/span&gt; -o &lt;span class=&#34;s2&#34;&gt;&amp;#34;sha256-4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -L &lt;span class=&#34;s2&#34;&gt;&amp;#34;https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:7339fa418c9ad3e8e12e74ad0fd26a9cc4be8703f9c110728a992b193be85cb2&amp;#34;&lt;/span&gt; -o &lt;span class=&#34;s2&#34;&gt;&amp;#34;sha256-7339fa418c9ad3e8e12e74ad0fd26a9cc4be8703f9c110728a992b193be85cb2&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;the-real-download-url-after-the-redirect&#34;&gt;The real download URL after the redirect
&lt;/h2&gt;&lt;p&gt;If you try downloading one of the blobs with &lt;code&gt;wget&lt;/code&gt;, you will notice that the request does not stay on &lt;code&gt;registry.ollama.ai&lt;/code&gt;. It gets redirected to a &lt;code&gt;Cloudflare R2&lt;/code&gt; object storage URL:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wget https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;There are a few key details in the log:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;registry.ollama.ai&lt;/code&gt; returns &lt;code&gt;307 Temporary Redirect&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The final download URL lands on &lt;code&gt;*.r2.cloudflarestorage.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The large file transfer is actually being served by the object storage domain behind the redirect&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This matters because if your proxy or routing rules only cover &lt;code&gt;registry.ollama.ai&lt;/code&gt; but not &lt;code&gt;*.r2.cloudflarestorage.com&lt;/code&gt;, downloads can still be slow or repeatedly interrupted.&lt;/p&gt;
&lt;p&gt;Here is one example of an actual redirect log:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;wget https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--2026-04-09 09:22:04--  https://registry.ollama.ai/v2/library/gemma4/blobs/sha256:4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Resolving registry.ollama.ai (registry.ollama.ai)... 104.21.75.227, 172.67.182.229, 2606:4700:3034::ac43:b6e5, ...
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Connecting to registry.ollama.ai (registry.ollama.ai)|104.21.75.227|:443... connected.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;HTTP request sent, awaiting response... 307 Temporary Redirect
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Location: https://dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com/ollama/docker/registry/v2/blobs/sha256/4c/4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a/data?... [following]
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;--2026-04-09 09:22:05--  https://dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com/ollama/docker/registry/v2/blobs/sha256/4c/4c27e0f5b5adf02ac956c7322bd2ee7636fe3f45a8512c9aba5385242cb6e09a/data?...
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Resolving dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com (dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com)... 172.64.66.1, 2606:4700:2ff9::1
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Connecting to dd20bb891979d25aebc8bec07b2b3bbc.r2.cloudflarestorage.com|172.64.66.1|:443... connected.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;HTTP request sent, awaiting response... 200 OK
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Length: 9608338848 (8.9G) [application/octet-stream]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;adjust-your-network-settings&#34;&gt;Adjust your network settings
&lt;/h2&gt;&lt;p&gt;Once you confirm the real download path, the troubleshooting direction becomes much clearer.&lt;/p&gt;
&lt;p&gt;If you are using a proxy, split routing, or custom DNS, check these first:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether &lt;code&gt;registry.ollama.ai&lt;/code&gt; and &lt;code&gt;*.r2.cloudflarestorage.com&lt;/code&gt; are using the same stable route&lt;/li&gt;
&lt;li&gt;Whether your proxy rules cover only the former but miss the latter&lt;/li&gt;
&lt;li&gt;Whether your current outbound path is suitable for sustained multi-GB downloads&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The key issue here is not simply whether the official site opens, but whether the redirected object storage path is stable enough for long-running large-file transfers. In many cases, the real bottleneck is the &lt;code&gt;Cloudflare R2&lt;/code&gt; layer rather than the registry domain in front of it.&lt;/p&gt;
&lt;h2 id=&#34;before-and-after-comparison&#34;&gt;Before-and-after comparison
&lt;/h2&gt;&lt;p&gt;Here is one real-world example while downloading &lt;code&gt;gemma4:31b-it-q8_0&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Before adjusting the network path, the download was slow and failed midway:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;PS C:\Users\knightli&amp;gt; ollama run gemma4:31b-it-q8_0
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pulling manifest
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pulling a0feadb736f5:  38% ▕██████████████████████                                    ▏  12 GB/ 33 GB  1.2 MB/s   4h40m
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Error: max retries exceeded: unexpected EOF
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;After the adjustment, the same model download became noticeably faster and more stable:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;PS C:\Users\knightli&amp;gt; ollama run gemma4:31b-it-q8_0
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pulling manifest
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pulling a0feadb736f5:  46% ▕████████████████████████████████████████████████████████████████▏ 15 GB/ 33 GB  8.5 MB/s  35m23s
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This does not mean every network environment will see the same improvement, but it does support one useful conclusion: the bottleneck may be the actual large-file download path rather than the Ollama client itself.&lt;/p&gt;
&lt;h2 id=&#34;a-more-practical-troubleshooting-order&#34;&gt;A more practical troubleshooting order
&lt;/h2&gt;&lt;p&gt;If you run into the same issue, this order usually works well:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Run &lt;code&gt;ollama pull&lt;/code&gt; or &lt;code&gt;ollama run&lt;/code&gt; once and confirm the issue is reproducible.&lt;/li&gt;
&lt;li&gt;Test a blob URL with &lt;code&gt;wget&lt;/code&gt; or &lt;code&gt;curl -L&lt;/code&gt; and confirm whether it redirects to &lt;code&gt;*.r2.cloudflarestorage.com&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Adjust your proxy or routing only for the real download domain, then test speed and stability again.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The benefit of this order is that each step validates one clear hypothesis, so you do not have to troubleshoot blindly.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;When &lt;code&gt;ollama pull&lt;/code&gt; is slow, the problem is often not that &lt;code&gt;registry.ollama.ai&lt;/code&gt; is unreachable, but that the &lt;code&gt;Cloudflare R2&lt;/code&gt; path actually serving the large files is unstable.&lt;/p&gt;
&lt;p&gt;So instead of retrying over and over, a better approach is to identify the real download path first and optimize the network route where the traffic actually lands.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Gemma 4 on Raspberry Pi 5: It Works, But Responses Are Slow</title>
        <link>https://knightli.com/en/2026/04/08/gemma4-on-raspberry-pi5-benchmark/</link>
        <pubDate>Wed, 08 Apr 2026 18:42:00 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/08/gemma4-on-raspberry-pi5-benchmark/</guid>
        <description>&lt;p&gt;I ran a near-limit experiment: running Gemma 4 on a &lt;code&gt;Raspberry Pi 5 (8GB RAM)&lt;/code&gt;. I was not targeting larger variants, only the smallest &lt;code&gt;E2B&lt;/code&gt; model.&lt;/p&gt;
&lt;p&gt;Conclusion first: it runs and it is usable, but it fits low-interaction workflows better than real-time chat.&lt;/p&gt;
&lt;h2 id=&#34;test-environment&#34;&gt;Test Environment
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Device: Raspberry Pi 5 (4-core CPU, 8GB RAM)&lt;/li&gt;
&lt;li&gt;OS: Ubuntu Server (no GUI)&lt;/li&gt;
&lt;li&gt;Access method: SSH&lt;/li&gt;
&lt;li&gt;Runtime: LM Studio CLI (command-line-only mode)&lt;/li&gt;
&lt;li&gt;Model: Gemma 4 E2B (about 4.5GB)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;step-1-install-and-start-lm-studio-cli&#34;&gt;Step 1: Install and Start LM Studio CLI
&lt;/h2&gt;&lt;p&gt;I installed the LM Studio CLI build on the Pi, then started the service and checked available commands.&lt;/p&gt;
&lt;p&gt;For a terminal-only setup, this deployment mode is a good fit for Raspberry Pi.&lt;/p&gt;
&lt;h2 id=&#34;step-2-move-model-storage-to-ssd&#34;&gt;Step 2: Move Model Storage to SSD
&lt;/h2&gt;&lt;p&gt;To avoid heavy SD card writes, I switched model download storage to an external SSD.&lt;/p&gt;
&lt;p&gt;On Raspberry Pi 5, SSD usage is much more practical than on older models. For long-term local model runs, SSD is strongly recommended.&lt;/p&gt;
&lt;h2 id=&#34;step-3-download-and-load-gemma-4-e2b&#34;&gt;Step 3: Download and Load Gemma 4 E2B
&lt;/h2&gt;&lt;p&gt;After download, the model loaded into memory successfully.&lt;/p&gt;
&lt;p&gt;According to official information, Gemma 4 includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tool-calling support for agent-style workflows (function calling)&lt;/li&gt;
&lt;li&gt;Multimodal capabilities (image/video; smaller models also include audio-related capability)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;128K&lt;/code&gt; context window&lt;/li&gt;
&lt;li&gt;Apache 2.0 license (commercial use allowed)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Given Raspberry Pi hardware limits, E2B is the most practical tier to start with.&lt;/p&gt;
&lt;h2 id=&#34;step-4-start-api-and-enable-lan-access&#34;&gt;Step 4: Start API and Enable LAN Access
&lt;/h2&gt;&lt;p&gt;After loading, I started the API on local port &lt;code&gt;4000&lt;/code&gt; and confirmed model listing works via HTTP.&lt;/p&gt;
&lt;p&gt;The issue: by default, it only listens on localhost, so other LAN devices cannot access it directly.&lt;/p&gt;
&lt;p&gt;Since host binding was not exposed by the startup options, I used &lt;code&gt;socat&lt;/code&gt; for port forwarding, bridging an external Pi port to LM Studio&amp;rsquo;s internal port.&lt;/p&gt;
&lt;p&gt;Result: successful. I could query the model list from a MacBook on the same LAN.&lt;/p&gt;
&lt;h2 id=&#34;step-5-connect-to-editor-zed&#34;&gt;Step 5: Connect to Editor (Zed)
&lt;/h2&gt;&lt;p&gt;LM Studio&amp;rsquo;s local server is OpenAI-API-compatible, so most tools that support custom &lt;code&gt;base_url&lt;/code&gt; can connect.&lt;/p&gt;
&lt;p&gt;I added a new LLM provider in Zed pointing to the Pi-hosted Gemma 4 instance, and in-editor chat worked.&lt;/p&gt;
&lt;h2 id=&#34;practical-usability&#34;&gt;Practical Usability
&lt;/h2&gt;&lt;p&gt;This setup is suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Local automation scripts&lt;/li&gt;
&lt;li&gt;Low-concurrency, low-real-time assistant tasks&lt;/li&gt;
&lt;li&gt;Personal learning and edge-device experimentation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Less suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;High-frequency interactive chat&lt;/li&gt;
&lt;li&gt;Development collaboration scenarios sensitive to response latency&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;Running Gemma 4 (E2B) on &lt;code&gt;Raspberry Pi 5&lt;/code&gt; is feasible, and the practical output quality is better than expected.&lt;/p&gt;
&lt;p&gt;If your goal is offline operation, tool integration, and lightweight-to-mid tasks, this setup is worth trying. If your goal is smooth real-time interaction, stronger hardware is still the better choice.&lt;/p&gt;
&lt;h2 id=&#34;related-posts&#34;&gt;Related Posts
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/05/google-gemma-4-model-comparison/&#34; &gt;Google Gemma 4 Model Comparison: How to Choose Between 2B/4B/26B/31B&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/08/android-gemma4-install-run-guide/&#34; &gt;How to Install and Run Gemma 4 on Android: Complete Getting-Started Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/08/run-gemma4-on-laptop/&#34; &gt;How to Run Gemma 4 on a Laptop: 5-Minute Local Setup Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/08/openclaw-connect-gemma4-local/&#34; &gt;Connect OpenClaw to Local Gemma 4: Complete Setup Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Connect OpenClaw to Local Gemma 4: Complete Setup Guide</title>
        <link>https://knightli.com/en/2026/04/08/openclaw-connect-gemma4-local/</link>
        <pubDate>Wed, 08 Apr 2026 18:18:00 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/08/openclaw-connect-gemma4-local/</guid>
        <description>&lt;p&gt;This guide shows how to connect &lt;code&gt;OpenClaw&lt;/code&gt; to a local &lt;code&gt;Gemma 4&lt;/code&gt; model through &lt;code&gt;Ollama&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you have not deployed Gemma 4 locally yet, start here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/08/run-gemma4-on-laptop/&#34; &gt;How to Run Gemma 4 on a Laptop: 5-Minute Local Setup Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;step-1-start-the-ollama-api-service&#34;&gt;Step 1: Start the Ollama API Service
&lt;/h2&gt;&lt;p&gt;Start Ollama first:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama serve
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then verify the API quickly with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl http://localhost:11434/api/generate -d &lt;span class=&#34;s1&#34;&gt;&amp;#39;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s1&#34;&gt;  &amp;#34;model&amp;#34;: &amp;#34;gemma4:12b&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s1&#34;&gt;  &amp;#34;prompt&amp;#34;: &amp;#34;Hello&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;s1&#34;&gt;}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you get a model response, your local API is ready.&lt;/p&gt;
&lt;h2 id=&#34;step-2-configure-openclaw-to-use-ollama&#34;&gt;Step 2: Configure OpenClaw to Use Ollama
&lt;/h2&gt;&lt;p&gt;The OpenClaw config file is usually located at:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;~/.openclaw/config.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Edit &lt;code&gt;config.yaml&lt;/code&gt; and add a local model entry under &lt;code&gt;models&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nt&#34;&gt;models&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;c&#34;&gt;# Your existing model config...&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;gemma4-local&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;provider&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;l&#34;&gt;ollama&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;base_url&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;l&#34;&gt;http://localhost:11434&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;l&#34;&gt;gemma4:12b&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;timeout&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;l&#34;&gt;120s&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;step-3-set-default-model-optional&#34;&gt;Step 3: Set Default Model (Optional)
&lt;/h2&gt;&lt;p&gt;If you want Gemma 4 as the default model:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nt&#34;&gt;default_model&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;l&#34;&gt;gemma4-local&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;step-4-restart-and-verify-openclaw&#34;&gt;Step 4: Restart and Verify OpenClaw
&lt;/h2&gt;&lt;p&gt;Restart OpenClaw:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;openclaw restart
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;List available models:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;openclaw models list
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Run a quick chat test:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;openclaw chat --model gemma4-local &lt;span class=&#34;s2&#34;&gt;&amp;#34;Hello&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the chat returns normally, OpenClaw is successfully connected to local Gemma 4.&lt;/p&gt;
&lt;h2 id=&#34;common-troubleshooting&#34;&gt;Common Troubleshooting
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;connection refused&lt;/code&gt;: make sure &lt;code&gt;ollama serve&lt;/code&gt; is running.&lt;/li&gt;
&lt;li&gt;Model not found: check model name with &lt;code&gt;ollama list&lt;/code&gt; (for example &lt;code&gt;gemma4:12b&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Timeout: increase &lt;code&gt;timeout&lt;/code&gt; and test a smaller model first.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;related-posts&#34;&gt;Related Posts
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/05/google-gemma-4-model-comparison/&#34; &gt;Google Gemma 4 Model Comparison: How to Choose Between 2B/4B/26B/31B&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/08/android-gemma4-install-run-guide/&#34; &gt;How to Install and Run Gemma 4 on Android: Complete Getting-Started Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/08/run-gemma4-on-laptop/&#34; &gt;How to Run Gemma 4 on a Laptop: 5-Minute Local Setup Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>How to Run Gemma 4 on a Laptop: 5-Minute Local Setup Guide</title>
        <link>https://knightli.com/en/2026/04/08/run-gemma4-on-laptop/</link>
        <pubDate>Wed, 08 Apr 2026 18:06:00 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/08/run-gemma4-on-laptop/</guid>
        <description>&lt;p&gt;If you want to run Gemma 4 locally on a laptop, &lt;code&gt;Ollama&lt;/code&gt; is one of the fastest and simplest options. Even without complex setup, you can usually get it running in about five minutes.&lt;/p&gt;
&lt;h2 id=&#34;step-1-install-ollama&#34;&gt;Step 1: Install Ollama
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;Open &lt;code&gt;https://ollama.com&lt;/code&gt; and download the installer for your OS.&lt;/li&gt;
&lt;li&gt;Complete installation based on your system:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;macOS: drag it to &lt;code&gt;Applications&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Windows: run the &lt;code&gt;.exe&lt;/code&gt; installer.&lt;/li&gt;
&lt;li&gt;Linux: use the install script from the official site.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After installation, Ollama runs as a background service. Beyond initial setup, daily usage is mostly simple commands.&lt;/p&gt;
&lt;h2 id=&#34;step-2-download-a-gemma-4-model&#34;&gt;Step 2: Download a Gemma 4 Model
&lt;/h2&gt;&lt;p&gt;Open a terminal and run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama pull gemma4:4b
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If your machine is stronger, you can switch to &lt;code&gt;12b&lt;/code&gt; or &lt;code&gt;27b&lt;/code&gt;. Once downloaded, the model is stored locally.&lt;/p&gt;
&lt;p&gt;Check downloaded models with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama list
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;step-3-run-the-model&#34;&gt;Step 3: Run the Model
&lt;/h2&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama run gemma4:4b
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This opens an interactive chat session in your terminal. Type your prompt and press Enter. To exit, type:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;/bye
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If you prefer a browser chat UI, you can pair it with &lt;code&gt;Open WebUI&lt;/code&gt;. It wraps Ollama with a local web interface and is usually quick to set up with Docker.&lt;/p&gt;
&lt;h2 id=&#34;laptop-performance-tips&#34;&gt;Laptop Performance Tips
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Apple Silicon (M2/M3/M4): Metal acceleration is enabled by default, and &lt;code&gt;12B&lt;/code&gt; can run well.&lt;/li&gt;
&lt;li&gt;NVIDIA GPU: CUDA is used automatically when a compatible GPU is detected. Keep drivers updated.&lt;/li&gt;
&lt;li&gt;CPU-only inference: works, but larger models will be slower. For most CPU-only setups, &lt;code&gt;4B&lt;/code&gt; is the practical default.&lt;/li&gt;
&lt;li&gt;Free memory before loading large models: as a rough rule, each billion parameters needs about &lt;code&gt;0.5GB to 1GB&lt;/code&gt; RAM.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-to-choose-a-model&#34;&gt;How to Choose a Model
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Gemma 4 1B&lt;/code&gt;: good for lightweight Q&amp;amp;A, simple summarization, and quick lookups; limited on complex reasoning.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemma 4 4B&lt;/code&gt;: best for most daily tasks (writing help, coding help, document summarization) with strong speed/quality balance.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemma 4 12B&lt;/code&gt;: better for longer context and more complex tasks, especially coding and reasoning.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Gemma 4 27B&lt;/code&gt;: better for high-demand workloads and closer to frontier-cloud quality, but needs significantly stronger hardware.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;related-posts&#34;&gt;Related Posts
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/05/google-gemma-4-model-comparison/&#34; &gt;Google Gemma 4 Model Comparison: How to Choose Between 2B/4B/26B/31B&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/08/android-gemma4-install-run-guide/&#34; &gt;How to Install and Run Gemma 4 on Android: Complete Getting-Started Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>How to Install and Run Gemma 4 on Android: Complete Getting-Started Guide</title>
        <link>https://knightli.com/en/2026/04/08/android-gemma4-install-run-guide/</link>
        <pubDate>Wed, 08 Apr 2026 17:55:53 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/08/android-gemma4-install-run-guide/</guid>
        <description>&lt;p&gt;If you want to run Gemma 4 offline on your phone, this guide walks you through the full process from setup to practical usage.&lt;/p&gt;
&lt;h2 id=&#34;step-1-get-the-app&#34;&gt;Step 1: Get the App
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;Google AI Edge Gallery&lt;/code&gt; is currently not available on Google Play, so you need to install it via APK sideloading.&lt;/p&gt;
&lt;p&gt;On your Android device, go to:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Settings -&amp;gt; Apps -&amp;gt; Special app access -&amp;gt; Install unknown apps&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Then:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Find your browser (for example, Chrome or Firefox) and enable &amp;ldquo;Allow from this source.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Open the &lt;code&gt;Google AI Edge Gallery&lt;/code&gt; GitHub Releases page in your mobile browser.&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;URL: &lt;a class=&#34;link&#34; href=&#34;https://github.com/google-ai-edge/gallery/releases&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/google-ai-edge/gallery/releases&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;Download the latest &lt;code&gt;.apk&lt;/code&gt; package.&lt;/li&gt;
&lt;li&gt;After the download completes, open the file from notifications or your file manager and follow the prompts.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With a stable connection, this step usually takes around 2 minutes.&lt;/p&gt;
&lt;h2 id=&#34;step-2-open-the-app-and-grant-permissions&#34;&gt;Step 2: Open the App and Grant Permissions
&lt;/h2&gt;&lt;p&gt;When you first open &lt;code&gt;AI Edge Gallery&lt;/code&gt;, it will request storage permission to save model files. It&amp;rsquo;s best to allow this; otherwise, the app cannot download or load models.&lt;/p&gt;
&lt;p&gt;You will typically see these sections on the home screen:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Ask Image&lt;/code&gt;: Vision tasks (describe images, answer questions about photos)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;AI Chat&lt;/code&gt;: Standard text chat&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Summarize&lt;/code&gt;: Paste text and generate summaries&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Smart Reply&lt;/code&gt;: Generate reply suggestions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For most users, &lt;code&gt;AI Chat&lt;/code&gt; is the primary entry point.&lt;/p&gt;
&lt;h2 id=&#34;step-3-download-a-gemma-4-model&#34;&gt;Step 3: Download a Gemma 4 Model
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;Enter &lt;code&gt;AI Chat&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Tap &lt;code&gt;Get Models&lt;/code&gt; when prompted.&lt;/li&gt;
&lt;li&gt;Choose a Gemma 4 model from the list (model size is shown).&lt;/li&gt;
&lt;li&gt;Pick based on your device capability; if your phone has &lt;code&gt;8GB RAM&lt;/code&gt;, start with &lt;code&gt;Gemma 4 4B&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Tap &lt;code&gt;Download&lt;/code&gt; and let it run in the background.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note: Larger models take longer to download. You can download multiple models and switch between them later. Downloaded models stay on your device, so you do not need to re-download them.&lt;/p&gt;
&lt;h2 id=&#34;step-4-start-chatting&#34;&gt;Step 4: Start Chatting
&lt;/h2&gt;&lt;p&gt;After the model download is finished:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Tap the model name to load it (the first load usually takes 10 to 30 seconds depending on model size and device performance).&lt;/li&gt;
&lt;li&gt;Enter your prompt in the chat box and send it.&lt;/li&gt;
&lt;li&gt;The model generates responses locally, and your data does not leave the phone.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first reply is often slower due to model warm-up. Later messages in the same session are usually faster.&lt;/p&gt;
&lt;h2 id=&#34;step-5-try-vision-features-gemma-4-multimodal&#34;&gt;Step 5: Try Vision Features (Gemma 4 Multimodal)
&lt;/h2&gt;&lt;p&gt;If you downloaded a Gemma 4 multimodal variant:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Go back to the main menu and open &lt;code&gt;Ask Image&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Select an image or take a photo.&lt;/li&gt;
&lt;li&gt;Ask a question (for example, &amp;ldquo;What&amp;rsquo;s in this image?&amp;rdquo; or &amp;ldquo;Is there any text I should pay attention to?&amp;rdquo;).&lt;/li&gt;
&lt;li&gt;Wait for the model to analyze the image locally and return a result.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This feature works offline, and your image is not sent to external servers.&lt;/p&gt;
&lt;h2 id=&#34;related-posts&#34;&gt;Related Posts
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/05/google-gemma-4-model-comparison/&#34; &gt;Google Gemma 4 Model Comparison: How to Choose Between 2B/4B/26B/31B&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/08/run-gemma4-on-laptop/&#34; &gt;How to Run Gemma 4 on a Laptop: 5-Minute Local Setup Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>Google Gemma 4 Model Comparison: How to Choose Between 2B/4B/26B/31B</title>
        <link>https://knightli.com/en/2026/04/05/google-gemma-4-model-comparison/</link>
        <pubDate>Sun, 05 Apr 2026 08:30:00 +0800</pubDate>
        
        <guid>https://knightli.com/en/2026/04/05/google-gemma-4-model-comparison/</guid>
        <description>&lt;p&gt;Gemma 4 focuses on &lt;code&gt;multimodality&lt;/code&gt; and &lt;code&gt;local offline inference&lt;/code&gt;, with a full range from lightweight to high-performance models. For most local deployment users, the key is not choosing the largest model, but choosing the one that best matches hardware and task needs.&lt;/p&gt;
&lt;h2 id=&#34;gemma-4-model-comparison&#34;&gt;Gemma 4 Model Comparison
&lt;/h2&gt;&lt;blockquote&gt;
&lt;p&gt;The table below is for quick model selection. Actual performance and resource usage should be validated in your own environment.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Model&lt;/th&gt;
          &lt;th&gt;Parameter Size&lt;/th&gt;
          &lt;th&gt;Positioning&lt;/th&gt;
          &lt;th&gt;Key Strengths&lt;/th&gt;
          &lt;th&gt;Main Limitations&lt;/th&gt;
          &lt;th&gt;Recommended Scenarios&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Gemma 4 2B&lt;/td&gt;
          &lt;td&gt;2B&lt;/td&gt;
          &lt;td&gt;Ultra-lightweight&lt;/td&gt;
          &lt;td&gt;Low latency, low resource usage, lowest deployment barrier&lt;/td&gt;
          &lt;td&gt;Limited performance on complex reasoning and long task chains&lt;/td&gt;
          &lt;td&gt;Mobile, IoT, lightweight Q&amp;amp;A, simple automation&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Gemma 4 4B&lt;/td&gt;
          &lt;td&gt;4B&lt;/td&gt;
          &lt;td&gt;Lightweight enhanced&lt;/td&gt;
          &lt;td&gt;Stronger understanding and generation than 2B, still easy to deploy locally&lt;/td&gt;
          &lt;td&gt;Limited ceiling for heavy coding and complex agent tasks&lt;/td&gt;
          &lt;td&gt;Local assistant, basic document work, multilingual daily tasks&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Gemma 4 26B&lt;/td&gt;
          &lt;td&gt;26B&lt;/td&gt;
          &lt;td&gt;High-performance (MoE)&lt;/td&gt;
          &lt;td&gt;Better reasoning and tool use, suitable for production workflows&lt;/td&gt;
          &lt;td&gt;Significantly higher VRAM requirement and hardware threshold&lt;/td&gt;
          &lt;td&gt;Coding assistant, complex workflows, enterprise internal agents&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Gemma 4 31B&lt;/td&gt;
          &lt;td&gt;31B&lt;/td&gt;
          &lt;td&gt;High-performance (dense)&lt;/td&gt;
          &lt;td&gt;Best overall capability and stronger stability on complex tasks&lt;/td&gt;
          &lt;td&gt;Highest resource cost and tuning complexity&lt;/td&gt;
          &lt;td&gt;Advanced reasoning, complex coding tasks, heavy automation&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;how-to-choose-start-from-hardware-and-tasks&#34;&gt;How to Choose: Start from Hardware and Tasks
&lt;/h2&gt;&lt;p&gt;If your top concern is whether it runs smoothly, use this guideline:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;8GB&lt;/code&gt; VRAM: prioritize &lt;code&gt;2B/4B&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;12GB&lt;/code&gt; VRAM: prioritize &lt;code&gt;4B&lt;/code&gt; or quantized variants of larger models.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;24GB&lt;/code&gt; VRAM: focus on &lt;code&gt;26B&lt;/code&gt;, and evaluate quantized &lt;code&gt;31B&lt;/code&gt; based on workload.&lt;/li&gt;
&lt;li&gt;Higher VRAM or multi-GPU: consider high-precision &lt;code&gt;31B&lt;/code&gt; setups.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Prioritize stability and inference speed first, then scale up model size gradually.&lt;/p&gt;
&lt;h2 id=&#34;four-typical-use-cases&#34;&gt;Four Typical Use Cases
&lt;/h2&gt;&lt;h3 id=&#34;1-local-general-assistant&#34;&gt;1) Local General Assistant
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Preferred model: &lt;code&gt;4B&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Why: strong balance between cost and quality, suitable for long-running local use.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;2-coding-and-automation&#34;&gt;2) Coding and Automation
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Preferred model: &lt;code&gt;26B&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Why: more stable in multi-step tasks, tool calls, and script generation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;3-advanced-reasoning-and-complex-agents&#34;&gt;3) Advanced Reasoning and Complex Agents
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Preferred model: &lt;code&gt;31B&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Why: stronger robustness under complex context.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;4-edge-devices-and-lightweight-offline-use&#34;&gt;4) Edge Devices and Lightweight Offline Use
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Preferred model: &lt;code&gt;2B&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Why: easiest to deploy on resource-constrained devices.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;deployment-suggestions-ollama&#34;&gt;Deployment Suggestions (Ollama)
&lt;/h2&gt;&lt;p&gt;A practical approach is to iterate in small steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Start with &lt;code&gt;4B&lt;/code&gt; to establish a baseline (latency, memory, quality).&lt;/li&gt;
&lt;li&gt;Build a fixed test set from real tasks (for example, 20 common questions + 10 automation tasks).&lt;/li&gt;
&lt;li&gt;Compare &lt;code&gt;26B/31B&lt;/code&gt; against that set for accuracy, latency, and VRAM cost.&lt;/li&gt;
&lt;li&gt;Upgrade only when the gain is clear.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This avoids jumping to a large model too early and running into lag, low throughput, and maintenance overhead.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;The real value of Gemma 4 is not just larger parameter counts, but a practical model ladder from lightweight to high-performance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For low-cost fast rollout: start with &lt;code&gt;2B/4B&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For production-grade local AI workflows: prioritize &lt;code&gt;26B&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For advanced reasoning and heavy automation: move to &lt;code&gt;31B&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In most cases, the best Gemma 4 choice is not the biggest model, but the one with the best fit for your hardware and task goals.&lt;/p&gt;
&lt;!-- ollama-related-links:start --&gt;
&lt;h2 id=&#34;related-posts&#34;&gt;Related Posts
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/05/llm-quantization-guide-fp16-q4-q2/&#34; &gt;LLM Quantization Guide (FP16/Q8/Q5/Q4/Q2)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/06/uninstall-ollama-on-linux/&#34; &gt;Completely Uninstall Ollama on Linux&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/06/ollama-model-storage-path-and-migration/&#34; &gt;Ollama Model Storage Path and Migration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/06/check-ollama-model-loaded-on-gpu/&#34; &gt;How to Check Whether Ollama Uses GPU&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/08/android-gemma4-install-run-guide/&#34; &gt;How to Install and Run Gemma 4 on Android (Chinese)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://knightli.com/en/2026/04/08/run-gemma4-on-laptop/&#34; &gt;How to Run Gemma 4 on a Laptop: 5-Minute Local Setup Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- ollama-related-links:end --&gt;
</description>
        </item>
        
    </channel>
</rss>
