🍥

记录并分享日常

Tags

7 pages

Gemma

Gemma 4 MTP Tuning: Pushing Toward 120 tokens/s With an assistant Draft Model

What Is Gemma 4 assistant-MTP: How Multi-Token Prediction Draft Models Speed Up Inference

Running Gemma 4 12B on 8GB VRAM: Tuning llama-cli Hybrid Offload Parameters

Deploying DiffusionGemma Locally: Running Google’s Text Diffusion Model with vLLM

DiffusionGemma: Google Brings Diffusion Models into Text Generation

How to Use Gemma 4 12B: Hugging Face Model Card and Local Loading Guide

Can Gemma 4 12B Run Locally? 16GB PC Trial and Getting Started Notes