Tags
28 pages
Local LLM
Running DeepSeek 4 Locally: Antirez's ds4 Experiment on Apple Silicon Mac
A Practical llama.cpp Multi-GPU Benchmarking Approach: Is 2x V100 16GB Faster Than One 32GB Card?
RTX 5090 / 5080 AI Inference Benchmarks: Choosing for Local LLMs, 4K Video, and Real-Time 3D
DeepSeek V4 Local Private Deployment: Choosing Domestic Chips or Consumer GPU Clusters
Local LLM Models Recommended for an RTX 3060 GPU
Hermes + Qwen3.6: A Low-Cost Local Agent Deployment
NVIDIA Releases Nemotron 3 Nano Omni: An Open Omnimodal Reasoning Model for Agents
Running Qwen3.6 Locally: VRAM Requirements for 27B and 35B-A3B Quantized Models
Running DeepSeek V4 Locally: VRAM Estimates for Pro, Flash, and Base Versions
Running Gemma 4 Locally: VRAM Requirements for E2B, E4B, 26B, and 31B Quantized Models
How to Tune llama.cpp on 8GB VRAM: Why 32K Is Safer and 64K Needs KV Cache Quantization
A 16GB GPU Can Still Run 35B Models: VRAM Compression Strategies for MoE Models in LM Studio
Ollama Multi-GPU Notes: VRAM Pooling, GPU Selection, and Common Misunderstandings
Gemma 4 E4B Uncensored vs Official: What Actually Changes
How to Use llama-quantize for GGUF Models
How to Get GGUF Models from Hugging Face with llama.cpp
What Does `it` Mean in Gemma-4-31B-it
Choosing Llama GGUF Quantization on Hugging Face: Practical Advice from Q8 to Q2
How to Access a Local Ollama API Over LAN on Windows
Gemma 4 Local Runtime Guide: From One-Command Start to Dev Integration
What are Ollama cloud models and how do you use them
How to Download a GGUF Model from Hugging Face and Import It into Ollama
How to Troubleshoot Slow `ollama pull` Model Downloads
Gemma 4 on Raspberry Pi 5: It Works, But Responses Are Slow
Connect OpenClaw to Local Gemma 4: Complete Setup Guide
How to Run Gemma 4 on a Laptop: 5-Minute Local Setup Guide
How to Install and Run Gemma 4 on Android: Complete Getting-Started Guide
Google Gemma 4 Model Comparison: How to Choose Between 2B/4B/26B/31B