Tags
12 pages
GGUF
Running Gemma 4 12B on 8GB VRAM: Tuning llama-cli Hybrid Offload Parameters
Can an RTX 3060 Run 35B? llama.cpp --n-cpu-moe Keeps Old PCs Useful for Local LLMs
Qwen3.6-35B-A3B jailbreak local deployment: uncensored GGUF, llama.cpp, and safety boundaries
Running Qwen3.6-35B Locally on an RTX 3070 8GB: llama.cpp Deployment Notes and Tuning Parameters
llama.cpp b9196 Update: Windows Prebuilt Binaries Support CUDA 13.1, Vulkan, HIP, and SYCL
Best Local LLMs for RTX 3060 12GB: GGUF Models, Quantization, and VRAM Tips
Qwen3.6 VRAM Table: Run 27B and 35B-A3B GGUF Models on 8GB, 16GB, and 24GB GPUs
Gemma 4 VRAM Table: Run E2B, E4B, 26B, and 31B GGUF Models on Local GPUs
How to Use llama-quantize for GGUF Models
How to Get GGUF Models from Hugging Face with llama.cpp
Choosing Llama GGUF Quantization on Hugging Face: Practical Advice from Q8 to Q2
How to Download a GGUF Model from Hugging Face and Import It into Ollama