🍥

记录并分享日常

Tags

11 pages

GGUF

Can an RTX 3060 Run 35B? llama.cpp --n-cpu-moe Keeps Old PCs Useful for Local LLMs

Qwen3.6-35B-A3B jailbreak local deployment: uncensored GGUF, llama.cpp, and safety boundaries

Running Qwen3.6-35B Locally on an RTX 3070 8GB: llama.cpp Deployment Notes and Tuning Parameters

llama.cpp b9196 Update: Windows Prebuilt Binaries Support CUDA 13.1, Vulkan, HIP, and SYCL

Local LLM Models Recommended for an RTX 3060 GPU

Running Qwen3.6 Locally: VRAM Requirements for 27B and 35B-A3B Quantized Models

Running Gemma 4 Locally: VRAM Requirements for E2B, E4B, 26B, and 31B Quantized Models

How to Use llama-quantize for GGUF Models

How to Get GGUF Models from Hugging Face with llama.cpp

Choosing Llama GGUF Quantization on Hugging Face: Practical Advice from Q8 to Q2

How to Download a GGUF Model from Hugging Face and Import It into Ollama