🍥

记录并分享日常

Tags

34 pages

Local LLM

Ollama Connects to Codex App: How Local LLMs Become AI Coding Agents

Can an RTX 3060 Run 35B? llama.cpp --n-cpu-moe Keeps Old PCs Useful for Local LLMs

Qwen3.6-35B-A3B jailbreak local deployment: uncensored GGUF, llama.cpp, and safety boundaries

Running Qwen3.6-35B Locally on an RTX 3070 8GB: llama.cpp Deployment Notes and Tuning Parameters

llama.cpp b9196 Update: Windows Prebuilt Binaries Support CUDA 13.1, Vulkan, HIP, and SYCL

Claude Code + Ollama Local Deployment Guide: Build a Free AI Coding Assistant with CC Switch

Running DeepSeek 4 Locally: Antirez's ds4 Experiment on Apple Silicon Mac

A Practical llama.cpp Multi-GPU Benchmarking Approach: Is 2x V100 16GB Faster Than One 32GB Card?

RTX 5090 / 5080 AI Inference Benchmarks: Choosing for Local LLMs, 4K Video, and Real-Time 3D

DeepSeek V4 Local Private Deployment: Choosing Domestic Chips or Consumer GPU Clusters

Local LLM Models Recommended for an RTX 3060 GPU

Hermes + Qwen3.6: A Low-Cost Local Agent Deployment

NVIDIA Releases Nemotron 3 Nano Omni: An Open Omnimodal Reasoning Model for Agents

Running Qwen3.6 Locally: VRAM Requirements for 27B and 35B-A3B Quantized Models

Running DeepSeek V4 Locally: VRAM Estimates for Pro, Flash, and Base Versions

Running Gemma 4 Locally: VRAM Requirements for E2B, E4B, 26B, and 31B Quantized Models

How to Tune llama.cpp on 8GB VRAM: Why 32K Is Safer and 64K Needs KV Cache Quantization

A 16GB GPU Can Still Run 35B Models: VRAM Compression Strategies for MoE Models in LM Studio

Ollama Multi-GPU Notes: VRAM Pooling, GPU Selection, and Common Misunderstandings

Gemma 4 E4B Uncensored vs Official: What Actually Changes

How to Use llama-quantize for GGUF Models

How to Get GGUF Models from Hugging Face with llama.cpp

What Does `it` Mean in Gemma-4-31B-it

Choosing Llama GGUF Quantization on Hugging Face: Practical Advice from Q8 to Q2

How to Access a Local Ollama API Over LAN on Windows

Gemma 4 Local Runtime Guide: From One-Command Start to Dev Integration

What are Ollama cloud models and how do you use them

How to Download a GGUF Model from Hugging Face and Import It into Ollama

How to Troubleshoot Slow `ollama pull` Model Downloads

Gemma 4 on Raspberry Pi 5: It Works, But Responses Are Slow

Connect OpenClaw to Local Gemma 4: Complete Setup Guide

How to Run Gemma 4 on a Laptop: 5-Minute Local Setup Guide

How to Install and Run Gemma 4 on Android: Complete Getting-Started Guide

Google Gemma 4 Model Comparison: How to Choose Between 2B/4B/26B/31B