Model Download on KnightLi Blog

Where Does llama-cli -hf Save Hugging Face Models by Default

Fri, 17 Apr 2026 14:48:04 +0800

If you use llama-cli to download and run a model directly from Hugging Face, for example:

`1`	`llama-cli -hf unsloth/gemma-4-E4B-it-GGUF`

this uses the Hugging Face download support built into llama.cpp. Recent llama.cpp builds store models downloaded with -hf in the standard Hugging Face Hub cache directory.

Default cache locations

The cache location used by llama-cli -hf is first controlled by the LLAMA_CACHE environment variable. If LLAMA_CACHE is not set, llama.cpp checks Hugging Face cache variables such as HF_HUB_CACHE, HUGGINGFACE_HUB_CACHE, and HF_HOME.

If none of those variables are set, common default paths are:

System	Default cache directory
Linux	`~/.cache/huggingface/hub`
macOS	`~/.cache/huggingface/hub`
Windows	`%USERPROFILE%\.cache\huggingface\hub`

On Windows, %USERPROFILE% usually expands to:

`1`	`C:\Users\用户名`

So the default cache directory is roughly:

`1`	`C:\Users\用户名\.cache\huggingface\hub`

How to change the llama-cli cache directory

Set LLAMA_CACHE if you want to store the downloaded models on a specific disk or in a specific folder. You can also follow the Hugging Face convention and set HF_HOME; in that case, the Hub cache directory will be $HF_HOME/hub.

Temporary Windows CMD example:

1
2

set LLAMA_CACHE=D:\models\llama-cache
llama-cli -hf unsloth/gemma-4-E4B-it-GGUF

Temporary PowerShell example:

1
2

$env:LLAMA_CACHE="D:\models\llama-cache"
llama-cli -hf unsloth/gemma-4-E4B-it-GGUF

Temporary Linux / macOS example:

1
2

export LLAMA_CACHE=/data/models/llama-cache
llama-cli -hf unsloth/gemma-4-E4B-it-GGUF

Summary

llama-cli -hf ... uses the download logic from llama.cpp, but recent builds default to the Hugging Face Hub cache.
Linux / macOS default: ~/.cache/huggingface/hub
Windows default: %USERPROFILE%\.cache\huggingface\hub
To change the location, set LLAMA_CACHE, or set HF_HOME / HF_HUB_CACHE

How to Fix SSL Certificate Verification Failed When llama-cli Downloads from Hugging Face on Windows

Fri, 17 Apr 2026 14:20:29 +0800

If you run this command on Windows:

`1`	`llama-cli -hf unsloth/gemma-4-E4B-it-GGUF`

and see an error like this:

1
2

get_repo_commit: error: HTTPLIB failed: SSL server verification failed
error: failed to download model from Hugging Face

the problem is usually not CUDA or llama.cpp itself. More often, the program cannot correctly access the system certificate chain in the current environment, so HTTPS verification fails.

From the log, ggml-rpc.dll and ggml-cpu-alderlake.dll were loaded successfully, which means the runtime environment is mostly fine. The issue is mainly in the model download step.

The easiest workaround: download the model manually

If you just want to get it running quickly, downloading the model manually is usually the most stable option.

Open the matching Hugging Face repository page.
Download the required .gguf file from Files and versions.
After the download finishes, run it with the local file path:

`1`	`llama-cli -m C:\Users\knightli\Downloads\gemma-4-e4b-it.gguf`

This bypasses SSL verification during the -hf download step and is useful when you only want to verify that the model can run locally.

If you still want to use `-hf` automatic download

You can manually specify a certificate file path so the program can find a usable CA bundle in the current session.

cacert.pem can be obtained from the CA Extract page maintained by the curl project:

Page: https://curl.se/docs/caextract.html
Direct download: https://curl.se/ca/cacert.pem

If you download it in a browser, open the direct download link and save it as cacert.pem. You can also download it to a fixed directory with PowerShell:

1
2

New-Item -ItemType Directory -Force C:\certs
Invoke-WebRequest -Uri https://curl.se/ca/cacert.pem -OutFile C:\certs\cacert.pem

After the download finishes, set these variables in the command line:

1
2

set SSL_CERT_FILE=C:\certs\cacert.pem
set CURL_CA_BUNDLE=C:\certs\cacert.pem

Then run the original command again:

`1`	`llama-cli -hf unsloth/gemma-4-E4B-it-GGUF`

If the issue really comes from the certificate chain, this usually fixes it directly.

How to Get GGUF Models from Hugging Face with llama.cpp

Sun, 12 Apr 2026 09:31:38 +0800

llama.cpp can work directly with GGUF models hosted on Hugging Face, so you do not always need to download model files manually first.

If a model repository already provides GGUF files, you can use the -hf argument in the CLI, for example:

`1`	`llama-cli -hf ggml-org/gemma-3-1b-it-GGUF`

By default, this downloads from Hugging Face.
If you use another service that exposes a Hugging Face compatible API, you can switch the download endpoint with the MODEL_ENDPOINT environment variable.

One important detail is that llama.cpp only works directly with the GGUF format.
If your model is in another format, you need to convert it first with the convert_*.py scripts provided in the repository.

Hugging Face also offers several online tools related to llama.cpp, including:

converting models to GGUF
quantizing weights to reduce size
converting LoRA adapters
editing GGUF metadata in the browser
hosting llama.cpp inference endpoints

If you only want the practical takeaway, start with repositories that already provide GGUF, then use llama-cli -hf <user>/<model>. In most cases, that is the simplest path.