What are Ollama cloud models and how do you use them

A brief explanation of what Ollama cloud models are, how they differ from local models, and how to use them from the command line or via API.

If you already use Ollama to run local models, cloud models are easy to understand.

There is only one core difference:
local models run on your own machine, while cloud models run on Ollama’s cloud infrastructure and return the result to you.

What are Ollama cloud models

Ollama cloud models keep the Ollama workflow, but move the actual computation from your local machine to the cloud.

The main benefits are:

  • Less pressure on local hardware
  • Easier access to larger models that your machine cannot run well
  • You can keep using the familiar Ollama workflow

How they differ from local models

Item Local models Cloud models
Runtime location Your machine Cloud
Hardware requirements High Low
Latency Usually lower Affected by network
Privacy Stronger Requests are sent to the cloud

If you care more about privacy, low latency, and offline use, local models are a better fit.
If your hardware is limited but you still want to use larger models, cloud models are more convenient.

How to identify a cloud model

At the moment, Ollama cloud models are typically labeled with a -cloud suffix, for example:

1
gpt-oss:120b-cloud

The available model list may change over time, so the official Ollama pages should be treated as the source of truth.

How to use them

First, sign in:

1
ollama signin

After that, run a cloud model directly:

1
ollama run gpt-oss:120b-cloud

If you are calling it from code, you can also configure an API key:

1
export OLLAMA_API_KEY=your_api_key

Python example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import os
from ollama import Client

client = Client(
    host="https://ollama.com",
    headers={"Authorization": "Bearer " + os.environ["OLLAMA_API_KEY"]},
)

messages = [
    {"role": "user", "content": "Why is the sky blue?"}
]

for part in client.chat("gpt-oss:120b-cloud", messages=messages, stream=True):
    print(part["message"]["content"], end="", flush=True)

Summary

Ollama cloud models can be summarized in one sentence:

the commands are almost the same, but the model is no longer running on your local machine.

If your computer cannot handle large models well, but you still want to keep the Ollama workflow, cloud models are a very direct option.

记录并分享
Built with Hugo
Theme Stack designed by Jimmy