Codex CLI OSS Mode: Set Up Ollama, LM Studio, OpenRouter, and Local Models

Codex CLI can now connect to local or custom model services through OSS mode. For developers, the value is straightforward: not every task has to use OpenAI’s default model. You can point Codex at Ollama, LM Studio, OpenRouter, an internal company gateway, or a self-hosted OpenAI-compatible inference service.

This article is based on a post from the CSDN/AtomGit community and has been checked against the current OpenAI Codex manual. One point is especially important: some profile examples in the original article use an older format. Since Codex 0.134.0, profiles are independent ~/.codex/<profile-name>.config.toml files. Codex no longer reads [profiles.xxx] tables from config.toml.

Original article: https://tianqi.csdn.net/6a33be7210ee7a33f27f7913.html

What OSS mode solves

Codex CLI is essentially a coding agent that can read a project, run commands, edit files, participate in code review, and help with debugging. The model determines its reasoning quality, speed, cost, and data boundary.

After OSS mode is enabled, Codex can connect to local “open source” providers such as Ollama or LM Studio. More broadly, as long as the backend supports OpenAI’s Responses API or Chat Completions API, it can also be connected through a custom provider.

OSS mode is worth considering when you:

want to run models on local hardware and reduce code leaving the machine;
want to send lightweight tasks to a local model to reduce cost;
want to switch between models such as Qwen, DeepSeek, Mistral, and Llama;
already have a unified model gateway or private inference service at work;
need to run Codex inside an intranet or LAN.

The boundary needs to be clear: being able to connect a model does not mean the experience will match OpenAI’s recommended models. Tool calling, long context, code-editing stability, and instruction following can vary a lot between models. For complex work, keep a stronger model profile available.

Install Codex CLI

If Codex CLI is not installed yet, use the official installer:

1

curl -fsSL https://chatgpt.com/codex/install.sh | sh

Windows users who use Codex inside WSL can run the same command in a WSL terminal, then start Codex with:

1

codex

After installation, check the version first:

1

codex --version

If you are on an older version, upgrade first. OSS mode, provider configuration, and profile behavior all depend on the Codex version, and configuration from older tutorials may no longer apply.

Start a local provider with `--oss`

The simplest entry point is to add --oss when starting Codex:

1

codex --oss

The official documentation says that when --oss is passed, Codex can run on a local open source provider, such as Ollama or LM Studio. If only --oss is passed, Codex uses the oss_provider in the configuration as the default OSS provider.

You can set the default local provider in ~/.codex/config.toml:

1

oss_provider = "ollama" # or "lmstudio"

Then run:

1

codex --oss

Codex will start with the local provider pointed to by oss_provider.

Configure the default model and provider

The user-level Codex configuration file is:

1

~/.codex/config.toml

The CLI and IDE extension share this configuration. The two most common fields are:

1
2


model = "your-model-name"
model_provider = "your-provider-id"

model is the model name passed to the backend. model_provider points to a provider defined below. The provider tells Codex where to send requests, which wire API to use, and how authentication works.

For example, you can define a provider for local Ollama:

1
2
3
4
5
6
7


model = "qwen2.5-coder:32b"
model_provider = "local_ollama"

[model_providers.local_ollama]
name = "Ollama"
base_url = "http://localhost:11434/v1"
wire_api = "responses"

These fields mean:

model: the model name, usually determined by the backend;
model_provider: which provider is currently used;
[model_providers.local_ollama]: the configuration table for the custom provider;
name: the display name;
base_url: the OpenAI-compatible API endpoint for the model service;
wire_api: the protocol type, commonly responses or chat.

Codex can still point to backends that support Chat Completions, but the official manual clearly warns that Chat Completions support is deprecated and will be removed from Codex in the future. Prefer wire_api = "responses" for new configurations.

Connect Ollama

Ollama is suitable for running models locally. First make sure Ollama is installed, the model has been pulled, and the service is running.

Example configuration:

1
2
3
4
5
6
7
8


model = "qwen2.5-coder:32b"
model_provider = "local_ollama"
oss_provider = "ollama"

[model_providers.local_ollama]
name = "Ollama"
base_url = "http://localhost:11434/v1"
wire_api = "responses"

Then start Codex:

1

codex --oss

To temporarily specify a model at startup, use -m:

1

codex --oss -m qwen2.5-coder:32b

When troubleshooting, check three things first:

Whether the Ollama service is running;
Whether base_url includes /v1;
Whether model is an actual model name available in Ollama.

Connect LM Studio

LM Studio can also provide a local OpenAI-compatible endpoint. The common address is:

1

http://localhost:1234/v1

You can configure it like this:

1
2
3
4
5
6
7
8


model = "local-model"
model_provider = "lmstudio_local"
oss_provider = "lmstudio"

[model_providers.lmstudio_local]
name = "LM Studio"
base_url = "http://localhost:1234/v1"
wire_api = "responses"

The model value must match the model name currently exposed by LM Studio. Only consider changing wire_api to the following if LM Studio exposes only a Chat Completions-compatible endpoint:

1

wire_api = "chat"

This is more of a compatibility fallback and is not recommended as the default for new configurations.

Connect OpenRouter or another cloud gateway

If you want to access multiple models through a cloud gateway, define a remote provider. For example, an OpenRouter-style OpenAI-compatible gateway can be configured as:

1
2
3
4
5
6
7
8


model = "anthropic/claude-sonnet-4-20250514"
model_provider = "openrouter"

[model_providers.openrouter]
name = "OpenRouter"
base_url = "https://openrouter.ai/api/v1"
wire_api = "responses"
experimental_bearer_token = "sk-or-v1-your-key"

If you do not want to write the secret into the config file, use environment variables or inject the token according to your team’s security policy. Secrets should not be committed to the repository, and they should not be written into project-level .codex/config.toml.

Connect a self-hosted OpenAI-compatible service

If you have deployed vLLM, SGLang, TGI, or an internal model gateway on a LAN or server, you can point Codex at that service:

1
2
3
4
5
6
7
8


model = "my-custom-model"
model_provider = "self_hosted"

[model_providers.self_hosted]
name = "Self Hosted"
base_url = "http://192.168.1.100:8080/v1"
wire_api = "responses"
experimental_bearer_token = "local-dev-key"

The most fragile part of this setup is protocol compatibility. A backend calling itself OpenAI compatible does not mean every Responses API field, streaming output, tool call, and error format is fully compatible. Validate with small tasks before giving it large code changes.

Use profiles correctly: do not write `[profiles.xxx]`

Older tutorials often use this format:

1
2
3
4


[profiles.fast-coder]
model = "qwen2.5-coder:7b"
model_provider = "ollama"
model_reasoning_effort = "low"

Do not use this format now. The official manual states that in Codex 0.134.0 and later, --profile no longer reads [profiles.profile-name] from config.toml, and the top-level profile = "profile-name" selector is no longer supported.

The new approach is to create an independent file for each profile:

1

~/.codex/fast-coder.config.toml

The file uses top-level configuration keys:

1
2
3
4
5
6
7
8


model = "qwen2.5-coder:7b"
model_provider = "local_ollama"
model_reasoning_effort = "low"

[model_providers.local_ollama]
name = "Ollama"
base_url = "http://localhost:11434/v1"
wire_api = "responses"

Start Codex with that profile:

1

codex --profile fast-coder

You can also create a profile for complex tasks:

1

~/.codex/deep-review.config.toml

Example:

1
2
3


model = "gpt-5.5"
model_provider = "openai"
model_reasoning_effort = "high"

Start it with:

1

codex --profile deep-review

This lets you separate daily small edits, complex architecture work, local offline tasks, and cloud strong-model tasks.

Do not put provider credentials in project-level config

Codex supports project-level .codex/config.toml, but provider settings, authentication, and profile selection are better kept in user-level configuration. The official manual also explains that local project configuration cannot override sensitive settings that redirect credentials, change provider auth, or select profiles. When Codex sees those keys, it ignores them and shows a startup warning.

In short:

user-level ~/.codex/config.toml: model providers, authentication method, and personal default model;
profile file ~/.codex/<name>.config.toml: differences between task modes;
project-level .codex/config.toml: project-specific, non-sensitive settings suitable for sharing with the repository;
secrets: prefer environment variables, system credentials, or a team-managed injection method.

Checklist before using OSS mode

Before starting, check the following in order:

Use codex --version to confirm the version is new enough;
Make sure the local model service is running, such as Ollama or LM Studio;
Make sure the model has been downloaded or is available on the server;
Check that base_url is correct and usually includes /v1;
Prefer wire_api = "responses";
Make sure the provider ID does not use a reserved name;
Make sure the API key is not written into the repository;
Validate tool calling, file editing, and command execution with a small task;
Keep a strong-model profile for complex tasks.

Conclusion

Codex CLI’s OSS mode makes model choice more flexible: lightweight tasks can use local models, sensitive code can stay on the local machine or intranet as much as possible, and complex tasks can switch back to a stronger model. The real points to watch are configuration format and model capability boundaries.

If you only want to try it quickly, run:

1
2


curl -fsSL https://chatgpt.com/codex/install.sh | sh
codex --oss

For long-term use, organize ~/.codex/config.toml and several independent profile files early. Separate local models, cloud gateways, and strong-model review modes. Then Codex is not just “able to connect to any model”; it can choose the right model for each development scenario.