Many model providers offer “OpenAI-compatible” APIs. When integrating with one, the first two things to confirm are:
BASE_URL: the API endpoint provided by the model service.OPENAI_API_KEY: your access key.
If the provider is compatible with the Chat Completions API, the request shape is usually close to OpenAI’s /v1/chat/completions. That said, OpenAI is also moving newer development toward the Responses API. Chat Completions is still common, especially among third-party compatible providers, but if you are building directly on official OpenAI capabilities, you should also pay attention to the Responses API.
Official references:
Minimal Request Example
A minimal Chat Completions request looks like this:
|
|
If you are using a third-party OpenAI-compatible provider, you usually only need to replace the first line with that provider’s BASE_URL:
|
|
The three most common mistakes are:
- adding or missing
/v1inBASE_URL; - forgetting that
Authorizationmust useBearer; - using a
modelvalue that the provider does not actually support.
The Most Important Request Body Fields
The Chat Completions request body is a JSON object. The most basic and commonly used fields are below.
model
model is required. It specifies the model ID to call.
|
|
In third-party OpenAI-compatible services, the model ID may not be gpt-4o-mini. Some providers use their own model names, such as deepseek-chat, qwen-plus, or llama-3.1-70b. Always follow the provider’s documentation or model list.
messages
messages is required. It represents the conversation history from beginning to end. It is an array, and each item is one message.
Common roles include:
system: system instructions that define assistant behavior;user: user messages;assistant: the model’s previous replies;tool: tool execution results.
A simple multi-turn conversation can be written like this:
|
|
system message
A system message is usually placed at the beginning and tells the model what role to play and what rules to follow.
|
|
Common fields:
role: fixed assystem;content: the system prompt;name: optional, used to label the participant.
Not every compatible provider implements system behavior in exactly the same way. If the system prompt does not seem to work, first check whether the provider rewrites or restricts role behavior.
user message
A user message represents user input. The most common form is plain text:
|
|
For models that support multimodal input, content can also be an array containing both text and images:
|
|
Whether a compatible provider supports image input depends on the model. Do not assume multimodal support just because the endpoint path is OpenAI-compatible.
assistant message
An assistant message usually appears in multi-turn history and represents what the model said earlier.
|
|
When the model decides to call a tool, the assistant message may contain no normal text and instead include tool_calls.
tool message
A tool message sends tool execution results back to the model. It must correspond to a tool_calls.id from the previous assistant message.
|
|
This is common in Agent workflows: the model first decides which function to call, the application actually executes that function, and then the execution result is sent back as a tool message so the model can produce the final answer.
tools and tool_choice
tools tells the model which tools it can call. The most common tool type is a function tool.
Here is a weather query example:
|
|
If the model decides a tool is needed, the response may contain:
|
|
Two details matter here:
- The model is only suggesting a function call. It does not actually execute the function for your program.
argumentsis a JSON string generated by the model. You must validate it before using it.
tool_choice controls whether the model calls tools:
"none": do not call tools; generate text only;"auto": let the model choose between generating text and calling tools;"required": require the model to call one or more tools;- specify one function: force the model to call a particular tool.
The older functions and function_call fields have been replaced by tools and tool_choice. You may still see the older fields in legacy projects, but new code should prefer the newer form.
stream Streaming Output
stream is an optional boolean. When set to true, the API returns content incrementally through SSE, which is useful for realtime chat interfaces.
|
|
A non-streaming response waits until the model finishes and returns everything at once. A streaming response returns chunks continuously. Typical fragments look like this:
|
|
Your frontend or backend needs to keep reading data: lines, concatenate each delta.content, and stop when it receives [DONE].
Reading a Non-Streaming Response
A non-streaming Chat Completions response usually contains fields like these:
|
|
The most useful fields are:
choices[0].message.content: the final model reply;choices[0].message.tool_calls: the tools the model wants to call;choices[0].finish_reason: why generation stopped;usage: token usage.
Common finish_reason values include:
stop: natural stop;length: token limit reached;tool_calls: the model requested a tool call;content_filter: content was filtered.
Checklist for OpenAI-Compatible APIs
When integrating with a third-party OpenAI-compatible service, do not rely only on the endpoint path. Check:
- Whether
BASE_URLincludes/v1. - Whether
OPENAI_API_KEYbelongs to the current provider. - Whether
modelis a valid model ID. - Whether
systemmessages are supported. - Whether multimodal
image_urlinput is supported. - Whether
toolsandtool_choiceare supported. - Whether
stream: trueis supported. - Whether the token limit field is
max_completion_tokensor the oldermax_tokens. - Whether the error response format matches OpenAI exactly.
- Whether there are extra rate limits, concurrency limits, or region restrictions.
“OpenAI-compatible” usually means the calling pattern is similar. It does not mean every model capability, field behavior, or error format is exactly the same.
Conclusion
For the most basic chat use case, the core Chat Completions fields are really just:
|
|
If you are building an Agent or business automation workflow, you need to understand:
tools: tell the model which functions are available;tool_choice: control whether the model calls tools;tool_calls: read what the model wants to call;toolmessage: send real tool results back to the model;stream: turn the reply into realtime output.
For OpenAI-compatible APIs, the safest integration path is: first get the minimal text request working, then add streaming, and finally add tool calling. Every time you add a new capability, verify it with the real model and the real provider. Do not infer compatibility from field names alone.