OpenAI API Basics: Writing Chat Completions, Function Calls, and Streaming Output

Many model providers offer “OpenAI-compatible” APIs. When integrating with one, the first two things to confirm are:

BASE_URL: the API endpoint provided by the model service.
OPENAI_API_KEY: your access key.

If the provider is compatible with the Chat Completions API, the request shape is usually close to OpenAI’s /v1/chat/completions. That said, OpenAI is also moving newer development toward the Responses API. Chat Completions is still common, especially among third-party compatible providers, but if you are building directly on official OpenAI capabilities, you should also pay attention to the Responses API.

Official references:

Minimal Request Example

A minimal Chat Completions request looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

If you are using a third-party OpenAI-compatible provider, you usually only need to replace the first line with that provider’s BASE_URL:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


curl $BASE_URL/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "your-model-id",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

The three most common mistakes are:

adding or missing /v1 in BASE_URL;
forgetting that Authorization must use Bearer;
using a model value that the provider does not actually support.

The Most Important Request Body Fields

The Chat Completions request body is a JSON object. The most basic and commonly used fields are below.

model

model is required. It specifies the model ID to call.

1
2
3


{
  "model": "gpt-4o-mini"
}

In third-party OpenAI-compatible services, the model ID may not be gpt-4o-mini. Some providers use their own model names, such as deepseek-chat, qwen-plus, or llama-3.1-70b. Always follow the provider’s documentation or model list.

messages

messages is required. It represents the conversation history from beginning to end. It is an array, and each item is one message.

Common roles include:

system: system instructions that define assistant behavior;
user: user messages;
assistant: the model’s previous replies;
tool: tool execution results.

A simple multi-turn conversation can be written like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are a concise technical assistant."
    },
    {
      "role": "user",
      "content": "什么是 API？"
    },
    {
      "role": "assistant",
      "content": "API 是应用程序之间约定好的调用接口。"
    },
    {
      "role": "user",
      "content": "用一句话解释给非技术人员听。"
    }
  ]
}

system message

A system message is usually placed at the beginning and tells the model what role to play and what rules to follow.

1
2
3
4


{
  "role": "system",
  "content": "You are a helpful assistant."
}

Common fields:

role: fixed as system;
content: the system prompt;
name: optional, used to label the participant.

Not every compatible provider implements system behavior in exactly the same way. If the system prompt does not seem to work, first check whether the provider rewrites or restricts role behavior.

user message

A user message represents user input. The most common form is plain text:

1
2
3
4


{
  "role": "user",
  "content": "Hello!"
}

For models that support multimodal input, content can also be an array containing both text and images:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ],
    "max_completion_tokens": 300
  }'

Whether a compatible provider supports image input depends on the model. Do not assume multimodal support just because the endpoint path is OpenAI-compatible.

assistant message

An assistant message usually appears in multi-turn history and represents what the model said earlier.

1
2
3
4


{
  "role": "assistant",
  "content": "Hello, how can I help you today?"
}

When the model decides to call a tool, the assistant message may contain no normal text and instead include tool_calls.

tool message

A tool message sends tool execution results back to the model. It must correspond to a tool_calls.id from the previous assistant message.

1
2
3
4
5


{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"temperature\": 22, \"unit\": \"celsius\"}"
}

This is common in Agent workflows: the model first decides which function to call, the application actually executes that function, and then the execution result is sent back as a tool message so the model can produce the final answer.

tools and tool_choice

tools tells the model which tools it can call. The most common tool type is a function tool.

Here is a weather query example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36


curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather like in Boston today?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"]
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

If the model decides a tool is needed, the response may contain:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\"location\":\"Boston, MA\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Two details matter here:

The model is only suggesting a function call. It does not actually execute the function for your program.
arguments is a JSON string generated by the model. You must validate it before using it.

tool_choice controls whether the model calls tools:

"none": do not call tools; generate text only;
"auto": let the model choose between generating text and calling tools;
"required": require the model to call one or more tools;
specify one function: force the model to call a particular tool.

The older functions and function_call fields have been replaced by tools and tool_choice. You may still see the older fields in legacy projects, but new code should prefer the newer form.

stream Streaming Output

stream is an optional boolean. When set to true, the API returns content incrementally through SSE, which is useful for realtime chat interfaces.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "user",
      "content": "写一句欢迎语"
    }
  ],
  "stream": true
}

A non-streaming response waits until the model finishes and returns everything at once. A streaming response returns chunks continuously. Typical fragments look like this:

1
2
3
4
5


data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: [DONE]

Your frontend or backend needs to keep reading data: lines, concatenate each delta.content, and stop when it receives [DONE].

Reading a Non-Streaming Response

A non-streaming Chat Completions response usually contains fields like these:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello there, how may I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

The most useful fields are:

choices[0].message.content: the final model reply;
choices[0].message.tool_calls: the tools the model wants to call;
choices[0].finish_reason: why generation stopped;
usage: token usage.

Common finish_reason values include:

stop: natural stop;
length: token limit reached;
tool_calls: the model requested a tool call;
content_filter: content was filtered.

Checklist for OpenAI-Compatible APIs

When integrating with a third-party OpenAI-compatible service, do not rely only on the endpoint path. Check:

Whether BASE_URL includes /v1.
Whether OPENAI_API_KEY belongs to the current provider.
Whether model is a valid model ID.
Whether system messages are supported.
Whether multimodal image_url input is supported.
Whether tools and tool_choice are supported.
Whether stream: true is supported.
Whether the token limit field is max_completion_tokens or the older max_tokens.
Whether the error response format matches OpenAI exactly.
Whether there are extra rate limits, concurrency limits, or region restrictions.

“OpenAI-compatible” usually means the calling pattern is similar. It does not mean every model capability, field behavior, or error format is exactly the same.

Conclusion

For the most basic chat use case, the core Chat Completions fields are really just:

1
2
3
4
5


{
  "model": "your-model-id",
  "messages": [],
  "stream": false
}

If you are building an Agent or business automation workflow, you need to understand:

tools: tell the model which functions are available;
tool_choice: control whether the model calls tools;
tool_calls: read what the model wants to call;
tool message: send real tool results back to the model;
stream: turn the reply into realtime output.

For OpenAI-compatible APIs, the safest integration path is: first get the minimal text request working, then add streaming, and finally add tool calling. Every time you add a new capability, verify it with the real model and the real provider. Do not infer compatibility from field names alone.