Skip to content

Chat Completions API

Before building agents, you need to understand the foundation: how Large Language Models (LLMs) process conversations through the Chat Completions API.

The Chat Completions API

The Chat Completions API is a stateless, request-response interface. You send a list of messages, and the model returns a completion — the next message in the conversation.

sequenceDiagram
    participant App as Your Code
    participant API as Chat Completions API
    participant LLM as Language Model
    App->>API: messages + parameters
    API->>LLM: Process messages
    LLM-->>API: Generated completion
    API-->>App: Response with choices

Every request includes the full conversation history. The API has no memory between calls — you manage the conversation state.

Messages and Roles

A conversation is a list of messages, each with a role:

Role Purpose Example
system Sets the model's behavior, persona, and constraints "You are a helpful travel assistant."
user Human input — questions, instructions, data "What's the best time to visit Japan?"
assistant Model's previous responses (for multi-turn context) "Spring (March-May) is ideal for cherry blossoms..."
tool Results from tool/function calls (covered later) {"temperature": 22, "condition": "sunny"}

The developer role

The latest OpenAI models also support a developer role as a replacement for system. They are functionally equivalent for our purposes. This workshop uses system because it works universally across all providers.

Single-Turn vs. Multi-Turn

Single-turn — one question, one answer:

messages = [
    {"role": "system", "content": "You are a travel assistant."},
    {"role": "user", "content": "What's the best time to visit Japan?"},
]

Multi-turn — a conversation with history:

messages = [
    {"role": "system", "content": "You are a travel assistant."},
    {"role": "user", "content": "What's the best time to visit Japan?"},
    {"role": "assistant", "content": "Spring (March-May) is ideal..."},
    {"role": "user", "content": "What about budget tips?"},
]

The model sees the entire message list on every call. This is how it maintains context — and why context management matters as conversations grow (see Context Management).

Key Parameters

Parameter What It Does Typical Values
model Which model to use gpt-4o-mini, gpt-4o
temperature Randomness (0 = deterministic, 2 = very random) 0.01.0
max_tokens Maximum response length 1004096
top_p Nucleus sampling (alternative to temperature) 0.11.0

Temperature for agents

For agentic tasks, use low temperature (0.00.3) to get consistent, reliable behavior. Save higher temperatures for creative tasks like brainstorming.

Token Usage

Every API call consumes tokens (roughly 4 characters per token in English). The response includes usage information:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
)

print(response.usage.prompt_tokens)      # Tokens in your input
print(response.usage.completion_tokens)  # Tokens in the response
print(response.usage.total_tokens)       # Total

Understanding token usage matters for cost control and for staying within context window limits (see Context Management).

Key Takeaways

  1. The Chat Completions API is stateless — you send the full conversation every time
  2. Roles define who said what: system, user, assistant, tool
  3. Temperature controls randomness — use low values for agents
  4. You manage conversation state — the API doesn't remember previous calls

References

Hands-On Exercise

Now try it yourself — head to the Chat Completion exercise to build a travel assistant with single-turn and multi-turn conversations.

You can run exercises from the terminal or use the Workshop TUI.