Fix "Request timed out" in OpenAI API

TL;DR

The 'Request timed out' error from the OpenAI API means the server did not return a response within the allowed time. This happens with complex prompts, large token counts, or during high-demand periods. Fix by increasing your client timeout, reducing max_tokens, using streaming for long responses, and implementing retry logic with exponential backoff.

Quick facts about this guide
Fact	Value
Tool	OpenAI API
Difficulty	Intermediate
Time required	5-15 minutes
Last updated	March 2026

What does "Request timed out" mean in the OpenAI API?

When the OpenAI API returns a timeout error, it means the server took too long to generate a response. The request was received and processing began, but the response was not completed within the time limit. For non-streaming requests, the OpenAI SDK throws a timeout exception if no response arrives within the configured window (default varies by SDK and HTTP client).

This error is especially common with reasoning models (o1, o3 series) that spend significant time "thinking" before generating output, and with GPT-4 class models processing large context windows. A request that works fine with GPT-3.5-turbo may timeout with GPT-4 because the larger model takes considerably longer to process.

Timeout errors are different from 500 server errors. A timeout means the request was being processed but took too long; a 500 error means something broke during processing. The OpenAI SDK auto-retries 408 (timeout) errors with exponential backoff, but if the root cause is a genuinely slow request, retrying the same request will timeout again.

Common causes

The prompt requires extensive reasoning or

a very long response that exceeds the client-side timeout window

The max_tokens parameter is set

very high, and the model is generating a maximum-length response that takes too long

OpenAI's servers are under heavy load during

peak hours, increasing response latency beyond normal levels

The request uses a reasoning model (o1, o3) with

a complex task that triggers extended internal reasoning chains

The HTTP client or proxy

between your application and OpenAI has a shorter timeout than the expected response time

A large context window (128K tokens) with

dense content significantly increases the processing time per output token

How to fix "Request timed out" in the OpenAI API

The most effective fix is to switch to streaming responses. With streaming, the API begins sending tokens immediately as they are generated, preventing client-side timeouts. Your application receives partial responses while the full response is still being generated. This is strongly recommended by OpenAI for any request that might take longer than 10 minutes.

If you cannot use streaming, increase your client timeout. The OpenAI SDK allows configuring the timeout explicitly. Set it to at least 120 seconds for GPT-4 and 300 seconds for reasoning models. Also check for intermediate proxies, load balancers, or serverless function timeouts that may be shorter than your client timeout.

Reduce max_tokens to the minimum your use case needs. A request with max_tokens=4096 will return much faster than one with max_tokens=100000 because the model generates fewer tokens. If you do not need a very long response, lower this value.

For reasoning models specifically, use the reasoning.effort parameter (set to 'low' or 'medium') to reduce thinking time. Note that the temperature parameter is not supported with o1/o3 models and will cause a separate error.

Implement retry logic with exponential backoff for intermittent timeouts. The SDK retries automatically, but configure max_retries high enough for your use case.

Before

typescript

from openai import OpenAI
client = OpenAI()
# No timeout config, no streaming, high max_tokens
response = client.chat.completions.create(
    model="gpt-4o",
    max_tokens=100000,
    messages=[{"role": "user", "content": very_long_prompt}]
)

After

typescript

from openai import OpenAI
client = OpenAI(
    timeout=120.0,
    max_retries=3
)
# Use streaming for long responses
stream = client.chat.completions.create(
    model="gpt-4o",
    max_tokens=4096,  # Only as much as needed
    messages=[{"role": "user", "content": very_long_prompt}],
    stream=True
)
full_response = ""
for chunk in stream:
    if chunk.choices[0].delta.content:
        full_response += chunk.choices[0].delta.content
        print(chunk.choices[0].delta.content, end="")

Prevention tips

Use streaming for any request that might take more than 30 seconds — it prevents client-side timeouts and improves the user experience by showing partial results
Set max_tokens to the minimum needed for your use case rather than a large default — lower values mean faster responses
Configure explicit timeouts on your OpenAI client (at least 120s for GPT-4, 300s for reasoning models) and check for shorter timeouts in proxies or serverless environments
For reasoning models (o1, o3), use reasoning.effort='medium' or 'low' to reduce processing time when maximum reasoning depth is not required

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

My OpenAI API calls keep timing out when using GPT-4o with long prompts. The error says 'Request timed out'. How do I implement streaming and proper timeout configuration to handle long-running requests?

OpenAI API Prompt

My OpenAI API request times out with model gpt-4o and max_tokens=100000. Here is my code: [paste code]. Convert it to use streaming, add proper timeout configuration, and implement retry logic.

Frequently asked questions

Why does my OpenAI API request keep timing out?

The most common causes are: high max_tokens requiring long generation time, complex prompts with reasoning models, server load during peak hours, and client-side timeout settings that are too low. Switch to streaming, reduce max_tokens, and increase your timeout configuration.

Does OpenAI charge for requests that return "Request timed out"?

For non-streaming requests that timeout, you are generally not charged because no response was delivered. However, for streaming requests that timeout mid-stream, you may be charged for tokens generated before the failure. Monitor your usage dashboard to verify.

How do I prevent timeouts when using GPT-4 or o1 models?

Use streaming responses, which begin returning tokens immediately and prevent client-side timeouts. Set your client timeout to at least 120 seconds for GPT-4 and 300 seconds for reasoning models. Also reduce max_tokens to only what you actually need.

What timeout should I set for the OpenAI API client?

For GPT-3.5-turbo, 60 seconds is usually sufficient. For GPT-4o, use at least 120 seconds. For reasoning models (o1, o3), set 300 seconds or more. If you use streaming, timeouts are less critical since tokens arrive continuously.

Can serverless functions cause OpenAI API timeouts?

Yes. Many serverless platforms (AWS Lambda, Vercel Functions, Cloudflare Workers) have execution time limits of 10-60 seconds. If your OpenAI request takes longer, the serverless function times out before the response arrives. Use streaming or move long-running requests to a persistent server.

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your issue.

Book a free consultation

How to Fix "Request timed out" in the OpenAI API

What does "Request timed out" mean in the OpenAI API?

Common causes

The prompt requires extensive reasoning or

The max_tokens parameter is set

OpenAI's servers are under heavy load during

The request uses a reasoning model (o1, o3) with

The HTTP client or proxy

A large context window (128K tokens) with

How to fix "Request timed out" in the OpenAI API

Prevention tips

Still stuck?

Frequently asked questions

Talk to an Expert

Your next step

Stuck on this for days?

We put the rapid in RapidDev

How to Fix "Request timed out" in the OpenAI API

What does "Request timed out" mean in the OpenAI API?

Common causes

The prompt requires extensive reasoning or

The max_tokens parameter is set

OpenAI's servers are under heavy load during

The request uses a reasoning model (o1, o3) with

The HTTP client or proxy

A large context window (128K tokens) with

How to fix "Request timed out" in the OpenAI API

Prevention tips

Still stuck?

More OpenAI API errors

Error 500: Internal Server Error

Request blocked by safety system

Too many tokens in response

Frequently asked questions

Talk to an Expert

Your next step

Stuck on this for days?

We put the rapid in RapidDev