The 'Request timed out' error from the OpenAI API means the server did not return a response within the allowed time. This happens with complex prompts, large token counts, or during high-demand periods. Fix by increasing your client timeout, reducing max_tokens, using streaming for long responses, and implementing retry logic with exponential backoff.
What does "Request timed out" mean in the OpenAI API?
When the OpenAI API returns a timeout error, it means the server took too long to generate a response. The request was received and processing began, but the response was not completed within the time limit. For non-streaming requests, the OpenAI SDK throws a timeout exception if no response arrives within the configured window (default varies by SDK and HTTP client).
This error is especially common with reasoning models (o1, o3 series) that spend significant time "thinking" before generating output, and with GPT-4 class models processing large context windows. A request that works fine with GPT-3.5-turbo may timeout with GPT-4 because the larger model takes considerably longer to process.
Timeout errors are different from 500 server errors. A timeout means the request was being processed but took too long; a 500 error means something broke during processing. The OpenAI SDK auto-retries 408 (timeout) errors with exponential backoff, but if the root cause is a genuinely slow request, retrying the same request will timeout again.
Common causes
The prompt requires extensive reasoning or
a very long response that exceeds the client-side timeout window
The max_tokens parameter is set
very high, and the model is generating a maximum-length response that takes too long
OpenAI's servers are under heavy load during
peak hours, increasing response latency beyond normal levels
The request uses a reasoning model (o1, o3) with
a complex task that triggers extended internal reasoning chains
The HTTP client or proxy
between your application and OpenAI has a shorter timeout than the expected response time
A large context window (128K tokens) with
dense content significantly increases the processing time per output token
How to fix "Request timed out" in the OpenAI API
The most effective fix is to switch to streaming responses. With streaming, the API begins sending tokens immediately as they are generated, preventing client-side timeouts. Your application receives partial responses while the full response is still being generated. This is strongly recommended by OpenAI for any request that might take longer than 10 minutes.
If you cannot use streaming, increase your client timeout. The OpenAI SDK allows configuring the timeout explicitly. Set it to at least 120 seconds for GPT-4 and 300 seconds for reasoning models. Also check for intermediate proxies, load balancers, or serverless function timeouts that may be shorter than your client timeout.
Reduce max_tokens to the minimum your use case needs. A request with max_tokens=4096 will return much faster than one with max_tokens=100000 because the model generates fewer tokens. If you do not need a very long response, lower this value.
For reasoning models specifically, use the reasoning.effort parameter (set to 'low' or 'medium') to reduce thinking time. Note that the temperature parameter is not supported with o1/o3 models and will cause a separate error.
Implement retry logic with exponential backoff for intermittent timeouts. The SDK retries automatically, but configure max_retries high enough for your use case.
from openai import OpenAIclient = OpenAI()# No timeout config, no streaming, high max_tokensresponse = client.chat.completions.create( model="gpt-4o", max_tokens=100000, messages=[{"role": "user", "content": very_long_prompt}])from openai import OpenAIclient = OpenAI( timeout=120.0, max_retries=3)# Use streaming for long responsesstream = client.chat.completions.create( model="gpt-4o", max_tokens=4096, # Only as much as needed messages=[{"role": "user", "content": very_long_prompt}], stream=True)full_response = ""for chunk in stream: if chunk.choices[0].delta.content: full_response += chunk.choices[0].delta.content print(chunk.choices[0].delta.content, end="")Prevention tips
- Use streaming for any request that might take more than 30 seconds — it prevents client-side timeouts and improves the user experience by showing partial results
- Set max_tokens to the minimum needed for your use case rather than a large default — lower values mean faster responses
- Configure explicit timeouts on your OpenAI client (at least 120s for GPT-4, 300s for reasoning models) and check for shorter timeouts in proxies or serverless environments
- For reasoning models (o1, o3), use reasoning.effort='medium' or 'low' to reduce processing time when maximum reasoning depth is not required
Still stuck?
Copy one of these prompts to get a personalized, step-by-step explanation.
My OpenAI API calls keep timing out when using GPT-4o with long prompts. The error says 'Request timed out'. How do I implement streaming and proper timeout configuration to handle long-running requests?
My OpenAI API request times out with model gpt-4o and max_tokens=100000. Here is my code: [paste code]. Convert it to use streaming, add proper timeout configuration, and implement retry logic.
Frequently asked questions
Why does my OpenAI API request keep timing out?
The most common causes are: high max_tokens requiring long generation time, complex prompts with reasoning models, server load during peak hours, and client-side timeout settings that are too low. Switch to streaming, reduce max_tokens, and increase your timeout configuration.
Does OpenAI charge for requests that return "Request timed out"?
For non-streaming requests that timeout, you are generally not charged because no response was delivered. However, for streaming requests that timeout mid-stream, you may be charged for tokens generated before the failure. Monitor your usage dashboard to verify.
How do I prevent timeouts when using GPT-4 or o1 models?
Use streaming responses, which begin returning tokens immediately and prevent client-side timeouts. Set your client timeout to at least 120 seconds for GPT-4 and 300 seconds for reasoning models. Also reduce max_tokens to only what you actually need.
What timeout should I set for the OpenAI API client?
For GPT-3.5-turbo, 60 seconds is usually sufficient. For GPT-4o, use at least 120 seconds. For reasoning models (o1, o3), set 300 seconds or more. If you use streaming, timeouts are less critical since tokens arrive continuously.
Can serverless functions cause OpenAI API timeouts?
Yes. Many serverless platforms (AWS Lambda, Vercel Functions, Cloudflare Workers) have execution time limits of 10-60 seconds. If your OpenAI request takes longer, the serverless function times out before the response arrives. Use streaming or move long-running requests to a persistent server.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your issue.
Book a free consultation