Fix "429 Too Many Requests" in Claude API

TL;DR

The '429 Too Many Requests' error from Claude's API means you have exceeded one of Anthropic's rate limits: requests per minute, input tokens per minute, or output tokens per minute. Implement exponential backoff with jitter, respect the retry-after header, and ramp traffic gradually. All limits are per-organization, not per API key.

Quick facts about this guide
Fact	Value
Tool	Claude (Anthropic)
Difficulty	Intermediate
Time required	10-30 minutes
Last updated	March 2026

What does "Error: 429 Too Many Requests" mean in Claude?

When the Claude API returns HTTP 429, you have exceeded one of three rate limits: requests per minute (RPM), input tokens per minute (ITPM), or output tokens per minute (OTPM). The response body reads: {"type":"error","error":{"type":"rate_limit_error","message":"Rate limited. Please try again later."}} Anthropic uses a token bucket algorithm, meaning a rate of 60 RPM may be enforced as 1 request per second — short bursts can trigger 429s even when you are under the nominal limit.

A critical detail: all rate limits are per-organization, not per API key. Creating multiple API keys does not increase your limits. Only uncached input tokens count toward the ITPM limit. The 429 error is different from the 529 overloaded error — 429 means you are sending too much traffic, while 529 means Anthropic's servers are at capacity.

A billing-tier variant returns: "Extra usage is required for long context requests." This means your account's tier does not support the context window size you are requesting. Some users on higher plans report seeing "Rate limit reached" with dashboard showing only 0-16% usage, suggesting the enforcement can be inconsistent.

Common causes

Your application is sending more

requests per minute than your organization's RPM limit allows

The total input tokens across

concurrent requests exceeds your input tokens per minute (ITPM) quota

A burst of requests triggered

the token bucket algorithm even though your average rate is within limits

Your account's billing tier does

not support the context window size or model you are requesting

Multiple applications or team members are

sharing the same organization, collectively exceeding the rate limit

Retry logic without proper backoff is

creating a cascade of repeated requests that compounds the rate limiting

How to fix "Error: 429 Too Many Requests" in Claude

The primary fix is implementing exponential backoff with jitter. When you receive a 429, wait before retrying and increase the wait time with each subsequent failure. Add random jitter to prevent thundering herd problems where multiple clients retry at the same time.

Check the retry-after response header, which tells you exactly how many seconds to wait before your next request will be accepted. Respect this value — sending requests before the retry-after window expires will just generate more 429 errors.

The Anthropic SDK handles this automatically with its built-in retry logic (2 retries by default). Increase this with max_retries=5 for better resilience. For high-throughput applications, implement a client-side rate limiter that tracks your RPM and ITPM usage and throttles requests before hitting the API limit.

Ramp traffic gradually when starting a new batch process. Do not send 100 concurrent requests immediately — start with a few and increase over 60 seconds. This avoids triggering the acceleration component of Anthropic's rate limiting.

For workloads that consistently hit rate limits, consider the Message Batches API for non-real-time processing, or contact Anthropic to discuss higher tier limits for your organization.

Before

typescript

import anthropic
client = anthropic.Anthropic()
# No rate limiting, no backoff
for item in large_batch:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": item}]
    )

After

typescript

import anthropic
import time
import random
client = anthropic.Anthropic(max_retries=5)
# Client-side rate limiting
MIN_DELAY = 1.0  # seconds between requests
for i, item in enumerate(large_batch):
    try:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{"role": "user", "content": item}]
        )
    except anthropic.RateLimitError as e:
        wait = float(e.response.headers.get("retry-after", 30))
        jitter = random.uniform(0, wait * 0.1)
        print(f"Rate limited. Waiting {wait + jitter:.1f}s")
        time.sleep(wait + jitter)
        # Retry this item
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{"role": "user", "content": item}]
        )
    time.sleep(MIN_DELAY)  # Throttle between requests

Prevention tips

Always check and respect the retry-after response header — it tells you the exact wait time before your next request will be accepted
Add random jitter to your backoff delays to prevent multiple clients from retrying at the same instant and triggering another rate limit
Use the Message Batches API for non-real-time workloads — batch requests have separate, more generous rate limits
Remember that rate limits are per-organization, not per API key — creating more keys does not increase your quota

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

I'm getting 'Error: 429 Too Many Requests' from the Claude/Anthropic API when processing a batch of 500 items. How do I implement proper rate limiting with exponential backoff and jitter to stay within Anthropic's limits?

Claude (Anthropic) Prompt

My Claude API integration hits 429 rate limits when processing batches. Here is my current code: [paste code]. Add exponential backoff with jitter, retry-after header support, and client-side rate limiting to stay within Anthropic's RPM and ITPM limits.

Frequently asked questions

What does "Error: 429 Too Many Requests" mean for Claude API?

It means you have exceeded one of three rate limits: requests per minute (RPM), input tokens per minute (ITPM), or output tokens per minute (OTPM). All limits are per-organization. The error response includes a retry-after header indicating when you can send the next request.

Is the 429 error the same as the 529 overloaded error in Claude?

No. A 429 means your organization is sending too many requests (your fault). A 529 means Anthropic's servers are at capacity (their fault). The 429 can be fixed with rate limiting and backoff. The 529 requires waiting for Anthropic's capacity to recover.

Am I charged for requests that receive a 429 error?

No. Rate-limited requests (429) are not billed because no tokens are processed. However, aggressive retry logic without backoff wastes time and can extend the rate limit window, so implement proper backoff to recover faster.

Will creating multiple API keys increase my rate limits?

No. Rate limits in Claude's API are enforced per-organization, not per API key. All keys under the same organization share the same quota. To get higher limits, you need to upgrade your billing tier or contact Anthropic.

How do I handle rate limits in a production application?

Implement a three-layer strategy: (1) client-side rate limiting to stay below your known limits, (2) exponential backoff with jitter for when 429s occur, and (3) a queue system that buffers requests during rate limit windows. Use the Message Batches API for non-real-time workloads.

Can RapidDev help optimize my Claude API integration for high throughput?

Yes. RapidDev can architect a production-grade integration with client-side rate limiting, queue-based request management, and automatic failover strategies. This is especially important for applications processing large volumes of requests against Claude's rate limits.

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your issue.

Book a free consultation

How to Fix "Error: 429 Too Many Requests" in Claude (Anthropic)

What does "Error: 429 Too Many Requests" mean in Claude?

Common causes

Your application is sending more

The total input tokens across

A burst of requests triggered

Your account's billing tier does

Multiple applications or team members are

Retry logic without proper backoff is

How to fix "Error: 429 Too Many Requests" in Claude

Prevention tips

Still stuck?

Frequently asked questions

Talk to an Expert

Your next step

Stuck on this for days?

We put the rapid in RapidDev

How to Fix "Error: 429 Too Many Requests" in Claude (Anthropic)

What does "Error: 429 Too Many Requests" mean in Claude?

Common causes

Your application is sending more

The total input tokens across

A burst of requests triggered

Your account's billing tier does

Multiple applications or team members are

Retry logic without proper backoff is

How to fix "Error: 429 Too Many Requests" in Claude

Prevention tips

Still stuck?

More Claude (Anthropic) errors

Internal server error (500)

Network error: failed to fetch

Message length exceeds model context limit

Frequently asked questions

Talk to an Expert

Your next step

Stuck on this for days?

We put the rapid in RapidDev