The '429 Too Many Requests' error from Claude's API means you have exceeded one of Anthropic's rate limits: requests per minute, input tokens per minute, or output tokens per minute. Implement exponential backoff with jitter, respect the retry-after header, and ramp traffic gradually. All limits are per-organization, not per API key.
What does "Error: 429 Too Many Requests" mean in Claude?
When the Claude API returns HTTP 429, you have exceeded one of three rate limits: requests per minute (RPM), input tokens per minute (ITPM), or output tokens per minute (OTPM). The response body reads: {"type":"error","error":{"type":"rate_limit_error","message":"Rate limited. Please try again later."}} Anthropic uses a token bucket algorithm, meaning a rate of 60 RPM may be enforced as 1 request per second — short bursts can trigger 429s even when you are under the nominal limit.
A critical detail: all rate limits are per-organization, not per API key. Creating multiple API keys does not increase your limits. Only uncached input tokens count toward the ITPM limit. The 429 error is different from the 529 overloaded error — 429 means you are sending too much traffic, while 529 means Anthropic's servers are at capacity.
A billing-tier variant returns: "Extra usage is required for long context requests." This means your account's tier does not support the context window size you are requesting. Some users on higher plans report seeing "Rate limit reached" with dashboard showing only 0-16% usage, suggesting the enforcement can be inconsistent.
Common causes
Your application is sending more
requests per minute than your organization's RPM limit allows
The total input tokens across
concurrent requests exceeds your input tokens per minute (ITPM) quota
A burst of requests triggered
the token bucket algorithm even though your average rate is within limits
Your account's billing tier does
not support the context window size or model you are requesting
Multiple applications or team members are
sharing the same organization, collectively exceeding the rate limit
Retry logic without proper backoff is
creating a cascade of repeated requests that compounds the rate limiting
How to fix "Error: 429 Too Many Requests" in Claude
The primary fix is implementing exponential backoff with jitter. When you receive a 429, wait before retrying and increase the wait time with each subsequent failure. Add random jitter to prevent thundering herd problems where multiple clients retry at the same time.
Check the retry-after response header, which tells you exactly how many seconds to wait before your next request will be accepted. Respect this value — sending requests before the retry-after window expires will just generate more 429 errors.
The Anthropic SDK handles this automatically with its built-in retry logic (2 retries by default). Increase this with max_retries=5 for better resilience. For high-throughput applications, implement a client-side rate limiter that tracks your RPM and ITPM usage and throttles requests before hitting the API limit.
Ramp traffic gradually when starting a new batch process. Do not send 100 concurrent requests immediately — start with a few and increase over 60 seconds. This avoids triggering the acceleration component of Anthropic's rate limiting.
For workloads that consistently hit rate limits, consider the Message Batches API for non-real-time processing, or contact Anthropic to discuss higher tier limits for your organization.
import anthropicclient = anthropic.Anthropic()# No rate limiting, no backofffor item in large_batch: response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": item}] )import anthropicimport timeimport randomclient = anthropic.Anthropic(max_retries=5)# Client-side rate limitingMIN_DELAY = 1.0 # seconds between requestsfor i, item in enumerate(large_batch): try: response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": item}] ) except anthropic.RateLimitError as e: wait = float(e.response.headers.get("retry-after", 30)) jitter = random.uniform(0, wait * 0.1) print(f"Rate limited. Waiting {wait + jitter:.1f}s") time.sleep(wait + jitter) # Retry this item response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": item}] ) time.sleep(MIN_DELAY) # Throttle between requestsPrevention tips
- Always check and respect the retry-after response header — it tells you the exact wait time before your next request will be accepted
- Add random jitter to your backoff delays to prevent multiple clients from retrying at the same instant and triggering another rate limit
- Use the Message Batches API for non-real-time workloads — batch requests have separate, more generous rate limits
- Remember that rate limits are per-organization, not per API key — creating more keys does not increase your quota
Still stuck?
Copy one of these prompts to get a personalized, step-by-step explanation.
I'm getting 'Error: 429 Too Many Requests' from the Claude/Anthropic API when processing a batch of 500 items. How do I implement proper rate limiting with exponential backoff and jitter to stay within Anthropic's limits?
My Claude API integration hits 429 rate limits when processing batches. Here is my current code: [paste code]. Add exponential backoff with jitter, retry-after header support, and client-side rate limiting to stay within Anthropic's RPM and ITPM limits.
Frequently asked questions
What does "Error: 429 Too Many Requests" mean for Claude API?
It means you have exceeded one of three rate limits: requests per minute (RPM), input tokens per minute (ITPM), or output tokens per minute (OTPM). All limits are per-organization. The error response includes a retry-after header indicating when you can send the next request.
Is the 429 error the same as the 529 overloaded error in Claude?
No. A 429 means your organization is sending too many requests (your fault). A 529 means Anthropic's servers are at capacity (their fault). The 429 can be fixed with rate limiting and backoff. The 529 requires waiting for Anthropic's capacity to recover.
Am I charged for requests that receive a 429 error?
No. Rate-limited requests (429) are not billed because no tokens are processed. However, aggressive retry logic without backoff wastes time and can extend the rate limit window, so implement proper backoff to recover faster.
Will creating multiple API keys increase my rate limits?
No. Rate limits in Claude's API are enforced per-organization, not per API key. All keys under the same organization share the same quota. To get higher limits, you need to upgrade your billing tier or contact Anthropic.
How do I handle rate limits in a production application?
Implement a three-layer strategy: (1) client-side rate limiting to stay below your known limits, (2) exponential backoff with jitter for when 429s occur, and (3) a queue system that buffers requests during rate limit windows. Use the Message Batches API for non-real-time workloads.
Can RapidDev help optimize my Claude API integration for high throughput?
Yes. RapidDev can architect a production-grade integration with client-side rate limiting, queue-based request management, and automatic failover strategies. This is especially important for applications processing large volumes of requests against Claude's rate limits.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your issue.
Book a free consultation