Rate limit MCP tool calls using a token bucket algorithm to prevent abuse, control API costs, and ensure fair usage. Implement per-tool and per-user limits that return 429-style error results when exceeded. This tutorial builds a reusable rate limiter that tracks tokens per bucket, refills over time, and integrates cleanly with MCP tool handlers.
Rate Limiting MCP Tool Calls with Token Buckets
MCP tools that call external APIs, query databases, or perform expensive computations need rate limiting to prevent runaway costs and ensure availability. This tutorial implements a token bucket rate limiter — the same algorithm used by AWS, Stripe, and most production APIs. Each tool (or user) gets a bucket of tokens that refills at a steady rate. Each tool call consumes one token. When the bucket is empty, the call returns a rate limit error instead of executing.
Prerequisites
- A working MCP server with one or more tools
- Node.js 18+ and npm installed
- Basic understanding of rate limiting concepts
- Familiarity with MCP tool registration patterns
Step-by-step guide
Build a token bucket rate limiter class
Build a token bucket rate limiter class
The token bucket algorithm works by maintaining a bucket of tokens for each key (tool name, user ID, or both). Each call consumes one token. Tokens refill at a steady rate up to a maximum capacity. If the bucket is empty, the call is rejected. This provides smooth rate limiting that allows short bursts while enforcing long-term averages. The implementation uses lazy refill — tokens are calculated on each check rather than using a timer.
1// src/rate-limiter.ts2export interface RateLimitConfig {3 maxTokens: number; // Maximum bucket capacity4 refillRate: number; // Tokens added per second5}67interface Bucket {8 tokens: number;9 lastRefill: number;10}1112export class TokenBucketLimiter {13 private buckets = new Map<string, Bucket>();14 private configs = new Map<string, RateLimitConfig>();15 private defaultConfig: RateLimitConfig;1617 constructor(defaultConfig: RateLimitConfig = { maxTokens: 60, refillRate: 1 }) {18 this.defaultConfig = defaultConfig;19 }2021 setConfig(key: string, config: RateLimitConfig): void {22 this.configs.set(key, config);23 }2425 tryConsume(key: string): { allowed: boolean; retryAfterMs: number; remaining: number } {26 const config = this.configs.get(key) || this.defaultConfig;27 const now = Date.now();28 let bucket = this.buckets.get(key);2930 if (!bucket) {31 bucket = { tokens: config.maxTokens, lastRefill: now };32 this.buckets.set(key, bucket);33 }3435 // Refill tokens based on elapsed time36 const elapsed = (now - bucket.lastRefill) / 1000;37 bucket.tokens = Math.min(config.maxTokens, bucket.tokens + elapsed * config.refillRate);38 bucket.lastRefill = now;3940 if (bucket.tokens >= 1) {41 bucket.tokens -= 1;42 return { allowed: true, retryAfterMs: 0, remaining: Math.floor(bucket.tokens) };43 }4445 // Calculate when next token will be available46 const deficit = 1 - bucket.tokens;47 const retryAfterMs = Math.ceil((deficit / config.refillRate) * 1000);48 return { allowed: false, retryAfterMs, remaining: 0 };49 }5051 reset(key: string): void {52 this.buckets.delete(key);53 }54}Expected result: A TokenBucketLimiter class that tracks rate limits per key with configurable capacity and refill rate.
Configure per-tool rate limits based on cost and risk
Configure per-tool rate limits based on cost and risk
Different tools have different costs and risks. A tool that reads a local file is cheap and safe, so it gets a high rate limit. A tool that calls an external API has both monetary cost and rate limit risk from the upstream provider, so it gets a lower limit. Configure each tool's limits based on its actual constraints. Expensive or dangerous tools should have stricter limits.
1// src/rate-config.ts2import { RateLimitConfig } from "./rate-limiter.js";34export const TOOL_RATE_LIMITS: Record<string, RateLimitConfig> = {5 // Local file operations — fast and cheap6 list_files: { maxTokens: 120, refillRate: 2 }, // 120/min burst, 2/sec sustained7 read_file: { maxTokens: 60, refillRate: 1 }, // 60/min burst, 1/sec sustained8 9 // Database queries — moderate cost10 query_db: { maxTokens: 30, refillRate: 0.5 }, // 30/min burst, 30/min sustained11 12 // External API calls — expensive and upstream-limited13 call_api: { maxTokens: 10, refillRate: 0.17 }, // 10/min burst, ~10/min sustained14 15 // Write operations — highest risk16 write_file: { maxTokens: 20, refillRate: 0.33 }, // 20/min burst, ~20/min sustained17 delete_file: { maxTokens: 5, refillRate: 0.08 }, // 5/min burst, ~5/min sustained18};Expected result: A configuration object mapping each tool name to its rate limit parameters.
Create a rate-limiting middleware wrapper for tool handlers
Create a rate-limiting middleware wrapper for tool handlers
Build a higher-order function that wraps any tool handler with rate limit checking. Before executing the handler, check the rate limiter. If allowed, run the handler. If denied, return an MCP error result with a retry-after message. Use a combined key of tool name and user ID for per-user, per-tool limits. This wrapper is reusable across all tools.
1// src/rate-limited-tool.ts2import { TokenBucketLimiter } from "./rate-limiter.js";3import { TOOL_RATE_LIMITS } from "./rate-config.js";45const limiter = new TokenBucketLimiter();67// Initialize tool-specific configs8for (const [tool, config] of Object.entries(TOOL_RATE_LIMITS)) {9 limiter.setConfig(tool, config);10}1112export function withRateLimit<T>(13 toolName: string,14 handler: (params: T) => Promise<any>,15 userId?: string16): (params: T) => Promise<any> {17 return async (params: T) => {18 // Check per-tool limit19 const toolResult = limiter.tryConsume(toolName);20 if (!toolResult.allowed) {21 console.error(`[rate-limit] Tool ${toolName} rate limited. Retry after ${toolResult.retryAfterMs}ms`);22 return {23 content: [{24 type: "text",25 text: `Rate limit exceeded for ${toolName}. Please retry after ${Math.ceil(toolResult.retryAfterMs / 1000)} seconds. Remaining capacity: ${toolResult.remaining}.`,26 }],27 isError: true,28 };29 }3031 // Check per-user limit if userId provided32 if (userId) {33 const userKey = `user:${userId}`;34 const userResult = limiter.tryConsume(userKey);35 if (!userResult.allowed) {36 console.error(`[rate-limit] User ${userId} rate limited`);37 return {38 content: [{39 type: "text",40 text: `User rate limit exceeded. Please retry after ${Math.ceil(userResult.retryAfterMs / 1000)} seconds.`,41 }],42 isError: true,43 };44 }45 }4647 return handler(params);48 };49}Expected result: A withRateLimit wrapper that checks rate limits before executing tool handlers.
Integrate rate limiting with MCP server tool registration
Integrate rate limiting with MCP server tool registration
Apply the rate limiter when registering tools with the MCP server. Wrap each tool's handler with withRateLimit before passing it to server.tool(). This keeps the rate limiting logic separate from the tool's business logic. You can also add a rate_limit_status tool that reports current bucket states for debugging.
1// src/index.ts2import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";3import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";4import { z } from "zod";5import { withRateLimit } from "./rate-limited-tool.js";67const server = new McpServer({ name: "rate-limited-server", version: "1.0.0" });89// Register tools with rate limiting10server.tool(11 "read_file",12 "Read a file with rate limiting applied",13 { filePath: z.string() },14 withRateLimit("read_file", async ({ filePath }) => {15 const fs = await import("fs/promises");16 const content = await fs.readFile(filePath, "utf-8");17 return { content: [{ type: "text", text: content }] };18 })19);2021server.tool(22 "call_api",23 "Call an external API with strict rate limiting",24 { url: z.string(), method: z.enum(["GET", "POST"]).default("GET") },25 withRateLimit("call_api", async ({ url, method }) => {26 const response = await fetch(url, { method });27 const text = await response.text();28 return { content: [{ type: "text", text }] };29 })30);3132async function main() {33 const transport = new StdioServerTransport();34 await server.connect(transport);35 console.error("Rate-limited MCP server running");36}37main().catch(e => { console.error(e); process.exit(1); });Expected result: MCP server with rate-limited tools that return error results when limits are exceeded.
Complete working example
1export interface RateLimitConfig {2 maxTokens: number;3 refillRate: number;4}56interface Bucket {7 tokens: number;8 lastRefill: number;9}1011export class TokenBucketLimiter {12 private buckets = new Map<string, Bucket>();13 private configs = new Map<string, RateLimitConfig>();14 private defaultConfig: RateLimitConfig;1516 constructor(defaultConfig: RateLimitConfig = { maxTokens: 60, refillRate: 1 }) {17 this.defaultConfig = defaultConfig;18 }1920 setConfig(key: string, config: RateLimitConfig): void {21 this.configs.set(key, config);22 }2324 tryConsume(key: string): { allowed: boolean; retryAfterMs: number; remaining: number } {25 const config = this.configs.get(key) || this.defaultConfig;26 const now = Date.now();27 let bucket = this.buckets.get(key);28 if (!bucket) {29 bucket = { tokens: config.maxTokens, lastRefill: now };30 this.buckets.set(key, bucket);31 }32 const elapsed = (now - bucket.lastRefill) / 1000;33 bucket.tokens = Math.min(config.maxTokens, bucket.tokens + elapsed * config.refillRate);34 bucket.lastRefill = now;3536 if (bucket.tokens >= 1) {37 bucket.tokens -= 1;38 return { allowed: true, retryAfterMs: 0, remaining: Math.floor(bucket.tokens) };39 }40 const retryAfterMs = Math.ceil(((1 - bucket.tokens) / config.refillRate) * 1000);41 return { allowed: false, retryAfterMs, remaining: 0 };42 }4344 reset(key: string): void { this.buckets.delete(key); }4546 getStatus(key: string): { tokens: number; maxTokens: number } | null {47 const bucket = this.buckets.get(key);48 const config = this.configs.get(key) || this.defaultConfig;49 if (!bucket) return null;50 const elapsed = (Date.now() - bucket.lastRefill) / 1000;51 const tokens = Math.min(config.maxTokens, bucket.tokens + elapsed * config.refillRate);52 return { tokens: Math.floor(tokens), maxTokens: config.maxTokens };53 }54}5556export function withRateLimit<T>(57 limiter: TokenBucketLimiter,58 toolName: string,59 handler: (params: T) => Promise<any>60): (params: T) => Promise<any> {61 return async (params: T) => {62 const result = limiter.tryConsume(toolName);63 if (!result.allowed) {64 return {65 content: [{ type: "text" as const,66 text: `Rate limit exceeded for ${toolName}. Retry after ${Math.ceil(result.retryAfterMs / 1000)}s.` }],67 isError: true as const,68 };69 }70 return handler(params);71 };72}Common mistakes when rate-limiting MCP tool calls
Why it's a problem: Using fixed windows (reset every minute) instead of token buckets, causing thundering herd at window boundaries
How to avoid: Use token bucket or sliding window algorithms that distribute load evenly without sharp reset boundaries.
Why it's a problem: Applying the same rate limit to all tools regardless of their cost and risk
How to avoid: Configure per-tool limits based on each tool's computational cost, external API limits, and risk level.
Why it's a problem: Not returning retry-after information in rate limit errors
How to avoid: Include the retry-after time in the error message so clients can wait the appropriate amount before retrying.
Why it's a problem: Rate limiting only by tool name, not by user, allowing one user to exhaust limits for everyone
How to avoid: Use composite keys combining tool name and user ID for per-user, per-tool rate limiting.
Best practices
- Use token bucket algorithm for smooth rate limiting that allows short bursts
- Configure per-tool limits based on cost — cheap tools get higher limits than expensive ones
- Return clear error messages with retry-after timing when rate limits are hit
- Combine per-tool and per-user limits for comprehensive traffic control
- Log all rate limit events to stderr for monitoring and tuning
- Set MCP tool limits below upstream API limits to leave safety headroom
- Use lazy refill calculations instead of timers for accuracy and efficiency
- Add a diagnostic tool that reports current bucket states for debugging
Still stuck?
Copy one of these prompts to get a personalized, step-by-step explanation.
Build a token bucket rate limiter for MCP tool calls in TypeScript. Support per-tool configuration with different limits, return retry-after information on rejection, and provide a withRateLimit wrapper function for tool handlers.
Add rate limiting to my MCP server. Create a TokenBucketLimiter class with per-tool configs, a withRateLimit wrapper that returns MCP error results with isError: true and retry-after times, and configure different limits for read vs write vs API tools.
Frequently asked questions
What rate limit values should I start with for MCP tools?
Start with 60 tokens max and 1 token/second refill for general tools. Reduce to 10 tokens max and 0.17/second for tools that call external APIs. Monitor actual usage for a week and then tune based on data.
How does the token bucket algorithm differ from fixed window rate limiting?
Fixed window resets all tokens at the start of each minute, allowing bursts at window boundaries. Token bucket refills continuously, providing smoother traffic distribution and more predictable behavior.
Can I share rate limit state across multiple MCP server instances?
Yes. Replace the in-memory Map with Redis. Use Redis INCR with EXPIRE for atomic token consumption, or use a Redis-backed rate limiting library like rate-limiter-flexible.
Should rate limit errors use isError: true?
Yes. Always return isError: true for rate limit responses so the AI client knows the tool call did not succeed and should either retry or inform the user.
Can RapidDev help configure rate limits for production MCP servers?
Yes, RapidDev can analyze your tool usage patterns and configure appropriate rate limits based on upstream API constraints, cost budgets, and performance requirements.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your project.
Book a free consultation