Skip to main content
RapidDev - Software Development Agency
mcp-tutorial

How to rate-limit MCP tool calls

Rate limit MCP tool calls using a token bucket algorithm to prevent abuse, control API costs, and ensure fair usage. Implement per-tool and per-user limits that return 429-style error results when exceeded. This tutorial builds a reusable rate limiter that tracks tokens per bucket, refills over time, and integrates cleanly with MCP tool handlers.

What you'll learn

  • How to implement a token bucket rate limiter for MCP tools
  • How to configure per-tool and per-user rate limits
  • How to return proper rate limit error responses in MCP format
  • How to integrate rate limiting with the MCP tool registration pattern
Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Advanced9 min read20-30 minMCP TypeScript SDK v1.x, Node.js 18+March 2026RapidDev Engineering Team
TL;DR

Rate limit MCP tool calls using a token bucket algorithm to prevent abuse, control API costs, and ensure fair usage. Implement per-tool and per-user limits that return 429-style error results when exceeded. This tutorial builds a reusable rate limiter that tracks tokens per bucket, refills over time, and integrates cleanly with MCP tool handlers.

Rate Limiting MCP Tool Calls with Token Buckets

MCP tools that call external APIs, query databases, or perform expensive computations need rate limiting to prevent runaway costs and ensure availability. This tutorial implements a token bucket rate limiter — the same algorithm used by AWS, Stripe, and most production APIs. Each tool (or user) gets a bucket of tokens that refills at a steady rate. Each tool call consumes one token. When the bucket is empty, the call returns a rate limit error instead of executing.

Prerequisites

  • A working MCP server with one or more tools
  • Node.js 18+ and npm installed
  • Basic understanding of rate limiting concepts
  • Familiarity with MCP tool registration patterns

Step-by-step guide

1

Build a token bucket rate limiter class

The token bucket algorithm works by maintaining a bucket of tokens for each key (tool name, user ID, or both). Each call consumes one token. Tokens refill at a steady rate up to a maximum capacity. If the bucket is empty, the call is rejected. This provides smooth rate limiting that allows short bursts while enforcing long-term averages. The implementation uses lazy refill — tokens are calculated on each check rather than using a timer.

typescript
1// src/rate-limiter.ts
2export interface RateLimitConfig {
3 maxTokens: number; // Maximum bucket capacity
4 refillRate: number; // Tokens added per second
5}
6
7interface Bucket {
8 tokens: number;
9 lastRefill: number;
10}
11
12export class TokenBucketLimiter {
13 private buckets = new Map<string, Bucket>();
14 private configs = new Map<string, RateLimitConfig>();
15 private defaultConfig: RateLimitConfig;
16
17 constructor(defaultConfig: RateLimitConfig = { maxTokens: 60, refillRate: 1 }) {
18 this.defaultConfig = defaultConfig;
19 }
20
21 setConfig(key: string, config: RateLimitConfig): void {
22 this.configs.set(key, config);
23 }
24
25 tryConsume(key: string): { allowed: boolean; retryAfterMs: number; remaining: number } {
26 const config = this.configs.get(key) || this.defaultConfig;
27 const now = Date.now();
28 let bucket = this.buckets.get(key);
29
30 if (!bucket) {
31 bucket = { tokens: config.maxTokens, lastRefill: now };
32 this.buckets.set(key, bucket);
33 }
34
35 // Refill tokens based on elapsed time
36 const elapsed = (now - bucket.lastRefill) / 1000;
37 bucket.tokens = Math.min(config.maxTokens, bucket.tokens + elapsed * config.refillRate);
38 bucket.lastRefill = now;
39
40 if (bucket.tokens >= 1) {
41 bucket.tokens -= 1;
42 return { allowed: true, retryAfterMs: 0, remaining: Math.floor(bucket.tokens) };
43 }
44
45 // Calculate when next token will be available
46 const deficit = 1 - bucket.tokens;
47 const retryAfterMs = Math.ceil((deficit / config.refillRate) * 1000);
48 return { allowed: false, retryAfterMs, remaining: 0 };
49 }
50
51 reset(key: string): void {
52 this.buckets.delete(key);
53 }
54}

Expected result: A TokenBucketLimiter class that tracks rate limits per key with configurable capacity and refill rate.

2

Configure per-tool rate limits based on cost and risk

Different tools have different costs and risks. A tool that reads a local file is cheap and safe, so it gets a high rate limit. A tool that calls an external API has both monetary cost and rate limit risk from the upstream provider, so it gets a lower limit. Configure each tool's limits based on its actual constraints. Expensive or dangerous tools should have stricter limits.

typescript
1// src/rate-config.ts
2import { RateLimitConfig } from "./rate-limiter.js";
3
4export const TOOL_RATE_LIMITS: Record<string, RateLimitConfig> = {
5 // Local file operations — fast and cheap
6 list_files: { maxTokens: 120, refillRate: 2 }, // 120/min burst, 2/sec sustained
7 read_file: { maxTokens: 60, refillRate: 1 }, // 60/min burst, 1/sec sustained
8
9 // Database queries — moderate cost
10 query_db: { maxTokens: 30, refillRate: 0.5 }, // 30/min burst, 30/min sustained
11
12 // External API calls — expensive and upstream-limited
13 call_api: { maxTokens: 10, refillRate: 0.17 }, // 10/min burst, ~10/min sustained
14
15 // Write operations — highest risk
16 write_file: { maxTokens: 20, refillRate: 0.33 }, // 20/min burst, ~20/min sustained
17 delete_file: { maxTokens: 5, refillRate: 0.08 }, // 5/min burst, ~5/min sustained
18};

Expected result: A configuration object mapping each tool name to its rate limit parameters.

3

Create a rate-limiting middleware wrapper for tool handlers

Build a higher-order function that wraps any tool handler with rate limit checking. Before executing the handler, check the rate limiter. If allowed, run the handler. If denied, return an MCP error result with a retry-after message. Use a combined key of tool name and user ID for per-user, per-tool limits. This wrapper is reusable across all tools.

typescript
1// src/rate-limited-tool.ts
2import { TokenBucketLimiter } from "./rate-limiter.js";
3import { TOOL_RATE_LIMITS } from "./rate-config.js";
4
5const limiter = new TokenBucketLimiter();
6
7// Initialize tool-specific configs
8for (const [tool, config] of Object.entries(TOOL_RATE_LIMITS)) {
9 limiter.setConfig(tool, config);
10}
11
12export function withRateLimit<T>(
13 toolName: string,
14 handler: (params: T) => Promise<any>,
15 userId?: string
16): (params: T) => Promise<any> {
17 return async (params: T) => {
18 // Check per-tool limit
19 const toolResult = limiter.tryConsume(toolName);
20 if (!toolResult.allowed) {
21 console.error(`[rate-limit] Tool ${toolName} rate limited. Retry after ${toolResult.retryAfterMs}ms`);
22 return {
23 content: [{
24 type: "text",
25 text: `Rate limit exceeded for ${toolName}. Please retry after ${Math.ceil(toolResult.retryAfterMs / 1000)} seconds. Remaining capacity: ${toolResult.remaining}.`,
26 }],
27 isError: true,
28 };
29 }
30
31 // Check per-user limit if userId provided
32 if (userId) {
33 const userKey = `user:${userId}`;
34 const userResult = limiter.tryConsume(userKey);
35 if (!userResult.allowed) {
36 console.error(`[rate-limit] User ${userId} rate limited`);
37 return {
38 content: [{
39 type: "text",
40 text: `User rate limit exceeded. Please retry after ${Math.ceil(userResult.retryAfterMs / 1000)} seconds.`,
41 }],
42 isError: true,
43 };
44 }
45 }
46
47 return handler(params);
48 };
49}

Expected result: A withRateLimit wrapper that checks rate limits before executing tool handlers.

4

Integrate rate limiting with MCP server tool registration

Apply the rate limiter when registering tools with the MCP server. Wrap each tool's handler with withRateLimit before passing it to server.tool(). This keeps the rate limiting logic separate from the tool's business logic. You can also add a rate_limit_status tool that reports current bucket states for debugging.

typescript
1// src/index.ts
2import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
3import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
4import { z } from "zod";
5import { withRateLimit } from "./rate-limited-tool.js";
6
7const server = new McpServer({ name: "rate-limited-server", version: "1.0.0" });
8
9// Register tools with rate limiting
10server.tool(
11 "read_file",
12 "Read a file with rate limiting applied",
13 { filePath: z.string() },
14 withRateLimit("read_file", async ({ filePath }) => {
15 const fs = await import("fs/promises");
16 const content = await fs.readFile(filePath, "utf-8");
17 return { content: [{ type: "text", text: content }] };
18 })
19);
20
21server.tool(
22 "call_api",
23 "Call an external API with strict rate limiting",
24 { url: z.string(), method: z.enum(["GET", "POST"]).default("GET") },
25 withRateLimit("call_api", async ({ url, method }) => {
26 const response = await fetch(url, { method });
27 const text = await response.text();
28 return { content: [{ type: "text", text }] };
29 })
30);
31
32async function main() {
33 const transport = new StdioServerTransport();
34 await server.connect(transport);
35 console.error("Rate-limited MCP server running");
36}
37main().catch(e => { console.error(e); process.exit(1); });

Expected result: MCP server with rate-limited tools that return error results when limits are exceeded.

Complete working example

src/rate-limiter.ts
1export interface RateLimitConfig {
2 maxTokens: number;
3 refillRate: number;
4}
5
6interface Bucket {
7 tokens: number;
8 lastRefill: number;
9}
10
11export class TokenBucketLimiter {
12 private buckets = new Map<string, Bucket>();
13 private configs = new Map<string, RateLimitConfig>();
14 private defaultConfig: RateLimitConfig;
15
16 constructor(defaultConfig: RateLimitConfig = { maxTokens: 60, refillRate: 1 }) {
17 this.defaultConfig = defaultConfig;
18 }
19
20 setConfig(key: string, config: RateLimitConfig): void {
21 this.configs.set(key, config);
22 }
23
24 tryConsume(key: string): { allowed: boolean; retryAfterMs: number; remaining: number } {
25 const config = this.configs.get(key) || this.defaultConfig;
26 const now = Date.now();
27 let bucket = this.buckets.get(key);
28 if (!bucket) {
29 bucket = { tokens: config.maxTokens, lastRefill: now };
30 this.buckets.set(key, bucket);
31 }
32 const elapsed = (now - bucket.lastRefill) / 1000;
33 bucket.tokens = Math.min(config.maxTokens, bucket.tokens + elapsed * config.refillRate);
34 bucket.lastRefill = now;
35
36 if (bucket.tokens >= 1) {
37 bucket.tokens -= 1;
38 return { allowed: true, retryAfterMs: 0, remaining: Math.floor(bucket.tokens) };
39 }
40 const retryAfterMs = Math.ceil(((1 - bucket.tokens) / config.refillRate) * 1000);
41 return { allowed: false, retryAfterMs, remaining: 0 };
42 }
43
44 reset(key: string): void { this.buckets.delete(key); }
45
46 getStatus(key: string): { tokens: number; maxTokens: number } | null {
47 const bucket = this.buckets.get(key);
48 const config = this.configs.get(key) || this.defaultConfig;
49 if (!bucket) return null;
50 const elapsed = (Date.now() - bucket.lastRefill) / 1000;
51 const tokens = Math.min(config.maxTokens, bucket.tokens + elapsed * config.refillRate);
52 return { tokens: Math.floor(tokens), maxTokens: config.maxTokens };
53 }
54}
55
56export function withRateLimit<T>(
57 limiter: TokenBucketLimiter,
58 toolName: string,
59 handler: (params: T) => Promise<any>
60): (params: T) => Promise<any> {
61 return async (params: T) => {
62 const result = limiter.tryConsume(toolName);
63 if (!result.allowed) {
64 return {
65 content: [{ type: "text" as const,
66 text: `Rate limit exceeded for ${toolName}. Retry after ${Math.ceil(result.retryAfterMs / 1000)}s.` }],
67 isError: true as const,
68 };
69 }
70 return handler(params);
71 };
72}

Common mistakes when rate-limiting MCP tool calls

Why it's a problem: Using fixed windows (reset every minute) instead of token buckets, causing thundering herd at window boundaries

How to avoid: Use token bucket or sliding window algorithms that distribute load evenly without sharp reset boundaries.

Why it's a problem: Applying the same rate limit to all tools regardless of their cost and risk

How to avoid: Configure per-tool limits based on each tool's computational cost, external API limits, and risk level.

Why it's a problem: Not returning retry-after information in rate limit errors

How to avoid: Include the retry-after time in the error message so clients can wait the appropriate amount before retrying.

Why it's a problem: Rate limiting only by tool name, not by user, allowing one user to exhaust limits for everyone

How to avoid: Use composite keys combining tool name and user ID for per-user, per-tool rate limiting.

Best practices

  • Use token bucket algorithm for smooth rate limiting that allows short bursts
  • Configure per-tool limits based on cost — cheap tools get higher limits than expensive ones
  • Return clear error messages with retry-after timing when rate limits are hit
  • Combine per-tool and per-user limits for comprehensive traffic control
  • Log all rate limit events to stderr for monitoring and tuning
  • Set MCP tool limits below upstream API limits to leave safety headroom
  • Use lazy refill calculations instead of timers for accuracy and efficiency
  • Add a diagnostic tool that reports current bucket states for debugging

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

Build a token bucket rate limiter for MCP tool calls in TypeScript. Support per-tool configuration with different limits, return retry-after information on rejection, and provide a withRateLimit wrapper function for tool handlers.

MCP Prompt

Add rate limiting to my MCP server. Create a TokenBucketLimiter class with per-tool configs, a withRateLimit wrapper that returns MCP error results with isError: true and retry-after times, and configure different limits for read vs write vs API tools.

Frequently asked questions

What rate limit values should I start with for MCP tools?

Start with 60 tokens max and 1 token/second refill for general tools. Reduce to 10 tokens max and 0.17/second for tools that call external APIs. Monitor actual usage for a week and then tune based on data.

How does the token bucket algorithm differ from fixed window rate limiting?

Fixed window resets all tokens at the start of each minute, allowing bursts at window boundaries. Token bucket refills continuously, providing smoother traffic distribution and more predictable behavior.

Can I share rate limit state across multiple MCP server instances?

Yes. Replace the in-memory Map with Redis. Use Redis INCR with EXPIRE for atomic token consumption, or use a Redis-backed rate limiting library like rate-limiter-flexible.

Should rate limit errors use isError: true?

Yes. Always return isError: true for rate limit responses so the AI client knows the tool call did not succeed and should either retry or inform the user.

Can RapidDev help configure rate limits for production MCP servers?

Yes, RapidDev can analyze your tool usage patterns and configure appropriate rate limits based on upstream API constraints, cost budgets, and performance requirements.

RapidDev

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

Need help with your project?

Our experts have built 600+ apps and can accelerate your development. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.