Parallel LLM calls in n8n cause rate limit errors (429), memory spikes, and inconsistent results when multiple webhook triggers or SplitInBatches nodes fire simultaneously. Fix this by setting workflow concurrency limits, using SplitInBatches with batch size 1 for sequential processing, adding rate limit handling in Code nodes, and configuring n8n queue mode for production deployments.
Why Parallel LLM Calls Create Problems in n8n
n8n processes items in parallel by default. When a workflow receives multiple webhook requests simultaneously, or when a node produces multiple items that flow into an LLM node, n8n sends all LLM API calls at once. This causes three problems: (1) rate limit errors (429) when the API rejects too many simultaneous requests, (2) memory spikes when multiple large LLM responses are held in memory, and (3) inconsistent execution order when responses arrive out of sequence. In production, these issues compound — a spike in traffic can cascade into hundreds of failed executions.
Prerequisites
- A running n8n instance (self-hosted recommended for queue mode)
- LLM API credentials with known rate limits
- A workflow that makes multiple LLM calls (via webhook traffic or batch processing)
- Understanding of n8n workflow settings and environment variables
Step-by-step guide
Set Workflow-Level Concurrency Limits
Set Workflow-Level Concurrency Limits
The simplest fix for parallel execution issues is to limit how many instances of a workflow can run simultaneously. In the n8n editor, click the gear icon (Workflow Settings) in the top bar. Under 'Execution', find 'Concurrency Limit' and set it to a value that matches your API rate limits. For example, if your OpenAI plan allows 60 requests per minute, set the concurrency to 3-5 to leave headroom. This prevents webhook-triggered workflows from spawning dozens of simultaneous LLM calls during traffic spikes.
Expected result: Only a limited number of workflow executions run simultaneously, preventing rate limit cascades.
Use SplitInBatches for Sequential LLM Processing
Use SplitInBatches for Sequential LLM Processing
When a single workflow execution needs to process multiple items through an LLM (e.g., summarizing 50 documents), use the SplitInBatches node to process them one at a time or in small groups. Set the batch size to 1 for the safest approach, or to a higher number if your rate limits allow it. Place SplitInBatches before the LLM node and connect the second output (loop) back to SplitInBatches to create the processing loop.
1// Workflow structure for sequential LLM processing:2//3// [Trigger] → [Get Documents] → [SplitInBatches (size=1)]4// ↓ ↑ (loop back)5// [LLM Node] → [Save Result] → [SplitInBatches]6// ↓ (done)7// [Merge Results] → [Output]89// In the SplitInBatches node:10// Batch Size: 1 (one item at a time)11// Options → Reset: false (to accumulate results)Expected result: LLM calls are processed sequentially, one item at a time, preventing rate limit errors.
Add Rate Limit Handling with Exponential Backoff
Add Rate Limit Handling with Exponential Backoff
Even with concurrency limits, you may still hit rate limits during sustained usage. Enable 'Continue On Fail' on your LLM node and add a Code node after it to detect 429 errors. When a rate limit is hit, the Code node calculates a backoff delay and flags the item for retry. Combine this with a Wait node and a loop to implement automatic retrying.
1// Code node: Rate limit detection and backoff calculation2const item = $input.item;3const json = item.json;45// Detect rate limit errors6const isRateLimit = json.error && (7 json.error.statusCode === 429 ||8 json.error.message?.includes('rate limit') ||9 json.error.message?.includes('Rate limit') ||10 json.error.message?.includes('Too Many Requests')11);1213if (isRateLimit) {14 const retryCount = json._retry_count || 0;15 const maxRetries = 5;1617 if (retryCount >= maxRetries) {18 return [{19 json: {20 ...json,21 status: 'rate_limit_exhausted',22 error: 'Exceeded max retries due to rate limiting'23 }24 }];25 }2627 // Exponential backoff: 2s, 4s, 8s, 16s, 32s28 const backoffMs = Math.pow(2, retryCount + 1) * 1000;2930 // Check for Retry-After header31 const retryAfter = json.error.headers?.['retry-after'];32 const waitMs = retryAfter ? parseInt(retryAfter) * 1000 : backoffMs;3334 return [{35 json: {36 ...json,37 _retry_count: retryCount + 1,38 _wait_ms: waitMs,39 _should_retry: true,40 status: 'rate_limited'41 }42 }];43}4445// Not rate limited — pass through46return [{47 json: {48 ...json,49 _should_retry: false,50 status: json.error ? 'error' : 'success'51 }52}];Expected result: Rate-limited requests are automatically retried with increasing delays instead of failing permanently.
Configure n8n Queue Mode for Production
Configure n8n Queue Mode for Production
For high-throughput production deployments, enable n8n's queue mode. Queue mode uses Redis (or BullMQ) to distribute workflow executions across multiple worker processes, providing natural concurrency control and preventing memory overload. Set the EXECUTIONS_MODE environment variable to 'queue' and configure the Redis connection. Each worker processes one execution at a time by default, and you can scale workers based on your needs.
1# Docker Compose for n8n queue mode2services:3 n8n-main:4 image: n8nio/n8n:latest5 environment:6 - EXECUTIONS_MODE=queue7 - QUEUE_BULL_REDIS_HOST=redis8 - QUEUE_BULL_REDIS_PORT=63799 - N8N_CONCURRENCY_PRODUCTION_LIMIT=510 command: start1112 n8n-worker:13 image: n8nio/n8n:latest14 environment:15 - EXECUTIONS_MODE=queue16 - QUEUE_BULL_REDIS_HOST=redis17 - QUEUE_BULL_REDIS_PORT=637918 - N8N_CONCURRENCY_PRODUCTION_LIMIT=319 command: worker20 deploy:21 replicas: 22223 redis:24 image: redis:7-alpine25 volumes:26 - redis_data:/data2728volumes:29 redis_data:Expected result: Workflow executions are queued and processed by workers with controlled concurrency, preventing overload.
Add a Request Queue Using Static Data
Add a Request Queue Using Static Data
For simpler deployments without Redis, you can implement a basic request queue using n8n's $getWorkflowStaticData() function. This approach throttles requests within a single workflow by tracking timestamps and enforcing a minimum delay between LLM calls. Add this Code node before your LLM node.
1// Code node: Simple request throttle using static data2const staticData = $getWorkflowStaticData('global');3const now = Date.now();4const MIN_DELAY_MS = 1000; // Minimum 1 second between LLM calls56// Check last request timestamp7const lastRequest = staticData.lastLLMRequest || 0;8const elapsed = now - lastRequest;910if (elapsed < MIN_DELAY_MS) {11 // Need to wait — pass the delay to a subsequent Wait node12 const waitTime = MIN_DELAY_MS - elapsed;13 staticData.lastLLMRequest = now + waitTime;14 return [{15 json: {16 ...$json,17 _throttle_wait_ms: waitTime,18 _throttled: true19 }20 }];21}2223// No throttling needed24staticData.lastLLMRequest = now;25return [{26 json: {27 ...$json,28 _throttle_wait_ms: 0,29 _throttled: false30 }31}];Expected result: LLM calls are spaced at least 1 second apart, preventing burst traffic from hitting rate limits.
Complete working example
1// Code node: Run Once for Each Item2// Complete concurrency control for LLM API calls3// Place AFTER the LLM node (with Continue On Fail enabled)45const item = $input.item;6const json = item.json;7const staticData = $getWorkflowStaticData('global');89// Initialize counters10if (!staticData.requestLog) {11 staticData.requestLog = {12 total: 0,13 success: 0,14 rate_limited: 0,15 errors: 0,16 window_start: Date.now()17 };18}1920const log = staticData.requestLog;21log.total++;2223// Reset counters every hour24if (Date.now() - log.window_start > 3600000) {25 log.total = 1;26 log.success = 0;27 log.rate_limited = 0;28 log.errors = 0;29 log.window_start = Date.now();30}3132// Detect rate limit errors33const isRateLimit = json.error && (34 json.error.statusCode === 429 ||35 json.error.message?.includes('rate limit') ||36 json.error.message?.includes('Too Many Requests')37);3839if (isRateLimit) {40 log.rate_limited++;41 const retryCount = json._retry_count || 0;42 const backoffMs = Math.min(Math.pow(2, retryCount + 1) * 1000, 60000);4344 return [{45 json: {46 original_input: json._original_input || json,47 status: 'rate_limited',48 _retry_count: retryCount + 1,49 _wait_ms: backoffMs,50 _should_retry: retryCount < 5,51 _stats: { ...log }52 }53 }];54}5556// Detect other errors57if (json.error) {58 log.errors++;59 return [{60 json: {61 original_input: json._original_input || json,62 status: 'error',63 error_message: json.error.message,64 _should_retry: false,65 _stats: { ...log }66 }67 }];68}6970// Success71log.success++;72const text = json.message?.content || json.text || json.output || '';7374return [{75 json: {76 text: text,77 status: 'success',78 _should_retry: false,79 _stats: { ...log }80 }81}];Common mistakes when handling Concurrency Issues with Parallel LLM Calls in n8n
Why it's a problem: Leaving workflow concurrency unlimited on webhook-triggered workflows
How to avoid: Set a concurrency limit in Workflow Settings or via the N8N_CONCURRENCY_PRODUCTION_LIMIT environment variable. Even a limit of 10 prevents most rate limit cascades.
Why it's a problem: Using SplitInBatches but not connecting the loop output back correctly
How to avoid: SplitInBatches has two outputs: the first goes to processing nodes, the second is the loop-back. Connect the end of your processing chain back to SplitInBatches input. The first output also serves as the 'done' output when all batches are processed.
Why it's a problem: Retrying rate-limited requests immediately without backoff delay
How to avoid: Always add an exponential backoff delay between retries. Start with 2 seconds and double each time. Use a Wait node with the delay calculated in a Code node.
Why it's a problem: Not accounting for LLM API rate limits that are per-minute, not per-second
How to avoid: OpenAI, Anthropic, and Gemini rate limits are typically RPM (requests per minute). A burst of 60 requests in 5 seconds will hit the per-minute limit. Space requests evenly using a throttle mechanism.
Best practices
- Set workflow concurrency limits to match your LLM API rate limits, with a 20% safety margin
- Use SplitInBatches with batch size 1 for any workflow that processes multiple items through an LLM
- Enable queue mode with Redis for production deployments handling more than 10 concurrent users
- Monitor rate limit hits using workflow static data counters and alert when they exceed thresholds
- Use the Wait node with dynamic delay times for backoff instead of sleeping in Code nodes
- Separate high-priority and low-priority LLM calls into different workflows with different concurrency settings
- Set N8N_CONCURRENCY_PRODUCTION_LIMIT as a global safety net even when individual workflows have their own limits
- Log all rate-limited and failed requests to identify peak usage patterns and adjust limits accordingly
Still stuck?
Copy one of these prompts to get a personalized, step-by-step explanation.
My n8n workflows hit 429 rate limit errors when multiple users trigger LLM calls simultaneously. How do I set concurrency limits, use SplitInBatches for sequential processing, add exponential backoff retry logic, and configure n8n queue mode with Redis for production?
Fix concurrency issues in my n8n LLM workflow. Set the workflow concurrency limit to 5, add SplitInBatches before the OpenAI node with batch size 1, and create a Code node that detects 429 errors and calculates exponential backoff delay for a Wait node.
Frequently asked questions
What is the default concurrency limit in n8n?
By default, n8n has no concurrency limit in regular mode — it processes as many executions as the server can handle. In queue mode, the default is determined by the N8N_CONCURRENCY_PRODUCTION_LIMIT environment variable. Always set an explicit limit for production workflows.
How do I know my LLM API's rate limits?
Check your provider's documentation or dashboard. OpenAI: platform.openai.com → Settings → Rate Limits. Anthropic: console.anthropic.com → Rate Limits. The limits depend on your plan tier and include RPM (requests per minute) and TPM (tokens per minute).
Does SplitInBatches slow down my workflow?
Yes, intentionally. With batch size 1, each LLM call completes before the next starts. For 50 items at 3 seconds per call, the total time is about 150 seconds versus 3 seconds for parallel processing. This tradeoff prevents rate limits and memory issues.
Can I have different concurrency limits for different workflows?
Yes. Set workflow-level concurrency limits in Workflow Settings for each workflow individually. The global N8N_CONCURRENCY_PRODUCTION_LIMIT serves as a safety net across all workflows.
Does queue mode require Redis?
Yes. n8n queue mode uses BullMQ, which requires a Redis instance. You can use a managed Redis service (AWS ElastiCache, Redis Cloud) or self-host Redis alongside n8n.
Can RapidDev help set up a high-throughput n8n deployment with queue mode?
Yes. RapidDev can architect and deploy n8n with queue mode, Redis, worker scaling, and concurrency tuning optimized for your specific LLM API usage patterns and traffic volumes.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your project.
Book a free consultation