Skip to main content
RapidDev - Software Development Agency
n8n-tutorial

How to Handle Concurrency Issues with Parallel LLM Calls in n8n

Parallel LLM calls in n8n cause rate limit errors (429), memory spikes, and inconsistent results when multiple webhook triggers or SplitInBatches nodes fire simultaneously. Fix this by setting workflow concurrency limits, using SplitInBatches with batch size 1 for sequential processing, adding rate limit handling in Code nodes, and configuring n8n queue mode for production deployments.

What you'll learn

  • How to set workflow and global concurrency limits in n8n
  • How to use SplitInBatches to serialize LLM calls
  • How to handle 429 rate limit errors with exponential backoff
  • How to configure n8n queue mode for high-throughput production deployments
Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Advanced9 min read25-35 minutesn8n 1.20+ with any LLM node (OpenAI, Anthropic, Gemini, Cohere, Mistral)March 2026RapidDev Engineering Team
TL;DR

Parallel LLM calls in n8n cause rate limit errors (429), memory spikes, and inconsistent results when multiple webhook triggers or SplitInBatches nodes fire simultaneously. Fix this by setting workflow concurrency limits, using SplitInBatches with batch size 1 for sequential processing, adding rate limit handling in Code nodes, and configuring n8n queue mode for production deployments.

Why Parallel LLM Calls Create Problems in n8n

n8n processes items in parallel by default. When a workflow receives multiple webhook requests simultaneously, or when a node produces multiple items that flow into an LLM node, n8n sends all LLM API calls at once. This causes three problems: (1) rate limit errors (429) when the API rejects too many simultaneous requests, (2) memory spikes when multiple large LLM responses are held in memory, and (3) inconsistent execution order when responses arrive out of sequence. In production, these issues compound — a spike in traffic can cascade into hundreds of failed executions.

Prerequisites

  • A running n8n instance (self-hosted recommended for queue mode)
  • LLM API credentials with known rate limits
  • A workflow that makes multiple LLM calls (via webhook traffic or batch processing)
  • Understanding of n8n workflow settings and environment variables

Step-by-step guide

1

Set Workflow-Level Concurrency Limits

The simplest fix for parallel execution issues is to limit how many instances of a workflow can run simultaneously. In the n8n editor, click the gear icon (Workflow Settings) in the top bar. Under 'Execution', find 'Concurrency Limit' and set it to a value that matches your API rate limits. For example, if your OpenAI plan allows 60 requests per minute, set the concurrency to 3-5 to leave headroom. This prevents webhook-triggered workflows from spawning dozens of simultaneous LLM calls during traffic spikes.

Expected result: Only a limited number of workflow executions run simultaneously, preventing rate limit cascades.

2

Use SplitInBatches for Sequential LLM Processing

When a single workflow execution needs to process multiple items through an LLM (e.g., summarizing 50 documents), use the SplitInBatches node to process them one at a time or in small groups. Set the batch size to 1 for the safest approach, or to a higher number if your rate limits allow it. Place SplitInBatches before the LLM node and connect the second output (loop) back to SplitInBatches to create the processing loop.

typescript
1// Workflow structure for sequential LLM processing:
2//
3// [Trigger] → [Get Documents] → [SplitInBatches (size=1)]
4// ↓ ↑ (loop back)
5// [LLM Node] → [Save Result] → [SplitInBatches]
6// ↓ (done)
7// [Merge Results] → [Output]
8
9// In the SplitInBatches node:
10// Batch Size: 1 (one item at a time)
11// Options → Reset: false (to accumulate results)

Expected result: LLM calls are processed sequentially, one item at a time, preventing rate limit errors.

3

Add Rate Limit Handling with Exponential Backoff

Even with concurrency limits, you may still hit rate limits during sustained usage. Enable 'Continue On Fail' on your LLM node and add a Code node after it to detect 429 errors. When a rate limit is hit, the Code node calculates a backoff delay and flags the item for retry. Combine this with a Wait node and a loop to implement automatic retrying.

typescript
1// Code node: Rate limit detection and backoff calculation
2const item = $input.item;
3const json = item.json;
4
5// Detect rate limit errors
6const isRateLimit = json.error && (
7 json.error.statusCode === 429 ||
8 json.error.message?.includes('rate limit') ||
9 json.error.message?.includes('Rate limit') ||
10 json.error.message?.includes('Too Many Requests')
11);
12
13if (isRateLimit) {
14 const retryCount = json._retry_count || 0;
15 const maxRetries = 5;
16
17 if (retryCount >= maxRetries) {
18 return [{
19 json: {
20 ...json,
21 status: 'rate_limit_exhausted',
22 error: 'Exceeded max retries due to rate limiting'
23 }
24 }];
25 }
26
27 // Exponential backoff: 2s, 4s, 8s, 16s, 32s
28 const backoffMs = Math.pow(2, retryCount + 1) * 1000;
29
30 // Check for Retry-After header
31 const retryAfter = json.error.headers?.['retry-after'];
32 const waitMs = retryAfter ? parseInt(retryAfter) * 1000 : backoffMs;
33
34 return [{
35 json: {
36 ...json,
37 _retry_count: retryCount + 1,
38 _wait_ms: waitMs,
39 _should_retry: true,
40 status: 'rate_limited'
41 }
42 }];
43}
44
45// Not rate limited — pass through
46return [{
47 json: {
48 ...json,
49 _should_retry: false,
50 status: json.error ? 'error' : 'success'
51 }
52}];

Expected result: Rate-limited requests are automatically retried with increasing delays instead of failing permanently.

4

Configure n8n Queue Mode for Production

For high-throughput production deployments, enable n8n's queue mode. Queue mode uses Redis (or BullMQ) to distribute workflow executions across multiple worker processes, providing natural concurrency control and preventing memory overload. Set the EXECUTIONS_MODE environment variable to 'queue' and configure the Redis connection. Each worker processes one execution at a time by default, and you can scale workers based on your needs.

typescript
1# Docker Compose for n8n queue mode
2services:
3 n8n-main:
4 image: n8nio/n8n:latest
5 environment:
6 - EXECUTIONS_MODE=queue
7 - QUEUE_BULL_REDIS_HOST=redis
8 - QUEUE_BULL_REDIS_PORT=6379
9 - N8N_CONCURRENCY_PRODUCTION_LIMIT=5
10 command: start
11
12 n8n-worker:
13 image: n8nio/n8n:latest
14 environment:
15 - EXECUTIONS_MODE=queue
16 - QUEUE_BULL_REDIS_HOST=redis
17 - QUEUE_BULL_REDIS_PORT=6379
18 - N8N_CONCURRENCY_PRODUCTION_LIMIT=3
19 command: worker
20 deploy:
21 replicas: 2
22
23 redis:
24 image: redis:7-alpine
25 volumes:
26 - redis_data:/data
27
28volumes:
29 redis_data:

Expected result: Workflow executions are queued and processed by workers with controlled concurrency, preventing overload.

5

Add a Request Queue Using Static Data

For simpler deployments without Redis, you can implement a basic request queue using n8n's $getWorkflowStaticData() function. This approach throttles requests within a single workflow by tracking timestamps and enforcing a minimum delay between LLM calls. Add this Code node before your LLM node.

typescript
1// Code node: Simple request throttle using static data
2const staticData = $getWorkflowStaticData('global');
3const now = Date.now();
4const MIN_DELAY_MS = 1000; // Minimum 1 second between LLM calls
5
6// Check last request timestamp
7const lastRequest = staticData.lastLLMRequest || 0;
8const elapsed = now - lastRequest;
9
10if (elapsed < MIN_DELAY_MS) {
11 // Need to wait — pass the delay to a subsequent Wait node
12 const waitTime = MIN_DELAY_MS - elapsed;
13 staticData.lastLLMRequest = now + waitTime;
14 return [{
15 json: {
16 ...$json,
17 _throttle_wait_ms: waitTime,
18 _throttled: true
19 }
20 }];
21}
22
23// No throttling needed
24staticData.lastLLMRequest = now;
25return [{
26 json: {
27 ...$json,
28 _throttle_wait_ms: 0,
29 _throttled: false
30 }
31}];

Expected result: LLM calls are spaced at least 1 second apart, preventing burst traffic from hitting rate limits.

Complete working example

concurrency-controlled-llm-pipeline.js
1// Code node: Run Once for Each Item
2// Complete concurrency control for LLM API calls
3// Place AFTER the LLM node (with Continue On Fail enabled)
4
5const item = $input.item;
6const json = item.json;
7const staticData = $getWorkflowStaticData('global');
8
9// Initialize counters
10if (!staticData.requestLog) {
11 staticData.requestLog = {
12 total: 0,
13 success: 0,
14 rate_limited: 0,
15 errors: 0,
16 window_start: Date.now()
17 };
18}
19
20const log = staticData.requestLog;
21log.total++;
22
23// Reset counters every hour
24if (Date.now() - log.window_start > 3600000) {
25 log.total = 1;
26 log.success = 0;
27 log.rate_limited = 0;
28 log.errors = 0;
29 log.window_start = Date.now();
30}
31
32// Detect rate limit errors
33const isRateLimit = json.error && (
34 json.error.statusCode === 429 ||
35 json.error.message?.includes('rate limit') ||
36 json.error.message?.includes('Too Many Requests')
37);
38
39if (isRateLimit) {
40 log.rate_limited++;
41 const retryCount = json._retry_count || 0;
42 const backoffMs = Math.min(Math.pow(2, retryCount + 1) * 1000, 60000);
43
44 return [{
45 json: {
46 original_input: json._original_input || json,
47 status: 'rate_limited',
48 _retry_count: retryCount + 1,
49 _wait_ms: backoffMs,
50 _should_retry: retryCount < 5,
51 _stats: { ...log }
52 }
53 }];
54}
55
56// Detect other errors
57if (json.error) {
58 log.errors++;
59 return [{
60 json: {
61 original_input: json._original_input || json,
62 status: 'error',
63 error_message: json.error.message,
64 _should_retry: false,
65 _stats: { ...log }
66 }
67 }];
68}
69
70// Success
71log.success++;
72const text = json.message?.content || json.text || json.output || '';
73
74return [{
75 json: {
76 text: text,
77 status: 'success',
78 _should_retry: false,
79 _stats: { ...log }
80 }
81}];

Common mistakes when handling Concurrency Issues with Parallel LLM Calls in n8n

Why it's a problem: Leaving workflow concurrency unlimited on webhook-triggered workflows

How to avoid: Set a concurrency limit in Workflow Settings or via the N8N_CONCURRENCY_PRODUCTION_LIMIT environment variable. Even a limit of 10 prevents most rate limit cascades.

Why it's a problem: Using SplitInBatches but not connecting the loop output back correctly

How to avoid: SplitInBatches has two outputs: the first goes to processing nodes, the second is the loop-back. Connect the end of your processing chain back to SplitInBatches input. The first output also serves as the 'done' output when all batches are processed.

Why it's a problem: Retrying rate-limited requests immediately without backoff delay

How to avoid: Always add an exponential backoff delay between retries. Start with 2 seconds and double each time. Use a Wait node with the delay calculated in a Code node.

Why it's a problem: Not accounting for LLM API rate limits that are per-minute, not per-second

How to avoid: OpenAI, Anthropic, and Gemini rate limits are typically RPM (requests per minute). A burst of 60 requests in 5 seconds will hit the per-minute limit. Space requests evenly using a throttle mechanism.

Best practices

  • Set workflow concurrency limits to match your LLM API rate limits, with a 20% safety margin
  • Use SplitInBatches with batch size 1 for any workflow that processes multiple items through an LLM
  • Enable queue mode with Redis for production deployments handling more than 10 concurrent users
  • Monitor rate limit hits using workflow static data counters and alert when they exceed thresholds
  • Use the Wait node with dynamic delay times for backoff instead of sleeping in Code nodes
  • Separate high-priority and low-priority LLM calls into different workflows with different concurrency settings
  • Set N8N_CONCURRENCY_PRODUCTION_LIMIT as a global safety net even when individual workflows have their own limits
  • Log all rate-limited and failed requests to identify peak usage patterns and adjust limits accordingly

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

My n8n workflows hit 429 rate limit errors when multiple users trigger LLM calls simultaneously. How do I set concurrency limits, use SplitInBatches for sequential processing, add exponential backoff retry logic, and configure n8n queue mode with Redis for production?

n8n Prompt

Fix concurrency issues in my n8n LLM workflow. Set the workflow concurrency limit to 5, add SplitInBatches before the OpenAI node with batch size 1, and create a Code node that detects 429 errors and calculates exponential backoff delay for a Wait node.

Frequently asked questions

What is the default concurrency limit in n8n?

By default, n8n has no concurrency limit in regular mode — it processes as many executions as the server can handle. In queue mode, the default is determined by the N8N_CONCURRENCY_PRODUCTION_LIMIT environment variable. Always set an explicit limit for production workflows.

How do I know my LLM API's rate limits?

Check your provider's documentation or dashboard. OpenAI: platform.openai.com → Settings → Rate Limits. Anthropic: console.anthropic.com → Rate Limits. The limits depend on your plan tier and include RPM (requests per minute) and TPM (tokens per minute).

Does SplitInBatches slow down my workflow?

Yes, intentionally. With batch size 1, each LLM call completes before the next starts. For 50 items at 3 seconds per call, the total time is about 150 seconds versus 3 seconds for parallel processing. This tradeoff prevents rate limits and memory issues.

Can I have different concurrency limits for different workflows?

Yes. Set workflow-level concurrency limits in Workflow Settings for each workflow individually. The global N8N_CONCURRENCY_PRODUCTION_LIMIT serves as a safety net across all workflows.

Does queue mode require Redis?

Yes. n8n queue mode uses BullMQ, which requires a Redis instance. You can use a managed Redis service (AWS ElastiCache, Redis Cloud) or self-host Redis alongside n8n.

Can RapidDev help set up a high-throughput n8n deployment with queue mode?

Yes. RapidDev can architect and deploy n8n with queue mode, Redis, worker scaling, and concurrency tuning optimized for your specific LLM API usage patterns and traffic volumes.

RapidDev

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

Need help with your project?

Our experts have built 600+ apps and can accelerate your development. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.