Skip to main content
RapidDev - Software Development Agency
n8n-tutorial

How to Queue OpenAI Requests When Too Many Users Hit the Workflow at Once

When multiple users trigger your n8n workflow simultaneously, OpenAI rate limits can cause failures. Use the SplitInBatches node combined with a Wait node to queue requests, processing them in controlled bursts with delays between batches. This prevents 429 errors and ensures every user gets a response without overloading the API.

What you'll learn

  • How to collect concurrent webhook requests into a processable queue
  • How to use SplitInBatches to throttle OpenAI API calls
  • How to add Wait node delays between batches for rate limit compliance
  • How to configure n8n concurrency settings to control parallel executions
Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Advanced7 min read35-50 minutesn8n 1.30+, OpenAI node, SplitInBatches nodeMarch 2026RapidDev Engineering Team
TL;DR

When multiple users trigger your n8n workflow simultaneously, OpenAI rate limits can cause failures. Use the SplitInBatches node combined with a Wait node to queue requests, processing them in controlled bursts with delays between batches. This prevents 429 errors and ensures every user gets a response without overloading the API.

Why You Need Request Queuing for OpenAI in n8n

OpenAI enforces rate limits measured in requests per minute (RPM) and tokens per minute (TPM). When a webhook-triggered workflow receives many concurrent requests, each one fires an independent OpenAI call. Without queuing, you will hit 429 Too Many Requests errors and lose user messages. This tutorial builds a queuing system using n8n's built-in SplitInBatches and Wait nodes so requests are processed sequentially with configurable delays, keeping you safely under rate limits.

Prerequisites

  • A running n8n instance (v1.30 or later)
  • An OpenAI API credential configured in n8n
  • Understanding of n8n Webhook node and execution model
  • Knowledge of your OpenAI tier rate limits (check platform.openai.com/account/limits)
  • Basic familiarity with SplitInBatches node

Step-by-step guide

1

Set n8n concurrency controls to limit parallel executions

Before building the workflow, configure n8n itself to limit how many workflow executions run at once. Set the environment variable N8N_CONCURRENCY_PRODUCTION_LIMIT to a value that matches your OpenAI tier. For example, if your OpenAI tier allows 60 RPM, set this to 5 so at most 5 executions run simultaneously. This is your first line of defense against rate limit errors.

typescript
1# In your n8n environment configuration:
2N8N_CONCURRENCY_PRODUCTION_LIMIT=5

Expected result: n8n limits concurrent production executions to 5, preventing a sudden burst from overwhelming OpenAI

2

Create the Webhook trigger with queuing-friendly response mode

Add a Webhook node with HTTP Method POST and Path /ai-chat. Set the Response Mode to 'Using Respond to Webhook Node' so the caller waits for the actual AI response. This is critical because it keeps the HTTP connection open while the request waits in the queue, ensuring the user eventually gets their response even if it takes longer due to queuing.

Expected result: Webhook node accepts POST requests and holds the connection open until a Respond to Webhook node fires

3

Add a SplitInBatches node to process requests in controlled groups

When a single execution handles multiple items (for example, from a batch endpoint or a database poll), the SplitInBatches node processes them in groups. Set the Batch Size to 1 for strict sequential processing, or increase it to 3-5 if your rate limits allow small bursts. Connect the Webhook output to SplitInBatches. Each batch will go through the OpenAI call, then loop back for the next batch.

typescript
1// SplitInBatches configuration:
2// Batch Size: 1 (safest for rate limits)
3// Options > Reset: false (keep processing state)

Expected result: Items flow through one at a time instead of all at once

4

Add the OpenAI node for the LLM call

Connect the first output of SplitInBatches (the batch output) to an OpenAI node. Configure it with your model (e.g., gpt-4o), the user prompt from {{ $json.message }}, and a system message. Keep max_tokens reasonable to stay within TPM limits. The SplitInBatches loop ensures only one call happens at a time.

Expected result: Each item gets processed by OpenAI individually rather than all at once

5

Add a Wait node between batches for rate limit spacing

Connect the OpenAI node output to a Wait node, then connect the Wait node back to the SplitInBatches node (the second input, which is the loop input). Set the Wait node to wait for 1-2 seconds. This creates a deliberate pause between each API call, preventing burst patterns that trigger rate limits even when you are technically under the RPM cap.

typescript
1// Wait node configuration:
2// Resume: After Time Interval
3// Wait Amount: 1
4// Wait Unit: Seconds

Expected result: Each OpenAI call is separated by at least 1 second, creating smooth request distribution

6

Add error handling for 429 responses that still occur

Even with queuing, edge cases can trigger 429 errors. Add an Error Trigger workflow that catches failed executions, waits 30-60 seconds using a Wait node, then retries the original request by calling the same webhook. In the main workflow, enable Settings > Error Workflow and point it to this retry workflow. Set a maximum retry count (stored in the payload) to prevent infinite loops.

typescript
1// In the Error Trigger workflow's Code node:
2const errorData = $input.first().json;
3const retryCount = (errorData.retryCount || 0) + 1;
4
5if (retryCount > 3) {
6 // Stop retrying after 3 attempts
7 return [{ json: { error: 'Max retries exceeded', originalMessage: errorData.message } }];
8}
9
10// Pass to Wait node (30s) then HTTP Request node to re-call webhook
11return [{ json: { ...errorData, retryCount } }];

Expected result: Failed requests automatically retry up to 3 times with increasing delays

Complete working example

queue-openai-requests.js
1// ====== Rate Limiter Code Node ======
2// Place this between the Webhook and OpenAI nodes
3// to add metadata for queue management
4
5const input = $input.first().json;
6const now = Date.now();
7
8// Add queue metadata
9const enriched = {
10 message: input.message || '',
11 userId: input.userId || 'anonymous',
12 queuedAt: now,
13 retryCount: input.retryCount || 0,
14 priority: input.priority || 'normal'
15};
16
17return [{ json: enriched }];
18
19// ====== Retry Logic Code Node (Error Workflow) ======
20// This runs in a separate Error Trigger workflow
21
22// const execution = $input.first().json;
23// const errorMessage = execution.error?.message || '';
24// const isRateLimit = errorMessage.includes('429') || errorMessage.includes('rate limit');
25//
26// if (!isRateLimit) {
27// // Not a rate limit error, don't retry
28// return [{ json: { shouldRetry: false, error: errorMessage } }];
29// }
30//
31// const retryCount = (execution.retryCount || 0) + 1;
32// if (retryCount > 3) {
33// return [{ json: { shouldRetry: false, error: 'Max retries exceeded' } }];
34// }
35//
36// // Calculate exponential backoff: 30s, 60s, 120s
37// const waitSeconds = 30 * Math.pow(2, retryCount - 1);
38//
39// return [{ json: {
40// shouldRetry: true,
41// retryCount,
42// waitSeconds,
43// originalPayload: execution.originalPayload
44// }}];

Common mistakes when queuing OpenAI Requests When Too Many Users Hit the Workflow at

Why it's a problem: Setting the Webhook Response Mode to 'Immediately' which returns empty responses before the LLM processes

How to avoid: Use 'Using Respond to Webhook Node' and add a Respond to Webhook node after the OpenAI node

Why it's a problem: Connecting the Wait node output back to the wrong SplitInBatches input, creating an infinite loop

How to avoid: Connect the Wait node to the second (loop) input of SplitInBatches, not the first (data) input

Why it's a problem: Not setting a retry limit in the error workflow, causing infinite retry loops that burn API credits

How to avoid: Always track retryCount in the payload and stop after 3-5 attempts

Why it's a problem: Using the same Wait node delay for all retry attempts instead of exponential backoff

How to avoid: Calculate delay as baseDelay * Math.pow(2, retryCount - 1) for progressively longer waits

Best practices

  • Set N8N_CONCURRENCY_PRODUCTION_LIMIT to match your OpenAI tier's RPM divided by 10 as a starting point
  • Use SplitInBatches with Batch Size 1 for the strictest rate control, increase only after testing
  • Add a Wait node between batches with delay calculated from your RPM limit (60 / RPM = seconds)
  • Implement an Error Trigger workflow for automatic retry with exponential backoff
  • Monitor your OpenAI usage dashboard alongside n8n execution history to spot trends
  • Consider using gpt-4o-mini for high-volume workloads since it has higher rate limits and lower cost
  • Store failed messages in a database so they can be reprocessed if all retries fail
  • Use the n8n Respond to Webhook node so users always get a response, even if delayed

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

I have an n8n workflow triggered by a webhook that calls OpenAI. When many users hit it at once, I get 429 rate limit errors. How do I queue these requests using SplitInBatches and Wait nodes to stay under OpenAI's rate limits?

n8n Prompt

Add a SplitInBatches node after the Webhook with Batch Size 1. Connect its output to the OpenAI node, then add a Wait node (1 second) after OpenAI, and loop the Wait output back to SplitInBatches. Set N8N_CONCURRENCY_PRODUCTION_LIMIT=5 in your environment.

Frequently asked questions

What is the best batch size for OpenAI requests in n8n?

Start with a batch size of 1 in the SplitInBatches node. This is the safest option because it processes one request at a time. If your OpenAI tier allows 60+ RPM and you need higher throughput, you can increase to 3-5, but always pair it with a Wait node.

Can I use n8n's built-in retry settings instead of a custom error workflow?

n8n has basic retry settings under Settings > Retry On Fail, but they use fixed delays. For rate limit errors, exponential backoff is more effective. A custom Error Trigger workflow gives you full control over retry timing and maximum attempts.

How long should users wait for a response when requests are queued?

With a batch size of 1 and 1-second delays, a user who is 10th in the queue would wait roughly 10 seconds plus LLM processing time. Set a reasonable client-side timeout (60 seconds) and show a loading state in your frontend.

Does this queuing approach work with streaming responses?

The SplitInBatches + Wait pattern works with standard (non-streaming) responses. For streaming, you would need to use Server-Sent Events or WebSockets, which requires a different architecture outside of n8n's webhook model.

What happens if n8n restarts while requests are in the queue?

Active executions are lost on restart. To prevent data loss, store incoming requests in a database (PostgreSQL or Supabase) before processing, and add a scheduled workflow that checks for unprocessed records.

Can RapidDev help me build a production-grade queuing system in n8n?

Yes. RapidDev builds custom n8n workflows with enterprise-grade queuing, including dead-letter queues, priority handling, and monitoring dashboards for teams with high-volume AI workloads.

RapidDev

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

Need help with your project?

Our experts have built 600+ apps and can accelerate your development. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.