Skip to main content
RapidDev - Software Development Agency
n8n-tutorial

How to Fix Webhook Replies Arriving Too Late for LLM Processing in n8n

Webhook replies arrive too late when the Respond to Webhook node waits for the LLM to finish, but the caller's HTTP client times out first. Fix this by using the async response pattern: immediately return a 202 Accepted status with a processing ID via Respond to Webhook, then continue the LLM processing, and deliver results through a callback URL, polling endpoint, or server-sent events. This decouples the webhook response time from the LLM processing time.

What you'll learn

  • How to return an immediate 202 acknowledgment before LLM processing starts
  • How to deliver LLM results via callback URL after processing completes
  • How to implement a polling endpoint for callers that cannot receive callbacks
  • How to handle timeout and retry scenarios without duplicate processing
Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Advanced11 min read30-40 minutesn8n 1.25+ (self-hosted and Cloud)March 2026RapidDev Engineering Team
TL;DR

Webhook replies arrive too late when the Respond to Webhook node waits for the LLM to finish, but the caller's HTTP client times out first. Fix this by using the async response pattern: immediately return a 202 Accepted status with a processing ID via Respond to Webhook, then continue the LLM processing, and deliver results through a callback URL, polling endpoint, or server-sent events. This decouples the webhook response time from the LLM processing time.

Async Response Patterns for Webhook-Triggered LLM Workflows in n8n

LLM API calls can take 5-30 seconds depending on the model, prompt complexity, and provider load. When a webhook caller (frontend app, chatbot, integration) sends a request and waits for the response, it often hits its own timeout limit before the LLM finishes. The result is a timeout error on the caller side while the n8n workflow continues processing. The caller receives nothing and may retry, creating duplicate work. This tutorial implements the async response pattern that eliminates this timing mismatch.

Prerequisites

  • An n8n workflow with a Webhook trigger connected to an LLM node (AI Agent, Basic LLM Chain, etc.)
  • Understanding of HTTP response codes (200, 202, 408, 504)
  • A way to receive callbacks or poll for results (callback URL in your application, or a separate n8n webhook)
  • Basic familiarity with n8n's Respond to Webhook node and workflow settings

Step-by-step guide

1

Understand the timing problem between webhooks and LLM processing

When the Webhook node's Response Mode is set to 'Last Node' or 'Using Respond to Webhook Node' (placed after the LLM node), n8n holds the HTTP connection open until the LLM finishes. If the LLM takes 15 seconds but the caller's HTTP client timeout is 10 seconds, the caller receives a timeout error. Meanwhile, the n8n workflow continues running, processes the LLM call successfully, and tries to send the response, but the connection is already closed. The caller gets no result and may retry, causing the LLM to process the same request again. To fix this, you need to split the workflow into two phases: immediate acknowledgment and deferred delivery.

Expected result: You understand that the timing mismatch between webhook timeout and LLM processing time causes the problem.

2

Add an immediate 202 response before the LLM node

Set the Webhook node's Response Mode to 'Using Respond to Webhook Node'. Then add a Code node immediately after the Webhook that generates a processing ID and extracts the callback URL from the request. Connect this Code node to a Respond to Webhook node. Configure the Respond to Webhook node to return HTTP status 202 (Accepted) with a JSON body containing the processing ID and an estimated completion time. Place this Respond to Webhook node BEFORE the LLM node in the workflow. After the Respond to Webhook sends the immediate reply, the workflow continues to the LLM node without holding the HTTP connection.

typescript
1// Code node: Prepare Async Response
2const webhook = $input.first().json;
3const body = webhook.body || webhook;
4
5const processingId = $execution.id;
6
7return [{
8 json: {
9 // Fields for immediate webhook response
10 processingId: processingId,
11 status: 'accepted',
12 message: 'Your request is being processed.',
13 estimatedSeconds: 15,
14
15 // Fields for LLM processing (passed downstream)
16 chatInput: body.message || body.text || '',
17 sessionId: body.sessionId || processingId,
18 callbackUrl: body.callbackUrl || body.callback_url || null
19 }
20}];
21
22// Respond to Webhook node settings:
23// Response Code: 202
24// Response Data: First Entry JSON
25// (This returns the processingId, status, message, estimatedSeconds fields)

Expected result: The caller receives a 202 response with a processing ID in under 1 second, and the workflow continues to the LLM node.

3

Process the LLM call after the webhook response

After the Respond to Webhook node, connect your LLM node (AI Agent, Basic LLM Chain, etc.) as usual. The webhook connection is already closed, so the LLM can take as long as needed without any timeout pressure. Configure the LLM node with the chatInput and sessionId from the Code node's output. Add error handling (On Error: Continue) so failures are captured instead of crashing the workflow. After the LLM node, add another Code node that packages the result with the processing ID for delivery.

typescript
1// Code node after LLM: Package Result for Delivery
2const llmOutput = $input.first().json;
3const processingId = $('Prepare Async Response').first().json.processingId;
4const callbackUrl = $('Prepare Async Response').first().json.callbackUrl;
5
6return [{
7 json: {
8 processingId: processingId,
9 callbackUrl: callbackUrl,
10 result: {
11 status: 'completed',
12 processingId: processingId,
13 response: llmOutput.output || llmOutput.text || '',
14 completedAt: new Date().toISOString()
15 }
16 }
17}];

Expected result: The LLM processes the request without time pressure, and the result is packaged with the processing ID for delivery.

4

Deliver results via callback URL

If the caller provided a callback URL in the original request, use an HTTP Request node to POST the result to that URL. Add an IF node after the LLM result packaging that checks whether a callback URL exists. If it does, route to the HTTP Request node. Configure the HTTP Request node to POST to {{ $json.callbackUrl }} with the result object as the JSON body. Add error handling on the HTTP Request node in case the callback URL is unreachable. If the callback fails, store the result in a database or queue for later retrieval. This is the most efficient delivery method because results are pushed immediately after processing.

typescript
1// HTTP Request node settings:
2// Method: POST
3// URL: {{ $json.callbackUrl }}
4// Body Content Type: JSON
5// Body: {{ $json.result }}
6// Options > Timeout: 10000 (10 seconds)
7// On Error: Continue

Expected result: The LLM result is delivered to the caller's callback URL immediately after processing completes.

5

Implement a polling endpoint as a fallback

Not all callers can receive callbacks. Create a second webhook in a separate workflow that acts as a status-checking endpoint. This endpoint accepts GET requests with the processing ID and returns the result if it is ready, or a 'processing' status if the LLM is still running. Store results in a database (Postgres, Redis, or n8n's static data) after the LLM completes. The polling workflow queries the database by processing ID and returns the result. The caller polls this endpoint every few seconds until the result is available. Include a TTL so results are cleaned up after retrieval.

typescript
1// Main workflow: Store result after LLM completes
2// Code node: Store Result
3const staticData = $getWorkflowStaticData('global');
4const result = $input.first().json;
5
6if (!staticData.results) staticData.results = {};
7
8staticData.results[result.processingId] = {
9 ...result.result,
10 storedAt: Date.now()
11};
12
13// Clean up results older than 5 minutes
14const FIVE_MINUTES = 5 * 60 * 1000;
15for (const id in staticData.results) {
16 if (Date.now() - staticData.results[id].storedAt > FIVE_MINUTES) {
17 delete staticData.results[id];
18 }
19}
20
21return [$input.first()];
22
23// ---
24// Polling workflow (separate):
25// Webhook (GET /result/:id) → Code node:
26const staticData = $getWorkflowStaticData('global');
27const processingId = $json.params?.id || $json.query?.id || '';
28
29const result = staticData.results?.[processingId];
30
31if (result) {
32 delete staticData.results[processingId];
33 return [{ json: { status: 'completed', ...result } }];
34} else {
35 return [{ json: { status: 'processing', processingId } }];
36}

Expected result: Callers can poll a separate endpoint to check whether their LLM result is ready and retrieve it.

6

Handle duplicate requests from caller retries

When callers time out, they often retry the same request. Without deduplication, the LLM processes the same prompt multiple times, wasting tokens and money. Add an idempotency check at the start of the workflow using a request ID provided by the caller or a hash of the request body. Before starting LLM processing, check if a result already exists for this request ID. If it does, return the cached result immediately. If it does not, proceed with processing. Use n8n's static data or a database to store the mapping from request ID to processing status.

typescript
1// Code node: Deduplication Check (place before Respond to Webhook)
2const staticData = $getWorkflowStaticData('global');
3const body = $input.first().json.body || $input.first().json;
4
5// Use client-provided request ID or hash the message
6const requestId = body.requestId || body.idempotencyKey || '';
7let isDuplicate = false;
8let existingResult = null;
9
10if (requestId && staticData.requests?.[requestId]) {
11 isDuplicate = true;
12 existingResult = staticData.requests[requestId];
13} else if (requestId) {
14 if (!staticData.requests) staticData.requests = {};
15 staticData.requests[requestId] = {
16 processingId: $execution.id,
17 status: 'processing',
18 startedAt: Date.now()
19 };
20}
21
22return [{
23 json: {
24 ...body,
25 processingId: isDuplicate ? existingResult.processingId : $execution.id,
26 isDuplicate: isDuplicate,
27 existingResult: existingResult
28 }
29}];
30
31// Follow with IF node: {{ $json.isDuplicate }}
32// True → Return cached result
33// False → Continue to LLM processing

Expected result: Duplicate requests return the existing processing ID or cached result without triggering another LLM call.

Complete working example

async-webhook-handler.js
1// Code node: Async Webhook Handler
2// Mode: Run Once for All Items
3// Place immediately after the Webhook node
4// Handles: immediate response, dedup, and context extraction
5
6const staticData = $getWorkflowStaticData('global');
7const webhookData = $input.first().json;
8const body = webhookData.body || webhookData;
9const headers = webhookData.headers || {};
10
11// --- Deduplication ---
12const requestId = body.requestId
13 || body.idempotencyKey
14 || headers['x-idempotency-key']
15 || '';
16
17let isDuplicate = false;
18let cachedProcessingId = null;
19
20if (requestId) {
21 if (!staticData.activeRequests) staticData.activeRequests = {};
22
23 // Clean entries older than 10 minutes
24 const TEN_MIN = 10 * 60 * 1000;
25 for (const id in staticData.activeRequests) {
26 if (Date.now() - staticData.activeRequests[id].ts > TEN_MIN) {
27 delete staticData.activeRequests[id];
28 }
29 }
30
31 if (staticData.activeRequests[requestId]) {
32 isDuplicate = true;
33 cachedProcessingId = staticData.activeRequests[requestId].pid;
34 } else {
35 staticData.activeRequests[requestId] = {
36 pid: $execution.id,
37 ts: Date.now()
38 };
39 }
40}
41
42// --- Build output ---
43const processingId = isDuplicate ? cachedProcessingId : $execution.id;
44
45const output = {
46 // For Respond to Webhook (immediate reply)
47 webhookResponse: {
48 status: isDuplicate ? 'already_processing' : 'accepted',
49 processingId: processingId,
50 message: isDuplicate
51 ? 'This request is already being processed.'
52 : 'Your request has been accepted for processing.',
53 estimatedSeconds: 15
54 },
55
56 // For LLM processing
57 chatInput: body.message || body.text || body.content || '',
58 sessionId: body.sessionId || body.userId || processingId,
59 callbackUrl: body.callbackUrl || body.callback_url || null,
60 processingId: processingId,
61 isDuplicate: isDuplicate,
62
63 // Metadata
64 _meta: {
65 requestId: requestId || 'none',
66 timestamp: new Date().toISOString(),
67 executionId: $execution.id
68 }
69};
70
71return [{ json: output }];

Common mistakes when fixing Webhook Replies Arriving Too Late for LLM Processing in n8n

Why it's a problem: Placing the Respond to Webhook node after the LLM node, causing it to wait for LLM completion

How to avoid: Move the Respond to Webhook node before the LLM node so the 202 response is sent immediately.

Why it's a problem: Not including a processing ID in the immediate response, making it impossible for callers to retrieve results

How to avoid: Generate a processing ID (use $execution.id) and include it in both the 202 response and the stored result.

Why it's a problem: Using n8n static data for result storage in a multi-worker queue mode setup

How to avoid: Use Postgres or Redis for result storage. Static data is not shared across worker instances in queue mode.

Why it's a problem: Not handling duplicate requests, causing the same prompt to be processed multiple times

How to avoid: Accept a requestId or idempotencyKey from the caller and check for existing processing before starting a new LLM call.

Why it's a problem: Forgetting that only one Respond to Webhook node can execute per webhook request

How to avoid: Place the single Respond to Webhook node early in the workflow for the immediate 202 response. Do not add another one after the LLM.

Best practices

  • Always return a 202 response immediately for LLM workflows that may take more than 3 seconds
  • Include a processing ID in the 202 response so callers can track and retrieve their results
  • Support both callback delivery and polling so callers can choose the method that fits their architecture
  • Implement idempotency checks to prevent duplicate LLM processing when callers retry timed-out requests
  • Store results in a database for polling, not in n8n static data, when running in queue mode with multiple workers
  • Clean up stored results after retrieval or after a TTL to prevent unbounded storage growth
  • Add error handling on the callback HTTP Request node in case the caller's URL is unreachable
  • Document the async API contract (202 response format, callback payload, polling endpoint) for API consumers

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

My n8n webhook workflow calls an LLM that takes 15-20 seconds. The caller times out before getting the response. How do I implement an async pattern that returns 202 immediately and delivers results via a callback URL?

n8n Prompt

Build a workflow with a Webhook that immediately returns 202 with a processing ID, then processes the request through an AI Agent node, and delivers the result via HTTP Request to a callback URL. Include deduplication for retry requests.

Frequently asked questions

Why does my webhook caller get a timeout error even though the workflow succeeds?

The LLM processing takes longer than the caller's HTTP timeout. The workflow runs to completion but the HTTP connection is already closed. Implement the 202 async pattern to return immediately and deliver results separately.

Can I use Respond to Webhook twice in the same workflow?

No, only one Respond to Webhook node can execute per webhook request. The first one to execute sends the response and closes the connection. Place it before the LLM node for the immediate 202 response.

What is the difference between HTTP 200 and 202?

HTTP 200 means the request was processed and the response contains the result. HTTP 202 means the request was accepted for processing but the result is not ready yet. Use 202 for async patterns to signal that the caller should check back later.

How does the caller retrieve the result after getting a 202?

Two common patterns: 1) The caller provides a callbackUrl in the request, and n8n POSTs the result to that URL when ready. 2) The caller polls a separate status endpoint using the processing ID until the result is available.

What if the callback URL is unreachable when the result is ready?

Add error handling on the HTTP Request node that delivers the callback. On failure, store the result in a database and optionally retry the callback after a delay. The result should always be available via the polling endpoint as a backup.

How do I prevent the same request from being processed twice when the caller retries?

Have the caller send a unique requestId or idempotencyKey. Check this ID against a cache at the start of the workflow. If it exists, return the existing processing ID without starting a new LLM call.

What timeout does Slack use for incoming webhooks?

Slack requires a response within 3 seconds. Any LLM processing that takes longer must use the async pattern: respond to Slack immediately with a 200, then use Slack's response_url to post the result later.

Can RapidDev help build async webhook architectures in n8n?

Yes, RapidDev designs production-grade async patterns in n8n, including callback delivery, polling endpoints, deduplication, and result storage. Their team can build the complete async infrastructure for your LLM-powered webhooks.

RapidDev

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

Need help with your project?

Our experts have built 600+ apps and can accelerate your development. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.