Skip to main content
RapidDev - Software Development Agency
n8n-tutorial

How to Stop n8n from Cutting Off Long Language Model Responses

Long LLM responses get cut off in n8n when the max_tokens setting is too low, the model hits its output token limit, or n8n's execution timeout kills the workflow before the response arrives. Fix this by increasing max_tokens in the LLM node, raising n8n's execution timeout via N8N_EXECUTIONS_TIMEOUT, splitting long generation tasks into chunks, and using the HTTP Request node for direct API control over streaming and token limits.

What you'll learn

  • How to identify which limit is causing truncation (max_tokens, timeout, or model limit)
  • How to configure max_tokens correctly for each LLM provider in n8n
  • How to increase n8n execution timeouts for long-running LLM calls
  • How to split long generation tasks into sequential chunks that bypass all limits
Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Advanced9 min read25-40 minutesn8n 1.30+, any LLM node (OpenAI, Claude, Gemini)March 2026RapidDev Engineering Team
TL;DR

Long LLM responses get cut off in n8n when the max_tokens setting is too low, the model hits its output token limit, or n8n's execution timeout kills the workflow before the response arrives. Fix this by increasing max_tokens in the LLM node, raising n8n's execution timeout via N8N_EXECUTIONS_TIMEOUT, splitting long generation tasks into chunks, and using the HTTP Request node for direct API control over streaming and token limits.

Why n8n Cuts Off Long LLM Responses

There are three distinct reasons why LLM responses get truncated in n8n, and each requires a different fix. First, the max_tokens parameter in the LLM node caps the output length — if set to 1024, the model stops generating at 1024 tokens regardless of whether it finished. Second, n8n's execution timeout (default 300 seconds) kills long-running workflows, and large LLM responses can take 30-60+ seconds to generate. Third, the model itself may hit its output token limit (e.g., Claude Sonnet has a 8,192 output token limit by default). This tutorial addresses all three causes with specific fixes.

Prerequisites

  • A running n8n instance (v1.30 or later)
  • An active credential for at least one LLM provider
  • A workflow that generates long responses (articles, reports, documentation)
  • Access to n8n environment configuration (for timeout settings)

Step-by-step guide

1

Diagnose which limit is causing the truncation

Open your failed or truncated execution in n8n. Click on the LLM node output and inspect the response. Look for three clues: (1) If the response has a 'finish_reason' of 'length' (OpenAI) or 'end_turn' is missing (Claude), the max_tokens limit was hit. (2) If the execution shows 'Execution timed out', the n8n timeout killed it. (3) If the response is cut mid-sentence with no error, the model's built-in output limit was reached. Each requires a different fix.

typescript
1// Check the LLM node output for these indicators:
2//
3// OpenAI: response.choices[0].finish_reason
4// 'stop' = completed normally
5// 'length' = max_tokens limit reached (TRUNCATED)
6//
7// Claude: response.stop_reason
8// 'end_turn' = completed normally
9// 'max_tokens' = output limit reached (TRUNCATED)
10//
11// Gemini: response.candidates[0].finishReason
12// 'STOP' = completed normally
13// 'MAX_TOKENS' = limit reached (TRUNCATED)

Expected result: You identify whether the truncation is caused by max_tokens, execution timeout, or model output limits

2

Increase max_tokens in the LLM node

The most common cause of truncation is the max_tokens setting being too low. In the LLM node configuration, find the Max Tokens (or Maximum Length) parameter and increase it. For the built-in OpenAI node, this is under Options > Maximum Number of Tokens. For the Claude node, it is Max Tokens to Sample. Set it based on your needs: 2048 for medium responses, 4096 for long responses, or 8192 for very long content like articles.

typescript
1// Recommended max_tokens by use case:
2//
3// Short answers (support replies): 512-1024
4// Medium content (summaries, explanations): 2048-4096
5// Long content (articles, reports): 4096-8192
6// Maximum output by model:
7// GPT-4o: 16,384 output tokens
8// Claude 3.5 Sonnet: 8,192 output tokens
9// Claude 3 Opus: 4,096 output tokens
10// Gemini 2.0 Flash: 8,192 output tokens
11// Gemini 1.5 Pro: 8,192 output tokens

Expected result: The LLM generates longer responses without truncation at the token level

3

Increase n8n execution timeout for long-running LLM calls

Large LLM responses (4000+ tokens) can take 30-120 seconds to generate, especially with larger models like GPT-4o or Claude Opus. If n8n's execution timeout is set too low, it kills the workflow before the response arrives. Set the N8N_EXECUTIONS_TIMEOUT environment variable to a higher value (in seconds). For LLM-heavy workflows, 600 seconds (10 minutes) is a safe starting point.

typescript
1# Environment variable configuration:
2N8N_EXECUTIONS_TIMEOUT=600
3
4# For Docker:
5# environment:
6# - N8N_EXECUTIONS_TIMEOUT=600
7
8# Per-workflow override (in workflow settings):
9# Settings Timeout Workflow After 600 seconds

Expected result: Workflows have enough time to wait for large LLM responses without being killed

4

Use the HTTP Request node for direct API control

The built-in LLM nodes in n8n abstract away some API parameters. For maximum control over token limits and response handling, use the HTTP Request node to call the LLM API directly. This lets you set exact max_tokens values, access streaming endpoints, and read response metadata like finish_reason to detect truncation programmatically.

typescript
1// HTTP Request node calling Claude API directly:
2// Method: POST
3// URL: https://api.anthropic.com/v1/messages
4// Headers:
5// x-api-key: {{ $env.ANTHROPIC_API_KEY }}
6// anthropic-version: 2023-06-01
7// content-type: application/json
8// Body:
9{
10 "model": "claude-sonnet-4-20250514",
11 "max_tokens": 8192,
12 "messages": [
13 {
14 "role": "user",
15 "content": "{{ $json.message }}"
16 }
17 ],
18 "system": "You are a technical writer. Write complete, detailed responses. Never truncate or summarize unless explicitly asked."
19}

Expected result: You have full control over max_tokens, model selection, and can inspect finish_reason in the response

5

Split long generation tasks into sequential chunks

When you need responses longer than any single model's output limit (e.g., a 5000-word article), split the task into sections. Use a Code node to create an array of section prompts (Introduction, Section 1, Section 2, etc.), process them with SplitInBatches through the LLM node, and concatenate the results. Each chunk stays within token limits while the combined output can be arbitrarily long.

typescript
1// Code node: Split a long article into section prompts
2const topic = $input.first().json.topic;
3
4const sections = [
5 { section: 'Introduction', prompt: `Write a 500-word introduction about ${topic}. End with a transition to the first main section.` },
6 { section: 'Background', prompt: `Write a 600-word background section about ${topic}. Cover the history and current state. Start from where the introduction left off.` },
7 { section: 'Analysis', prompt: `Write a 700-word analysis section about ${topic}. Include data points and expert opinions.` },
8 { section: 'Recommendations', prompt: `Write a 500-word recommendations section about ${topic}. Provide actionable advice.` },
9 { section: 'Conclusion', prompt: `Write a 300-word conclusion about ${topic}. Summarize key points and end with a call to action.` }
10];
11
12return sections.map(s => ({ json: s }));

Expected result: A 2500-word article is generated in 5 chunks of 500 words each, bypassing any single-call output limit

6

Add a truncation detection node after the LLM

Add a Code node after the LLM that checks whether the response was truncated. If it was, the node can automatically retry with a continuation prompt ('Continue from where you left off: [last 200 chars]') to get the remaining content. This handles edge cases where even high max_tokens settings are not quite enough.

typescript
1const response = $input.first().json;
2
3// Detect truncation based on provider
4const finishReason = response.choices?.[0]?.finish_reason // OpenAI
5 || response.stop_reason // Claude
6 || response.candidates?.[0]?.finishReason // Gemini
7 || 'unknown';
8
9const isTruncated = ['length', 'max_tokens', 'MAX_TOKENS'].includes(finishReason);
10const text = response.choices?.[0]?.message?.content || response.content?.[0]?.text || response.text || '';
11
12if (isTruncated) {
13 // Get last 200 characters for continuation prompt
14 const lastChunk = text.slice(-200);
15 return [{
16 json: {
17 text,
18 isTruncated: true,
19 continuationPrompt: `Continue writing from exactly where this text ends. Do not repeat any content. Previous text ended with: "${lastChunk}"`
20 }
21 }];
22}
23
24return [{ json: { text, isTruncated: false } }];

Expected result: Truncated responses are detected and can be continued automatically

Complete working example

truncation-handler.js
1// ====== Truncation Detector & Auto-Continuator — Code Node ======
2// Place after the LLM node to detect and handle truncated responses
3
4const response = $input.first().json;
5
6// Extract text and finish reason across providers
7const text = response.choices?.[0]?.message?.content // OpenAI
8 || response.content?.[0]?.text // Claude
9 || response.candidates?.[0]?.content?.parts?.[0]?.text // Gemini
10 || response.output || response.text || '';
11
12const finishReason = response.choices?.[0]?.finish_reason // OpenAI
13 || response.stop_reason // Claude
14 || response.candidates?.[0]?.finishReason // Gemini
15 || 'unknown';
16
17const isTruncated = [
18 'length', // OpenAI
19 'max_tokens', // Claude
20 'MAX_TOKENS' // Gemini
21].includes(finishReason);
22
23const usage = {
24 inputTokens: response.usage?.prompt_tokens || response.usage?.input_tokens || 0,
25 outputTokens: response.usage?.completion_tokens || response.usage?.output_tokens || 0,
26 finishReason
27};
28
29if (isTruncated) {
30 const lastChunk = text.slice(-300);
31 return [{
32 json: {
33 text,
34 isTruncated: true,
35 usage,
36 continuation: {
37 needed: true,
38 prompt: `Continue from where this text ends. Do not repeat content. The text ended with: "...${lastChunk}"`,
39 previousLength: text.length
40 }
41 }
42 }];
43}
44
45return [{
46 json: {
47 text,
48 isTruncated: false,
49 usage,
50 continuation: { needed: false }
51 }
52}];

Common mistakes when stopping n8n from Cutting Off Long Language Model Responses

Why it's a problem: Leaving max_tokens at the default value (often 256 or 1024) when generating long content

How to avoid: Explicitly set max_tokens to match your content needs: 4096 for articles, 8192 for reports

Why it's a problem: Confusing n8n execution timeout with LLM generation time — they are separate limits

How to avoid: Increase both: max_tokens in the LLM node AND N8N_EXECUTIONS_TIMEOUT in environment variables

Why it's a problem: Setting max_tokens higher than the model supports, expecting more output

How to avoid: Check the model's actual output limit. Claude Sonnet maxes at 8,192 output tokens regardless of what you set

Why it's a problem: Using the Webhook Response Mode 'Immediately' for long LLM calls, returning an empty response to the caller

How to avoid: Use 'Using Respond to Webhook Node' so the HTTP connection stays open until the LLM finishes

Best practices

  • Always check finish_reason/stop_reason in LLM responses to detect truncation programmatically
  • Set max_tokens to at least 2x your expected output length for safety margin
  • Use per-workflow timeout settings instead of global N8N_EXECUTIONS_TIMEOUT for targeted control
  • For content longer than 8,000 tokens, split into sections and generate sequentially
  • Use the HTTP Request node instead of built-in LLM nodes when you need full control over parameters
  • Include 'Write a complete response. Do not truncate.' in your system prompt as a behavioral nudge
  • Monitor token usage to optimize cost — higher max_tokens does not increase cost, only actual tokens generated do
  • Set the Webhook Response Mode to 'Using Respond to Webhook Node' to prevent HTTP timeouts on long generations

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

My n8n workflow cuts off long responses from the language model. The text stops mid-sentence. How do I fix this? I need to handle max_tokens limits, n8n execution timeouts, and model output caps.

n8n Prompt

Increase max_tokens in the LLM node (Options > Maximum Tokens) to 4096-8192. Set N8N_EXECUTIONS_TIMEOUT=600 in environment variables. Add a Code node after the LLM to check finish_reason — if it's 'length' or 'max_tokens', auto-continue with a follow-up prompt. For very long content, use SplitInBatches to generate sections sequentially.

Frequently asked questions

Does setting a higher max_tokens cost more money?

No. You are only charged for tokens actually generated, not the max_tokens limit. Setting max_tokens to 8192 but receiving a 2000-token response means you pay for 2000 output tokens. There is no penalty for setting the limit high.

What is the maximum output length for each major model?

GPT-4o supports up to 16,384 output tokens. Claude 3.5 Sonnet supports 8,192 output tokens. Gemini 2.0 Flash supports 8,192 output tokens. Gemini 1.5 Pro supports 8,192 output tokens. These are hard limits that cannot be exceeded in a single API call.

Can I use streaming to get longer responses?

Streaming does not increase the maximum output length — it delivers the same content token-by-token instead of all at once. However, streaming prevents HTTP timeouts because the connection receives data continuously. Use the HTTP Request node with streaming enabled for the best timeout behavior.

How do I concatenate chunked responses from SplitInBatches?

After SplitInBatches completes all iterations, add a Code node that uses $('LLM Node').all() to get all outputs, then join them: $('LLM Node').all().map(item => item.json.text).join('\n\n'). This produces the complete concatenated text.

Why does my response get cut off even with max_tokens set to 8192?

Check three things: (1) the model's actual output limit may be lower than 8192, (2) n8n's execution timeout may be killing the workflow, or (3) the Webhook response mode may be set to 'Immediately' which returns before the LLM finishes. Fix all three independently.

Can RapidDev help optimize my n8n workflow for long-form content generation?

Yes. RapidDev builds n8n content generation pipelines with automatic chunking, truncation recovery, and quality verification for teams producing articles, reports, and documentation at scale.

RapidDev

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

Need help with your project?

Our experts have built 600+ apps and can accelerate your development. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.