Skip to main content
RapidDev - Software Development Agency
n8n-tutorial

How to Avoid Exceeding Token Limits When Chaining LLM Calls in n8n

To avoid exceeding token limits when chaining LLM calls in n8n, count tokens before each call using a Code node, truncate or summarize intermediate outputs, and split long content into chunks processed sequentially with the SplitInBatches node. This prevents 400-level errors from providers like OpenAI and Anthropic that reject requests exceeding their context windows.

What you'll learn

  • How to estimate token counts using a Code node before each LLM call
  • How to truncate or summarize intermediate outputs to stay within limits
  • How to split large inputs into chunks and process them with SplitInBatches
  • How to set up fallback logic when a single chunk still exceeds the limit
Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Advanced9 min read30-45 minutesn8n 1.30+ (self-hosted and Cloud)March 2026RapidDev Engineering Team
TL;DR

To avoid exceeding token limits when chaining LLM calls in n8n, count tokens before each call using a Code node, truncate or summarize intermediate outputs, and split long content into chunks processed sequentially with the SplitInBatches node. This prevents 400-level errors from providers like OpenAI and Anthropic that reject requests exceeding their context windows.

Managing Token Budgets Across Chained LLM Calls in n8n

When you chain multiple LLM nodes in an n8n workflow, each call adds to the cumulative token count. A summarization node feeding into an analysis node feeding into a formatting node can easily exceed the context window of your chosen model. This tutorial shows you how to estimate token counts between nodes, truncate or compress intermediate outputs, and structure your workflow so that each LLM call stays within its token budget.

Prerequisites

  • A running n8n instance (v1.30 or later)
  • An OpenAI or Anthropic API credential configured in n8n
  • Basic understanding of n8n expressions and the Code node
  • Familiarity with LLM token limits (e.g., GPT-4o: 128K tokens, Claude 3.5 Sonnet: 200K tokens)

Step-by-step guide

1

Add a Code node to estimate token count before the first LLM call

Insert a Code node between your data source and the first LLM node. This node approximates the token count of the input text using the common heuristic of roughly 4 characters per token for English text. While not exact, this estimate is close enough to prevent most overflows. Set the Code node to Run Once for All Items so it processes the combined input. The node outputs the estimated token count alongside the original text, letting downstream logic decide whether to proceed or truncate.

typescript
1// Code node — Run Once for All Items
2const items = $input.all();
3const results = [];
4
5for (const item of items) {
6 const text = item.json.text || '';
7 const estimatedTokens = Math.ceil(text.length / 4);
8 results.push({
9 json: {
10 text: text,
11 estimatedTokens: estimatedTokens,
12 exceedsLimit: estimatedTokens > 120000
13 }
14 });
15}
16
17return results;

Expected result: Each item now has an estimatedTokens field and an exceedsLimit boolean flag.

2

Add an IF node to route oversized inputs to a chunking branch

After the token estimation Code node, add an IF node that checks the exceedsLimit field. When the value is true, route the item to a branch that splits and summarizes the text before sending it to the LLM. When false, route it directly to the LLM node. This prevents wasted API calls that would fail with a 400 error. Set the IF condition to check whether the expression {{ $json.exceedsLimit }} equals true. The true output connects to a SplitInBatches node, and the false output connects directly to your LLM node.

Expected result: Items under the token limit go directly to the LLM node. Oversized items are routed to the chunking branch.

3

Split oversized text into chunks using a Code node and SplitInBatches

On the true branch of the IF node, add a Code node that splits the text into chunks of roughly 80,000 tokens (about 320,000 characters). Each chunk becomes a separate item. Then connect this to a SplitInBatches node with a batch size of 1, so each chunk is processed individually by the LLM. The SplitInBatches node ensures that only one chunk is sent to the LLM at a time, preventing parallel calls from overwhelming your rate limit. After the LLM processes each chunk, the results loop back through SplitInBatches until all chunks are processed.

typescript
1// Code node — Run Once for All Items
2const items = $input.all();
3const CHUNK_SIZE = 320000; // ~80K tokens
4const chunks = [];
5
6for (const item of items) {
7 const text = item.json.text || '';
8 for (let i = 0; i < text.length; i += CHUNK_SIZE) {
9 chunks.push({
10 json: {
11 chunk: text.substring(i, i + CHUNK_SIZE),
12 chunkIndex: Math.floor(i / CHUNK_SIZE),
13 totalChunks: Math.ceil(text.length / CHUNK_SIZE),
14 originalId: item.json.id || 'unknown'
15 }
16 });
17 }
18}
19
20return chunks;

Expected result: The text is split into manageable chunks, each under the token limit, processed one at a time through the LLM.

4

Summarize intermediate outputs to compress token usage between chained calls

When chaining multiple LLM calls, the output of one call becomes the input of the next, and outputs can be verbose. Insert a summarization step between chained LLM nodes to reduce the token count. Use a dedicated LLM call with a system prompt that instructs the model to compress the previous output into a concise summary. Set the max_tokens parameter on this summarization call to a low value like 500-1000 tokens, forcing the model to be brief. This compressed output then feeds into the next LLM node without risk of exceeding the context window.

Expected result: Intermediate outputs are compressed to under 1000 tokens, keeping the total context for the next LLM call well within limits.

5

Merge chunked results back into a single output

After all chunks have been processed by the LLM and the SplitInBatches loop completes, add another Code node to merge the individual chunk results back into a single coherent output. This node collects all chunk responses, sorts them by chunkIndex, and concatenates the results. Connect this Code node to the done output of the SplitInBatches node. The merged result can then continue through the rest of your workflow as a single item, ready for the next chained LLM call or a final output step.

typescript
1// Code node — Run Once for All Items
2const items = $input.all();
3
4const sorted = items.sort(
5 (a, b) => (a.json.chunkIndex || 0) - (b.json.chunkIndex || 0)
6);
7
8const mergedText = sorted
9 .map(item => item.json.response || '')
10 .join('\n\n');
11
12return [{
13 json: {
14 mergedResponse: mergedText,
15 totalChunks: sorted.length
16 }
17}];

Expected result: A single item containing the merged response from all chunks, ready for the next stage of the workflow.

6

Add error handling for token limit errors from the LLM provider

Even with estimation, edge cases can still trigger token limit errors. Configure the On Error setting on each LLM node to Continue (using error output) instead of Stop Workflow. This routes failed items to a separate error-handling branch where you can log the error, re-chunk the input into smaller pieces, and retry. Add an Error Trigger node in a separate workflow to capture these failures and send notifications. On the error output, connect a Code node that halves the chunk size and re-routes the item back through the splitting logic.

Expected result: Token limit errors are caught gracefully, items are re-chunked with smaller sizes, and the workflow continues without crashing.

Complete working example

token-limit-manager.js
1// Code node: Token Estimator (place before each LLM call)
2// Mode: Run Once for All Items
3
4const MODEL_LIMITS = {
5 'gpt-4o': 128000,
6 'gpt-4o-mini': 128000,
7 'claude-3-5-sonnet': 200000,
8 'claude-3-5-haiku': 200000,
9 'gemini-1.5-pro': 1000000
10};
11
12const CHARS_PER_TOKEN = 4;
13const SAFETY_MARGIN = 0.85; // Use only 85% of limit
14
15function estimateTokens(text) {
16 return Math.ceil((text || '').length / CHARS_PER_TOKEN);
17}
18
19function chunkText(text, maxTokens) {
20 const maxChars = maxTokens * CHARS_PER_TOKEN;
21 const chunks = [];
22 let start = 0;
23
24 while (start < text.length) {
25 let end = Math.min(start + maxChars, text.length);
26
27 if (end < text.length) {
28 const lastBreak = text.lastIndexOf('\n\n', end);
29 if (lastBreak > start) {
30 end = lastBreak;
31 }
32 }
33
34 chunks.push(text.substring(start, end));
35 start = end;
36 }
37
38 return chunks;
39}
40
41const items = $input.all();
42const modelName = items[0]?.json?.model || 'gpt-4o';
43const modelLimit = MODEL_LIMITS[modelName] || 128000;
44const safeLimit = Math.floor(modelLimit * SAFETY_MARGIN);
45const results = [];
46
47for (const item of items) {
48 const text = item.json.text || item.json.prompt || '';
49 const tokens = estimateTokens(text);
50
51 if (tokens <= safeLimit) {
52 results.push({
53 json: {
54 ...item.json,
55 estimatedTokens: tokens,
56 needsChunking: false
57 }
58 });
59 } else {
60 const chunks = chunkText(text, safeLimit);
61 for (let i = 0; i < chunks.length; i++) {
62 results.push({
63 json: {
64 text: chunks[i],
65 chunkIndex: i,
66 totalChunks: chunks.length,
67 estimatedTokens: estimateTokens(chunks[i]),
68 needsChunking: true,
69 originalId: item.json.id || $execution.id
70 }
71 });
72 }
73 }
74}
75
76return results;

Common mistakes when avoiding Exceeding Token Limits When Chaining LLM Calls in n8n

Why it's a problem: Using the same token limit for all models

How to avoid: Maintain a lookup table of model-specific limits and reference it dynamically based on which model the workflow uses.

Why it's a problem: Forgetting to account for system prompt tokens in the budget

How to avoid: Subtract the estimated system prompt token count from the available budget before checking user content length.

Why it's a problem: Splitting text at exact character boundaries, cutting words and sentences

How to avoid: Use lastIndexOf with paragraph or sentence delimiters to find natural break points near the target split position.

Why it's a problem: Not setting max_tokens on LLM nodes, allowing verbose responses that overflow the next node

How to avoid: Always set max_tokens on every LLM node in the chain, especially intermediate steps, to cap response length.

Best practices

  • Always leave a 15-20% buffer below the model's stated token limit to account for system prompts and response tokens
  • Use model-specific token limits rather than a single hardcoded value across all LLM nodes
  • Split text on natural boundaries like paragraphs or sentences rather than fixed character counts
  • Use cheaper models like GPT-4o-mini for intermediate summarization steps to reduce costs
  • Log estimated vs actual token counts to calibrate your estimation heuristic over time
  • Set max_tokens on each LLM node to cap response length and prevent the total from exceeding limits on the next call
  • Use n8n's built-in execution data to track token usage per workflow run for monitoring
  • Consider caching LLM responses using the n8n static data feature to avoid redundant calls on retries

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

I have an n8n workflow that chains 3 LLM calls. The input text is about 150K tokens. Help me design a token management strategy that splits the input, processes chunks, summarizes intermediate outputs, and merges results without exceeding any model's context window.

n8n Prompt

Build a workflow with a Webhook trigger that receives long text, estimates tokens with a Code node, splits oversized inputs into chunks, processes each chunk through an OpenAI Chat Model node with SplitInBatches, and merges results into a single output returned via Respond to Webhook.

Frequently asked questions

How do I know exactly how many tokens my text uses?

The most accurate method is to use the tiktoken library (for OpenAI models) or the provider's token counting endpoint. In n8n, you can call these via an HTTP Request node. The 4-characters-per-token heuristic works for rough estimates but can be off by 10-20% depending on the language and content.

Does n8n have a built-in token counter?

No, n8n does not include a built-in token counter as of v1.30. You need to implement counting in a Code node or call an external API. The Code node approach using character-based estimation is the simplest option.

What happens if I exceed the token limit on an OpenAI call in n8n?

OpenAI returns a 400 error with a message like 'This model's maximum context length is 128000 tokens. However, your messages resulted in 145000 tokens.' By default, n8n stops the workflow. Set the On Error option to Continue to handle this gracefully.

Can I use different models for different chunks to stay within limits?

Yes. You can use an IF node or Switch node to route chunks to different LLM nodes based on their token count. For example, send smaller chunks to GPT-4o-mini and larger ones to Claude 3.5 Sonnet which has a 200K context window.

How do I handle token limits when using the AI Agent node?

The AI Agent node manages its own conversation history. Use the Window Memory sub-node to limit how many previous messages are kept in context. Set the window size to a value that keeps total tokens under the model's limit.

Is there a way to monitor token usage across all my n8n workflows?

Use n8n's execution data to log token estimates per run. Create a separate monitoring workflow with a Cron trigger that queries execution data and aggregates token usage into a database or sends summary reports via email.

Should I use the Structured Output Parser to reduce response tokens?

Yes, the Structured Output Parser forces the model to return only the fields you define, which typically produces shorter responses than free-form text. This is especially useful between chained calls where you only need specific data points.

Can RapidDev help me build token-optimized LLM workflows in n8n?

Yes, RapidDev specializes in building production-grade n8n workflows with proper token management, error handling, and cost optimization. Their engineering team can design multi-model chains that stay within budget and handle edge cases.

RapidDev

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

Need help with your project?

Our experts have built 600+ apps and can accelerate your development. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.