When Google Gemini returns very large responses in n8n, the workflow can crash with out-of-memory errors. Fix this by limiting max output tokens in the Gemini Chat Model sub-node, chunking long inputs, streaming responses through the Code node, and increasing the Node.js heap size for your n8n instance.
Why Gemini Responses Crash n8n Workflows
Google Gemini models can generate extremely long responses — sometimes exceeding 30,000 tokens — when given open-ended prompts or large context windows. n8n stores the entire response payload in memory as a JavaScript object. When multiple executions run simultaneously or a single response is large enough, Node.js hits its default heap limit (typically 1.4 GB) and the process crashes. This tutorial walks you through capping output length, handling large payloads safely, and configuring n8n to be more resilient.
Prerequisites
- A running n8n instance (self-hosted or n8n Cloud)
- Google Gemini API credentials configured in n8n
- An AI Agent or Basic LLM Chain workflow that uses the Gemini Chat Model sub-node
- Basic familiarity with n8n expressions and the Code node
Step-by-step guide
Cap Max Output Tokens on the Gemini Chat Model Sub-Node
Cap Max Output Tokens on the Gemini Chat Model Sub-Node
Open your workflow and click on the AI Agent or Basic LLM Chain node. Under the Language Model section, click the Gemini Chat Model sub-node. In the Options section, set 'Maximum Number of Tokens' to a safe value such as 4096 or 8192. This prevents Gemini from generating unbounded responses that consume excessive memory. The exact limit depends on your use case — summarization tasks can often use 1024, while code generation may need 4096. Setting this value is the single most effective fix for memory-related crashes.
Expected result: Gemini responses are now capped at the specified token count, preventing runaway output sizes.
Add a Code Node to Truncate Oversized Responses
Add a Code Node to Truncate Oversized Responses
Even with token limits, edge cases can produce unexpectedly large payloads. Add a Code node after your LLM node to validate and truncate the response. Set the Code node to 'Run Once for Each Item' mode. This acts as a safety net — if the response exceeds your maximum allowed character count, it gets truncated with an indicator that content was cut. This prevents downstream nodes from choking on massive strings.
1const MAX_CHARS = 50000;23for (const item of $input.all()) {4 const text = item.json.text || item.json.output || '';5 if (text.length > MAX_CHARS) {6 item.json.text = text.substring(0, MAX_CHARS) + '\n\n[Response truncated — exceeded ' + MAX_CHARS + ' characters]';7 item.json.was_truncated = true;8 } else {9 item.json.was_truncated = false;10 }11}1213return $input.all();Expected result: Any response exceeding 50,000 characters is safely truncated instead of crashing the workflow.
Increase Node.js Heap Size for Self-Hosted n8n
Increase Node.js Heap Size for Self-Hosted n8n
If you self-host n8n and frequently process large LLM responses, increase the Node.js memory limit. For Docker deployments, set the NODE_OPTIONS environment variable in your docker-compose.yml or container configuration. The default heap is about 1.4 GB; raising it to 4 GB gives you significant headroom for concurrent large responses. On n8n Cloud, you cannot change this directly — focus on token limits and truncation instead.
1# Docker Compose example2services:3 n8n:4 image: n8nio/n8n:latest5 environment:6 - NODE_OPTIONS=--max-old-space-size=40967 - N8N_DEFAULT_BINARY_DATA_MODE=filesystem8 volumes:9 - n8n_data:/home/node/.n8n10 - n8n_files:/filesExpected result: n8n can now handle larger in-memory payloads without crashing due to JavaScript heap out-of-memory errors.
Split Long Inputs Into Chunks Before Sending to Gemini
Split Long Inputs Into Chunks Before Sending to Gemini
If you are sending large documents to Gemini for summarization or analysis, split them into smaller chunks using a Code node before the LLM call. This reduces the likelihood that Gemini generates a massive single response and also avoids hitting context length limits. Use the SplitInBatches node after the Code node to process chunks sequentially, then aggregate results in a final Code node.
1const fullText = $json.document_text || '';2const CHUNK_SIZE = 8000;3const chunks = [];45for (let i = 0; i < fullText.length; i += CHUNK_SIZE) {6 chunks.push({7 json: {8 chunk_index: Math.floor(i / CHUNK_SIZE),9 chunk_text: fullText.substring(i, i + CHUNK_SIZE),10 total_chunks: Math.ceil(fullText.length / CHUNK_SIZE)11 }12 });13}1415return chunks;Expected result: Large documents are split into manageable chunks, each producing a smaller Gemini response that does not risk memory overflow.
Add Error Handling to Catch Memory Failures Gracefully
Add Error Handling to Catch Memory Failures Gracefully
Configure error handling on your AI Agent or LLM Chain node so that if a crash does occur, the workflow does not silently fail. Click the node, go to Settings, and enable 'Continue On Fail'. Then add an IF node after it to check for errors. Route error cases to a notification node (such as Slack or Email) so you know when a response was too large. This does not prevent the crash but ensures you are notified and can act on it.
1// In a Code node after the LLM node (with Continue On Fail enabled)2const items = $input.all();3const results = [];45for (const item of items) {6 if (item.json.error) {7 results.push({8 json: {9 status: 'error',10 error_message: item.json.error.message || 'Unknown LLM error',11 suggestion: 'Reduce prompt size or lower max tokens',12 timestamp: new Date().toISOString()13 }14 });15 } else {16 results.push(item);17 }18}1920return results;Expected result: Failed executions are caught and routed to a notification channel instead of silently dying.
Complete working example
1// Code node: Run Once for Each Item2// Place this AFTER the AI Agent / LLM Chain node that uses Gemini34const MAX_CHARS = 50000;5const MAX_CHUNKS = 10;6const CHUNK_SIZE = 8000;78// Step 1: Get the LLM response9const rawText = $json.text || $json.output || '';1011// Step 2: Check if response is oversized12if (rawText.length > MAX_CHARS) {13 // Truncate and flag14 return [{15 json: {16 text: rawText.substring(0, MAX_CHARS) +17 '\n\n[Truncated: original was ' + rawText.length + ' chars]',18 was_truncated: true,19 original_length: rawText.length,20 processing_note: 'Response exceeded safety limit'21 }22 }];23}2425// Step 3: If the response contains structured data, validate it26let parsed = null;27try {28 // Check if the response is JSON wrapped in markdown fences29 let cleanText = rawText;30 if (cleanText.startsWith('```json')) {31 cleanText = cleanText.replace(/^```json\n?/, '').replace(/\n?```$/, '');32 } else if (cleanText.startsWith('```')) {33 cleanText = cleanText.replace(/^```\n?/, '').replace(/\n?```$/, '');34 }35 parsed = JSON.parse(cleanText);36} catch (e) {37 // Not JSON, that is fine — return as plain text38 parsed = null;39}4041return [{42 json: {43 text: rawText,44 parsed_data: parsed,45 was_truncated: false,46 char_count: rawText.length,47 timestamp: new Date().toISOString()48 }49}];Common mistakes when fixing Workflow Crashes on Large Responses from Gemini in n8n
Why it's a problem: Leaving max tokens unset and relying on the model to self-limit output length
How to avoid: Always set 'Maximum Number of Tokens' in the Gemini Chat Model sub-node Options to a value appropriate for your use case (e.g., 2048 for summaries, 4096 for code).
Why it's a problem: Running many concurrent webhook-triggered Gemini workflows without concurrency limits
How to avoid: Set the workflow concurrency limit in Workflow Settings or use the N8N_CONCURRENCY_PRODUCTION_LIMIT environment variable to cap simultaneous executions.
Why it's a problem: Storing full LLM responses in n8n's internal database, bloating execution data
How to avoid: Enable pruning of execution data and set EXECUTIONS_DATA_MAX_AGE to automatically clean up old execution records.
Why it's a problem: Ignoring the 'JavaScript heap out of memory' error as a transient issue
How to avoid: This error means Node.js ran out of memory. Investigate the specific execution that caused it — it usually points to an unbounded LLM response or a loop processing too much data.
Best practices
- Always set a max output token limit on Gemini Chat Model sub-nodes — never leave it unbounded in production
- Use N8N_DEFAULT_BINARY_DATA_MODE=filesystem to reduce memory pressure from binary data
- Monitor memory usage with the n8n metrics endpoint (/metrics) if running self-hosted
- Set concurrency limits on webhook-triggered workflows to prevent multiple large responses from stacking up
- Use the SplitInBatches node to process large datasets sequentially instead of all at once
- Enable 'Continue On Fail' on LLM nodes and add downstream error routing
- Test with worst-case prompts during development to identify memory limits before production
- Consider using Gemini Flash for tasks that do not need long-form output — it defaults to shorter responses
Still stuck?
Copy one of these prompts to get a personalized, step-by-step explanation.
I'm using n8n with a Google Gemini Chat Model sub-node in an AI Agent workflow. The workflow crashes with a 'JavaScript heap out of memory' error when Gemini returns very long responses. How can I prevent this? I need to cap output tokens, handle large responses safely, and configure n8n memory limits.
My n8n workflow crashes when the Gemini Chat Model returns large responses. Add a max token limit to the Gemini sub-node and a Code node after the AI Agent that truncates responses over 50000 characters. Also show me how to increase NODE_OPTIONS heap size in Docker.
Frequently asked questions
What is the maximum response size Gemini can return in n8n?
Gemini Pro can return up to 8,192 output tokens by default, and Gemini 1.5 Pro can return up to 8,192 tokens (configurable up to the model maximum). In character terms, this can be 30,000+ characters. Without a cap, this can exceed n8n's available memory when combined with other execution data.
Does n8n Cloud have the same memory crash problem?
Yes, n8n Cloud instances have memory limits too. You cannot change NODE_OPTIONS on Cloud, so focus on capping max output tokens, truncating oversized responses in a Code node, and limiting workflow concurrency.
Will setting max tokens cause the response to be cut off mid-sentence?
Yes, if the model needs more tokens than the limit allows, the response will end abruptly. To mitigate this, include instructions in your system prompt like 'Keep your response under 2000 words' alongside the token limit.
Can I stream Gemini responses in n8n to avoid loading the full response into memory?
n8n does not natively support streaming LLM responses through nodes. The full response is always loaded into memory before passing to the next node. Use token limits and truncation as your primary mitigation strategies.
How do I know if my crash was caused by a large Gemini response?
Check the n8n logs for 'FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory'. Then look at the execution history for the workflow that was running when the crash occurred. If the last successful execution had a very large output field, that is likely the cause.
Is RapidDev able to help configure n8n memory settings for production deployments?
Yes. RapidDev's engineering team can audit your n8n deployment, optimize memory settings, configure Docker resource limits, and set up monitoring for heap usage to prevent crashes in production AI workflows.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your project.
Book a free consultation