Learn how to prevent n8n from cutting off long language model responses with simple settings tweaks for smooth, complete AI outputs.

Book a call with an Expert
Starting a new venture? Need to upgrade your web app? RapidDev builds application with your growth in mind.
The most reliable way to stop n8n from cutting off long language‑model responses is to increase its internal payload limits (especially N8N_PAYLOAD_SIZE\_MAX) and make sure the AI node or HTTP Request node asking the model is not running into its own timeout or token limits. In real workflows, the cutoff almost always happens because n8n hits a payload/timeout ceiling, not because the model stops.
n8n enforces limits to protect itself from huge JSON blobs. Two things matter the most:
These are n8n server limits, not limits inside the workflow.
In production (Docker or server), set a higher payload limit. For large LLM outputs, 16–64 MB is typical.
Example for Docker Compose:
environment:
- N8N_PAYLOAD_SIZE_MAX=64mb // allows large JSON responses
Example for a plain environment variable:
export N8N_PAYLOAD_SIZE_MAX=64mb
Restart n8n after changing this. Without this step, n8n continues to cut responses no matter what you do inside the workflow.
If you use:
Example of setting HTTP timeout to 180 seconds:
// in the node UI:
// Timeout: 180000 (milliseconds)
Even if n8n can accept large responses, the model may be capped by:
For example, if you call a model via HTTP Request node:
{
"model": "gpt-4.1",
"max_tokens": 8000,
"messages": [
{ "role": "user", "content": "Write a long report..." }
]
}
If max\_tokens is too small, the model stops early regardless of n8n settings.
n8n does not support incremental-streaming LLM responses inside normal workflows. When you turn on streaming with some APIs, the server may send partial chunks and close the connection early. n8n often treats this as a complete response, which looks like a cutoff.
If you need truly long responses:
Once an LLM output gets bigger than ~30–50 MB, it’s safer to store the raw text in S3, database, or filesystem instead of flowing it through the n8n UI. n8n will accept the payload, but the UI might struggle to display it.
Pattern for this:
If n8n is cutting off long LLM responses, increase N8N_PAYLOAD_SIZE_MAX, increase node timeouts, disable streaming, and ensure your model max_tokens is high enough. These are the real production‑level fixes. Once these limits are raised, n8n will reliably pass the full response through the workflow without truncation.
When it comes to serving you, we sweat the little things. That’s why our work makes a big impact.