Skip to main content
RapidDev - Software Development Agency
n8n-tutorial

How to Handle Incomplete Responses from Cohere in n8n

Incomplete responses from Cohere in n8n happen when the model hits its max token limit, the API times out, or the connection drops mid-response. Fix this by increasing max tokens, extending HTTP timeouts, detecting incomplete responses via the finish_reason field, adding retry logic for partial outputs, and implementing a chunking strategy for long-form content generation.

What you'll learn

  • How to detect incomplete responses from Cohere using finish_reason and output analysis
  • How to configure proper max_tokens and timeout settings for Cohere calls
  • How to implement automatic retry and continuation for incomplete outputs
  • How to use chunking strategies for long content that exceeds single-response limits
Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Advanced9 min read20-25 minutesn8n 1.20+ with Cohere Chat Model sub-node or HTTP Request nodeMarch 2026RapidDev Engineering Team
TL;DR

Incomplete responses from Cohere in n8n happen when the model hits its max token limit, the API times out, or the connection drops mid-response. Fix this by increasing max tokens, extending HTTP timeouts, detecting incomplete responses via the finish_reason field, adding retry logic for partial outputs, and implementing a chunking strategy for long-form content generation.

Why Cohere Returns Incomplete Responses in n8n

Cohere's Command models can produce incomplete responses for several reasons: (1) the max_tokens parameter is too low, causing the response to be cut off mid-sentence, (2) the API request times out before the model finishes generating, (3) network interruptions drop the connection, or (4) the model hits an internal processing limit. In n8n, these incomplete responses can cause cascading failures — downstream nodes expecting complete JSON get partial objects, text processing nodes receive cut-off content, and workflows that depend on structured output break silently with no error. The key challenge is detecting when a response is incomplete versus intentionally short.

Prerequisites

  • A running n8n instance with Cohere API credentials configured
  • A workflow using the Cohere Chat Model sub-node or HTTP Request node for Cohere
  • Experience with incomplete or truncated AI responses in your workflows
  • Basic understanding of n8n Code nodes and expression syntax

Step-by-step guide

1

Detect Incomplete Responses with a Validation Code Node

Add a Code node after your Cohere LLM node to detect incomplete responses. Check the finish_reason field — a value of 'MAX_TOKENS' indicates the response was truncated. Also look for text-level indicators: sentences ending mid-word, unclosed brackets in JSON, and trailing ellipsis. Enable 'Continue On Fail' on the Cohere node to catch timeout errors as well.

typescript
1const json = $json;
2const text = json.text || json.output || json.message?.content || '';
3const finishReason = json.finish_reason || json.meta?.finish_reason || '';
4
5// Check for obvious incompleteness indicators
6const indicators = {
7 max_tokens_hit: finishReason === 'MAX_TOKENS' || finishReason === 'COMPLETE_LENGTH',
8 ends_mid_sentence: text.length > 0 && !text.trim().match(/[.!?}\]"']$/),
9 unclosed_json: (text.match(/\{/g) || []).length > (text.match(/\}/g) || []).length,
10 unclosed_array: (text.match(/\[/g) || []).length > (text.match(/\]/g) || []).length,
11 very_short: text.length < 50 && !json.error,
12 has_error: !!json.error
13};
14
15const isIncomplete = indicators.max_tokens_hit ||
16 indicators.unclosed_json ||
17 indicators.unclosed_array ||
18 indicators.has_error;
19
20return [{
21 json: {
22 text: text,
23 is_complete: !isIncomplete,
24 finish_reason: finishReason,
25 completeness_indicators: indicators,
26 char_count: text.length
27 }
28}];

Expected result: Incomplete responses are reliably detected with clear indicators of what went wrong.

2

Increase Max Tokens on the Cohere Chat Model Sub-Node

The most common cause of incomplete responses is the max_tokens limit being too low. In the Cohere Chat Model sub-node (connected to your AI Agent or Basic LLM Chain), look for the 'Max Tokens' or 'Maximum Number of Tokens' setting in Options. Cohere Command-R supports up to 4,096 output tokens by default. Set this to a value appropriate for your use case — summaries might need 500-1000, while detailed analyses might need 2048-4096. If using the HTTP Request node, include max_tokens in the request body.

typescript
1// HTTP Request node body for Cohere Chat API
2// POST https://api.cohere.ai/v1/chat
3{
4 "model": "command-r-plus",
5 "message": "{{ $json.user_message }}",
6 "preamble": "You are a helpful assistant.",
7 "max_tokens": 4096,
8 "temperature": 0.3,
9 "chat_history": []
10}

Expected result: Responses have enough token budget to complete fully without being cut off.

3

Extend HTTP Timeout for Long Cohere Responses

Cohere models, especially Command-R+, can take 20-60 seconds for complex prompts. If n8n's HTTP timeout is too short, the connection drops before the response completes. For self-hosted n8n, increase the timeout via the N8N_HTTP_TIMEOUT environment variable. When using the HTTP Request node, set the timeout in the node's Options section. For the Cohere Chat Model sub-node, the global timeout applies.

typescript
1# Environment variable for self-hosted n8n
2N8N_HTTP_TIMEOUT=600000 # 10 minutes in milliseconds
3
4# Or in the HTTP Request node Options:
5# Timeout: 120000 (120 seconds)

Expected result: Long-running Cohere responses complete without being terminated by HTTP timeouts.

4

Implement Automatic Continuation for Truncated Responses

When a response is truncated due to max_tokens, you can automatically continue it by sending the partial response back to Cohere as conversation context with a prompt to continue. Use a loop structure: after detecting an incomplete response, add the partial text to the chat history and ask Cohere to continue from where it left off. Merge the parts in a final Code node.

typescript
1// Code node: Build continuation prompt
2// This runs when the IF node detects an incomplete response
3
4const partialText = $json.text || '';
5const originalPrompt = $json._original_prompt || '';
6const continuationCount = ($json._continuation_count || 0) + 1;
7const maxContinuations = 3;
8
9if (continuationCount > maxContinuations) {
10 // Reached max continuations — return what we have
11 return [{
12 json: {
13 final_text: $json._accumulated_text + partialText,
14 status: 'max_continuations_reached',
15 parts: continuationCount
16 }
17 }];
18}
19
20// Build chat history with the partial response
21const chatHistory = [
22 { role: 'USER', message: originalPrompt },
23 { role: 'CHATBOT', message: partialText }
24];
25
26return [{
27 json: {
28 message: 'Continue your response from exactly where you left off. Do not repeat any content.',
29 chat_history: chatHistory,
30 _original_prompt: originalPrompt,
31 _accumulated_text: ($json._accumulated_text || '') + partialText,
32 _continuation_count: continuationCount
33 }
34}];

Expected result: Truncated responses are automatically continued, with all parts merged into a complete response.

5

Use Chunking for Long Content Generation

For tasks that require very long outputs (multi-page documents, detailed reports), split the work into chunks upfront rather than relying on continuation. Use a Code node to break the task into subtasks, process each through Cohere separately, and merge the results. This approach is more reliable than continuation because each chunk gets a full token budget.

typescript
1// Code node: Split a long content task into chunks
2// Example: Generate a report with multiple sections
3
4const topic = $json.report_topic;
5const sections = [
6 'Executive Summary',
7 'Market Analysis',
8 'Competitive Landscape',
9 'Recommendations',
10 'Conclusion'
11];
12
13const chunks = sections.map((section, index) => ({
14 json: {
15 section_name: section,
16 section_index: index,
17 total_sections: sections.length,
18 prompt: `Write the "${section}" section of a report about ${topic}. ` +
19 `This is section ${index + 1} of ${sections.length}. ` +
20 'Write 200-400 words. Return ONLY the section content, no headers.',
21 max_tokens: 1024
22 }
23}));
24
25return chunks;

Expected result: Long content is generated in manageable chunks that each complete within Cohere's token limits.

Complete working example

cohere-incomplete-response-handler.js
1// Code node: Run Once for Each Item
2// Complete handler for Cohere incomplete responses
3// Place after the Cohere LLM node (with Continue On Fail enabled)
4
5const json = $json;
6const text = json.text || json.output || json.message?.content || '';
7const finishReason = json.finish_reason || '';
8
9// Detect error responses
10if (json.error) {
11 const isTimeout = json.error.message?.includes('ETIMEDOUT') ||
12 json.error.message?.includes('timeout') ||
13 json.error.message?.includes('ECONNRESET');
14
15 return [{
16 json: {
17 text: '',
18 status: isTimeout ? 'timeout' : 'error',
19 error_message: json.error.message,
20 should_retry: isTimeout,
21 retry_with_lower_tokens: !isTimeout
22 }
23 }];
24}
25
26// Detect truncation via finish reason
27if (finishReason === 'MAX_TOKENS' || finishReason === 'COMPLETE_LENGTH') {
28 return [{
29 json: {
30 text: text,
31 status: 'truncated',
32 finish_reason: finishReason,
33 should_continue: true,
34 char_count: text.length
35 }
36 }];
37}
38
39// Detect truncation via content analysis
40const openBraces = (text.match(/\{/g) || []).length;
41const closeBraces = (text.match(/\}/g) || []).length;
42const openBrackets = (text.match(/\[/g) || []).length;
43const closeBrackets = (text.match(/\]/g) || []).length;
44
45if (openBraces > closeBraces || openBrackets > closeBrackets) {
46 return [{
47 json: {
48 text: text,
49 status: 'structurally_incomplete',
50 should_continue: true,
51 unmatched: {
52 braces: openBraces - closeBraces,
53 brackets: openBrackets - closeBrackets
54 }
55 }
56 }];
57}
58
59// Response appears complete
60return [{
61 json: {
62 text: text,
63 status: 'complete',
64 finish_reason: finishReason || 'COMPLETE',
65 should_continue: false,
66 char_count: text.length
67 }
68}];

Common mistakes when handling Incomplete Responses from Cohere in n8n

Why it's a problem: Assuming a response is complete just because no error was thrown

How to avoid: A truncated response due to max_tokens is not an error — it returns normally with finish_reason 'MAX_TOKENS'. Always check finish_reason or validate the response structure.

Why it's a problem: Setting max_tokens very high to avoid truncation, not realizing it increases cost and latency

How to avoid: Set max_tokens based on actual needs. If your summaries are always under 500 tokens, use 600-800 as the limit. Only use 4096 when you genuinely need long-form output.

Why it's a problem: Using continuation prompts that cause the model to repeat content already generated

How to avoid: Include the partial response in the chat_history and explicitly instruct: 'Continue from exactly where you left off. Do not repeat any previous content.'

Why it's a problem: Not distinguishing between timeout-caused incompleteness and token-limit truncation

How to avoid: Timeouts produce error objects with ETIMEDOUT messages. Token truncation produces normal responses with finish_reason 'MAX_TOKENS'. They require different handling: timeouts need retry; truncation needs continuation or higher limits.

Best practices

  • Always check the finish_reason field from Cohere to detect truncation — MAX_TOKENS means the response was cut off
  • Set max_tokens to the maximum your use case needs, not a blanket high value — unnecessarily high limits waste tokens
  • Implement structural validation for JSON responses — check that braces and brackets are balanced
  • Use chunking for tasks that require more output than a single response can provide
  • Limit automatic continuations to 3-5 rounds to prevent infinite loops and runaway token costs
  • Log incomplete response occurrences to identify prompts that consistently exceed limits
  • Set HTTP timeouts higher than you expect Cohere to need — 120 seconds is a safe baseline
  • Test with the worst-case prompt during development to find your actual max_tokens needs

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

My n8n workflow using Cohere returns incomplete responses — text is cut off mid-sentence and JSON is missing closing braces. How do I detect incomplete responses, increase max_tokens on the Cohere Chat Model sub-node, and implement automatic continuation for truncated outputs?

n8n Prompt

Fix incomplete Cohere responses in my n8n workflow. Add a Code node after the Cohere Chat Model that detects truncation via finish_reason and structural analysis, then show me how to build a continuation loop that sends the partial response back to Cohere to complete.

Frequently asked questions

What is the maximum number of output tokens for Cohere Command models?

Cohere Command-R supports up to 4,096 output tokens by default. Command-R+ has the same default but can handle longer outputs with extended context. Check Cohere's current documentation for the latest limits as these change with model updates.

How do I know if a response was incomplete due to timeout versus token limit?

Token-limited responses return normally with finish_reason 'MAX_TOKENS' and contain partial text. Timeout-caused incompleteness produces an error object with an ETIMEDOUT, ECONNRESET, or timeout message. Check for the error field first, then check finish_reason.

Can I use streaming with Cohere in n8n to avoid timeouts?

n8n's built-in Cohere Chat Model sub-node does not support streaming. If you use the HTTP Request node to call Cohere's streaming endpoint, you would need to handle Server-Sent Events in a Code node, which is complex. In most cases, increasing the timeout is simpler and more reliable.

Will automatic continuation produce seamless text?

Usually yes, if you include the partial response in the chat_history and instruct the model not to repeat. However, there may be slight style or coherence changes between parts. For critical content, review the merged output or use the chunking approach instead.

Does Cohere charge for tokens in incomplete responses?

Yes. Cohere charges for all tokens generated, including those in truncated responses. A response cut off at max_tokens still incurs the full token cost for the generated portion. This is why it is important to set appropriate limits rather than over-generating and truncating.

Can RapidDev help build reliable Cohere integrations in n8n?

Yes. RapidDev's engineering team can configure Cohere workflows in n8n with proper timeout handling, truncation detection, automatic continuation, and chunking strategies for long-form content generation.

RapidDev

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

Need help with your project?

Our experts have built 600+ apps and can accelerate your development. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.