Skip to main content
RapidDev - Software Development Agency
n8n-tutorial

How to Prevent Duplicate LLM Calls in the Same n8n Workflow

Duplicate LLM calls in n8n workflows waste API credits and increase latency. They happen when multiple branches call the same model with the same input, retry logic fires unnecessarily, or Merge nodes re-trigger upstream nodes. Fix this by implementing a response cache in a Code node, using conditional execution with the IF node to skip redundant calls, and restructuring workflows to call each LLM endpoint exactly once.

What you'll learn

  • How to identify duplicate LLM calls in n8n's execution history
  • How to implement a response cache using static data to avoid re-calling the same prompt
  • How to restructure branching workflows to call the LLM only once
  • How to configure Merge nodes to prevent upstream re-execution
Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Advanced9 min read25-35 minutesn8n 1.30+, any LLM provider, Code node, IF node, Merge nodeMarch 2026RapidDev Engineering Team
TL;DR

Duplicate LLM calls in n8n workflows waste API credits and increase latency. They happen when multiple branches call the same model with the same input, retry logic fires unnecessarily, or Merge nodes re-trigger upstream nodes. Fix this by implementing a response cache in a Code node, using conditional execution with the IF node to skip redundant calls, and restructuring workflows to call each LLM endpoint exactly once.

Why Duplicate LLM Calls Happen in n8n Workflows

Duplicate LLM calls are one of the most common and costly mistakes in n8n AI workflows. They occur in three scenarios: (1) Branching workflows where both branches independently call the same LLM with the same prompt. (2) Merge nodes that re-execute upstream nodes when combining data, causing the LLM node to fire twice. (3) Retry logic that retries an already-successful call because the error handling scope is too broad. Each duplicate call doubles your API cost and adds unnecessary latency. This tutorial shows how to identify and eliminate every type of duplication.

Prerequisites

  • A running n8n instance (self-hosted or cloud) on version 1.30 or later
  • An existing workflow that calls an LLM API (OpenAI, Claude, Mistral, etc.)
  • Basic understanding of n8n workflow execution order and branching
  • Familiarity with n8n's Merge node and IF node
  • Access to n8n's Execution History to inspect execution details

Step-by-step guide

1

Identify duplicate calls in execution history

Open a recent execution of your workflow in n8n's Execution History. Click on each LLM node (OpenAI, HTTP Request to Claude, AI Agent) and check the 'Input' and 'Output' tabs. If the same node appears to have run multiple times with identical input, you have a duplication problem. Also check the execution timeline — if the same LLM node shows up twice in the execution order, it was triggered twice. Look at the 'Items' count in the node's output: if it shows more items than expected, input items are being duplicated before reaching the LLM node.

Expected result: You can identify which nodes are running multiple times and trace the duplication back to its source (branch, Merge, or retry).

2

Fix Merge node re-execution issues

The most common source of duplicate LLM calls is the Merge node. When set to 'Combine' mode, the Merge node can re-trigger upstream nodes if both inputs are not ready simultaneously. Switch the Merge node's Mode to 'Wait' — this waits for both inputs before continuing without re-executing upstream nodes. Alternatively, restructure the workflow so the LLM call happens before the branch point, not inside the branches.

typescript
1// Merge node configuration to prevent re-execution:
2// Mode: Combine
3// Join Mode: Wait for Both
4// This ensures both inputs arrive before the Merge fires,
5// preventing re-execution of upstream nodes.
6
7// Alternative: use 'Append' mode if you need all items from both branches
8// without any re-triggering.

Expected result: The Merge node waits for both inputs before firing, preventing upstream LLM nodes from executing twice.

3

Implement a prompt-based cache using static data

Add a cache layer before your LLM node that checks if the same prompt was recently sent and a cached response exists. Use n8n's static data ($getWorkflowStaticData) to store prompt-response pairs with expiration timestamps. If a matching prompt is found in the cache, skip the LLM call and return the cached response. This is especially useful for workflows that receive the same questions repeatedly (FAQ bots, support agents).

typescript
1// Code node — JavaScript
2// Check cache before LLM call
3
4const staticData = $getWorkflowStaticData('global');
5if (!staticData.cache) {
6 staticData.cache = {};
7}
8
9const prompt = $input.first().json.prompt || $input.first().json.message || '';
10const cacheKey = prompt.trim().toLowerCase().substring(0, 200); // Normalize
11const CACHE_TTL = 3600000; // 1 hour in milliseconds
12const now = Date.now();
13
14// Clean expired entries
15for (const [key, entry] of Object.entries(staticData.cache)) {
16 if (now - entry.timestamp > CACHE_TTL) {
17 delete staticData.cache[key];
18 }
19}
20
21// Check cache
22if (staticData.cache[cacheKey] && (now - staticData.cache[cacheKey].timestamp < CACHE_TTL)) {
23 return [{
24 json: {
25 response: staticData.cache[cacheKey].response,
26 cached: true,
27 cacheAge: Math.round((now - staticData.cache[cacheKey].timestamp) / 1000)
28 }
29 }];
30}
31
32// Cache miss — continue to LLM
33return [{
34 json: {
35 prompt,
36 cached: false,
37 cacheKey
38 }
39}];

Expected result: Identical prompts within the TTL window return cached responses instantly without making an API call.

4

Route cached vs uncached requests with IF node

After the cache check Code node, add an IF node that checks the 'cached' field. If true, route directly to the response formatting node (skipping the LLM call). If false, route to the LLM node. After the LLM node, add another Code node that stores the response in the cache before continuing to the output. This creates a complete cache-hit/cache-miss branching pattern.

typescript
1// IF node configuration:
2// Condition: {{ $json.cached }} equals true
3// True output → response formatting (skip LLM)
4// False output → LLM node
5
6// Code node after LLM (cache store):
7const staticData = $getWorkflowStaticData('global');
8if (!staticData.cache) staticData.cache = {};
9
10const cacheKey = $('Cache Check').first().json.cacheKey;
11const response = $input.first().json.choices?.[0]?.message?.content
12 || $input.first().json.content?.[0]?.text
13 || $input.first().json.output
14 || '';
15
16staticData.cache[cacheKey] = {
17 response,
18 timestamp: Date.now()
19};
20
21return [{ json: { response, cached: false } }];

Expected result: Cached requests bypass the LLM entirely. Fresh requests go to the LLM and their responses are stored for future use.

5

Move LLM calls before branch points

If your workflow has branches that each need the same LLM output (e.g., one branch logs the response, another sends it to the user), restructure the workflow so the LLM call happens before the branch point. Use a single LLM node, then branch the output to multiple destinations. This is the simplest and most reliable way to eliminate duplication — call once, use the result many times.

Expected result: The LLM node fires exactly once, and its output is distributed to all downstream branches that need it.

6

Add deduplication for batch inputs

When processing multiple items (e.g., from a SplitInBatches node or a spreadsheet import), duplicate prompts within the same batch waste API calls. Add a Code node before the LLM that deduplicates items by prompt content, calls the LLM only for unique prompts, and then maps responses back to all original items including duplicates.

typescript
1// Code node — JavaScript
2// Deduplicate items before LLM call
3
4const items = $input.all();
5const seen = new Map();
6const uniqueItems = [];
7const duplicateMap = []; // Maps original index to unique index
8
9for (let i = 0; i < items.length; i++) {
10 const prompt = (items[i].json.prompt || items[i].json.message || '').trim();
11
12 if (seen.has(prompt)) {
13 duplicateMap.push({ originalIndex: i, uniqueIndex: seen.get(prompt) });
14 } else {
15 seen.set(prompt, uniqueItems.length);
16 duplicateMap.push({ originalIndex: i, uniqueIndex: uniqueItems.length });
17 uniqueItems.push(items[i]);
18 }
19}
20
21// Store the map for re-expansion after LLM call
22const staticData = $getWorkflowStaticData('global');
23staticData.duplicateMap = duplicateMap;
24staticData.originalCount = items.length;
25
26// Only unique items go to the LLM
27return uniqueItems;

Expected result: Only unique prompts are sent to the LLM. Duplicate prompts are tracked and will be mapped to the same response after the LLM call.

Complete working example

llm-dedup-cache.js
1// Complete Code node: LLM call deduplication with caching
2// Place before any LLM node to prevent duplicate API calls
3
4const CACHE_TTL = 3600000; // 1 hour
5const MAX_CACHE_SIZE = 500; // Max cached entries
6
7const staticData = $getWorkflowStaticData('global');
8if (!staticData.llmCache) {
9 staticData.llmCache = {};
10 staticData.cacheStats = { hits: 0, misses: 0 };
11}
12
13const now = Date.now();
14const items = $input.all();
15const cached = [];
16const uncached = [];
17
18// Clean expired entries
19const keys = Object.keys(staticData.llmCache);
20for (const key of keys) {
21 if (now - staticData.llmCache[key].ts > CACHE_TTL) {
22 delete staticData.llmCache[key];
23 }
24}
25
26// Evict oldest if cache is too large
27if (Object.keys(staticData.llmCache).length > MAX_CACHE_SIZE) {
28 const sorted = Object.entries(staticData.llmCache)
29 .sort(([, a], [, b]) => a.ts - b.ts);
30 const toRemove = sorted.slice(0, sorted.length - MAX_CACHE_SIZE);
31 for (const [key] of toRemove) {
32 delete staticData.llmCache[key];
33 }
34}
35
36// Check each item
37for (const item of items) {
38 const prompt = (item.json.prompt || item.json.message || '').trim();
39 const key = prompt.toLowerCase().substring(0, 200);
40
41 if (staticData.llmCache[key]) {
42 staticData.cacheStats.hits++;
43 cached.push({
44 json: {
45 ...item.json,
46 response: staticData.llmCache[key].response,
47 cached: true,
48 cacheAge: Math.round((now - staticData.llmCache[key].ts) / 1000)
49 }
50 });
51 } else {
52 staticData.cacheStats.misses++;
53 uncached.push({
54 json: {
55 ...item.json,
56 cached: false,
57 cacheKey: key
58 }
59 });
60 }
61}
62
63// Output 0: cached items (skip LLM)
64// Output 1: uncached items (send to LLM)
65// Use the IF node after this to route appropriately
66return [
67 cached.length > 0 ? cached : [{ json: { _empty: true } }],
68 uncached.length > 0 ? uncached : [{ json: { _empty: true } }]
69];

Common mistakes when preventing Duplicate LLM Calls in the Same n8n Workflow

Why it's a problem: Using the Merge node in default mode, which re-triggers upstream nodes when inputs arrive at different times

How to avoid: Switch to 'Wait for Both' join mode or restructure to avoid Merge nodes between branches with LLM calls

Why it's a problem: Caching responses indefinitely without a TTL, serving stale data for dynamic content

How to avoid: Set a CACHE_TTL (e.g., 1 hour) and clean expired entries on each execution

Why it's a problem: Placing the same LLM call in multiple branches instead of calling once before the branch

How to avoid: Restructure: LLM call → branch point → use result in each branch

Why it's a problem: Using prompt substring matching for cache keys, causing false cache hits on similar but different prompts

How to avoid: Use the full prompt (or a hash of it) as the cache key, not just the first N characters

Why it's a problem: Not accounting for different system prompts when caching — same user prompt with different system prompts should not match

How to avoid: Include the system prompt in the cache key calculation: hash(systemPrompt + userPrompt)

Best practices

  • Call each LLM endpoint exactly once per workflow execution — move calls before branch points
  • Use the Merge node in 'Wait for Both' mode to prevent upstream re-execution
  • Implement a response cache for workflows that receive the same prompts repeatedly
  • Set a reasonable cache TTL (1 hour for general, 24 hours for static reference data)
  • Deduplicate batch inputs before sending to the LLM to avoid paying for identical prompts
  • Monitor cache hit rates to verify the cache is providing value
  • Limit cache size to prevent static data from growing unbounded
  • Use execution history to verify LLM nodes fire exactly once per execution

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

My n8n workflow is making duplicate calls to OpenAI — the same prompt is sent twice in a single execution, doubling my API costs. How do I identify and eliminate the duplicate calls?

n8n Prompt

I have an n8n workflow with a Merge node that seems to re-trigger my OpenAI node, causing it to run twice. How do I configure the Merge node to prevent this and implement caching for duplicate prompts?

Frequently asked questions

How can I tell if my workflow is making duplicate LLM calls?

Open an execution in n8n's Execution History and check the LLM node's input/output. If the node processed more items than expected, or if the same prompt appears twice in the input, you have duplicates. Also check your LLM provider's usage dashboard for unexpected spikes.

Does static data caching work in n8n Cloud?

Yes, static data works on n8n Cloud. It persists across executions within the same workflow. However, static data is stored in n8n's database, so very large caches (thousands of entries) may impact performance.

Should I use Redis or static data for LLM response caching?

For low-volume caching (under 500 entries), static data is simpler and requires no external service. For high-volume caching or multi-workflow shared caches, use Redis with the Redis node.

Can the IF node after a cache check cause both branches to execute?

No, the IF node routes each item to exactly one output (true or false). If a cached item goes to the true branch, it will not also go to the false branch. However, if you have both cached and uncached items in the same batch, different items may go to different branches.

How much money can I save by eliminating duplicate LLM calls?

If your workflow averages 2 duplicate calls per execution, you are paying 2x what you should. At 1,000 executions/month with GPT-4o, that could be $50-200/month in wasted API credits. Implementing caching for repeated prompts can save an additional 30-50%.

Can RapidDev audit my n8n workflows for duplicate LLM calls and optimize costs?

Yes, RapidDev performs workflow audits that identify duplicate API calls, implement caching strategies, and restructure branching logic to minimize LLM costs. Their team typically reduces API costs by 40-60% through deduplication and caching.

RapidDev

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

Need help with your project?

Our experts have built 600+ apps and can accelerate your development. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.