Implement Sampling in MCP | Tutorial

TL;DR

Sampling in MCP lets your server request the AI client to generate text completions on the server's behalf. Use the sampling/createMessage request to ask the client's AI model to analyze data, summarize content, or make decisions as part of a tool workflow. This enables agentic patterns where the server uses AI capabilities without having its own model.

Implementing Sampling in MCP Servers

Sampling is an advanced MCP capability that inverts the usual flow. Instead of the AI client calling tools on your server, your server asks the AI client to generate a completion. The server sends a sampling/createMessage request with messages and optional model preferences, the client routes it to its AI model, and returns the generated text.

This enables powerful patterns: a tool can analyze data using AI without bundling its own model, a workflow can combine database queries with AI-powered summarization, and multi-step agents can reason about intermediate results. The key constraint is that sampling requires client consent — the human user must approve the request.

Prerequisites

A working MCP server with the TypeScript or Python SDK
An MCP client that supports the sampling capability (e.g., Claude Desktop)
Understanding of MCP tool registration and the request/response flow
Familiarity with LLM prompt design for structured outputs

Step-by-step guide

Understand the sampling flow

In normal MCP flow, the client calls tools on the server. With sampling, the server sends a sampling/createMessage request to the client, asking it to generate a completion. The client presents this to the user for approval (human-in-the-loop), routes it to its AI model, and returns the generated message. The server can then use this AI-generated content in its tool response.

typescript

1// The sampling flow:
2// 1. Client calls tool on server
3// 2. Server needs AI analysis as part of the tool
4// 3. Server sends sampling/createMessage to client
5// 4. Client shows user approval dialog
6// 5. Client routes to its AI model
7// 6. Client returns generated message to server
8// 7. Server uses the result in its tool response

Expected result: You understand that sampling is a server-to-client request for AI completions, with human approval.

Send a sampling request from a tool handler

Inside a tool handler, use the server's request method to send a sampling/createMessage request. Provide the messages array (conversation for the AI), optional model preferences, and a system prompt. The client returns a message with the AI's response. Access the sampling capability through the extra parameter in your tool handler.

typescript

1// TypeScript
2import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
3import { z } from "zod";
4
5const server = new McpServer({ name: "sampling-server", version: "1.0.0" });
6
7server.registerTool("analyze-data", {
8  description: "Analyze a dataset and return AI-generated insights",
9  inputSchema: {
10    data: z.string().describe("Data to analyze (JSON or CSV format)"),
11    question: z.string().describe("Specific question about the data"),
12  },
13}, async ({ data, question }, extra) => {
14  // Request the client's AI model to analyze the data
15  try {
16    const samplingResult = await extra.requestSampling({
17      messages: [
18        {
19          role: "user",
20          content: {
21            type: "text",
22            text: `Analyze this data and answer the question.\n\nData:\n${data}\n\nQuestion: ${question}\n\nProvide a concise, data-driven answer.`,
23          },
24        },
25      ],
26      maxTokens: 1000,
27      systemPrompt: "You are a data analyst. Provide precise, factual analysis based only on the data provided.",
28    });
29
30    return {
31      content: [{
32        type: "text",
33        text: `Analysis:\n${samplingResult.content.type === "text" ? samplingResult.content.text : "Non-text response"}`,
34      }],
35    };
36  } catch (error) {
37    return {
38      content: [{ type: "text", text: `Error: Sampling not available — ${(error as Error).message}` }],
39      isError: true,
40    };
41  }
42});

Expected result: The tool sends data to the client's AI model for analysis and includes the AI response in its tool result.

Specify model preferences for sampling

When requesting sampling, you can express preferences about which model the client should use. Specify hints (preferred model names), cost priority, speed priority, and intelligence priority. The client uses these as suggestions — it ultimately decides which model to use based on user settings and availability.

typescript

1// TypeScript — model preferences
2const samplingResult = await extra.requestSampling({
3  messages: [{
4    role: "user",
5    content: { type: "text", text: "Summarize this document..." },
6  }],
7  maxTokens: 500,
8  modelPreferences: {
9    hints: [
10      { name: "claude-sonnet-4-20250514" },  // Preferred model
11      { name: "claude" },             // Fallback: any Claude model
12    ],
13    costPriority: 0.3,           // 0 = cheapest, 1 = most expensive ok
14    speedPriority: 0.7,          // 0 = slowest ok, 1 = fastest
15    intelligencePriority: 0.5,   // 0 = simplest, 1 = smartest
16  },
17  systemPrompt: "Provide a brief summary in 2-3 sentences.",
18});

Expected result: The client receives your model preferences and selects the most appropriate available model.

Build a multi-step tool with sampling

Sampling is most powerful in multi-step tool workflows. A tool can fetch data from a database, use sampling to analyze it, then take action based on the analysis. This creates an agentic loop where the server orchestrates data access and the client's AI provides reasoning.

typescript

1// TypeScript — multi-step tool with sampling
2server.registerTool("smart-report", {
3  description: "Generate an AI-analyzed report from database data",
4  inputSchema: {
5    tableName: z.string().describe("Table to analyze"),
6    metric: z.string().describe("Metric to focus on"),
7  },
8}, async ({ tableName, metric }, extra) => {
9  try {
10    // Step 1: Fetch data from database
11    const data = await pool.query(
12      `SELECT * FROM ${tableName} ORDER BY created_at DESC LIMIT 100`
13    );
14
15    // Step 2: Use sampling to analyze the data
16    const analysis = await extra.requestSampling({
17      messages: [{
18        role: "user",
19        content: {
20          type: "text",
21          text: `Analyze this data focusing on ${metric}:\n\n${JSON.stringify(data.rows, null, 2)}\n\nProvide:\n1. Key trends\n2. Anomalies\n3. Recommendations`,
22        },
23      }],
24      maxTokens: 2000,
25      systemPrompt: "You are a business analyst. Be specific and reference actual data values.",
26    });
27
28    const analysisText = analysis.content.type === "text" ? analysis.content.text : "";
29
30    // Step 3: Return combined result
31    return {
32      content: [{
33        type: "text",
34        text: `Report for ${tableName} — ${metric}\n\nData points: ${data.rowCount}\n\n${analysisText}`,
35      }],
36    };
37  } catch (error) {
38    return {
39      content: [{ type: "text", text: `Error: ${(error as Error).message}` }],
40      isError: true,
41    };
42  }
43});

Expected result: The tool fetches database data, sends it to the AI for analysis, and returns a combined report.

Handle clients that do not support sampling

Not all MCP clients support sampling. Check the client's capabilities during initialization or handle the error gracefully when sampling fails. Provide a fallback path that returns raw data without AI analysis. For production servers that depend heavily on sampling, RapidDev can help design fallback strategies and client compatibility testing.

typescript

1// TypeScript — graceful sampling fallback
2server.registerTool("summarize-logs", {
3  description: "Summarize application logs, with AI analysis if available",
4  inputSchema: {
5    hours: z.number().min(1).max(72).default(24)
6      .describe("Hours of logs to analyze"),
7  },
8}, async ({ hours }, extra) => {
9  const logs = await fetchLogs(hours);
10  const summary = {
11    totalEntries: logs.length,
12    errors: logs.filter((l: any) => l.level === "error").length,
13    warnings: logs.filter((l: any) => l.level === "warn").length,
14  };
15
16  // Try AI analysis, fall back to raw summary
17  let analysis = "AI analysis not available (client does not support sampling).";
18  try {
19    const result = await extra.requestSampling({
20      messages: [{
21        role: "user",
22        content: {
23          type: "text",
24          text: `Analyze these application logs and identify patterns:\n\n${JSON.stringify(logs.slice(0, 50), null, 2)}`,
25        },
26      }],
27      maxTokens: 1000,
28    });
29    if (result.content.type === "text") {
30      analysis = result.content.text;
31    }
32  } catch {
33    // Sampling not supported or declined — use raw summary
34  }
35
36  return {
37    content: [{
38      type: "text",
39      text: `Log Summary (last ${hours}h):\n${JSON.stringify(summary, null, 2)}\n\nAnalysis:\n${analysis}`,
40    }],
41  };
42});

Expected result: The tool works with any client: AI-powered analysis when sampling is available, raw summary when it is not.

Complete working example

src/index.ts

1import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
2import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
3import { z } from "zod";
4
5const server = new McpServer({
6  name: "sampling-demo-server",
7  version: "1.0.0",
8});
9
10// Tool with sampling for AI analysis
11server.registerTool("analyze-text", {
12  description: "Analyze text content using AI (requires client sampling support)",
13  inputSchema: {
14    text: z.string().min(1).describe("Text to analyze"),
15    analysisType: z.enum(["sentiment", "summary", "key-points", "entities"])
16      .describe("Type of analysis to perform"),
17  },
18}, async ({ text, analysisType }, extra) => {
19  const prompts: Record<string, string> = {
20    sentiment: "Analyze the sentiment of this text. Rate as positive/negative/neutral with confidence.",
21    summary: "Summarize this text in 2-3 sentences.",
22    "key-points": "Extract the 3-5 most important points from this text.",
23    entities: "Extract all named entities (people, places, organizations, dates) from this text.",
24  };
25
26  try {
27    const result = await extra.requestSampling({
28      messages: [{
29        role: "user",
30        content: {
31          type: "text",
32          text: `${prompts[analysisType]}\n\nText:\n${text}`,
33        },
34      }],
35      maxTokens: 1000,
36      modelPreferences: {
37        intelligencePriority: 0.7,
38        speedPriority: 0.5,
39        costPriority: 0.3,
40      },
41    });
42
43    const responseText = result.content.type === "text"
44      ? result.content.text
45      : "Non-text response received";
46
47    return {
48      content: [{
49        type: "text",
50        text: `${analysisType.toUpperCase()} Analysis:\n\n${responseText}`,
51      }],
52    };
53  } catch (error) {
54    return {
55      content: [{
56        type: "text",
57        text: `Error: Sampling failed — ${(error as Error).message}. This tool requires a client that supports MCP sampling.`,
58      }],
59      isError: true,
60    };
61  }
62});
63
64const transport = new StdioServerTransport();
65await server.connect(transport);
66console.error("Sampling demo server running on stdio");

Common mistakes when implementing sampling (LLM calls) in MCP

Why it's a problem: Assuming all clients support sampling

How to avoid: Always handle the case where sampling is not available. Provide a fallback or clear error message.

Why it's a problem: Sending too much data in sampling messages

How to avoid: AI models have token limits. Truncate or summarize data before including it in sampling messages. Send the most relevant subset.

Why it's a problem: Not setting maxTokens on sampling requests

How to avoid: Always set maxTokens to prevent excessive token usage. Use the minimum needed for your analysis type.

Why it's a problem: Treating model hints as requirements

How to avoid: The hints array is advisory. The client chooses the model. Design prompts that work across different models.

Why it's a problem: Using sampling for simple transformations

How to avoid: If you can do the operation with code (regex, math, string manipulation), do it in the handler. Reserve sampling for tasks that genuinely need AI reasoning.

Best practices

Always wrap sampling requests in try/catch with fallback behavior
Set appropriate maxTokens to control cost and response time
Use systemPrompt to constrain the AI's response format and behavior
Truncate large datasets before sending in sampling messages
Use sampling for analysis, summarization, and reasoning — not string formatting
Provide model hints as suggestions, not requirements
Design tools to return useful results even when sampling is unavailable
Test with the MCP Inspector which simulates sampling responses

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

I'm building an MCP server that needs to use the client's AI model for analysis inside a tool. Show me how to implement the sampling/createMessage flow with model preferences, error handling, and fallback behavior when the client does not support sampling.

MCP Prompt

Add sampling to my MCP tool [tool-name]. After the tool fetches [data], send a sampling/createMessage request to the client to analyze it. Include a system prompt for [analysis type], set maxTokens to [N], and handle the case where sampling is not available.

Frequently asked questions

What is the difference between sampling and just calling an LLM API directly?

Sampling uses the client's AI model, which means the server does not need its own API key, the user controls which model is used, and the human-in-the-loop approval ensures the server cannot use AI without consent. Calling an LLM directly requires your own API key and bypasses client control.

Does the user have to approve every sampling request?

The protocol requires human-in-the-loop approval. In practice, clients may implement automatic approval for trusted servers or let users set approval policies. But your server should assume approval is required and handle rejection gracefully.

Can I chain multiple sampling requests in one tool?

Yes. A tool can make multiple sequential sampling requests, using each response to inform the next. This enables multi-step reasoning. However, each request may require separate user approval, so minimize the number of calls.

Which MCP clients support sampling?

Client support for sampling is still evolving. Check the client's capabilities object during initialization. Claude Desktop supports sampling. Other clients may add support over time.

Can I use sampling in Python FastMCP?

Yes. Use the context object (ctx: Context parameter) to access sampling. Call await ctx.sample() with your messages and model preferences. The pattern is the same as TypeScript but with Python syntax.

How much does sampling cost?

Sampling uses the client's AI model, so token costs are billed to the client/user, not the server. This is one advantage of sampling over direct API calls — the server incurs no model costs. For enterprise MCP deployments where sampling costs need tracking, RapidDev can help implement usage monitoring.

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

How to implement sampling (LLM calls) in MCP

What you'll learn

What you'll learn

Implementing Sampling in MCP Servers

Prerequisites

Step-by-step guide

Understand the sampling flow

Understand the sampling flow

Send a sampling request from a tool handler

Send a sampling request from a tool handler

Specify model preferences for sampling

Specify model preferences for sampling

Build a multi-step tool with sampling

Build a multi-step tool with sampling

Handle clients that do not support sampling

Handle clients that do not support sampling

Complete working example

Common mistakes when implementing sampling (LLM calls) in MCP

Best practices

Still stuck?

Frequently asked questions

Talk to an Expert

Need help with your project?

We put the rapid in RapidDev

How to implement sampling (LLM calls) in MCP

What you'll learn

Implementing Sampling in MCP Servers

Prerequisites

Step-by-step guide

Understand the sampling flow

Understand the sampling flow

Send a sampling request from a tool handler

Send a sampling request from a tool handler

Specify model preferences for sampling

Specify model preferences for sampling

Build a multi-step tool with sampling

Build a multi-step tool with sampling

Handle clients that do not support sampling

Handle clients that do not support sampling

Complete working example

Common mistakes when implementing sampling (LLM calls) in MCP

Best practices

Still stuck?

Related tutorials

How to Define Tools in MCP Server

How to Build MCP Server with Database

How to Add Streaming to MCP Server

How to Handle Errors in MCP Server

Frequently asked questions

Talk to an Expert

Need help with your project?

We put the rapid in RapidDev