Skip to main content
RapidDev - Software Development Agency
mcp-tutorial

How to implement sampling (LLM calls) in MCP

Sampling in MCP lets your server request the AI client to generate text completions on the server's behalf. Use the sampling/createMessage request to ask the client's AI model to analyze data, summarize content, or make decisions as part of a tool workflow. This enables agentic patterns where the server uses AI capabilities without having its own model.

What you'll learn

  • What sampling is and how it fits into MCP architecture
  • How to send sampling/createMessage requests from a server
  • How to specify model preferences and system prompts
  • How to use sampling results in tool workflows
  • How to handle clients that do not support sampling
Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Advanced9 min read30-40 minMCP TypeScript SDK 1.x, clients that support sampling capabilityMarch 2026RapidDev Engineering Team
TL;DR

Sampling in MCP lets your server request the AI client to generate text completions on the server's behalf. Use the sampling/createMessage request to ask the client's AI model to analyze data, summarize content, or make decisions as part of a tool workflow. This enables agentic patterns where the server uses AI capabilities without having its own model.

Implementing Sampling in MCP Servers

Sampling is an advanced MCP capability that inverts the usual flow. Instead of the AI client calling tools on your server, your server asks the AI client to generate a completion. The server sends a sampling/createMessage request with messages and optional model preferences, the client routes it to its AI model, and returns the generated text.

This enables powerful patterns: a tool can analyze data using AI without bundling its own model, a workflow can combine database queries with AI-powered summarization, and multi-step agents can reason about intermediate results. The key constraint is that sampling requires client consent — the human user must approve the request.

Prerequisites

  • A working MCP server with the TypeScript or Python SDK
  • An MCP client that supports the sampling capability (e.g., Claude Desktop)
  • Understanding of MCP tool registration and the request/response flow
  • Familiarity with LLM prompt design for structured outputs

Step-by-step guide

1

Understand the sampling flow

In normal MCP flow, the client calls tools on the server. With sampling, the server sends a sampling/createMessage request to the client, asking it to generate a completion. The client presents this to the user for approval (human-in-the-loop), routes it to its AI model, and returns the generated message. The server can then use this AI-generated content in its tool response.

typescript
1// The sampling flow:
2// 1. Client calls tool on server
3// 2. Server needs AI analysis as part of the tool
4// 3. Server sends sampling/createMessage to client
5// 4. Client shows user approval dialog
6// 5. Client routes to its AI model
7// 6. Client returns generated message to server
8// 7. Server uses the result in its tool response

Expected result: You understand that sampling is a server-to-client request for AI completions, with human approval.

2

Send a sampling request from a tool handler

Inside a tool handler, use the server's request method to send a sampling/createMessage request. Provide the messages array (conversation for the AI), optional model preferences, and a system prompt. The client returns a message with the AI's response. Access the sampling capability through the extra parameter in your tool handler.

typescript
1// TypeScript
2import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
3import { z } from "zod";
4
5const server = new McpServer({ name: "sampling-server", version: "1.0.0" });
6
7server.registerTool("analyze-data", {
8 description: "Analyze a dataset and return AI-generated insights",
9 inputSchema: {
10 data: z.string().describe("Data to analyze (JSON or CSV format)"),
11 question: z.string().describe("Specific question about the data"),
12 },
13}, async ({ data, question }, extra) => {
14 // Request the client's AI model to analyze the data
15 try {
16 const samplingResult = await extra.requestSampling({
17 messages: [
18 {
19 role: "user",
20 content: {
21 type: "text",
22 text: `Analyze this data and answer the question.\n\nData:\n${data}\n\nQuestion: ${question}\n\nProvide a concise, data-driven answer.`,
23 },
24 },
25 ],
26 maxTokens: 1000,
27 systemPrompt: "You are a data analyst. Provide precise, factual analysis based only on the data provided.",
28 });
29
30 return {
31 content: [{
32 type: "text",
33 text: `Analysis:\n${samplingResult.content.type === "text" ? samplingResult.content.text : "Non-text response"}`,
34 }],
35 };
36 } catch (error) {
37 return {
38 content: [{ type: "text", text: `Error: Sampling not available — ${(error as Error).message}` }],
39 isError: true,
40 };
41 }
42});

Expected result: The tool sends data to the client's AI model for analysis and includes the AI response in its tool result.

3

Specify model preferences for sampling

When requesting sampling, you can express preferences about which model the client should use. Specify hints (preferred model names), cost priority, speed priority, and intelligence priority. The client uses these as suggestions — it ultimately decides which model to use based on user settings and availability.

typescript
1// TypeScript — model preferences
2const samplingResult = await extra.requestSampling({
3 messages: [{
4 role: "user",
5 content: { type: "text", text: "Summarize this document..." },
6 }],
7 maxTokens: 500,
8 modelPreferences: {
9 hints: [
10 { name: "claude-sonnet-4-20250514" }, // Preferred model
11 { name: "claude" }, // Fallback: any Claude model
12 ],
13 costPriority: 0.3, // 0 = cheapest, 1 = most expensive ok
14 speedPriority: 0.7, // 0 = slowest ok, 1 = fastest
15 intelligencePriority: 0.5, // 0 = simplest, 1 = smartest
16 },
17 systemPrompt: "Provide a brief summary in 2-3 sentences.",
18});

Expected result: The client receives your model preferences and selects the most appropriate available model.

4

Build a multi-step tool with sampling

Sampling is most powerful in multi-step tool workflows. A tool can fetch data from a database, use sampling to analyze it, then take action based on the analysis. This creates an agentic loop where the server orchestrates data access and the client's AI provides reasoning.

typescript
1// TypeScript — multi-step tool with sampling
2server.registerTool("smart-report", {
3 description: "Generate an AI-analyzed report from database data",
4 inputSchema: {
5 tableName: z.string().describe("Table to analyze"),
6 metric: z.string().describe("Metric to focus on"),
7 },
8}, async ({ tableName, metric }, extra) => {
9 try {
10 // Step 1: Fetch data from database
11 const data = await pool.query(
12 `SELECT * FROM ${tableName} ORDER BY created_at DESC LIMIT 100`
13 );
14
15 // Step 2: Use sampling to analyze the data
16 const analysis = await extra.requestSampling({
17 messages: [{
18 role: "user",
19 content: {
20 type: "text",
21 text: `Analyze this data focusing on ${metric}:\n\n${JSON.stringify(data.rows, null, 2)}\n\nProvide:\n1. Key trends\n2. Anomalies\n3. Recommendations`,
22 },
23 }],
24 maxTokens: 2000,
25 systemPrompt: "You are a business analyst. Be specific and reference actual data values.",
26 });
27
28 const analysisText = analysis.content.type === "text" ? analysis.content.text : "";
29
30 // Step 3: Return combined result
31 return {
32 content: [{
33 type: "text",
34 text: `Report for ${tableName} — ${metric}\n\nData points: ${data.rowCount}\n\n${analysisText}`,
35 }],
36 };
37 } catch (error) {
38 return {
39 content: [{ type: "text", text: `Error: ${(error as Error).message}` }],
40 isError: true,
41 };
42 }
43});

Expected result: The tool fetches database data, sends it to the AI for analysis, and returns a combined report.

5

Handle clients that do not support sampling

Not all MCP clients support sampling. Check the client's capabilities during initialization or handle the error gracefully when sampling fails. Provide a fallback path that returns raw data without AI analysis. For production servers that depend heavily on sampling, RapidDev can help design fallback strategies and client compatibility testing.

typescript
1// TypeScript — graceful sampling fallback
2server.registerTool("summarize-logs", {
3 description: "Summarize application logs, with AI analysis if available",
4 inputSchema: {
5 hours: z.number().min(1).max(72).default(24)
6 .describe("Hours of logs to analyze"),
7 },
8}, async ({ hours }, extra) => {
9 const logs = await fetchLogs(hours);
10 const summary = {
11 totalEntries: logs.length,
12 errors: logs.filter((l: any) => l.level === "error").length,
13 warnings: logs.filter((l: any) => l.level === "warn").length,
14 };
15
16 // Try AI analysis, fall back to raw summary
17 let analysis = "AI analysis not available (client does not support sampling).";
18 try {
19 const result = await extra.requestSampling({
20 messages: [{
21 role: "user",
22 content: {
23 type: "text",
24 text: `Analyze these application logs and identify patterns:\n\n${JSON.stringify(logs.slice(0, 50), null, 2)}`,
25 },
26 }],
27 maxTokens: 1000,
28 });
29 if (result.content.type === "text") {
30 analysis = result.content.text;
31 }
32 } catch {
33 // Sampling not supported or declined — use raw summary
34 }
35
36 return {
37 content: [{
38 type: "text",
39 text: `Log Summary (last ${hours}h):\n${JSON.stringify(summary, null, 2)}\n\nAnalysis:\n${analysis}`,
40 }],
41 };
42});

Expected result: The tool works with any client: AI-powered analysis when sampling is available, raw summary when it is not.

Complete working example

src/index.ts
1import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
2import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
3import { z } from "zod";
4
5const server = new McpServer({
6 name: "sampling-demo-server",
7 version: "1.0.0",
8});
9
10// Tool with sampling for AI analysis
11server.registerTool("analyze-text", {
12 description: "Analyze text content using AI (requires client sampling support)",
13 inputSchema: {
14 text: z.string().min(1).describe("Text to analyze"),
15 analysisType: z.enum(["sentiment", "summary", "key-points", "entities"])
16 .describe("Type of analysis to perform"),
17 },
18}, async ({ text, analysisType }, extra) => {
19 const prompts: Record<string, string> = {
20 sentiment: "Analyze the sentiment of this text. Rate as positive/negative/neutral with confidence.",
21 summary: "Summarize this text in 2-3 sentences.",
22 "key-points": "Extract the 3-5 most important points from this text.",
23 entities: "Extract all named entities (people, places, organizations, dates) from this text.",
24 };
25
26 try {
27 const result = await extra.requestSampling({
28 messages: [{
29 role: "user",
30 content: {
31 type: "text",
32 text: `${prompts[analysisType]}\n\nText:\n${text}`,
33 },
34 }],
35 maxTokens: 1000,
36 modelPreferences: {
37 intelligencePriority: 0.7,
38 speedPriority: 0.5,
39 costPriority: 0.3,
40 },
41 });
42
43 const responseText = result.content.type === "text"
44 ? result.content.text
45 : "Non-text response received";
46
47 return {
48 content: [{
49 type: "text",
50 text: `${analysisType.toUpperCase()} Analysis:\n\n${responseText}`,
51 }],
52 };
53 } catch (error) {
54 return {
55 content: [{
56 type: "text",
57 text: `Error: Sampling failed — ${(error as Error).message}. This tool requires a client that supports MCP sampling.`,
58 }],
59 isError: true,
60 };
61 }
62});
63
64const transport = new StdioServerTransport();
65await server.connect(transport);
66console.error("Sampling demo server running on stdio");

Common mistakes when implementing sampling (LLM calls) in MCP

Why it's a problem: Assuming all clients support sampling

How to avoid: Always handle the case where sampling is not available. Provide a fallback or clear error message.

Why it's a problem: Sending too much data in sampling messages

How to avoid: AI models have token limits. Truncate or summarize data before including it in sampling messages. Send the most relevant subset.

Why it's a problem: Not setting maxTokens on sampling requests

How to avoid: Always set maxTokens to prevent excessive token usage. Use the minimum needed for your analysis type.

Why it's a problem: Treating model hints as requirements

How to avoid: The hints array is advisory. The client chooses the model. Design prompts that work across different models.

Why it's a problem: Using sampling for simple transformations

How to avoid: If you can do the operation with code (regex, math, string manipulation), do it in the handler. Reserve sampling for tasks that genuinely need AI reasoning.

Best practices

  • Always wrap sampling requests in try/catch with fallback behavior
  • Set appropriate maxTokens to control cost and response time
  • Use systemPrompt to constrain the AI's response format and behavior
  • Truncate large datasets before sending in sampling messages
  • Use sampling for analysis, summarization, and reasoning — not string formatting
  • Provide model hints as suggestions, not requirements
  • Design tools to return useful results even when sampling is unavailable
  • Test with the MCP Inspector which simulates sampling responses

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

I'm building an MCP server that needs to use the client's AI model for analysis inside a tool. Show me how to implement the sampling/createMessage flow with model preferences, error handling, and fallback behavior when the client does not support sampling.

MCP Prompt

Add sampling to my MCP tool [tool-name]. After the tool fetches [data], send a sampling/createMessage request to the client to analyze it. Include a system prompt for [analysis type], set maxTokens to [N], and handle the case where sampling is not available.

Frequently asked questions

What is the difference between sampling and just calling an LLM API directly?

Sampling uses the client's AI model, which means the server does not need its own API key, the user controls which model is used, and the human-in-the-loop approval ensures the server cannot use AI without consent. Calling an LLM directly requires your own API key and bypasses client control.

Does the user have to approve every sampling request?

The protocol requires human-in-the-loop approval. In practice, clients may implement automatic approval for trusted servers or let users set approval policies. But your server should assume approval is required and handle rejection gracefully.

Can I chain multiple sampling requests in one tool?

Yes. A tool can make multiple sequential sampling requests, using each response to inform the next. This enables multi-step reasoning. However, each request may require separate user approval, so minimize the number of calls.

Which MCP clients support sampling?

Client support for sampling is still evolving. Check the client's capabilities object during initialization. Claude Desktop supports sampling. Other clients may add support over time.

Can I use sampling in Python FastMCP?

Yes. Use the context object (ctx: Context parameter) to access sampling. Call await ctx.sample() with your messages and model preferences. The pattern is the same as TypeScript but with Python syntax.

How much does sampling cost?

Sampling uses the client's AI model, so token costs are billed to the client/user, not the server. This is one advantage of sampling over direct API calls — the server incurs no model costs. For enterprise MCP deployments where sampling costs need tracking, RapidDev can help implement usage monitoring.

RapidDev

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

Need help with your project?

Our experts have built 600+ apps and can accelerate your development. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.