Sampling in MCP lets your server request the AI client to generate text completions on the server's behalf. Use the sampling/createMessage request to ask the client's AI model to analyze data, summarize content, or make decisions as part of a tool workflow. This enables agentic patterns where the server uses AI capabilities without having its own model.
Implementing Sampling in MCP Servers
Sampling is an advanced MCP capability that inverts the usual flow. Instead of the AI client calling tools on your server, your server asks the AI client to generate a completion. The server sends a sampling/createMessage request with messages and optional model preferences, the client routes it to its AI model, and returns the generated text.
This enables powerful patterns: a tool can analyze data using AI without bundling its own model, a workflow can combine database queries with AI-powered summarization, and multi-step agents can reason about intermediate results. The key constraint is that sampling requires client consent — the human user must approve the request.
Prerequisites
- A working MCP server with the TypeScript or Python SDK
- An MCP client that supports the sampling capability (e.g., Claude Desktop)
- Understanding of MCP tool registration and the request/response flow
- Familiarity with LLM prompt design for structured outputs
Step-by-step guide
Understand the sampling flow
Understand the sampling flow
In normal MCP flow, the client calls tools on the server. With sampling, the server sends a sampling/createMessage request to the client, asking it to generate a completion. The client presents this to the user for approval (human-in-the-loop), routes it to its AI model, and returns the generated message. The server can then use this AI-generated content in its tool response.
1// The sampling flow:2// 1. Client calls tool on server3// 2. Server needs AI analysis as part of the tool4// 3. Server sends sampling/createMessage to client5// 4. Client shows user approval dialog6// 5. Client routes to its AI model7// 6. Client returns generated message to server8// 7. Server uses the result in its tool responseExpected result: You understand that sampling is a server-to-client request for AI completions, with human approval.
Send a sampling request from a tool handler
Send a sampling request from a tool handler
Inside a tool handler, use the server's request method to send a sampling/createMessage request. Provide the messages array (conversation for the AI), optional model preferences, and a system prompt. The client returns a message with the AI's response. Access the sampling capability through the extra parameter in your tool handler.
1// TypeScript2import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";3import { z } from "zod";45const server = new McpServer({ name: "sampling-server", version: "1.0.0" });67server.registerTool("analyze-data", {8 description: "Analyze a dataset and return AI-generated insights",9 inputSchema: {10 data: z.string().describe("Data to analyze (JSON or CSV format)"),11 question: z.string().describe("Specific question about the data"),12 },13}, async ({ data, question }, extra) => {14 // Request the client's AI model to analyze the data15 try {16 const samplingResult = await extra.requestSampling({17 messages: [18 {19 role: "user",20 content: {21 type: "text",22 text: `Analyze this data and answer the question.\n\nData:\n${data}\n\nQuestion: ${question}\n\nProvide a concise, data-driven answer.`,23 },24 },25 ],26 maxTokens: 1000,27 systemPrompt: "You are a data analyst. Provide precise, factual analysis based only on the data provided.",28 });2930 return {31 content: [{32 type: "text",33 text: `Analysis:\n${samplingResult.content.type === "text" ? samplingResult.content.text : "Non-text response"}`,34 }],35 };36 } catch (error) {37 return {38 content: [{ type: "text", text: `Error: Sampling not available — ${(error as Error).message}` }],39 isError: true,40 };41 }42});Expected result: The tool sends data to the client's AI model for analysis and includes the AI response in its tool result.
Specify model preferences for sampling
Specify model preferences for sampling
When requesting sampling, you can express preferences about which model the client should use. Specify hints (preferred model names), cost priority, speed priority, and intelligence priority. The client uses these as suggestions — it ultimately decides which model to use based on user settings and availability.
1// TypeScript — model preferences2const samplingResult = await extra.requestSampling({3 messages: [{4 role: "user",5 content: { type: "text", text: "Summarize this document..." },6 }],7 maxTokens: 500,8 modelPreferences: {9 hints: [10 { name: "claude-sonnet-4-20250514" }, // Preferred model11 { name: "claude" }, // Fallback: any Claude model12 ],13 costPriority: 0.3, // 0 = cheapest, 1 = most expensive ok14 speedPriority: 0.7, // 0 = slowest ok, 1 = fastest15 intelligencePriority: 0.5, // 0 = simplest, 1 = smartest16 },17 systemPrompt: "Provide a brief summary in 2-3 sentences.",18});Expected result: The client receives your model preferences and selects the most appropriate available model.
Build a multi-step tool with sampling
Build a multi-step tool with sampling
Sampling is most powerful in multi-step tool workflows. A tool can fetch data from a database, use sampling to analyze it, then take action based on the analysis. This creates an agentic loop where the server orchestrates data access and the client's AI provides reasoning.
1// TypeScript — multi-step tool with sampling2server.registerTool("smart-report", {3 description: "Generate an AI-analyzed report from database data",4 inputSchema: {5 tableName: z.string().describe("Table to analyze"),6 metric: z.string().describe("Metric to focus on"),7 },8}, async ({ tableName, metric }, extra) => {9 try {10 // Step 1: Fetch data from database11 const data = await pool.query(12 `SELECT * FROM ${tableName} ORDER BY created_at DESC LIMIT 100`13 );1415 // Step 2: Use sampling to analyze the data16 const analysis = await extra.requestSampling({17 messages: [{18 role: "user",19 content: {20 type: "text",21 text: `Analyze this data focusing on ${metric}:\n\n${JSON.stringify(data.rows, null, 2)}\n\nProvide:\n1. Key trends\n2. Anomalies\n3. Recommendations`,22 },23 }],24 maxTokens: 2000,25 systemPrompt: "You are a business analyst. Be specific and reference actual data values.",26 });2728 const analysisText = analysis.content.type === "text" ? analysis.content.text : "";2930 // Step 3: Return combined result31 return {32 content: [{33 type: "text",34 text: `Report for ${tableName} — ${metric}\n\nData points: ${data.rowCount}\n\n${analysisText}`,35 }],36 };37 } catch (error) {38 return {39 content: [{ type: "text", text: `Error: ${(error as Error).message}` }],40 isError: true,41 };42 }43});Expected result: The tool fetches database data, sends it to the AI for analysis, and returns a combined report.
Handle clients that do not support sampling
Handle clients that do not support sampling
Not all MCP clients support sampling. Check the client's capabilities during initialization or handle the error gracefully when sampling fails. Provide a fallback path that returns raw data without AI analysis. For production servers that depend heavily on sampling, RapidDev can help design fallback strategies and client compatibility testing.
1// TypeScript — graceful sampling fallback2server.registerTool("summarize-logs", {3 description: "Summarize application logs, with AI analysis if available",4 inputSchema: {5 hours: z.number().min(1).max(72).default(24)6 .describe("Hours of logs to analyze"),7 },8}, async ({ hours }, extra) => {9 const logs = await fetchLogs(hours);10 const summary = {11 totalEntries: logs.length,12 errors: logs.filter((l: any) => l.level === "error").length,13 warnings: logs.filter((l: any) => l.level === "warn").length,14 };1516 // Try AI analysis, fall back to raw summary17 let analysis = "AI analysis not available (client does not support sampling).";18 try {19 const result = await extra.requestSampling({20 messages: [{21 role: "user",22 content: {23 type: "text",24 text: `Analyze these application logs and identify patterns:\n\n${JSON.stringify(logs.slice(0, 50), null, 2)}`,25 },26 }],27 maxTokens: 1000,28 });29 if (result.content.type === "text") {30 analysis = result.content.text;31 }32 } catch {33 // Sampling not supported or declined — use raw summary34 }3536 return {37 content: [{38 type: "text",39 text: `Log Summary (last ${hours}h):\n${JSON.stringify(summary, null, 2)}\n\nAnalysis:\n${analysis}`,40 }],41 };42});Expected result: The tool works with any client: AI-powered analysis when sampling is available, raw summary when it is not.
Complete working example
1import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";2import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";3import { z } from "zod";45const server = new McpServer({6 name: "sampling-demo-server",7 version: "1.0.0",8});910// Tool with sampling for AI analysis11server.registerTool("analyze-text", {12 description: "Analyze text content using AI (requires client sampling support)",13 inputSchema: {14 text: z.string().min(1).describe("Text to analyze"),15 analysisType: z.enum(["sentiment", "summary", "key-points", "entities"])16 .describe("Type of analysis to perform"),17 },18}, async ({ text, analysisType }, extra) => {19 const prompts: Record<string, string> = {20 sentiment: "Analyze the sentiment of this text. Rate as positive/negative/neutral with confidence.",21 summary: "Summarize this text in 2-3 sentences.",22 "key-points": "Extract the 3-5 most important points from this text.",23 entities: "Extract all named entities (people, places, organizations, dates) from this text.",24 };2526 try {27 const result = await extra.requestSampling({28 messages: [{29 role: "user",30 content: {31 type: "text",32 text: `${prompts[analysisType]}\n\nText:\n${text}`,33 },34 }],35 maxTokens: 1000,36 modelPreferences: {37 intelligencePriority: 0.7,38 speedPriority: 0.5,39 costPriority: 0.3,40 },41 });4243 const responseText = result.content.type === "text"44 ? result.content.text45 : "Non-text response received";4647 return {48 content: [{49 type: "text",50 text: `${analysisType.toUpperCase()} Analysis:\n\n${responseText}`,51 }],52 };53 } catch (error) {54 return {55 content: [{56 type: "text",57 text: `Error: Sampling failed — ${(error as Error).message}. This tool requires a client that supports MCP sampling.`,58 }],59 isError: true,60 };61 }62});6364const transport = new StdioServerTransport();65await server.connect(transport);66console.error("Sampling demo server running on stdio");Common mistakes when implementing sampling (LLM calls) in MCP
Why it's a problem: Assuming all clients support sampling
How to avoid: Always handle the case where sampling is not available. Provide a fallback or clear error message.
Why it's a problem: Sending too much data in sampling messages
How to avoid: AI models have token limits. Truncate or summarize data before including it in sampling messages. Send the most relevant subset.
Why it's a problem: Not setting maxTokens on sampling requests
How to avoid: Always set maxTokens to prevent excessive token usage. Use the minimum needed for your analysis type.
Why it's a problem: Treating model hints as requirements
How to avoid: The hints array is advisory. The client chooses the model. Design prompts that work across different models.
Why it's a problem: Using sampling for simple transformations
How to avoid: If you can do the operation with code (regex, math, string manipulation), do it in the handler. Reserve sampling for tasks that genuinely need AI reasoning.
Best practices
- Always wrap sampling requests in try/catch with fallback behavior
- Set appropriate maxTokens to control cost and response time
- Use systemPrompt to constrain the AI's response format and behavior
- Truncate large datasets before sending in sampling messages
- Use sampling for analysis, summarization, and reasoning — not string formatting
- Provide model hints as suggestions, not requirements
- Design tools to return useful results even when sampling is unavailable
- Test with the MCP Inspector which simulates sampling responses
Still stuck?
Copy one of these prompts to get a personalized, step-by-step explanation.
I'm building an MCP server that needs to use the client's AI model for analysis inside a tool. Show me how to implement the sampling/createMessage flow with model preferences, error handling, and fallback behavior when the client does not support sampling.
Add sampling to my MCP tool [tool-name]. After the tool fetches [data], send a sampling/createMessage request to the client to analyze it. Include a system prompt for [analysis type], set maxTokens to [N], and handle the case where sampling is not available.
Frequently asked questions
What is the difference between sampling and just calling an LLM API directly?
Sampling uses the client's AI model, which means the server does not need its own API key, the user controls which model is used, and the human-in-the-loop approval ensures the server cannot use AI without consent. Calling an LLM directly requires your own API key and bypasses client control.
Does the user have to approve every sampling request?
The protocol requires human-in-the-loop approval. In practice, clients may implement automatic approval for trusted servers or let users set approval policies. But your server should assume approval is required and handle rejection gracefully.
Can I chain multiple sampling requests in one tool?
Yes. A tool can make multiple sequential sampling requests, using each response to inform the next. This enables multi-step reasoning. However, each request may require separate user approval, so minimize the number of calls.
Which MCP clients support sampling?
Client support for sampling is still evolving. Check the client's capabilities object during initialization. Claude Desktop supports sampling. Other clients may add support over time.
Can I use sampling in Python FastMCP?
Yes. Use the context object (ctx: Context parameter) to access sampling. Call await ctx.sample() with your messages and model preferences. The pattern is the same as TypeScript but with Python syntax.
How much does sampling cost?
Sampling uses the client's AI model, so token costs are billed to the client/user, not the server. This is one advantage of sampling over direct API calls — the server incurs no model costs. For enterprise MCP deployments where sampling costs need tracking, RapidDev can help implement usage monitoring.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your project.
Book a free consultation