MCP for Document Search | Tutorial

TL;DR

Use MCP Filesystem and Search servers to give AI assistants document search capabilities. The AI lists directory contents, reads file contents, and searches across documents by filename or content patterns. Configure the Filesystem MCP server pointed at your documents folder, ask natural language questions, and the AI finds and summarizes relevant documents without you digging through files manually.

Quick facts about this guide
Fact	Value
Tool	MCP
Difficulty	Intermediate
Time required	15-20 min
Compatibility	MCP Filesystem Server, Claude Desktop / Cursor / Windsurf
Last updated	March 2026

Finding and Searching Documents with MCP

The simplest and most immediately useful MCP setup is connecting an AI assistant to your documents folder. Using the Filesystem MCP server, the AI can list files, read their contents, and search by name patterns. This tutorial shows how to set it up in minutes, then how to build a more advanced custom server with full-text content search for larger document collections.

Prerequisites

Claude Desktop, Cursor, or Windsurf with MCP support
A folder of documents (Markdown, text, PDF, code files) to search
Node.js 18+ for custom server development (optional)

Step-by-step guide

Configure the Filesystem MCP server for your documents

The Filesystem MCP server is one of the official MCP servers maintained by the community. It provides tools to read files, list directories, search by filename, and get file metadata. Point it at your documents directory to give the AI access. You can allow multiple directories by passing additional path arguments.

typescript

1// Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json
2{
3  "mcpServers": {
4    "documents": {
5      "command": "npx",
6      "args": [
7        "-y", "@modelcontextprotocol/server-filesystem",
8        "/Users/you/Documents",
9        "/Users/you/Projects/docs"
10      ]
11    }
12  }
13}
14
15// For Cursor: .cursor/mcp.json
16// For VS Code: use "servers" key instead of "mcpServers"

Expected result: Filesystem MCP server provides read_file, list_directory, search_files, and get_file_info tools.

Search documents with natural language queries

Once connected, ask the AI questions about your documents. It will use the MCP tools to list directories, search for relevant files, and read their contents. Start with broad questions and narrow down. The AI uses search_files for filename patterns and read_file to examine promising results.

typescript

1// Example questions to ask:
2
3// File discovery:
4// "What documents are in my Documents folder?"
5// "Find all PDF files in my Projects directory"
6// "List all Markdown files related to onboarding"
7
8// Content search:
9// "Find the document that talks about our pricing strategy"
10// "Which files mention the Q1 2026 budget?"
11// "Search for any documents about the API migration plan"
12
13// Summarization:
14// "Read the meeting-notes-march.md file and summarize the key decisions"
15// "Compare the contents of proposal-v1.md and proposal-v2.md"
16// "What are the action items mentioned across all files in the meetings/ folder?"

Expected result: The AI finds relevant documents by name and content, reads them, and answers your questions.

Build a custom MCP server with full-text content search

The Filesystem server searches by filename, not content. For large document collections, build a custom MCP server that indexes file contents and provides full-text search. Index documents at startup, then expose a search_content tool that finds files containing specific terms or phrases. This is much faster than having the AI read every file.

typescript

1// src/doc-search-server.ts
2import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
3import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
4import { z } from "zod";
5import fs from "fs/promises";
6import path from "path";
7
8interface IndexEntry {
9  path: string;
10  content: string;
11  modified: Date;
12}
13
14const index: IndexEntry[] = [];
15
16async function indexDirectory(dir: string): Promise<void> {
17  const entries = await fs.readdir(dir, { withFileTypes: true, recursive: true });
18  for (const entry of entries) {
19    if (!entry.isFile()) continue;
20    const ext = path.extname(entry.name).toLowerCase();
21    if (!['.md', '.txt', '.json', '.ts', '.js', '.py', '.yaml', '.yml'].includes(ext)) continue;
22    const fullPath = path.join(dir, entry.name);
23    try {
24      const content = await fs.readFile(fullPath, 'utf-8');
25      const stat = await fs.stat(fullPath);
26      index.push({ path: fullPath, content, modified: stat.mtime });
27    } catch {}
28  }
29  console.error(`Indexed ${index.length} files from ${dir}`);
30}
31
32const server = new McpServer({ name: "doc-search", version: "1.0.0" });
33
34server.tool("search_content", "Search documents by content", {
35  query: z.string().describe("Search term or phrase"),
36  maxResults: z.number().default(10),
37}, async ({ query, maxResults }) => {
38  const lower = query.toLowerCase();
39  const results = index
40    .filter(e => e.content.toLowerCase().includes(lower))
41    .slice(0, maxResults)
42    .map(e => {
43      const idx = e.content.toLowerCase().indexOf(lower);
44      const start = Math.max(0, idx - 100);
45      const end = Math.min(e.content.length, idx + query.length + 100);
46      return { file: e.path, excerpt: '...' + e.content.slice(start, end) + '...' };
47    });
48  return { content: [{ type: "text", text: JSON.stringify(results, null, 2) }] };
49});
50
51async function main() {
52  const dirs = process.argv.slice(2);
53  for (const dir of dirs) await indexDirectory(dir);
54  await server.connect(new StdioServerTransport());
55  console.error("Document search server running");
56}
57main().catch(e => { console.error(e); process.exit(1); });

Expected result: A custom MCP server that indexes file contents and provides fast full-text search.

Combine with AI for intelligent document summarization

The real power of MCP document search comes from combining search with AI summarization. The AI searches for relevant files, reads their contents, and then synthesizes answers from multiple documents. This creates a lightweight knowledge management system where you can ask questions across your entire document collection. For organizations with large document repositories, RapidDev builds custom MCP solutions that combine full-text search with vector embeddings for semantic search.

typescript

1// Example workflow the AI executes:
2// 1. User asks: "What were the key decisions from last month's meetings?"
3// 2. AI calls search_content with query "meeting" or "decisions"
4// 3. AI reads the top matching files with read_file
5// 4. AI synthesizes a summary across all meeting notes
6// 5. AI returns a structured answer with source citations
7
8// To enable this, configure both servers:
9{
10  "mcpServers": {
11    "filesystem": {
12      "command": "npx",
13      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/you/Documents"]
14    },
15    "doc-search": {
16      "command": "node",
17      "args": ["dist/doc-search-server.js", "/Users/you/Documents"]
18    }
19  }
20}

Expected result: AI searches across documents and synthesizes answers from multiple sources with citations.

Complete working example

src/doc-search-server.ts

1import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
2import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
3import { z } from "zod";
4import fs from "fs/promises";
5import path from "path";
6
7interface Doc { path: string; content: string; size: number; modified: string; }
8const docs: Doc[] = [];
9
10const EXTS = new Set(['.md', '.txt', '.json', '.ts', '.js', '.py', '.yaml', '.yml', '.csv']);
11
12async function indexDir(dir: string) {
13  const entries = await fs.readdir(dir, { withFileTypes: true, recursive: true });
14  for (const e of entries) {
15    if (!e.isFile() || !EXTS.has(path.extname(e.name).toLowerCase())) continue;
16    const p = path.join(dir, e.name);
17    try {
18      const [content, stat] = await Promise.all([fs.readFile(p, 'utf-8'), fs.stat(p)]);
19      if (content.length < 5_000_000) docs.push({ path: p, content, size: stat.size, modified: stat.mtime.toISOString() });
20    } catch {}
21  }
22  console.error(`Indexed ${docs.length} documents`);
23}
24
25const server = new McpServer({ name: "doc-search", version: "1.0.0" });
26
27server.tool("search_content", "Full-text search across all indexed documents", {
28  query: z.string(), maxResults: z.number().default(10),
29}, async ({ query, maxResults }) => {
30  const q = query.toLowerCase();
31  const hits = docs.filter(d => d.content.toLowerCase().includes(q)).slice(0, maxResults);
32  const results = hits.map(d => {
33    const i = d.content.toLowerCase().indexOf(q);
34    return { file: d.path, excerpt: d.content.slice(Math.max(0, i-100), i + query.length + 100) };
35  });
36  return { content: [{ type: "text", text: results.length ? JSON.stringify(results, null, 2) : "No matches found." }] };
37});
38
39server.tool("list_indexed", "List all indexed documents with metadata", {}, async () => {
40  const list = docs.map(d => ({ file: d.path, size: d.size, modified: d.modified }));
41  return { content: [{ type: "text", text: JSON.stringify(list, null, 2) }] };
42});
43
44server.tool("read_document", "Read a document's full content", {
45  filePath: z.string(),
46}, async ({ filePath }) => {
47  const doc = docs.find(d => d.path === filePath || d.path.endsWith(filePath));
48  if (!doc) return { content: [{ type: "text", text: "Document not found in index" }], isError: true };
49  return { content: [{ type: "text", text: doc.content }] };
50});
51
52async function main() {
53  for (const dir of process.argv.slice(2)) await indexDir(dir);
54  await server.connect(new StdioServerTransport());
55  console.error("Document search MCP server ready");
56}
57main().catch(e => { console.error(e); process.exit(1); });

Common mistakes when using MCP for AI-powered document search

Why it's a problem: Granting the Filesystem server access to the entire home directory, exposing sensitive files

How to avoid: Only allow access to specific document directories. Never include .ssh, .env files, or credential directories.

Why it's a problem: Trying to index binary files (images, compiled code), causing encoding errors

How to avoid: Filter by file extension and only index text-based formats (md, txt, json, ts, js, py, yaml, csv).

Why it's a problem: Not handling large files that exceed memory limits during indexing

How to avoid: Set a file size limit (e.g., 5MB) and skip files that exceed it. Log skipped files to stderr.

Best practices

Limit Filesystem server access to specific directories containing documents only
Use full-text search for large collections instead of reading every file
Filter indexable file types to text-based formats only
Include file metadata (size, modified date) in search results for context
Return excerpts with surrounding context so the AI can judge relevance
Combine search-by-name (fast) with search-by-content (thorough) for best results
Set file size limits during indexing to prevent memory issues

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

Set up the Filesystem MCP server to give Claude Desktop access to my documents folder. Then show me how to build a custom MCP server that indexes file contents and provides full-text search with excerpts. Use TypeScript and the MCP SDK.

MCP Prompt

Build a document search MCP server that indexes Markdown, text, and code files at startup, provides search_content and list_indexed tools, and returns search results with excerpts and file paths.

Frequently asked questions

Can the AI read PDF files through MCP?

The basic Filesystem server reads text files only. For PDFs, build a custom server that uses a PDF parsing library like pdf-parse to extract text before indexing.

How many files can the document search server handle?

The in-memory index works well for up to 10,000 files. Beyond that, use a proper search engine like Elasticsearch or MeiliSearch as the backend.

Does the Filesystem server let the AI modify my documents?

The Filesystem server includes write tools. If you want read-only access, use a custom server that only exposes read and search tools.

Can I search across documents in different formats?

Yes, as long as you extract text from each format. Build a custom server with parsers for Markdown, plain text, JSON, YAML, CSV, and any other formats you use.

Can RapidDev build a custom document search solution?

Yes. RapidDev builds document search MCP servers that combine full-text search, vector embeddings, and metadata filtering for enterprise document collections.

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

How to use MCP for AI-powered document search

What you'll learn

Finding and Searching Documents with MCP

Prerequisites

Step-by-step guide

Configure the Filesystem MCP server for your documents

Search documents with natural language queries

Build a custom MCP server with full-text content search

Combine with AI for intelligent document summarization

Complete working example

Common mistakes when using MCP for AI-powered document search

Best practices

Still stuck?

Frequently asked questions

Talk to an Expert

Your next step

Learning is great. Shipping is faster with help.

We put the rapid in RapidDev

How to use MCP for AI-powered document search

What you'll learn

Finding and Searching Documents with MCP

Prerequisites

Step-by-step guide

Configure the Filesystem MCP server for your documents

Search documents with natural language queries

Build a custom MCP server with full-text content search

Combine with AI for intelligent document summarization

Complete working example

Common mistakes when using MCP for AI-powered document search

Best practices

Still stuck?

Related tutorials

How to Build MCP Server for RAG

How to Use MCP for Data Analysis

How to Chain Multiple MCP Tools

Frequently asked questions

Talk to an Expert

Your next step

Learning is great. Shipping is faster with help.

We put the rapid in RapidDev