A server for document management and semantic search using AI embeddings, with local JSON storage.
A TypeScript-based Model Context Protocol (MCP) server that provides document management and semantic search capabilities. Upload documents, search them with AI embeddings, and integrate seamlessly with MCP clients like Claude Desktop.
# Run directly with npx (recommended)
npx @andrea9293/mcp-documentation-server
Add to your MCP client configuration (e.g., Claude Desktop):
{
"mcpServers": {
"documentation": {
"command": "npx",
"args": [
"-y",
"@andrea9293/mcp-documentation-server"
],
"env": {
"MCP_EMBEDDING_MODEL": "Xenova/all-MiniLM-L6-v2"
}
}
}
}
~/.mcp-documentation-server/
directoryTool | Description |
---|---|
add_document | Add a document with title, content, and metadata |
search_documents | Search for chunks within a specific document. Returns a hint for LLMs on how to retrieve more context. |
get_context_window | Returns a window of chunks around a central chunk for a document |
list_documents | List all documents with their metadata |
get_document | Retrieve a complete document by ID |
delete_document | Delete a document by ID (removes all associated chunks) |
get_uploads_path | Get path to uploads folder |
list_uploads_files | List files in uploads folder |
process_uploads | Process uploaded files into documents |
{
"tool": "add_document",
"arguments": {
"title": "Python Basics",
"content": "Python is a high-level programming language...",
"metadata": {
"category": "programming",
"tags": ["python", "tutorial"]
}
}
}
{
"tool": "search_documents",
"arguments": {
"document_id": "doc-123",
"query": "variable assignment",
"limit": 5
}
}
{
"tool": "get_context_window",
"arguments": {
"document_id": "doc-123",
"chunk_index": 5,
"before": 2,
"after": 2
}
}
{
"tool": "delete_document",
"arguments": {
"id": "doc-123"
}
}
get_uploads_path
(~/.mcp-documentation-server/uploads/
)process_uploads
Supported file types:
All documents and uploads are stored locally in:
~/.mcp-documentation-server/
āāā data/ # Document storage (JSON files)
āāā uploads/ # Files to process (.txt, .md, .pdf)
Set via MCP_EMBEDDING_MODEL
environment variable:
Xenova/all-MiniLM-L6-v2
(default) - Fast, good quality (384 dimensions)Xenova/paraphrase-multilingual-mpnet-base-v2
(recommended) - Best quality, multilingual (768 dimensions)The system automatically manages the correct embedding dimension for each model. Embedding providers expose their dimension via getDimensions()
.
ā ļø Important: Changing models requires re-adding all documents as embeddings are incompatible.
npx @andrea9293/mcp-documentation-server
npm install -g @andrea9293/mcp-documentation-server
mcp-documentation-server
git clone https://github.com/andrea9293/mcp-documentation-server.git
cd mcp-documentation-server
npm install
npm run build
npm start
search_documents
, use get_context_window
to retrieve additional context around those chunks. You can call get_context_window
multiple times until you have enough context to answer your question.# Development server with hot reload
npm run dev
# Build and test
npm run build
# Inspect tools with web UI
npm run inspect
git checkout -b feature/name
MIT - see LICENSE file
Built with FastMCP and TypeScript š
Provides web search capabilities using the Baidu Search API, with features for content fetching and parsing.
Web search and webpage scraping using the Serper API.
Extracts basic chemical information about drugs and compounds from the PubChem API.
A Model Context Protocol Server for SearXNG
Integrates the Brave Search API for both web and local search capabilities. Requires a BRAVE_API_KEY.
Semantic search for Hex package documentation. Requires local Elixir and Mix installation.
Search the Old School RuneScape (OSRS) Wiki and access game data definitions.
Provides knowledge base search and dialogue completion using the Volcengine Knowledge Base service. Requires external credential configuration.
Query Shodan's database of internet-connected devices and vulnerabilities using the Shodan API.
Harvests scientific papers from arXiv and OpenAlex, providing real-time access to metadata and full text.