QuantConnect Docs
An MCP server for intelligent search and retrieval of QuantConnect PDF documentation.
QuantConnect PDF MCP Server
An advanced Model Context Protocol (MCP) server that provides intelligent search and retrieval capabilities for QuantConnect PDF documentation. This server converts PDFs to searchable markdown format and provides fast, context-aware search using TF-IDF scoring and proximity matching.
Features
- Intelligent PDF Processing: Automatically converts PDFs to structured markdown with proper formatting
- Fast Search Index: Uses inverted index with TF-IDF scoring for relevant results
- Context-Aware Results: Returns relevant excerpts with highlighted matches
- Caching System: Avoids reprocessing unchanged PDFs for better performance
- Proximity Matching: Boosts results where query terms appear close together
- Three MCP Tools: Search, list documents, and retrieve full content
Project Structure
QuantConnectServer/
├── server.py # Main MCP server with enhanced search
├── convert_pdfs.py # Standalone PDF conversion utility
├── requirements.txt # Python dependencies
├── README.md # This documentation
├── env/ # Python virtual environment
└── quantconnect-docs/ # PDF documents and converted markdown
├── Quantconnect-Local-Platform-Python-2.pdf
├── Quantconnect-Writing-Algorithms-Python-2.pdf
└── markdown/ # Auto-generated markdown files
├── .pdf_cache.json # Processing cache
├── .search_index.pkl # Search index cache
└── *.md files # Converted documents
Installation
Prerequisites
- Python 3.8 or higher
- pip package manager
Step 1: Install Dependencies
Install required packages:
pip install -r requirements.txt
The requirements.txt
includes:
mcp
- Model Context Protocol libraryPyPDF2
- PDF text extractionasyncio
- Asynchronous processing
Step 2: Prepare Your Environment
Create a virtual environment (recommended):
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
pip install -r requirements.txt
Configuration
Claude Desktop Setup
Find your Claude Desktop configuration file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
- Windows:
%APPDATA%\Claude\claude_desktop_config.json
- Linux:
~/.config/claude/claude_desktop_config.json
Add this configuration (adjust paths to match your system):
{
"mcpServers": {
"quantconnect-pdf-server": {
"command": "/path/to/your/project/env/bin/python3",
"args": ["/path/to/your/project/server.py"],
"env": {
"QUANTCONNECT_PDF_FOLDER": "/path/to/your/project/quantconnect-docs",
"QUANTCONNECT_MARKDOWN_FOLDER": "/path/to/your/project/quantconnect-docs/markdown"
}
}
}
}
Environment Variables
QUANTCONNECT_PDF_FOLDER
: Directory containing your PDF files (required)QUANTCONNECT_MARKDOWN_FOLDER
: Directory for converted markdown files (optional, defaults toPDF_FOLDER/markdown
)
Usage
Starting the Server
-
Standalone testing:
export QUANTCONNECT_PDF_FOLDER="/path/to/your/pdfs" python server.py
-
With Claude Desktop: Restart Claude Desktop after configuration to load the MCP server
-
Manual PDF conversion (optional):
python convert_pdfs.py [pdf_folder] [markdown_folder]
Testing the Integration
Test in Claude by asking:
- "Can you list the available QuantConnect documents?"
- "Search for information about backtesting in the QuantConnect docs"
- "What does the QuantConnect documentation say about indicators?"
- "Show me page 5 of the Local Platform documentation"
Available MCP Tools
The server provides three powerful tools accessible through Claude:
1. search_quantconnect_docs
Purpose: Intelligent search through all QuantConnect documentation Parameters:
query
(required): Search terms or topic to findmax_results
(optional): Number of results to return (default: 5)
Features:
- TF-IDF scoring for relevance ranking
- Proximity matching for multi-word queries
- Context extraction with highlighted matches
- Returns document excerpts with page numbers
2. list_quantconnect_docs
Purpose: List all available PDF documents in the collection Parameters: None
Returns: Complete catalog of processed documents with metadata
3. get_document_content
Purpose: Retrieve full content from specific documents Parameters:
filename
(required): Document name (with or without .md extension)page_number
(optional): Specific page to retrieve
Use cases: Reading complete sections, accessing specific pages, extracting code examples
Technical Architecture
Search Engine
- Inverted Index: Maps words to document locations for fast lookup
- TF-IDF Scoring: Balances term frequency with document rarity
- Proximity Boosting: Enhances results where query terms appear together
- Context Extraction: Provides relevant snippets around matches
Caching System
- PDF Processing Cache: Avoids reprocessing unchanged files using MD5 hashes
- Search Index Cache: Persists search index for faster startup
- Incremental Updates: Only processes new or modified PDFs
Performance Features
- Asynchronous Processing: Non-blocking PDF conversion and indexing
- Background Initialization: Server starts immediately while processing continues
- Efficient Storage: Markdown conversion reduces memory usage vs. raw PDF text
Troubleshooting
Common Issues
-
Server not connecting
- Verify absolute paths in Claude Desktop configuration
- Check Python virtual environment activation
- Ensure
server.py
has execute permissions
-
PDFs not loading
- Confirm
QUANTCONNECT_PDF_FOLDER
path exists - Check PDF file permissions and readability
- Look for error messages in server output
- Confirm
-
Search returning no results
- Wait for initial PDF processing to complete
- Check if markdown files were created successfully
- Try broader search terms
-
Performance issues
- Ensure adequate disk space for markdown files
- Check if antivirus is scanning the project folder
- Consider moving cache files to faster storage
Debug Mode
Run the server with debug output:
export QUANTCONNECT_PDF_FOLDER="/path/to/pdfs"
python server.py 2>&1 | tee server.log
Advanced Usage
Bulk PDF Processing
Process all PDFs without starting the server:
python convert_pdfs.py ./quantconnect-docs ./quantconnect-docs/markdown
Custom Search Queries
The search supports various query types:
- Single terms:
backtesting
- Multi-word queries:
custom indicator development
- Technical terms:
OnData event handler
- Code concepts:
Algorithm.Initialize method
Integration Examples
Ask Claude sophisticated questions like:
"Using the QuantConnect docs, show me step-by-step how to create a custom indicator with examples"
"What are all the different order types available and when should I use each one?"
"Find code examples of universe selection and explain the different approaches"
"Compare the local platform setup process with cloud deployment according to the documentation"
Contributing
To extend the server:
- Add new document formats: Extend the conversion system in
server.py:236
- Improve search: Enhance the
SearchIndex
class for semantic search - Add specialized tools: Create domain-specific search functions
- Performance optimization: Implement parallel processing or database storage
Version History
- v0.3.0: Enhanced search with TF-IDF scoring and proximity matching
- v0.2.0: Added caching system and background processing
- v0.1.0: Basic PDF to markdown conversion and simple search
Related Servers
SearXNG Bridge
A bridge server for connecting to a SearXNG metasearch engine instance.
Brave Search
An MCP server for the Brave Search API, providing web and local search capabilities via a streaming SSE interface.
Geocoding
Provides geocoding services by integrating with the Nominatim API.
Google PSE/CSE
A Model Context Protocol (MCP) server providing access to Google Programmable Search Engine (PSE) and Custom Search Engine (CSE).
Erick Wendel Contributions
Query Erick Wendel's contributions, including talks, blog posts, and videos, using natural language.
Obsidian Omnisearch
Search your Obsidian vault using the Omnisearch plugin via a REST API.
Tavily Search
Perform web searches using the Tavily Search API.
YouTube Music MCP
Search and play tracks on YouTube Music via AI assistants.
MCP Registry Server
A server for discovering and retrieving other MCP servers via MCPulse.
Tavily
A comprehensive search API for real-time web search, data extraction, and crawling, requiring a Tavily API key.