QuantConnect PDF MCP Server

An advanced Model Context Protocol (MCP) server that provides intelligent search and retrieval capabilities for QuantConnect PDF documentation. This server converts PDFs to searchable markdown format and provides fast, context-aware search using TF-IDF scoring and proximity matching.

Features

Intelligent PDF Processing: Automatically converts PDFs to structured markdown with proper formatting
Fast Search Index: Uses inverted index with TF-IDF scoring for relevant results
Context-Aware Results: Returns relevant excerpts with highlighted matches
Caching System: Avoids reprocessing unchanged PDFs for better performance
Proximity Matching: Boosts results where query terms appear close together
Three MCP Tools: Search, list documents, and retrieve full content

Project Structure

QuantConnectServer/
├── server.py           # Main MCP server with enhanced search
├── convert_pdfs.py     # Standalone PDF conversion utility
├── requirements.txt    # Python dependencies
├── README.md          # This documentation
├── env/               # Python virtual environment
└── quantconnect-docs/ # PDF documents and converted markdown
    ├── Quantconnect-Local-Platform-Python-2.pdf
    ├── Quantconnect-Writing-Algorithms-Python-2.pdf
    └── markdown/      # Auto-generated markdown files
        ├── .pdf_cache.json      # Processing cache
        ├── .search_index.pkl    # Search index cache
        └── *.md files           # Converted documents

Installation

Prerequisites

Python 3.8 or higher
pip package manager

Step 1: Install Dependencies

Install required packages:

pip install -r requirements.txt

The requirements.txt includes:

mcp - Model Context Protocol library
PyPDF2 - PDF text extraction
asyncio - Asynchronous processing

Step 2: Prepare Your Environment

Create a virtual environment (recommended):

python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate
pip install -r requirements.txt

Configuration

Claude Desktop Setup

Find your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/claude/claude_desktop_config.json

Add this configuration (adjust paths to match your system):

{
  "mcpServers": {
    "quantconnect-pdf-server": {
      "command": "/path/to/your/project/env/bin/python3",
      "args": ["/path/to/your/project/server.py"],
      "env": {
        "QUANTCONNECT_PDF_FOLDER": "/path/to/your/project/quantconnect-docs",
        "QUANTCONNECT_MARKDOWN_FOLDER": "/path/to/your/project/quantconnect-docs/markdown"
      }
    }
  }
}

Environment Variables

QUANTCONNECT_PDF_FOLDER: Directory containing your PDF files (required)
QUANTCONNECT_MARKDOWN_FOLDER: Directory for converted markdown files (optional, defaults to PDF_FOLDER/markdown)

Usage

Starting the Server

Standalone testing:

export QUANTCONNECT_PDF_FOLDER="/path/to/your/pdfs"
python server.py

With Claude Desktop: Restart Claude Desktop after configuration to load the MCP server

Manual PDF conversion (optional):

python convert_pdfs.py [pdf_folder] [markdown_folder]

Testing the Integration

Test in Claude by asking:

"Can you list the available QuantConnect documents?"
"Search for information about backtesting in the QuantConnect docs"
"What does the QuantConnect documentation say about indicators?"
"Show me page 5 of the Local Platform documentation"

Available MCP Tools

The server provides three powerful tools accessible through Claude:

1. `search_quantconnect_docs`

Purpose: Intelligent search through all QuantConnect documentation Parameters:

query (required): Search terms or topic to find
max_results (optional): Number of results to return (default: 5)

Features:

TF-IDF scoring for relevance ranking
Proximity matching for multi-word queries
Context extraction with highlighted matches
Returns document excerpts with page numbers

2. `list_quantconnect_docs`

Purpose: List all available PDF documents in the collection Parameters: None

Returns: Complete catalog of processed documents with metadata

3. `get_document_content`

Purpose: Retrieve full content from specific documents Parameters:

filename (required): Document name (with or without .md extension)
page_number (optional): Specific page to retrieve

Use cases: Reading complete sections, accessing specific pages, extracting code examples

Technical Architecture

Search Engine

Inverted Index: Maps words to document locations for fast lookup
TF-IDF Scoring: Balances term frequency with document rarity
Proximity Boosting: Enhances results where query terms appear together
Context Extraction: Provides relevant snippets around matches

Caching System

PDF Processing Cache: Avoids reprocessing unchanged files using MD5 hashes
Search Index Cache: Persists search index for faster startup
Incremental Updates: Only processes new or modified PDFs

Performance Features

Asynchronous Processing: Non-blocking PDF conversion and indexing
Background Initialization: Server starts immediately while processing continues
Efficient Storage: Markdown conversion reduces memory usage vs. raw PDF text

Troubleshooting

Common Issues

Server not connecting
- Verify absolute paths in Claude Desktop configuration
- Check Python virtual environment activation
- Ensure server.py has execute permissions
PDFs not loading
- Confirm QUANTCONNECT_PDF_FOLDER path exists
- Check PDF file permissions and readability
- Look for error messages in server output
Search returning no results
- Wait for initial PDF processing to complete
- Check if markdown files were created successfully
- Try broader search terms
Performance issues
- Ensure adequate disk space for markdown files
- Check if antivirus is scanning the project folder
- Consider moving cache files to faster storage

Debug Mode

Run the server with debug output:

export QUANTCONNECT_PDF_FOLDER="/path/to/pdfs"
python server.py 2>&1 | tee server.log

Advanced Usage

Bulk PDF Processing

Process all PDFs without starting the server:

python convert_pdfs.py ./quantconnect-docs ./quantconnect-docs/markdown

Custom Search Queries

The search supports various query types:

Single terms: backtesting
Multi-word queries: custom indicator development
Technical terms: OnData event handler
Code concepts: Algorithm.Initialize method

Integration Examples

Ask Claude sophisticated questions like:

"Using the QuantConnect docs, show me step-by-step how to create a custom indicator with examples"

"What are all the different order types available and when should I use each one?"

"Find code examples of universe selection and explain the different approaches"

"Compare the local platform setup process with cloud deployment according to the documentation"

Contributing

To extend the server:

Add new document formats: Extend the conversion system in server.py:236
Improve search: Enhance the SearchIndex class for semantic search
Add specialized tools: Create domain-specific search functions
Performance optimization: Implement parallel processing or database storage

Version History

v0.3.0: Enhanced search with TF-IDF scoring and proximity matching
v0.2.0: Added caching system and background processing
v0.1.0: Basic PDF to markdown conversion and simple search

QuantConnect Docs

QuantConnect PDF MCP Server

Features

Project Structure

Installation

Prerequisites

Step 1: Install Dependencies

Step 2: Prepare Your Environment

Configuration

Claude Desktop Setup

Environment Variables

Usage

Starting the Server

Testing the Integration

Available MCP Tools

1. search_quantconnect_docs

2. list_quantconnect_docs

3. get_document_content

Technical Architecture

Search Engine

Caching System

Performance Features

Troubleshooting

Common Issues

Debug Mode

Advanced Usage

Bulk PDF Processing

Custom Search Queries

Integration Examples

Contributing

Version History

Related Servers

Coles and Woolworths MCP Server

Search MCP Server

NullBR MCP Server

arXiv MCP Server

Meilisearch

Deep Research

Perplexity

Chromium CodeSearch Tools

Expo MCP Server

招投标大数据服务

1. `search_quantconnect_docs`

2. `list_quantconnect_docs`

3. `get_document_content`