SEC Filings and Earnings Call
The MCP server provides end-to-end workflows for SEC filings and earnings call transcripts—including ticker resolution, document retrieval, OCR, embedding, on-disk resource discovery, and semantic search—exposed via MCP and powered by the same olmOCR and embedding backends as the vLLM backends.
Finance Data MCP
A Python-first toolkit for SEC filing ingestion, OCR-to-Markdown conversion, transcript collection, and retrieval across hybrid retrieval (dense + BM25) with reranking.
What this project does
- Downloads SEC filings and stores filing metadata.
- Converts filing PDFs to Markdown via olmOCR.
- Chunks and indexes filings/transcripts in Chroma.
- Supports:
- Hybrid search (dense + BM25 reciprocal-rank-fusion + reranker).
- Exposes workflows through:
- FastAPI (
server.py). - MCP server (
mcp_server.py).
- FastAPI (
Repository layout
finance_data/filings/: SEC download + helpers.finance_data/ocr/: olmOCR pipeline.finance_data/dataloader/: chunking, Chroma indexing, semantic + BM25 retrieval.finance_data/earnings_transcripts/: transcript fetch + persistence.finance_data/server_api/: API request/response models + batch helpers.server.py: FastAPI app.mcp_server.py: MCP entrypoint.docs/: setup and operations docs.
Quick start
1) Install dependencies
uv sync
For OCR/embedding flows:
uv sync --group ocr-md
For MCP workflows:
uv sync --group ocr-md --group mcp
2) Configure environment
Use .env or environment variables. Common settings:
SEC_API_ORGANIZATION,SEC_API_EMAILOLMOCR_SERVER,OLMOCR_MODEL,OLMOCR_WORKSPACEEMBEDDING_SERVER,EMBEDDING_MODELCHROMA_PERSIST_DIRMCP_HOST,MCP_PORT,MCP_NGROK_ALLOWED_HOSTS
See finance_data/settings.py for defaults.
3) Run services
Start model servers:
make vllm-olmocr-serve
make vllm-embd-serve
make vllm-reranker-serve
Start API:
make start-server
Start MCP:
uv run --group ocr-md --group mcp python mcp_server.py
Search capabilities
SEC filings API
- Hybrid (dense + BM25 + reranker):
POST /vector_store/search_sec_filings
Transcript API
- Hybrid (dense + BM25 + reranker):
POST /vector_store/search_transcripts
MCP tools
- Hybrid:
search_sec_filings_tool,search_transcripts_tool
Core workflows
SEC filing → Markdown
uv run python -m finance_data.filings.sec_data --ticker AMZN --year 2025
uv run python -m finance_data.ocr.olmocr_pipeline --pdf-dir sec_data/AMZN-2025
Embed and search filings (API)
curl -s -X POST "http://127.0.0.1:8081/vector_store/embed_sec_filings" \
-H "Content-Type: application/json" \
-d '{"ticker":"AMZN","year":"2025","filing_type":"10-K","force":false}'
curl -s -X POST "http://127.0.0.1:8081/vector_store/search_sec_filings" \
-H "Content-Type: application/json" \
-d '{"ticker":"AMZN","year":"2025","filing_type":"10-K","query":"operating income margin","top_k":5}'
Earnings transcripts
Fetch quarterly transcripts:
uv run python -m finance_data.earnings_transcripts.transcripts AMZN 2025
Embed + hybrid search transcripts:
curl -s -X POST "http://127.0.0.1:8081/vector_store/embed_transcripts" \
-H "Content-Type: application/json" \
-d '{"ticker":"AMZN","year":"2025","force":false}'
curl -s -X POST "http://127.0.0.1:8081/vector_store/search_transcripts" \
-H "Content-Type: application/json" \
-d '{"ticker":"AMZN","year":"2025","query":"AWS revenue growth","top_k":5}'
Docker
Use Makefile wrappers:
make docker-build
make docker-start
Stop/remove by API port:
make docker-stop
make docker-remove
Documentation
docs/README.mddocs/setup-and-operations.md
Verwandte Server
IP2Location.io
IP2Location.io API integration to retrieve the geolocation information for an IP address.
Sleeper Hit Studio Public Discovery
Read-only discovery for opted-in screenwriters and writer-approved public project assets.
yfinance MCP Server
Access up-to-date prices and news for stocks and cryptocurrencies.
MCP Deep Search
A server for performing deep web searches using the @just-every/search library, requiring API keys via an environment file.
Google Search
Web search and webpage content extraction using the Google Custom Search API.
RocketReach
Find emails, phone numbers, and enrich company data using the RocketReach API.
PaperMCP 智能学术论文检索系统
An academic paper search server powered by the OpenAlex API.
VelociRAG
Lightning-fast RAG for AI agents. 4-layer fusion (vector, BM25, graph, metadata), ONNX Runtime, sub-200ms search, no PyTorch.
Adzuna Job Search MCP
MCP server for Adzuna Job Search API - search jobs, analyze salaries, and research employers across 12 countries
Perplexity Search
Web search and chat completion powered by the Perplexity AI API.