SEC Filings and Earnings Call MCP Server
MCP 서버는 SEC 서류 및 실적 발표 대본에 대한 종단 간 워크플로우를 제공합니다. 여기에는 티커 확인, 문서 검색, OCR, 임베딩, 디스크 내 리소스 검색 및 의미 검색이 포함되며, MCP를 통해 노출되고 vLLM 백엔드와 동일한 olmOCR 및 임베딩 백엔드로 구동됩니다.
문서
Finance Data MCP
A Python-first toolkit for SEC filing ingestion, OCR-to-Markdown conversion, transcript collection, and retrieval across hybrid retrieval (dense + BM25) with reranking.
What this project does
- Downloads SEC filings and stores filing metadata.
- Converts filing PDFs to Markdown via olmOCR.
- Chunks and indexes filings/transcripts in Chroma.
- Supports:
- Hybrid search (dense + BM25 reciprocal-rank-fusion + reranker).
- Exposes workflows through:
- FastAPI (
server.py). - MCP server (
mcp_server.py).
- FastAPI (
Repository layout
finance_data/filings/: SEC download + helpers.finance_data/ocr/: olmOCR pipeline.finance_data/dataloader/: chunking, Chroma indexing, semantic + BM25 retrieval.finance_data/earnings_transcripts/: transcript fetch + persistence.finance_data/server_api/: API request/response models + batch helpers.server.py: FastAPI app.mcp_server.py: MCP entrypoint.docs/: setup and operations docs.
Quick start
1) Install dependencies
uv sync
For OCR/embedding flows:
uv sync --group ocr-md
For MCP workflows:
uv sync --group ocr-md --group mcp
2) Configure environment
Use .env or environment variables. Common settings:
SEC_API_ORGANIZATION,SEC_API_EMAILOLMOCR_SERVER,OLMOCR_MODEL,OLMOCR_WORKSPACEEMBEDDING_SERVER,EMBEDDING_MODELCHROMA_PERSIST_DIRMCP_HOST,MCP_PORT,MCP_NGROK_ALLOWED_HOSTS
See finance_data/settings.py for defaults.
3) Run services
Start model servers:
make vllm-olmocr-serve
make vllm-embd-serve
make vllm-reranker-serve
Start API:
make start-server
Start MCP:
uv run --group ocr-md --group mcp python mcp_server.py
Search capabilities
SEC filings API
- Hybrid (dense + BM25 + reranker):
POST /vector_store/search_sec_filings
Transcript API
- Hybrid (dense + BM25 + reranker):
POST /vector_store/search_transcripts
MCP tools
- Hybrid:
search_sec_filings_tool,search_transcripts_tool
Core workflows
SEC filing → Markdown
uv run python -m finance_data.filings.sec_data --ticker AMZN --year 2025
uv run python -m finance_data.ocr.olmocr_pipeline --pdf-dir sec_data/AMZN-2025
Embed and search filings (API)
curl -s -X POST "http://127.0.0.1:8081/vector_store/embed_sec_filings" \
-H "Content-Type: application/json" \
-d '{"ticker":"AMZN","year":"2025","filing_type":"10-K","force":false}'
curl -s -X POST "http://127.0.0.1:8081/vector_store/search_sec_filings" \
-H "Content-Type: application/json" \
-d '{"ticker":"AMZN","year":"2025","filing_type":"10-K","query":"operating income margin","top_k":5}'
Earnings transcripts
Fetch quarterly transcripts:
uv run python -m finance_data.earnings_transcripts.transcripts AMZN 2025
Embed + hybrid search transcripts:
curl -s -X POST "http://127.0.0.1:8081/vector_store/embed_transcripts" \
-H "Content-Type: application/json" \
-d '{"ticker":"AMZN","year":"2025","force":false}'
curl -s -X POST "http://127.0.0.1:8081/vector_store/search_transcripts" \
-H "Content-Type: application/json" \
-d '{"ticker":"AMZN","year":"2025","query":"AWS revenue growth","top_k":5}'
Docker
Use Makefile wrappers:
make docker-build
make docker-start
Stop/remove by API port:
make docker-stop
make docker-remove
Documentation
docs/README.mddocs/setup-and-operations.md