SEC Filings and Earnings Call MCP Server

MCP 서버는 SEC 서류 및 실적 발표 대본에 대한 종단 간 워크플로우를 제공합니다. 여기에는 티커 확인, 문서 검색, OCR, 임베딩, 디스크 내 리소스 검색 및 의미 검색이 포함되며, MCP를 통해 노출되고 vLLM 백엔드와 동일한 olmOCR 및 임베딩 백엔드로 구동됩니다.

문서

Finance Data MCP

A Python-first toolkit for SEC filing ingestion, OCR-to-Markdown conversion, transcript collection, and retrieval across hybrid retrieval (dense + BM25) with reranking.

What this project does

  • Downloads SEC filings and stores filing metadata.
  • Converts filing PDFs to Markdown via olmOCR.
  • Chunks and indexes filings/transcripts in Chroma.
  • Supports:
    • Hybrid search (dense + BM25 reciprocal-rank-fusion + reranker).
  • Exposes workflows through:
    • FastAPI (server.py).
    • MCP server (mcp_server.py).

Repository layout

  • finance_data/filings/: SEC download + helpers.
  • finance_data/ocr/: olmOCR pipeline.
  • finance_data/dataloader/: chunking, Chroma indexing, semantic + BM25 retrieval.
  • finance_data/earnings_transcripts/: transcript fetch + persistence.
  • finance_data/server_api/: API request/response models + batch helpers.
  • server.py: FastAPI app.
  • mcp_server.py: MCP entrypoint.
  • docs/: setup and operations docs.

Quick start

1) Install dependencies

uv sync

For OCR/embedding flows:

uv sync --group ocr-md

For MCP workflows:

uv sync --group ocr-md --group mcp

2) Configure environment

Use .env or environment variables. Common settings:

  • SEC_API_ORGANIZATION, SEC_API_EMAIL
  • OLMOCR_SERVER, OLMOCR_MODEL, OLMOCR_WORKSPACE
  • EMBEDDING_SERVER, EMBEDDING_MODEL
  • CHROMA_PERSIST_DIR
  • MCP_HOST, MCP_PORT, MCP_NGROK_ALLOWED_HOSTS

See finance_data/settings.py for defaults.

3) Run services

Start model servers:

make vllm-olmocr-serve
make vllm-embd-serve
make vllm-reranker-serve

Start API:

make start-server

Start MCP:

uv run --group ocr-md --group mcp python mcp_server.py

Search capabilities

SEC filings API

  • Hybrid (dense + BM25 + reranker): POST /vector_store/search_sec_filings

Transcript API

  • Hybrid (dense + BM25 + reranker): POST /vector_store/search_transcripts

MCP tools

  • Hybrid: search_sec_filings_tool, search_transcripts_tool

Core workflows

SEC filing → Markdown

uv run python -m finance_data.filings.sec_data --ticker AMZN --year 2025
uv run python -m finance_data.ocr.olmocr_pipeline --pdf-dir sec_data/AMZN-2025

Embed and search filings (API)

curl -s -X POST "http://127.0.0.1:8081/vector_store/embed_sec_filings" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","filing_type":"10-K","force":false}'

curl -s -X POST "http://127.0.0.1:8081/vector_store/search_sec_filings" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","filing_type":"10-K","query":"operating income margin","top_k":5}'

Earnings transcripts

Fetch quarterly transcripts:

uv run python -m finance_data.earnings_transcripts.transcripts AMZN 2025

Embed + hybrid search transcripts:

curl -s -X POST "http://127.0.0.1:8081/vector_store/embed_transcripts" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","force":false}'

curl -s -X POST "http://127.0.0.1:8081/vector_store/search_transcripts" \
  -H "Content-Type: application/json" \
  -d '{"ticker":"AMZN","year":"2025","query":"AWS revenue growth","top_k":5}'

Docker

Use Makefile wrappers:

make docker-build
make docker-start

Stop/remove by API port:

make docker-stop
make docker-remove

Documentation

  • docs/README.md
  • docs/setup-and-operations.md