arxiv-mcp-server

arXiv paper search and full-text reading

@cyanheads/arxiv-mcp-server

Search arXiv, fetch paper metadata, and read full-text content via MCP. STDIO & Streamable HTTP.

4 Tools · 2 Resources

npm Docker Version Framework

MCP SDK License TypeScript

Public Hosted Server: https://arxiv.caseyjhand.com/mcp


Tools

Four tools for searching and reading arXiv papers:

Tool NameDescription
arxiv_searchSearch arXiv papers by query with category and sort filters.
arxiv_get_metadataGet full metadata for one or more arXiv papers by ID.
arxiv_read_paperFetch the full text content of an arXiv paper from its HTML rendering.
arxiv_list_categoriesList arXiv category taxonomy, optionally filtered by group.

arxiv_search

Search for papers using free-text queries with field prefixes and boolean operators.

  • Field prefixes: ti: (title), au: (author), abs: (abstract), cat: (category), all: (all fields)
  • Boolean operators: AND, OR, ANDNOT
  • Optional category filter, sorting (relevance, submitted, updated), and pagination
  • Returns up to 50 results per request with full metadata including abstract

arxiv_get_metadata

Fetch full metadata for one or more papers by known arXiv ID.

  • Batch fetch up to 10 papers in a single request
  • Accepts both versioned (2401.12345v2) and unversioned (2401.12345) IDs
  • Legacy ID format supported (hep-th/9901001)
  • Reports not-found IDs separately from found papers

arxiv_read_paper

Read the full HTML content of an arXiv paper.

  • Tries native arXiv HTML first, falls back to ar5iv for broader coverage
  • Strips HTML head/boilerplate before truncation so the character budget targets paper content
  • max_characters defaults to 100,000; raw HTML can be 500KB-3MB+ for math-heavy papers
  • Returns raw HTML — no parsing or extraction; the LLM interprets content directly

arxiv_list_categories

List arXiv category codes and names for discovery.

  • ~155 categories across 8 top-level groups (cs, math, physics, q-bio, q-fin, stat, eess, econ)
  • Optional group filter to narrow results
  • Static data — always succeeds

Resources

URI PatternDescription
arxiv://paper/{paperId}Paper metadata by arXiv ID.
arxiv://categoriesFull arXiv category taxonomy.

Features

Built on @cyanheads/mcp-ts-core:

  • Declarative tool definitions — single file per tool, framework handles registration and validation
  • Unified error handling across all tools
  • Pluggable auth (none, jwt, oauth)
  • Structured logging with optional OpenTelemetry tracing
  • Runs locally (stdio/HTTP) from the same codebase

arXiv-specific:

  • Read-only, no authentication required — arXiv API is free, metadata is CC0
  • Rate-limited request queue enforcing arXiv's 3-second crawl delay
  • Retry with exponential backoff for transient failures
  • HTML content fallback chain: native arXiv HTML → ar5iv
  • Full arXiv category taxonomy embedded as static data

Getting Started

Public Hosted Instance

A public instance is available at https://arxiv.caseyjhand.com/mcp — no installation required. Point any MCP client at it via Streamable HTTP:

{
  "mcpServers": {
    "arxiv": {
      "type": "streamable-http",
      "url": "https://arxiv.caseyjhand.com/mcp"
    }
  }
}

Self-Hosted / Local

Add to your MCP client config (e.g., claude_desktop_config.json):

{
  "mcpServers": {
    "arxiv": {
      "type": "stdio",
      "command": "bunx",
      "args": ["@cyanheads/arxiv-mcp-server@latest"]
    }
  }
}

Prerequisites

Installation

  1. Clone the repository:
git clone https://github.com/cyanheads/arxiv-mcp-server.git
  1. Navigate into the directory:
cd arxiv-mcp-server
  1. Install dependencies:
bun install

Configuration

All configuration is optional — the server works out of the box with sensible defaults.

VariableDescriptionDefault
ARXIV_API_BASE_URLarXiv API base URL.https://export.arxiv.org/api
ARXIV_REQUEST_DELAY_MSMinimum delay between arXiv API requests (ms).3000
ARXIV_CONTENT_TIMEOUT_MSTimeout for HTML content fetches (ms).30000
ARXIV_API_TIMEOUT_MSTimeout for API search/metadata requests (ms).15000
MCP_TRANSPORT_TYPETransport: stdio or http.stdio
MCP_HTTP_PORTPort for HTTP server.3010
MCP_AUTH_MODEAuth mode: none, jwt, or oauth.none
MCP_LOG_LEVELLog level (RFC 5424).info

Running the Server

Local Development

  • Build and run the production version:

    bun run build
    bun run start:http   # or start:stdio
    
  • Run in development mode:

    bun run dev:stdio    # or dev:http
    
  • Run checks and tests:

    bun run devcheck     # Lints, formats, type-checks
    bun run test         # Runs test suite
    

Docker

docker build -t arxiv-mcp-server .
docker run -p 3010:3010 arxiv-mcp-server

Project Structure

DirectoryPurpose
src/mcp-server/tools/definitions/Tool definitions (*.tool.ts).
src/mcp-server/resources/definitions/Resource definitions (*.resource.ts).
src/services/Domain service integrations (ArxivService).
src/config/Environment variable parsing and validation with Zod.
tests/Unit and integration tests.
docs/Design document and directory structure.

Development Guide

See CLAUDE.md for development guidelines and architectural rules. The short version:

  • Handlers throw, framework catches — no try/catch in tool logic
  • Use ctx.log for domain-specific logging
  • Rate limiting is managed by ArxivService — don't add per-tool delays
  • arXiv API returns HTTP 200 for everything — check content-type and response body

Contributing

Issues and pull requests are welcome. Run checks before submitting:

bun run devcheck
bun test

License

Apache-2.0 — see LICENSE for details.

İlgili Sunucular