Harvests scientific papers from arXiv and OpenAlex, providing real-time access to metadata and full text.
A comprehensive Model Context Protocol (MCP) server that provides LLMs with real-time access to scientific papers from 6 major academic sources: arXiv, OpenAlex, PMC (PubMed Central), Europe PMC, bioRxiv/medRxiv, and CORE.
npm install
npm run build
To use this server with an MCP client (like Claude Desktop), add the following to your MCP client configuration:
Option 1: Using npx (recommended for AI tools like Claude)
{
"mcpServers": {
"scientific-papers": {
"command": "npx",
"args": [
"-y",
"@futurelab-studio/latest-science-mcp@latest"
]
}
}
}
Option 2: Global installation
npm install -g @futurelab-studio/latest-science-mcp
Then configure:
{
"mcpServers": {
"scientific-papers": {
"command": "latest-science-mcp"
}
}
}
# List arXiv categories
node dist/cli.js list-categories --source=arxiv
# List OpenAlex concepts
node dist/cli.js list-categories --source=openalex
# List PMC biomedical categories
node dist/cli.js list-categories --source=pmc
# List Europe PMC life science categories
node dist/cli.js list-categories --source=europepmc
# List bioRxiv/medRxiv categories (includes both servers)
node dist/cli.js list-categories --source=biorxiv
# List CORE academic categories
node dist/cli.js list-categories --source=core
# Get latest AI papers from arXiv
node dist/cli.js fetch-latest --source=arxiv --category=cs.AI --count=10
# Get latest biology papers from bioRxiv
node dist/cli.js fetch-latest --source=biorxiv --category="biorxiv:biology" --count=5
# Get latest immunology papers from PMC
node dist/cli.js fetch-latest --source=pmc --category=immunology --count=3
# Get latest papers from CORE by subject
node dist/cli.js fetch-latest --source=core --category=computer_science --count=5
# Search by concept name (OpenAlex)
node dist/cli.js fetch-latest --source=openalex --category="machine learning" --count=3
# Get top 20 cited papers in machine learning since 2024
node dist/cli.js fetch-top-cited --concept="machine learning" --since=2024-01-01 --count=20
# Get top cited papers by concept ID
node dist/cli.js fetch-top-cited --concept=C41008148 --since=2023-06-01 --count=10
# Search by keywords across all fields
node dist/cli.js search-papers --source=arxiv --query="machine learning" --count=10
# Search by paper title
node dist/cli.js search-papers --source=openalex --query="neural networks" --field=title --count=5
# Search by author name
node dist/cli.js search-papers --source=europepmc --query="John Smith" --field=author --count=10
# Search full-text content sorted by citations
node dist/cli.js search-papers --source=core --query="climate change" --field=fulltext --sortBy=citations --count=20
# Get arXiv paper by ID
node dist/cli.js fetch-content --source=arxiv --id=2401.12345
# Get bioRxiv paper by DOI
node dist/cli.js fetch-content --source=biorxiv --id="10.1101/2021.01.01.425001"
# Get PMC paper by ID
node dist/cli.js fetch-content --source=pmc --id=PMC8245678
# Get CORE paper by ID
node dist/cli.js fetch-content --source=core --id=12345678
# Show text content with preview
node dist/cli.js fetch-content --source=arxiv --id=2401.12345 --show-text --text-preview=500
list_categories
Lists available categories/concepts from any data source.
Parameters:
source
: "arxiv"
| "openalex"
| "pmc"
| "europepmc"
| "biorxiv"
| "core"
Returns:
id
, name
, and optional description
Examples:
{
"name": "list_categories",
"arguments": {
"source": "biorxiv"
}
}
fetch_latest
Fetches the latest papers from any source for a given category with metadata only (no text extraction).
Parameters:
source
: "arxiv"
| "openalex"
| "pmc"
| "europepmc"
| "biorxiv"
| "core"
category
: Category ID or concept name (varies by source)count
: Number of papers to fetch (default: 50, max: 200)Category Examples by Source:
"cs.AI"
, "physics.gen-ph"
, "math.CO"
"artificial intelligence"
, "machine learning"
, "C41008148"
"immunology"
, "genetics"
, "neuroscience"
"biology"
, "medicine"
, "cancer"
"biorxiv:neuroscience"
, "medrxiv:psychiatry"
"computer_science"
, "mathematics"
, "physics"
Returns:
text: ""
) - use fetch_content
for full textfetch_top_cited
Fetches the top cited papers from OpenAlex for a given concept since a specific date.
Parameters:
concept
: Concept name or OpenAlex concept IDsince
: Start date in YYYY-MM-DD formatcount
: Number of papers to fetch (default: 50, max: 200)search_papers
Searches for papers across multiple academic sources with field-specific search and sorting options.
Parameters:
source
: "arxiv"
| "openalex"
| "europepmc"
| "core"
query
: Search query string (max 1500 characters)field
: "all"
| "title"
| "abstract"
| "author"
| "fulltext"
(default: "all")count
: Number of results to return (default: 50, max: 200)sortBy
: "relevance"
| "date"
| "citations"
(default: "relevance")Search Capabilities by Source:
Example Queries:
"machine learning"
, "climate change"
"artificial intelligence"
(use quotes for exact phrases)"deep learning AND neural networks"
(arXiv supports this)"John Smith"
, "Smith J"
Returns:
text: ""
) - use fetch_content
for full textfetch_content
Fetches full metadata and text content for a specific paper by ID with complete text extraction.
Parameters:
source
: Any of the 6 supported sourcesid
: Paper ID (format varies by source)ID Formats by Source:
"2401.12345"
, "cs/0601001"
, "1234.5678v2"
"W2741809807"
or numeric 2741809807
"PMC8245678"
or "12345678"
"PMC8245678"
, "12345678"
, or DOI"10.1101/2021.01.01.425001"
or "2021.01.01.425001"
"12345678"
All tools return paper objects with the following structure:
{
id: string; // Paper ID
title: string; // Paper title
authors: string[]; // List of author names
date: string; // Publication date (ISO format)
pdf_url?: string; // PDF URL (if available)
text: string; // Extracted full text content
textTruncated?: boolean; // Warning: text was truncated due to size limits
textExtractionFailed?: boolean; // Warning: text extraction failed
}
Each source has specialized text extraction approaches:
arxiv.org/html
with ar5iv.labs.arxiv.org
fallbackAdvanced DOI resolver with multiple fallback strategies:
Respectful API usage with per-source rate limiting:
For enhanced CORE access, set environment variable:
export CORE_API_KEY="your-api-key"
# Run all tests
npm test
# Run integration tests
npm run test -- tests/integration
# Run end-to-end workflow tests
npm run test -- tests/e2e
# Run performance benchmarks
npm run test -- tests/integration/performance.test.ts
Source | Papers | Disciplines | Full-Text | Citation Data | Preprints | Search |
---|---|---|---|---|---|---|
arXiv | 2.3M+ | STEM | HTML โ | Limited | โ | โโโ |
OpenAlex | 200M+ | All | Variable | โโโ | โ | โโโ |
PMC | 7M+ | Biomedical | XML/HTML โ | Limited | โ | Limited |
Europe PMC | 40M+ | Life Sciences | HTML โ | Limited | โ | โโโ |
bioRxiv/medRxiv | 500K+ | Bio/Medical | HTML โ | Limited | โโโ | Limited |
CORE | 200M+ | All | PDF/HTML โ | Limited | โ | โโโ |
npm run build
# Test specific sources
node dist/cli.js list-categories --source=arxiv
node dist/cli.js fetch-latest --source=biorxiv --category="biorxiv:biology" --count=3
node dist/cli.js fetch-content --source=core --id=12345678
# Test search functionality
node dist/cli.js search-papers --source=arxiv --query="artificial intelligence" --count=5
node dist/cli.js search-papers --source=openalex --query="quantum computing" --field=title --count=3
# Run performance benchmarks
npm run test -- tests/integration/performance.test.ts
# Test memory usage
npm run test -- --reporter=verbose
Comprehensive error handling for all sources:
CORE_API_KEY
environment variablecount
parameters (smaller for faster responses)fetch_latest
for discovery, fetch_content
for detailed readingMIT
Ready to explore the world's scientific knowledge? Start with any of the 6 sources and discover papers across all academic disciplines! ๐ฌ๐
MCP server that performs whois lookup against domain, IP, ASN and TLD.
Search and get up-to-date information about NPM, Cargo, PyPi, and NuGet packages.
Interact with the Haloscan SEO API for search engine optimization tasks.
Search and book from over 2 million hotels with shopping and booking capabilities.
Query records of Korean independence activists from the Ministry of Patriots and Veterans Affairs.
Search for academic articles from scholarly vendors.
Extracts basic chemical information about drugs and compounds from the PubChem API.
Query 24-hour weather forecasts and city information by city name or coordinates.
Search campgrounds around the world on campertunity, check availability, and provide booking links.
A server for performing deep web searches using the @just-every/search library, requiring API keys via an environment file.