Academia MCP
Search for scientific publications across ArXiv, ACL Anthology, HuggingFace Datasets, and Semantic Scholar.
Academia MCP
MCP server with tools to search, fetch, analyze, and report on scientific papers and datasets.
Features
- ArXiv search and download
- ACL Anthology search
- Hugging Face datasets search
- Semantic Scholar citations and references
- Web search via Exa, Brave, or Tavily
- Web page crawler, LaTeX compilation, PDF reading
- Optional LLM-powered tools for document QA and research proposal workflows
Requirements
- Python 3.12+
Install
- Using pip (end users):
pip3 install academia-mcp
- For development (uv + Makefile):
uv venv .venv
make install
Quickstart
- Run over HTTP (default transport):
python -m academia_mcp --transport streamable-http
# OR
uv run -m academia_mcp --transport streamable-http
- Run over stdio (for local MCP clients like Claude Desktop):
python -m academia_mcp --transport stdio
# OR
uv run -m academia_mcp --transport stdio
Notes:
- Transports:
stdio,sse,streamable-http. host/portare used for HTTP transports; ignored forstdio. Default port is5056(orPORT).
Authentication
Academia MCP supports optional token-based authentication for HTTP transports (streamable-http and sse). Authentication is disabled by default to maintain backward compatibility.
Enabling Authentication
Set the ENABLE_AUTH environment variable to true:
export ENABLE_AUTH=true
export TOKENS_FILE=/path/to/tokens.json # Optional, defaults to ./tokens.json
Managing Tokens
Issue a new token:
academia_mcp auth issue-token --client-id=my-client --description="Production API client"
# Issue token with 30-day expiration
academia_mcp auth issue-token --client-id=test-client --expires-days=30
# Issue token with custom scopes
academia_mcp auth issue-token --client-id=admin --scopes="read,write,admin"
List active tokens:
academia_mcp auth list-tokens
Revoke a token:
academia_mcp auth revoke-token mcp_a1b2c3d4e5f6...
Using Tokens
Include the token in the Authorization header with the Bearer scheme or as a query parameter apiKey.
Security Notes:
- Tokens are displayed only once during issuance. Store them securely.
- Use HTTPS in production to protect tokens in transit.
- The
tokens.jsonfile is automatically created with restrictive permissions (mode 600). - Tokens are stored in plaintext (standard practice for bearer tokens) - protect the tokens file.
Claude Desktop config
{
"mcpServers": {
"academia": {
"command": "python3",
"args": [
"-m",
"academia_mcp",
"--transport",
"stdio"
]
}
}
}
Available tools (one-liners)
arxiv_search: Query arXiv with field-specific queries and filters.arxiv_download: Fetch a paper by ID and convert to structured text (HTML/PDF modes).anthology_search: Search ACL Anthology with fielded queries and optional date filtering.hf_datasets_search: Find Hugging Face datasets with filters and sorting.s2_get_citations: List papers citing a given arXiv paper (Semantic Scholar Graph).s2_get_references: List papers referenced by a given arXiv paper.visit_webpage: Fetch and normalize a web page.web_search: Unified search wrapper; available when at least one of Exa/Brave/Tavily keys is set.exa_web_search,brave_web_search,tavily_web_search: Provider-specific search.get_latex_templates_list,get_latex_template: Enumerate and fetch built-in LaTeX templates.compile_latex: Compile LaTeX to PDF inWORKSPACE_DIR.read_pdf: Extract text per page from a PDF.download_pdf_paper,review_pdf_paper: Download and optionally review PDFs (requires LLM + workspace).document_qa: Answer questions over provided document chunks (requires LLM).extract_bitflip_info,generate_research_proposals,score_research_proposals: Research proposal helpers (requires LLM).
Availability notes:
- Set
WORKSPACE_DIRto enablecompile_latex,read_pdf,download_pdf_paper, andreview_pdf_paper. - Set
OPENROUTER_API_KEYto enable LLM tools (document_qa,review_pdf_paper, and bitflip tools). - Set one or more of
EXA_API_KEY,BRAVE_API_KEY,TAVILY_API_KEYto enableweb_searchand provider tools.
Environment variables
Set as needed, depending on which tools you use:
OPENROUTER_API_KEY: required for LLM-related tools.BASE_URL: override OpenRouter base URL.DOCUMENT_QA_MODEL_NAME: override default model fordocument_qa.BITFLIP_MODEL_NAME: override default model for bitflip tools.TAVILY_API_KEY: enables Tavily inweb_search.EXA_API_KEY: enables Exa inweb_searchandvisit_webpage.BRAVE_API_KEY: enables Brave inweb_search.WORKSPACE_DIR: directory for generated files (PDFs, temp artifacts).PORT: HTTP port (default5056).
You can put these in a .env file in the project root.
Docker
Build the image:
docker build -t academia_mcp .
Run the server (HTTP):
docker run --rm -p 5056:5056 \
-e PORT=5056 \
-e OPENROUTER_API_KEY=your_key_here \
-e WORKSPACE_DIR=/workspace \
-v "$PWD/workdir:/workspace" \
academia_mcp
Or use existing image: phoenix120/academia_mcp
Examples
Makefile targets
make install: install the package in editable mode with uvmake validate: run black, flake8, and mypy (strict)make test: run the test suite with pytestmake publish: build and publish using uv
LaTeX/PDF requirements
Only needed for LaTeX/PDF tools. Ensure a LaTeX distribution is installed and pdflatex is on PATH, as well as latexmk. On Debian/Ubuntu:
sudo apt install texlive-latex-base texlive-fonts-recommended texlive-latex-extra texlive-science latexmk
Máy chủ liên quan
Google Search Console
An MCP server for accessing Google Search Console data, including site performance and indexing status.
NCBI Literature Search
Search NCBI databases, including PubMed, for scientific literature. Tailored for researchers in life sciences, evolutionary biology, and computational biology.
Tavily MCP Server
Web search using the Tavily API.
StatPearls
Fetches peer-reviewed medical and disease information from StatPearls.
AI Furniture Hub
Japan-focused MCP server with 15 tools for mm-precision product search across 300+ items and 31 categories. Curated sets, dimension-compatible replacements, AI visibility diagnosis.
Crawleo MCP Server
Crawleo MCP - Web Search & Crawl for AI Enable AI assistants to access real-time web data through native tool integration. Two Powerful Tools: web.search - Real-time web search with flexible formatting Search from any country/language Device-specific results (desktop, mobile, tablet) Multiple output formats: Enhanced HTML (AI-optimized, clean) Raw HTML (original source) Markdown (formatted text) Plain Text (pure content) Auto-crawl option for full content extraction Multi-page search support web.crawl - Deep content extraction Extract clean content from any URL JavaScript rendering support Markdown conversion Screenshot capture Multi-URL support Features: ✅ Zero data retention (complete privacy) ✅ Real-time, not cached results ✅ AI-optimized with Enhanced HTML mode ✅ Global coverage (any country/language) ✅ Device-specific search (mobile/desktop/tablet) ✅ Flexible output formats (4 options) ✅ Cost-effective (5-10x cheaper than competitors) ✅ Simple Claude Desktop integration Perfect for: Research, content analysis, data extraction, AI agents, RAG pipelines, multi-device testing
SerpApi MCP Server
Retrieve parsed search engine results using the SerpApi.
Stack Overflow
Access Stack Overflow's trusted and verified technical questions and answers.
RAG Documentation MCP Server
Retrieve and process documentation using vector search to provide relevant context for AI assistants.
BGPT MCP
Search scientific papers with structured experimental data extracted from full-text studies. Returns 25+ fields per paper including methods, results, sample sizes, limitations, and quality scores.