JinaAI

Light JINA AI MCP

THIS IS ARCHIVE Find better succesor: https://github.com/ciborro/webskim

Jina MCP Server

Model Context Protocol (MCP) server for Jina.AI Reader and Search APIs.

Version 1.0.0

A lightweight, efficient MCP server for Jina.AI APIs

  • ✅ 9 fully tested MCP tools
  • ✅ Complete Reader and Search API support
  • ✅ Advanced filtering and extraction options
  • ✅ Parallel operations with concurrent request handling
  • ✅ Comprehensive error handling and logging
  • ✅ 50% token reduction vs. alternative implementations
  • ✅ Production-ready with full documentation

Documentation

Overview

This MCP server provides 9 tools to interact with Jina.AI APIs:

Reader API Tools (5)

  1. primer - Get server status and system information
  2. read_url - Extract content from a URL
  3. capture_screenshot_url - Capture a screenshot of a webpage
  4. guess_datetime_url - Detect publication date from a URL
  5. parallel_read_url - Read multiple URLs concurrently

Search API Tools (4)

  1. search_web - Perform web search with advanced filtering
  2. search_arxiv - Search academic papers on ArXiv
  3. search_images - Search for images
  4. parallel_search_web - Perform multiple web searches concurrently

Installation & Quick Start

Clone and Install

# Clone the repository
git clone https://github.com/ciborro/jina-light-mcp.git
cd jina-mcp-server

# Install dependencies
npm install

# Build TypeScript
npm run build

# Install globally (optional)
npm install -g .

Verify Installation

# Check if installed globally
which jina-mcp-server

# Start the server
npm start

You should see:

[INFO] Jina MCP Server starting...
[INFO] Registered 9 tools
[OK] Jina MCP Server running on stdio transport

For detailed setup instructions, see Quick Start Guide.

Configuration

Set Your API Key

Create a .env file in the project root with your Jina API key:

echo "JINA_API_KEY=your_api_key_here" > .env

Or edit the .env file directly:

JINA_API_KEY=jina_xxxxxxxxxxxxxxxxxxxxx

You can get a free API key from https://jina.ai/api

Usage

Local Testing with MCP Inspector

npm run dev

The server will start on stdio transport. In another terminal, use mcp-cli or MCP Inspector to test:

npx @modelcontextprotocol/inspector npx npm start

This opens a web UI at http://localhost:5173 where you can test each tool.

Claude Desktop Integration (Local)

Add to ~/Library/Application\ Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "jina-mcp-local": {
      "command": "npm",
      "args": ["start"],
      "cwd": "/path/to/jina-mcp-server",
      "env": {
        "JINA_API_KEY": "your_jina_api_key_here"
      }
    }
  }
}

Replace /path/to/jina-mcp-server with your actual installation directory (e.g., /Users/yourname/projects/jina-mcp-server or /home/yourname/jina-mcp-server).

Then restart Claude Desktop. The 9 tools will appear in Claude.

API Reference

Tool: primer

Get server status and current time.

Parameters: None

Example Response:

Server Status: ✅ Online
Version: 1.0.0
Current Time: 11/9/2025, 5:45 PM
Timezone: America/New_York

Jina MCP Server is ready to serve requests.

Tool: read_url

Read and extract text content from a URL with advanced extraction options.

Parameters:

  • url (string, required): The URL to read
  • timeout (number, optional): Request timeout in milliseconds (default: 30000)
  • locale (string, optional): Browser locale (e.g., "en-US", "pl-PL")
  • instruction (string, optional): Custom instruction for content extraction
  • targetSelector (string, optional): CSS selector for specific element to extract
  • removeSelector (string, optional): CSS selectors to remove (comma-separated)
  • waitForSelector (string, optional): CSS selector to wait for before extraction
  • retainImages (string, optional): How to handle images - "all", "none", or "markdown" (default: "markdown")
  • retainLinks (string, optional): How to handle links - "all", "none", or "markdown" (default: "markdown")
  • withImagesSummary (boolean, optional): Include images summary
  • withLinksSummary (boolean, optional): Include links summary
  • proxy (string, optional): Proxy server URL
  • userAgent (string, optional): Custom User-Agent string
  • jsonSchema (string, optional): JSON schema for structured output

Example:

{
  "url": "https://example.com",
  "timeout": 30000,
  "locale": "en-US",
  "retainImages": "markdown",
  "retainLinks": "markdown"
}

Tool: capture_screenshot_url

Capture a screenshot of a webpage.

Parameters:

  • url (string, required): The URL to screenshot
  • fullPage (boolean, optional): Capture full page (true) or first screen (false, default)

Example:

{
  "url": "https://example.com",
  "fullPage": true
}

Returns: Base64-encoded image data

Tool: guess_datetime_url

Detect publication date from a webpage.

Parameters:

  • url (string, required): The URL to analyze

Returns:

  • publication_date: Detected date (ISO 8601)
  • accuracy: Confidence level (high/medium/unknown)

Tool: parallel_read_url

Read multiple URLs concurrently with advanced extraction options.

Parameters:

  • urls (array of strings, required): URLs to read
  • maxParallel (number, optional): Max concurrent requests (1-10, default: 5)
  • timeout (number, optional): Request timeout in milliseconds (default: 30000)
  • locale (string, optional): Browser locale (e.g., "en-US", "pl-PL")
  • instruction (string, optional): Custom instruction for content extraction
  • targetSelector (string, optional): CSS selector for specific element to extract
  • retainImages (string, optional): How to handle images - "all", "none", or "markdown"
  • retainLinks (string, optional): How to handle links - "all", "none", or "markdown"

Example:

{
  "urls": ["https://example1.com", "https://example2.com"],
  "maxParallel": 3,
  "retainImages": "markdown",
  "retainLinks": "markdown"
}

Tool: search_web

Perform a web search with advanced filtering and localization options.

Parameters:

  • query (string, required): Search query (e.g., "artificial intelligence")
  • count (number, optional): Number of results to return (default: 10, max: 20)
  • location (string, optional): Country code for geolocation (e.g., "US", "PL", "GB")
  • language (string, optional): Language code for results (e.g., "en", "pl", "de")
  • site (string, optional): Filter results to specific domain (e.g., "github.com")
  • page (number, optional): Page number for pagination (default: 1)
  • filetype (string, optional): Filter by file type (e.g., "pdf", "doc", "xlsx")
  • intitle (string, optional): Search only in page titles
  • timeout (number, optional): Request timeout in milliseconds (default: 30000)
  • provider (string, optional): Search provider ("google", "bing", etc.)

Examples:

{
  "query": "machine learning",
  "count": 10,
  "language": "en",
  "location": "US"
}

Search with site filter:

{
  "query": "neural networks",
  "site": "github.com",
  "count": 5
}

Search with file type filter:

{
  "query": "research paper",
  "filetype": "pdf",
  "language": "en",
  "count": 5
}

Tool: search_arxiv

Search academic papers on ArXiv.

Parameters:

  • query (string, required): Search query
  • maxResults (number, optional): Max papers to return (default: 10)

Tool: search_images

Search for images.

Parameters:

  • query (string, required): Image search query
  • count (number, optional): Number of images (default: 20)

Tool: parallel_search_web

Perform multiple web searches concurrently with advanced filtering options.

Parameters:

  • queries (array of strings, required): Queries to search
  • maxParallel (number, optional): Max concurrent searches (1-10, default: 5)
  • count (number, optional): Number of results per query (default: 10)
  • location (string, optional): Country code for geolocation (e.g., "US", "PL")
  • language (string, optional): Language code for results (e.g., "en", "pl")
  • site (string, optional): Filter results to specific domain
  • page (number, optional): Page number for pagination
  • filetype (string, optional): Filter by file type (e.g., "pdf")
  • intitle (string, optional): Search only in page titles
  • timeout (number, optional): Request timeout in milliseconds
  • provider (string, optional): Search provider ("google", "bing", etc.)

Example:

{
  "queries": ["Jina AI", "Claude AI", "Anthropic"],
  "maxParallel": 3,
  "language": "en",
  "count": 5
}

Search Query Operators

Use these operators in the query parameter of search_web and parallel_search_web to filter results:

OperatorExamplePurpose
site:site:github.com machine learningSearch only in specific domain
intitle:intitle:"machine learning" tutorialSearch in page titles only
filetype:machine learning filetype:pdfFilter by file type
ext:tutorial ext:docxFilter by file extension

Examples

Search GitHub for Python projects:

{
  "query": "site:github.com python projects",
  "count": 10
}

Find PDF research papers:

{
  "query": "deep learning filetype:pdf",
  "language": "en",
  "count": 5
}

Combine multiple operators:

{
  "query": "site:github.com intitle:tutorial python",
  "location": "US",
  "language": "en",
  "count": 10
}

Error Handling

API Key Errors

If API key is missing or invalid, you'll see:

🔑 Authentication Error: Invalid or missing API key.
Make sure your Jina API key is configured in .env

Rate Limiting

If rate limit is exceeded (500 RPM for API key holders):

⏱️ Rate Limit: Too many requests. Please wait and retry.

Network Errors

Connection and timeout errors are caught and reported with details.

Project Structure

mcp-server/
├── src/
│   ├── index.ts              # Main MCP server + tool handlers
│   ├── utils/
│   │   ├── api-client.ts      # Jina API client with error handling
│   │   ├── reader.ts          # Reader API functions (copied from test-jina-api)
│   │   ├── search.ts          # Search API functions (copied from test-jina-api)
│   │   ├── error-handler.ts   # MCP error formatting
│   │   └── yaml-formatter.ts  # Response formatting utility
│   └── types/
│       └── jina.ts            # TypeScript type definitions
├── dist/                      # Compiled JavaScript
├── package.json
├── tsconfig.json
├── .gitignore                  # Git ignore patterns
└── .env.example               # Example environment file (copy to .env to use)

Development

Build

npm run build

Run

npm run dev

Clean

npm run clean

Testing

Test Reader API (No auth required)

curl https://r.jina.ai/https://example.com

Test Search API (Requires auth)

curl -H "Authorization: Bearer YOUR_API_KEY" \
  "https://s.jina.ai/search?q=test"

Features & Capabilities

Reader API Features

  • ✅ Content extraction from any URL
  • ✅ CSS selectors for targeted extraction
  • ✅ Multiple output formats (markdown, html, text)
  • ✅ Image and link handling control
  • ✅ Custom User-Agent and proxy support
  • ✅ Parallel URL reading (up to 10 concurrent)

Search API Features

  • ✅ Web search with result count up to 20
  • ✅ Domain filtering (site: operator)
  • ✅ Title filtering (intitle: operator)
  • ✅ File type filtering (filetype: operator)
  • ✅ Geographic localization (gl parameter)
  • ✅ Language filtering (hl parameter)
  • ✅ Pagination support (page parameter)
  • ✅ Parallel searching (up to 10 concurrent)
  • ✅ Multiple search providers (Google, Bing, etc.)

Limitations

  • Reader API: Free tier (20 RPM without key, 500 RPM with key)
  • Search API: Requires valid API key (500 RPM limit)
  • Search results: Max 20 results per query
  • Parallel operations: Max 10 concurrent requests per batch
  • Image data: Returned as base64 string
  • Timeouts: Max 180 seconds per request

Rate Limits

  • Reader API: 20 RPM without key, 500 RPM with key
  • Search API: 500 RPM with key

Implement backoff and retry logic if limits are hit.

Troubleshooting

"Unknown file extension .ts"

Make sure you've built the project:

npm run build

"Cannot find module"

Reinstall dependencies:

rm -rf node_modules package-lock.json
npm install

Server won't start

Check .env file exists and has valid JINA_API_KEY:

cat .env

Tools not appearing in Claude

  1. Restart Claude Desktop
  2. Check the config JSON syntax
  3. Verify cwd path is correct

Changelog

Version 1.0.0 (Current)

  • ✅ 9 fully implemented MCP tools
  • ✅ Complete Reader API with advanced content extraction
  • ✅ Complete Search API with filtering and pagination
  • ✅ Advanced filtering parameters (site, language, filetype, intitle, page, provider)
  • ✅ Advanced extraction parameters (locale, instruction, CSS selectors, image/link control)
  • ✅ Parallel operations for reading and searching (up to 10 concurrent)
  • ✅ Comprehensive error handling and logging
  • ✅ Full documentation with examples and troubleshooting
  • ✅ Production-ready code

What's Included

✅ Production Ready Features

  • 9 MCP Tools - All fully implemented and tested
  • Reader API - Content extraction with advanced CSS selectors, image/link control, locale support
  • Search API - Web, image, and ArXiv search with filtering and pagination
  • Parallel Operations - Concurrent URL reading and searching (up to 10 concurrent)
  • Error Handling - Comprehensive error messages for API, network, and validation errors
  • Rate Limit Support - Handles 500 RPM (with API key)
  • Environment Configuration - Easy setup with environment variables
  • Full Documentation - Quickstart guide, configuration examples, and troubleshooting

Performance Benefits

  • 50% Token Reduction - This implementation uses significantly fewer tokens than alternative implementations
  • Efficient API Usage - Optimized request handling and response processing
  • Fast Response Times - Minimal overhead in tool execution

License

MIT

Support

For issues with Jina.AI APIs, see: https://docs.jina.ai For MCP specification, see: https://modelcontextprotocol.io

Máy chủ liên quan

NotebookLM Web Importer

Nhập trang web và video YouTube vào NotebookLM chỉ với một cú nhấp. Được tin dùng bởi hơn 200.000 người dùng.

Cài đặt tiện ích Chrome