Gemini OCR MCP Server

This project provides a simple yet powerful OCR (Optical Character Recognition) service through a FastMCP server, leveraging the capabilities of the Google Gemini API. It allows you to extract text from images either by providing a file path or a base64 encoded string.

Objective

Extract the text from the following image:

CAPTCHA

and convert it to plain text, e.g., fbVk

Features

File-based OCR: Extract text directly from an image file on your local system.
Base64 OCR: Extract text from a base64 encoded image string.
Easy to Use: Exposes OCR functionality as simple tools in an MCP server.
Powered by Gemini: Utilizes Google's advanced Gemini models for high-accuracy text recognition.

Prerequisites

Python 3.8 or higher
A Google Gemini API Key. You can obtain one from Google AI Studio.

Setup and Installation

Clone the repository:

git clone https://github.com/WindoC/gemini-ocr-mcp
cd gemini-ocr-mcp

Create and activate a virtual environment:

# Install uv standalone if needed

## On macOS and Linux.
curl -LsSf https://astral.sh/uv/install.sh | sh

## On Windows.
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Install the required dependencies:
```
uv sync
```

MCP Configuration Example

If you are running this as a server for a parent MCP application, you can configure it in your main MCP config.json.

Windows Example:

{
  "mcpServers": {
    "gemini-ocr-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "x:\\path\\to\\your\\project\\gemini-ocr-mcp",
        "run",
        "gemini-ocr-mcp.py"
      ],
      "env": {
        "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
      }
    }
  }
}

Linux/macOS Example:

{
  "mcpServers": {
    "gemini-ocr-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/your/project/gemini-ocr-mcp",
        "run",
        "gemini-ocr-mcp.py"
      ],
      "env": {
        "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
      }
    }
  }
}

Note: Remember to replace the placeholder paths with the absolute path to your project directory.

Tools Provided

`ocr_image_file`

Performs OCR on a local image file.

Parameter: image_file (string): The absolute or relative path to the image file.
Returns: (string) The extracted text from the image.

`ocr_image_base64`

Performs OCR on a base64 encoded image.

Parameter: base64_image (string): The base64 encoded string of the image.
Returns: (string) The extracted text from the image.

Gemini OCR

Gemini OCR MCP Server

Objective

Features

Prerequisites

Setup and Installation

MCP Configuration Example

Tools Provided

`ocr_image_file`

`ocr_image_base64`

İlgili Sunucular

NowAIKit

Edgee MCP Server

Elementary

AWS Cost Explorer & Bedrock Logs

Novita AI

Kayzen Analytics

Salesforce MCP Server

Amazon VPC Lattice

Coolify

Prometheus

NotebookLM Web Importer