Gemini OCR
Provides Optical Character Recognition (OCR) services using Google's Gemini API.
Gemini OCR MCP Server
This project provides a simple yet powerful OCR (Optical Character Recognition) service through a FastMCP server, leveraging the capabilities of the Google Gemini API. It allows you to extract text from images either by providing a file path or a base64 encoded string.
Objective
Extract the text from the following image:

and convert it to plain text, e.g., fbVk
Features
- File-based OCR: Extract text directly from an image file on your local system.
- Base64 OCR: Extract text from a base64 encoded image string.
- Easy to Use: Exposes OCR functionality as simple tools in an MCP server.
- Powered by Gemini: Utilizes Google's advanced Gemini models for high-accuracy text recognition.
Prerequisites
- Python 3.8 or higher
- A Google Gemini API Key. You can obtain one from Google AI Studio.
Setup and Installation
-
Clone the repository:
git clone https://github.com/WindoC/gemini-ocr-mcp cd gemini-ocr-mcp -
Create and activate a virtual environment:
# Install uv standalone if needed ## On macOS and Linux. curl -LsSf https://astral.sh/uv/install.sh | sh ## On Windows. powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" -
Install the required dependencies:
uv sync
MCP Configuration Example
If you are running this as a server for a parent MCP application, you can configure it in your main MCP config.json.
Windows Example:
{
"mcpServers": {
"gemini-ocr-mcp": {
"command": "uv",
"args": [
"--directory",
"x:\\path\\to\\your\\project\\gemini-ocr-mcp",
"run",
"gemini-ocr-mcp.py"
],
"env": {
"GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
"GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
}
}
}
}
Linux/macOS Example:
{
"mcpServers": {
"gemini-ocr-mcp": {
"command": "uv",
"args": [
"--directory",
"/path/to/your/project/gemini-ocr-mcp",
"run",
"gemini-ocr-mcp.py"
],
"env": {
"GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
"GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
}
}
}
}
Note: Remember to replace the placeholder paths with the absolute path to your project directory.
Tools Provided
ocr_image_file
Performs OCR on a local image file.
- Parameter:
image_file(string): The absolute or relative path to the image file. - Returns: (string) The extracted text from the image.
ocr_image_base64
Performs OCR on a base64 encoded image.
- Parameter:
base64_image(string): The base64 encoded string of the image. - Returns: (string) The extracted text from the image.
Servidores relacionados
AWS MCP Servers
A suite of MCP servers providing AI applications with access to AWS documentation, contextual guidance, and best practices.
BNBChain MCP
Interact with BNB Chain and other EVM-compatible networks using natural language and AI assistance.
Binance MCP Server
Provides seamless access to the Binance exchange API. Requires Binance API credentials to be configured.
DataWorks
A Model Context Protocol (MCP) server that provides tools for AI, allowing it to interact with the DataWorks Open API through a standardized interface. This implementation is based on the Aliyun Open API and enables AI agents to perform cloud resources operations seamlessly.
CoSense
An MCP server for interacting with the CoSense collaborative sensemaking platform, supporting public and private projects.
MCP Remote with Okta/Adobe IMS Authentication
A remote MCP server that uses Adobe IMS/Okta for authentication.
Digi Remote Manager MCP
Digi Remote Manager MCP allows users to connect Ai Agents to their Digi Remote Manager account for analyzing fleet data and help with troubleshooting.
Coolify MCP Server
An MCP server for integrating with Coolify, the self-hostable alternative to Netlify and Vercel.
MCP Salesforce Server
Provides seamless integration with Salesforce using OAuth authentication.
MCP Spotify AI Assistant
An AI assistant that controls Spotify features like playback, playlists, and search using the Model Context Protocol (MCP).