Provides Optical Character Recognition (OCR) services using Google's Gemini API.
This project provides a simple yet powerful OCR (Optical Character Recognition) service through a FastMCP server, leveraging the capabilities of the Google Gemini API. It allows you to extract text from images either by providing a file path or a base64 encoded string.
Extract the text from the following image:
and convert it to plain text, e.g., fbVk
Clone the repository:
git clone https://github.com/WindoC/gemini-ocr-mcp
cd gemini-ocr-mcp
Create and activate a virtual environment:
# Install uv standalone if needed
## On macOS and Linux.
curl -LsSf https://astral.sh/uv/install.sh | sh
## On Windows.
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Install the required dependencies:
uv sync
If you are running this as a server for a parent MCP application, you can configure it in your main MCP config.json
.
Windows Example:
{
"mcpServers": {
"gemini-ocr-mcp": {
"command": "uv",
"args": [
"--directory",
"x:\\path\\to\\your\\project\\gemini-ocr-mcp",
"run",
"gemini-ocr-mcp.py"
],
"env": {
"GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
"GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
}
}
}
}
Linux/macOS Example:
{
"mcpServers": {
"gemini-ocr-mcp": {
"command": "uv",
"args": [
"--directory",
"/path/to/your/project/gemini-ocr-mcp",
"run",
"gemini-ocr-mcp.py"
],
"env": {
"GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
"GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
}
}
}
}
Note: Remember to replace the placeholder paths with the absolute path to your project directory.
ocr_image_file
Performs OCR on a local image file.
image_file
(string): The absolute or relative path to the image file.ocr_image_base64
Performs OCR on a base64 encoded image.
base64_image
(string): The base64 encoded string of the image.Retrieve logs from the Mezmo observability platform.
Interact with the Invertir Online (IOL) API to manage investments and access market data.
Provides a unified interface to AWS services for security investigations and incident response.
Interact with the Eyevinn Open Source Cloud API. Requires a Personal Access Token (OSC_ACCESS_TOKEN).
Interact with Stripe API
An authentication-free remote MCP server deployable on Cloudflare Workers or locally via npm.
Interact with your content on the Contentful platform
The PayPal Model Context Protocol server allows you to integrate with PayPal APIs through function calling. This protocol supports various tools to interact with different PayPal services.
Administer Keycloak by managing users, realms, roles, and other resources through an LLM interface.
Interact with your AWS environment using natural language to query and manage resources. Requires local AWS credentials.