Gemini Image and Audio generation
MCP Server for Gemini Image and Audio generation
Gemini Gen MCP
MCP Server for Gemini Image and Audio generation using Google's Gemini AI models.
Features
This MCP server provides tools to:
- Generate images from text using Gemini's Flash Image model
- Generate audio from text using Gemini 2.5 Flash Preview TTS model
Installation
From PyPI
pip install gemini-gen-mcp
From Source
git clone https://github.com/ServiceStack/gemini-gen-mcp.git
cd gemini-gen-mcp
pip install -e .
Prerequisites
You need a Google Gemini API key to use this server. Get one from Google AI Studio.
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
GEMINI_API_KEY | Yes | - | Your Google Gemini API key |
GEMINI_DOWNLOAD_PATH | No | /tmp/gemini_gen_mcp | Directory where generated files are saved |
Set the environment variables:
export GEMINI_API_KEY='your-api-key-here'
export GEMINI_DOWNLOAD_PATH='/path/to/downloads' # optional
Generated files are organized by type and date:
- Images:
$GEMINI_DOWNLOAD_PATH/images/YYYY-MM-DD/ - Audio:
$GEMINI_DOWNLOAD_PATH/audios/YYYY-MM-DD/
Each generated file includes a companion .info.json file with generation metadata.
Usage
Running the Server
Run the MCP server directly:
gemini-gen-mcp
Or as a Python module:
python -m gemini_gen_mcp.server
Using with Claude Desktop
See CLAUDE_CONFIG.md for detailed instructions.
Add this to your or claude_desktop_config.json:
{
"mcpServers": {
"gemini-gen": {
"description": "Gemini Image and Audio TTS generation",
"command": "uvx",
"args": [
"gemini-gen-mcp"
],
"env": {
"GEMINI_API_KEY": "$GEMINI_API_KEY"
}
}
}
}
Using in llms .py
Or paste server configuration into llms .py MCP Servers:
Name: gemini-gen
{
"description": "Gemini Image and Audio TTS generation",
"command": "uvx",
"args": [
"gemini-gen-mcp"
],
"env": {
"GEMINI_API_KEY": "$GEMINI_API_KEY"
}
}
Development Server
For development, you can run this server using uv:
{
"mcpServers": {
{
"command": "uv",
"args": [
"run",
"--directory",
"/path/to/ServiceStack/gemini-gen-mcp",
"gemini-gen-mcp"
],
"env": {
"GEMINI_API_KEY": "$GEMINI_API_KEY"
}
}
}
}
Available Tools
text_to_image
Generate images from text descriptions using Gemini's image generation models.
Parameters:
prompt(string, required): Text description of the image to generatemodel(string, optional): Gemini model to usegemini-2.5-flash-image(default)gemini-3-pro-image-preview
aspect_ratio(string, optional): Aspect ratio for the generated image (default: "1:1")- Supported:
1:1,2:3,3:2,3:4,4:3,4:5,5:4,9:16,16:9,21:9
- Supported:
temperature(float, optional): Sampling temperature for image generation (default: 1.0)top_p(float, optional): Nucleus sampling parameter (optional)
Example:
{
"prompt": "A serene mountain landscape at sunset with a lake",
"model": "gemini-2.5-flash-image",
"aspect_ratio": "16:9",
"temperature": 1.0
}
text_to_audio
Generate audio/speech from text using Gemini's TTS models. Output is saved as WAV format.
Parameters:
text(string, required): Text to convert to speechmodel(string, optional): Gemini TTS model to usegemini-2.5-flash-preview-tts(default)gemini-2.5-pro-preview-tts
voice(string, optional): Voice to use for speech generation (default: "Kore")
Available Voices:
| Voice | Style | Voice | Style | Voice | Style |
|---|---|---|---|---|---|
| Zephyr | Bright | Puck | Upbeat | Charon | Informative |
| Kore | Firm | Fenrir | Excitable | Leda | Youthful |
| Orus | Firm | Aoede | Breezy | Callirrhoe | Easy-going |
| Autonoe | Bright | Enceladus | Breathy | Iapetus | Clear |
| Umbriel | Easy-going | Algieba | Smooth | Despina | Smooth |
| Erinome | Clear | Algenib | Gravelly | Rasalgethi | Informative |
| Laomedeia | Upbeat | Achernar | Soft | Alnilam | Firm |
| Schedar | Even | Gacrux | Mature | Pulcherrima | Forward |
| Achird | Friendly | Zubenelgenubi | Casual | Vindemiatrix | Gentle |
| Sadachbia | Lively | Sadaltager | Knowledgeable | Sulafat | Warm |
Example:
{
"text": "Hello, this is a test of the Gemini text to speech system.",
"model": "gemini-2.5-flash-preview-tts",
"voice": "Kore"
}
Development
Setup Development Environment
# Clone the repository
git clone https://github.com/ServiceStack/gemini-gen-mcp.git
cd gemini-gen-mcp
# Install in editable mode with dependencies
pip install -e .
Running Tests
# Install test dependencies
pip install pytest pytest-asyncio
# Run tests
```bash
# uv run pytest tests -v
npm test
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
For issues and questions, please use the GitHub Issues page.
Acknowledgments
- Built with FastMCP
- Powered by Google Gemini AI
Links
관련 서버
Illumio MCP Server
Interact with the Illumio Policy Compute Engine (PCE) to manage workloads, labels, and analyze traffic flows.
Axiom MCP Server
Access Axiom logs through an MCP server. Requires an Axiom API token.
Unstoppable Domains MCP
AI-powered domain name management — search availability, check pricing, manage your portfolio, configure DNS, list domains for sale, and complete purchases via natural language across 400+ ICANN TLDs.
Subfeed
The Cloud for Agents
Bitrix24
The Bitrix24 MCP Server is designed to connect external systems to Bitrix24. It provides AI agents with standardized access to Bitrix24 features and data via the Model Context Protocol (MCP). The MCP server enables external AI systems to interact with Bitrix24 modules through a single standardized interface. You can connect the Bitrix24 MCP Server to the AI model you already use and manage Bitrix24 directly from it. The MCP server allows actions to be performed and data to be retrieved strictly within the access rights configured in your Bitrix24: the AI agent receives only the information and capabilities that are explicitly requested and authorized. Interaction with the Tasks module is supported (the list of supported modules and available actions is gradually expanding).
MCP Server for Kubernetes
A server for managing Kubernetes clusters using the Model Context Protocol.
sentry-mcp-rs
A fast, lightweight MCP server for Sentry, written in Rust.
Hygraph
Integrate Hygraph directly into MCP-compatible tools like Claude and Cursor, executing content operations via natural language
Octodet Keycloak
Administer Keycloak by managing users, realms, roles, and other resources through an LLM interface.
CData Salesloft Server
A read-only MCP server by CData that enables LLMs to query live data from Salesloft.