Gemini Image and Audio generation

MCP Server for Gemini Image and Audio generation

Gemini Gen MCP

PyPI version License: MIT

MCP Server for Gemini Image and Audio generation using Google's Gemini AI models.

Features

This MCP server provides tools to:

  • Generate images from text using Gemini's Flash Image model
  • Generate audio from text using Gemini 2.5 Flash Preview TTS model

Installation

From PyPI

pip install gemini-gen-mcp

From Source

git clone https://github.com/ServiceStack/gemini-gen-mcp.git
cd gemini-gen-mcp
pip install -e .

Prerequisites

You need a Google Gemini API key to use this server. Get one from Google AI Studio.

Environment Variables

VariableRequiredDefaultDescription
GEMINI_API_KEYYes-Your Google Gemini API key
GEMINI_DOWNLOAD_PATHNo/tmp/gemini_gen_mcpDirectory where generated files are saved

Set the environment variables:

export GEMINI_API_KEY='your-api-key-here'
export GEMINI_DOWNLOAD_PATH='/path/to/downloads'  # optional

Generated files are organized by type and date:

  • Images: $GEMINI_DOWNLOAD_PATH/images/YYYY-MM-DD/
  • Audio: $GEMINI_DOWNLOAD_PATH/audios/YYYY-MM-DD/

Each generated file includes a companion .info.json file with generation metadata.

Usage

Running the Server

Run the MCP server directly:

gemini-gen-mcp

Or as a Python module:

python -m gemini_gen_mcp.server

Using with Claude Desktop

See CLAUDE_CONFIG.md for detailed instructions.

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "gemini-gen": {
      "command": "gemini-gen-mcp",
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Available Tools

text_to_image

Generate images from text descriptions using Gemini's image generation models.

Parameters:

  • prompt (string, required): Text description of the image to generate
  • model (string, optional): Gemini model to use
    • gemini-2.5-flash-image (default)
    • gemini-3-pro-image-preview
  • aspect_ratio (string, optional): Aspect ratio for the generated image (default: "1:1")
    • Supported: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
  • temperature (float, optional): Sampling temperature for image generation (default: 1.0)
  • top_p (float, optional): Nucleus sampling parameter (optional)

Example:

{
  "prompt": "A serene mountain landscape at sunset with a lake",
  "model": "gemini-2.5-flash-image",
  "aspect_ratio": "16:9",
  "temperature": 1.0
}

text_to_audio

Generate audio/speech from text using Gemini's TTS models. Output is saved as WAV format.

Parameters:

  • text (string, required): Text to convert to speech
  • model (string, optional): Gemini TTS model to use
    • gemini-2.5-flash-preview-tts (default)
    • gemini-2.5-pro-preview-tts
  • voice (string, optional): Voice to use for speech generation (default: "Kore")

Available Voices:

VoiceStyleVoiceStyleVoiceStyle
ZephyrBrightPuckUpbeatCharonInformative
KoreFirmFenrirExcitableLedaYouthful
OrusFirmAoedeBreezyCallirrhoeEasy-going
AutonoeBrightEnceladusBreathyIapetusClear
UmbrielEasy-goingAlgiebaSmoothDespinaSmooth
ErinomeClearAlgenibGravellyRasalgethiInformative
LaomedeiaUpbeatAchernarSoftAlnilamFirm
SchedarEvenGacruxMaturePulcherrimaForward
AchirdFriendlyZubenelgenubiCasualVindemiatrixGentle
SadachbiaLivelySadaltagerKnowledgeableSulafatWarm

Example:

{
  "text": "Hello, this is a test of the Gemini text to speech system.",
  "model": "gemini-2.5-flash-preview-tts",
  "voice": "Kore"
}

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/ServiceStack/gemini-gen-mcp.git
cd gemini-gen-mcp

# Install in editable mode with dependencies
pip install -e .

Running Tests

# Install test dependencies
pip install pytest pytest-asyncio

# Run tests
```bash
# uv run pytest tests -v
npm test

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions, please use the GitHub Issues page.

Acknowledgments

Links

Related Servers