CHeema-Text-to-Voice-MCP-Server
AI-powered text-to-speech MCP server with instant voice cloning. Generate speech from Claude Desktop, Claude Code, or n8n using 5 built-in voices (English, German, French, Spanish) or clone any voice from a short audio sample. Runs fully local, no API keys, no cloud. Supports stdio, SSE, and HTTP transports.
Cheema Text-to-Voice MCP Server
Free, open-source text-to-speech for AI assistants. Generate natural speech directly from Claude Desktop, Claude Code, n8n, or any MCP-compatible platform. No API keys. No cloud. Runs entirely on your machine.
What Can It Do?
Just ask your AI assistant to speak — it handles the rest:
"Say hello in French using the juliette voice"
"Convert this paragraph to speech and save it as intro.wav"
"Clone my voice from this recording and use it to read my essay"
5 built-in voices across 4 languages, plus instant voice cloning from a short audio sample.
Quick Start
1. Install Prerequisites
You need Python 3.10+ and espeak-ng:
# Ubuntu / Debian
sudo apt install espeak-ng
# macOS
brew install espeak-ng
# Windows
choco install espeak-ng
2. Clone & Install
git clone https://github.com/MuhammadTayyabIlyas/CHeema-Text-to-Voice-MCP-Server.git
cd CHeema-Text-to-Voice-MCP-Server
python -m venv venv
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windows
pip install -e .
pip install "mcp[cli]"
3. Connect to Your AI Assistant
Pick your platform and follow the steps below.
Setup by Platform
Claude Desktop
Add this to your config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"cheema-tts": {
"command": "/full/path/to/CHeema-Text-to-Voice-MCP-Server/venv/bin/python",
"args": ["/full/path/to/CHeema-Text-to-Voice-MCP-Server/mcp_server.py"],
"env": {}
}
}
}
Restart Claude Desktop. You'll see the TTS tools appear in the tools menu.
Claude Code
claude mcp add cheema-tts -- /full/path/to/venv/bin/python /full/path/to/mcp_server.py
Then just ask Claude to generate speech in any conversation.
n8n
Start the server in SSE mode:
cd CHeema-Text-to-Voice-MCP-Server
source venv/bin/activate
python mcp_server.py --transport sse --host 127.0.0.1 --port 8000
In your n8n workflow:
- Add an AI Agent node with an MCP Client Tool
- Set connection type to SSE
- Enter the URL:
http://127.0.0.1:8000/sse - The agent can now call any TTS tool
Any Other MCP Client
Start the server with your preferred transport:
# SSE (for web platforms and remote clients)
python mcp_server.py --transport sse --host 0.0.0.0 --port 8000
# Streamable HTTP
python mcp_server.py --transport streamable-http --host 0.0.0.0 --port 8000
Connect your MCP client to:
- SSE:
http://<your-host>:8000/sse - HTTP:
http://<your-host>:8000/mcp
Available Voices
| Voice | Language | Description |
|---|---|---|
| jo | English | Default — clear, natural female voice |
| dave | English | Male voice |
| greta | German | German female voice |
| juliette | French | French female voice |
| mateo | Spanish | Spanish male voice |
Use tts_list_speakers to see all voices including any custom ones you've added.
Voice Cloning
Clone any voice from a short audio sample:
- Record or find a WAV file — 3 to 15 seconds of clean speech
- Know the transcript — the exact words spoken in the recording
- Ask your AI assistant:
"Add a new speaker called 'alex' from /path/to/recording.wav — the transcript is 'This is what I said in the recording'"
Or call the tool directly:
tts_add_speaker(name="alex", wav_path="/path/to/recording.wav", ref_text="This is what I said in the recording")
Custom voices are saved permanently and available across restarts.
Tips for best results:
- Mono audio, 16-44 kHz sample rate
- 3-15 seconds of continuous, natural speech
- Minimal background noise
Available Tools
| Tool | What It Does |
|---|---|
tts_help | Shows a complete usage guide with examples — start here |
tts_synthesize | Converts text to speech, saves a WAV file |
tts_list_speakers | Lists all available voices |
tts_list_models | Shows the active model and alternatives |
tts_add_speaker | Clones a new voice from an audio sample |
tts_synthesize Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
text | Yes | — | The text to convert to speech |
speaker | No | "jo" | Which voice to use |
output_filename | No | auto-generated | Custom filename for the WAV output |
MCP Prompts (Templates)
| Prompt | Description |
|---|---|
quick_speech | Fast speech generation — just provide text and optional speaker |
voice_clone_guide | Step-by-step walkthrough for adding a new voice |
These appear automatically in Claude Desktop's prompt picker.
Models
The default model (neutts-nano) works great on CPU. Larger models produce higher quality but need more resources.
| Model | Language | Size | Notes |
|---|---|---|---|
neuphonic/neutts-nano | English | ~229M | Default — fast, good quality |
neuphonic/neutts-air | English | ~552M | Higher quality, slower |
neuphonic/neutts-nano-german | German | ~229M | German language |
neuphonic/neutts-nano-french | French | ~229M | French language |
neuphonic/neutts-nano-spanish | Spanish | ~229M | Spanish language |
neuphonic/neutts-*-q4-gguf | varies | smaller | Quantized — faster, less memory |
neuphonic/neutts-*-q8-gguf | varies | medium | Quantized — balanced |
Switch models using environment variables:
NEUTTS_BACKBONE="neuphonic/neutts-air" python mcp_server.py
Configuration
All settings are optional — defaults work out of the box.
| Variable | Default | Description |
|---|---|---|
NEUTTS_BACKBONE | neuphonic/neutts-nano | HuggingFace model repo |
NEUTTS_BACKBONE_DEVICE | cpu | cpu or cuda for GPU |
NEUTTS_CODEC | neuphonic/neucodec | Audio codec model |
NEUTTS_CODEC_DEVICE | cpu | cpu or cuda for GPU |
NEUTTS_OUTPUT_DIR | ./output | Where WAV files are saved |
NEUTTS_SAMPLES_DIR | ./samples | Built-in speaker samples |
NEUTTS_SPEAKERS_DIR | ./speakers | Custom voice data |
NEUTTS_TRANSPORT | stdio | stdio, sse, or streamable-http |
NEUTTS_HOST | 127.0.0.1 | Bind address (SSE/HTTP only) |
NEUTTS_PORT | 8000 | Bind port (SSE/HTTP only) |
GPU Acceleration
For faster synthesis on NVIDIA GPUs:
NEUTTS_BACKBONE_DEVICE=cuda NEUTTS_CODEC_DEVICE=cuda python mcp_server.py
Requires PyTorch with CUDA support.
Running as a Service
For production use, create a systemd service so it starts automatically:
# /etc/systemd/system/cheema-tts.service
[Unit]
Description=Cheema Text-to-Voice MCP Server
After=network.target
[Service]
Type=simple
WorkingDirectory=/path/to/CHeema-Text-to-Voice-MCP-Server
ExecStart=/path/to/venv/bin/python mcp_server.py --transport sse --host 127.0.0.1 --port 8000
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
sudo systemctl enable --now cheema-tts
To expose it over HTTPS, put an Nginx or Caddy reverse proxy in front with these key settings for SSE:
- Disable proxy buffering (
proxy_buffering off) - Set a long read timeout (
proxy_read_timeout 86400) - Add
X-Accel-Buffering: noheader
Troubleshooting
Server won't start?
- Check espeak-ng:
espeak-ng --version - Check dependencies:
pip list | grep -E "mcp|neutts|torch|soundfile"
No audio output?
- Check the
output/directory for WAV files - Verify speaker name with
tts_list_speakers
Slow first run?
- Normal — the first run downloads model weights from HuggingFace (~200-500MB). Cached after that.
Want GPU acceleration?
- Set
NEUTTS_BACKBONE_DEVICE=cudaandNEUTTS_CODEC_DEVICE=cuda
How It Works
- The server loads the NeuTTS backbone model and audio codec on startup
- Speaker voice prints (
.ptfiles) are loaded into memory - When you request speech, text is phonemized and combined with the speaker's voice reference
- The model generates speech tokens, decoded into a 24kHz waveform
- Output is saved as a standard WAV file
Project Structure
CHeema-Text-to-Voice-MCP-Server/
├── mcp_server.py # MCP server entry point
├── neutts/ # NeuTTS engine
├── samples/ # Built-in speaker voices (.wav, .pt, .txt)
├── speakers/ # Custom cloned voices (auto-created)
├── output/ # Generated audio files (auto-created)
└── examples/ # Usage examples
Credits
- NeuTTS by Neuphonic — the TTS engine
- MCP Protocol by Anthropic — the AI tool standard
Author
Tayyab Ilyas — PhD Researcher & EdTech Founder
Building AI-powered tools for educators and researchers.
License
MIT License. The underlying NeuTTS models have their own licenses — see the NeuTTS repository for details.
Cheema Text-to-Voice MCP Server
Give your AI assistant a voice.
関連サーバー
wodeapp
AI-powered no-code app builder with 17 MCP tools — create projects, generate pages from natural language, AI text/image generation (GPT, Claude, Gemini, 14+ models), page CRUD, workflow execution, publish & version control. SSE transport, API key auth.
Cinode
Give agents a view of people, projects, skills and assignments
SafeDep
Real-time malicious package protection for AI coding agents
AI Incident Reporting MCP
Structured AI incident reporting for EU AI Act Article 62 — generates mandatory incident reports, severity classification, root cause analysis, and regulator-ready submissions for serious AI incidents.
Github MCP Server Java
A production-ready MCP server that connects any MCP-compatible AI agent to the GitHub API. Manage repositories, issues, pull requests, and search — all through natural language.
Mnemo Cortex
Persistent cross-agent semantic memory for AI agents. Recall past sessions, share knowledge across agents. Multi-agent (isolated writes, shared reads), local-first (SQLite + FTS5), works with any LLM — local Ollama at $0 or cloud APIs like Gemini and OpenAI. Integrations for Claude Code, Claude Desktop, and OpenClaw.
Guesty MCP Server
First MCP server for Guesty property management. 38 tools for reservations, guests, messaging, pricing, financials, calendars, reviews, tasks, and webhooks. Free tier with 23 tools, Pro tier with all 38.
Enedis Linky MCP Server
A production-ready Model Context Protocol (MCP) server written in Go that wraps the Conso API, giving AI assistants like Claude direct access to your Enedis Linky smart meter data.
Earnings Feed
SEC filings and insider trades in real-time. 10-K, 10-Q, 8-K, Form 4, and company lookup.
Fabric MCP Server
The Fabric MCP server exposes the following MCP tools that allow AI assistants to interact with your Equinix Fabric resources. Each tool corresponds to a specific Fabric API endpoint.