Advanced TTS MCP Server
A high-quality, feature-rich Text-to-Speech (TTS) server for generating natural and expressive speech with advanced controls.
Advanced TTS MCP Server
A high-quality, feature-rich Text-to-Speech MCP server with native TypeScript implementation. Designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies.
⨠Features
šÆ Advanced Voice Control
- 10 High-Quality Voices - Male and female voices with distinct personalities
- Emotion Control - Neutral, happy, excited, calm, serious, casual, confident
- Dynamic Pacing - Natural, conversational, presentation, tutorial, narrative modes
- Speed & Volume - Precise control from 0.25x to 3.0x speed, 0.1x to 2.0x volume
š Professional Capabilities
- Streaming Audio - Real-time synthesis and playback
- Batch Processing - Handle multiple text segments efficiently
- Multiple Formats - WAV, MP3, FLAC, OGG output support
- Natural Speech Enhancement - Automatic pause insertion and emotion markers
- Queue Management - Handle multiple concurrent requests
š§ MCP Integration
- 6 Powerful Tools - Complete synthesis, batch processing, voice management
- 2 Rich Resources - Voice capabilities and usage examples
- Real-time Status - Track processing progress and manage requests
- File Management - Save, list, and organize audio outputs
š Quick Start
Option 1: Deploy to Smithery.ai (Recommended)
šÆ One-Click Deployment to Smithery Platform
- Deploy Now: Visit Smithery.ai and import this repository
- Configure: Set your preferred voice and speech settings
- Use Instantly: Access via Claude Desktop or any MCP-compatible client
Benefits:
- ā Zero setup required
- ā Automatic scaling and updates
- ā No model downloads needed
- ā Enterprise-grade hosting
š Full Smithery Deployment Guide ā
Option 2: Local Installation
Prerequisites:
- Node.js 18+
Installation:
- Clone the repository
git clone https://github.com/samihalawa/advanced-tts-mcp.git
cd advanced-tts-mcp
- Install dependencies
npm install
- Configure Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"advanced-tts": {
"command": "node",
"args": ["dist/index.js"],
"cwd": "/path/to/advanced-tts-mcp"
}
}
}
- Start using!
# Build TypeScript
npm run build
# Start server
npm start
Restart Claude Desktop and start synthesizing with natural, expressive voices.
šļø Available Voices
| Voice ID | Name | Gender | Description |
|---|---|---|---|
af_heart | Heart | Female | Warm, friendly voice (default) |
af_sky | Sky | Female | Clear, bright voice |
af_bella | Bella | Female | Elegant, sophisticated voice |
af_sarah | Sarah | Female | Professional, confident voice |
af_nicole | Nicole | Female | Gentle, soothing voice |
am_adam | Adam | Male | Strong, authoritative voice |
am_michael | Michael | Male | Friendly, approachable voice |
bf_emma | Emma | Female | Young, energetic voice |
bf_isabella | Isabella | Female | Mature, expressive voice |
bm_lewis | Lewis | Male | Deep, resonant voice |
š Usage Examples
Basic Synthesis
# Simple text-to-speech
await synthesize_speech(
text="Hello! Welcome to Advanced TTS.",
voice_id="af_heart"
)
Emotional Expression
# Excited announcement
await synthesize_speech(
text="This is amazing news! You're going to love this new feature!",
voice_id="af_heart",
emotion="excited",
pacing="conversational",
speed=1.1
)
Professional Presentation
# Tutorial narration
await synthesize_speech(
text="Step one: Open your browser. Step two: Navigate to the website.",
voice_id="am_adam",
emotion="calm",
pacing="tutorial",
speed=0.9
)
Batch Processing
# Multiple segments with pauses
await batch_synthesize(
segments=[
"Welcome to our presentation.",
"Today we'll cover three main topics.",
"Let's begin with the first topic."
],
voice_id="af_sarah",
emotion="confident",
pacing="presentation",
merge_output=True,
segment_pause=1.0,
save_file=True
)
š ļø Available Tools
synthesize_speech
Convert text to natural speech with full control over voice characteristics.
Parameters:
text- Text to synthesize (max 10,000 chars)voice_id- Voice selection (see table above)speed- Speech rate (0.25-3.0)emotion- Voice emotion (neutral, happy, excited, calm, serious, casual, confident)pacing- Speech style (natural, conversational, presentation, tutorial, narrative, fast, slow)volume- Audio volume (0.1-2.0)output_format- File format (wav, mp3, flac, ogg)save_file- Save to file (boolean)filename- Custom filename
batch_synthesize
Process multiple text segments efficiently with optional merging.
Parameters:
segments- List of text segmentsmerge_output- Combine into single filesegment_pause- Pause between segments (0.0-5.0s)- All synthesis parameters from above
get_voices
Retrieve complete voice information and capabilities.
get_status
Check processing status for synthesis requests.
cancel_request
Cancel active synthesis operations.
list_output_files
Browse saved audio files with metadata.
šļø Voice Controls
Emotions
- Neutral - Standard, professional tone
- Happy - Upbeat, cheerful expression
- Excited - Enthusiastic, energetic delivery
- Calm - Relaxed, soothing tone
- Serious - Formal, authoritative delivery
- Casual - Relaxed, conversational style
- Confident - Assured, professional tone
Pacing Styles
- Natural - Balanced, human-like rhythm
- Conversational - Casual discussion pace
- Presentation - Professional speaking rhythm
- Tutorial - Educational, clear delivery
- Narrative - Storytelling pace
- Fast - Quick delivery (1.2x base speed)
- Slow - Deliberate delivery (0.8x base speed)
šµ Audio Formats
| Format | Quality | Use Case |
|---|---|---|
| WAV | Uncompressed | Highest quality, editing |
| MP3 | Compressed | Web, streaming, sharing |
| FLAC | Lossless | Archival, high-quality storage |
| OGG | Compressed | Open source alternative |
š§ Configuration
Environment Variables
# Model paths (optional)
KOKORO_MODEL_PATH=./kokoro-v1.0.onnx
KOKORO_VOICES_PATH=./voices-v1.0.bin
# Output settings
TTS_OUTPUT_DIR=./audio_output
TTS_MAX_QUEUE_SIZE=100
# Audio settings
TTS_DEFAULT_VOICE=af_heart
TTS_ENABLE_STREAMING=true
Server Configuration
config = ServerConfig(
model_path="./kokoro-v1.0.onnx",
voices_path="./voices-v1.0.bin",
output_dir="./audio_output",
max_queue_size=100,
enable_streaming=True,
default_voice="af_heart"
)
šļø Architecture
āāā src/advanced_tts/
ā āāā __init__.py # Package initialization
ā āāā server.py # MCP server implementation
ā āāā engine.py # Kokoro TTS engine wrapper
ā āāā models.py # Data models and validation
ā āāā utils.py # Utility functions
āāā pyproject.toml # Project configuration
āāā README.md # Documentation
āāā LICENSE # MIT License
š¤ Contributing
Contributions welcome! Areas for improvement:
- Additional voice models
- Real-time streaming synthesis
- Advanced audio effects
- Multi-language support
- Performance optimizations
š License
MIT License - see LICENSE for details.
š Acknowledgments
- Kokoro TTS - High-quality neural voice synthesis
- MCP Protocol - Seamless AI model integration
- FastMCP - Efficient server framework
Developed by Sami Halawa
Transform your text into natural, expressive speech with Advanced TTS MCP Server.
Related Servers
Ghost MCP
Interact with the Ghost blogging platform using the Model Context Protocol (MCP) with Server-Sent Events (SSE) support.
Bluesky
Post to the Bluesky social network using the AT Protocol.
Channel.io
Integrate with the Channel Talk API to let AI assistants access and utilize chat information.
Slack MCP Server
An MCP server for interacting with Slack workspaces using user tokens, without requiring bots or special permissions.
Telegram
Interact with the Telegram API to send and receive messages.
RSS MCP Server by CData
A read-only MCP server for querying live RSS data using the CData JDBC Driver for RSS.
Claude MCP Slack
A GitHub Action that functions as a Slack MCP server, enabling secure image downloads and integrations with Slack.
Gmail MCP Server
An MCP server that enables AI models to interact directly with the Gmail API to manage emails.
Dad Jokes MCP Server
Generates dad jokes with multiple styles and topics, complete with ratings and fun statistics.
Genesys Cloud MCP Server
Exposes Genesys Cloud tools like sentiment analysis, conversation search, and topic detection for LLMs.