MCP TTS VOICEVOX
A Text-to-Speech server that integrates with an external VOICEVOX engine.
MCP TTS VOICEVOX
English | 日本語
A text-to-speech MCP server using VOICEVOX
Features
- Advanced playback control - Flexible audio processing with queue management, immediate playback, and synchronous/asynchronous control
- Prefetching - Pre-generates next audio for smooth playback
- Cross-platform support - Works on Windows, macOS, and Linux (including WSL environment audio playback)
- Stdio/HTTP support - Supports Stdio, SSE, and StreamableHttp
- Multiple speaker support - Individual speaker specification per segment
- Automatic text segmentation - Stable audio synthesis through automatic long text segmentation
- Independent client library - Provided as a separate package
@kajidog/voicevox-client
Requirements
- Node.js 18.0.0 or higher
- VOICEVOX Engine or compatible engine
Installation
npm install -g @kajidog/mcp-tts-voicevox
Usage
As MCP Server
1. Start VOICEVOX Engine
Start the VOICEVOX Engine and have it wait on the default port (http://localhost:50021).
2. Start MCP Server
Standard I/O mode (recommended):
npx @kajidog/mcp-tts-voicevox
HTTP server mode:
# Linux/macOS
MCP_HTTP_MODE=true npx @kajidog/mcp-tts-voicevox
# Windows PowerShell
$env:MCP_HTTP_MODE='true'; npx @kajidog/mcp-tts-voicevox
MCP Tools
speak - Text-to-speech
Converts text to speech and plays it.
Parameters:
text: String (multiple texts separated by newlines, speaker specification in "1:text" format)speaker(optional): Speaker IDspeedScale(optional): Playback speedimmediate(optional): Whether to start playback immediately (default: true)waitForStart(optional): Whether to wait for playback to start (default: false)waitForEnd(optional): Whether to wait for playback to end (default: false)
Examples:
// Simple text
{ "text": "Hello\nIt's a nice day today" }
// Speaker specification
{ "text": "Hello", "speaker": 3 }
// Per-segment speaker specification
{ "text": "1:Hello\n3:It's a nice day today" }
// Immediate playback (bypass queue)
{
"text": "Emergency message",
"immediate": true,
"waitForEnd": true
}
// Wait for playback to complete (synchronous processing)
{
"text": "Wait for this audio playback to complete before next processing",
"waitForEnd": true
}
// Add to queue but don't auto-play
{
"text": "Wait for manual playback start",
"immediate": false
}
Advanced Playback Control Features
Immediate Playback (immediate: true)
Play audio immediately by bypassing the queue:
- Parallel operation with regular queue: Does not interfere with existing queue playback
- Multiple simultaneous playback: Multiple immediate playbacks can run simultaneously
- Ideal for urgent notifications: Prioritizes important messages
Synchronous Playback Control (waitForEnd: true)
Wait for playback completion to synchronize processing:
- Sequential processing: Execute next processing after audio playback
- Timing control: Enables coordination between audio and other processing
- UI synchronization: Align screen display with audio timing
// Example 1: Play urgent message immediately and wait for completion
{
"text": "Emergency! Please check immediately",
"immediate": true,
"waitForEnd": true
}
// Example 2: Step-by-step audio guide
{
"text": "Step 1: Please open the file",
"waitForEnd": true
}
// Next processing executes after the above audio completes
Other Tools
generate_query- Generate query for speech synthesissynthesize_file- Generate audio filestop_speaker- Stop playback and clear queueget_speakers- Get speaker listget_speaker_detail- Get speaker details
Package Structure
@kajidog/mcp-tts-voicevox (this package)
- MCP Server - Communicates with MCP clients like Claude Desktop
- HTTP Server - Remote MCP communication via SSE/StreamableHTTP
@kajidog/voicevox-client (independent package)
- General-purpose library - Communication functionality with VOICEVOX Engine
- Cross-platform - Node.js and browser environment support
- Advanced playback control - Immediate playback, synchronous playback, and queue management features
MCP Configuration Examples
Claude Desktop Configuration
Add the following configuration to your claude_desktop_config.json file:
{
"mcpServers": {
"tts-mcp": {
"command": "npx",
"args": ["-y", "@kajidog/mcp-tts-voicevox"]
}
}
}
When SSE Mode is Required
If you need speech synthesis in SSE mode, you can use mcp-remote for SSE↔Stdio conversion:
-
Claude Desktop Configuration
{ "mcpServers": { "tts-mcp-proxy": { "command": "npx", "args": ["-y", "mcp-remote", "http://localhost:3000/sse"] } } } -
Starting SSE Server
Mac/Linux:
MCP_HTTP_MODE=true MCP_HTTP_PORT=3000 npx @kajidog/mcp-tts-voicevoxWindows:
$env:MCP_HTTP_MODE='true'; $env:MCP_HTTP_PORT='3000'; npx @kajidog/mcp-tts-voicevox
## Environment Variables
### VOICEVOX Configuration
- `VOICEVOX_URL`: VOICEVOX Engine URL (default: `http://localhost:50021`)
- `VOICEVOX_DEFAULT_SPEAKER`: Default speaker ID (default: `1`)
- `VOICEVOX_DEFAULT_SPEED_SCALE`: Default playback speed (default: `1.0`)
### Playback Options Configuration
- `VOICEVOX_DEFAULT_IMMEDIATE`: Whether to start playback immediately when added to queue (default: `true`)
- `VOICEVOX_DEFAULT_WAIT_FOR_START`: Whether to wait for playback to start (default: `false`)
- `VOICEVOX_DEFAULT_WAIT_FOR_END`: Whether to wait for playback to end (default: `false`)
**Usage Examples:**
```bash
# Example 1: Wait for completion for all audio playback (synchronous processing)
export VOICEVOX_DEFAULT_WAIT_FOR_END=true
npx @kajidog/mcp-tts-voicevox
# Example 2: Wait for both playback start and end
export VOICEVOX_DEFAULT_WAIT_FOR_START=true
export VOICEVOX_DEFAULT_WAIT_FOR_END=true
npx @kajidog/mcp-tts-voicevox
# Example 3: Manual control (disable auto-play)
export VOICEVOX_DEFAULT_IMMEDIATE=false
npx @kajidog/mcp-tts-voicevox
```
These options allow fine-grained control of audio playback behavior according to application requirements.
### Server Configuration
- `MCP_HTTP_MODE`: Enable HTTP server mode (set to `true` to enable)
- `MCP_HTTP_PORT`: HTTP server port number (default: `3000`)
- `MCP_HTTP_HOST`: HTTP server host (default: `0.0.0.0`)
## Usage with WSL (Windows Subsystem for Linux)
Configuration method for connecting from WSL environment to Windows host MCP server.
### 1. Windows Host Configuration
**Starting MCP server with PowerShell:**
```powershell
$env:MCP_HTTP_MODE='true'; $env:MCP_HTTP_PORT='3000'; npx @kajidog/mcp-tts-voicevox
```
### 2. WSL Environment Configuration
**Check Windows host IP address:**
```bash
# Get Windows host IP address from WSL
ip route show | grep default | awk '{print $3}'
```
Usually in the format `172.x.x.1`.
**Claude Code .mcp.json configuration example:**
```json
{
"mcpServers": {
"tts": {
"type": "sse",
"url": "http://172.29.176.1:3000/sse"
}
}
}
```
**Important Points:**
- Within WSL, `localhost` or `127.0.0.1` refers to WSL internal, so cannot access Windows host services
- Use WSL gateway IP (usually `172.x.x.1`) to access Windows host
- Ensure the port is not blocked by Windows firewall
**Connection Test:**
```bash
# Check connection to Windows host MCP server from WSL
curl http://172.29.176.1:3000
```
If normal, `404 Not Found` will be returned (because root path doesn't exist).
## Troubleshooting
### Common Issues
1. **VOICEVOX Engine is not running**
```bash
curl http://localhost:50021/speakers
```
2. **Audio is not playing**
- Check system audio output device
- Check platform-specific audio playback tools:
- **Linux**: Requires one of `aplay`, `paplay`, `play`, `ffplay`
- **macOS**: `afplay` (pre-installed)
- **Windows**: PowerShell (pre-installed)
3. **Not recognized by MCP client**
- Check package installation: `npm list -g @kajidog/mcp-tts-voicevox`
- Check JSON syntax in configuration file
## License
ISC
[](https://mseep.ai/app/kajidog-mcp-tts-voicevox)
## Developer Information
Instructions for developing this repository locally.
### Setup
1. Clone the repository:
```bash
git clone https://github.com/kajidog/mcp-tts-voicevox.git
cd mcp-tts-voicevox
```
2. Install [pnpm](https://pnpm.io/) (if not already installed).
3. Install dependencies:
```bash
pnpm install
```
### Main Development Commands
You can run the following commands in the project root.
- **Build all packages:**
```bash
pnpm build
```
- **Run all tests:**
```bash
pnpm test
```
- **Run all linters:**
```bash
pnpm lint
```
- **Start root server in development mode:**
```bash
pnpm dev
```
- **Start stdio interface in development mode:**
```bash
pnpm dev:stdio
```
These commands will also properly handle processing for related packages within the workspace.
Related Servers
Human-In-the-Loop MCP Server
Enables AI assistants to interact with humans through GUI dialogs for input, choices, and confirmations.
X (Twitter)
Enhanced MCP server for Twitter/X with OAuth 2.0 support, v2 API media uploads, smart v1.1 fallbacks, and comprehensive rate limiting. Post tweets with text/media, search, and delete tweets programmatically.
Mailtrap
Integrates with Mailtrap Email API.
Reddit MCP Server
Interact with the Reddit API to fetch saved posts, search, get comments, and reply.
Hacker News MCP Server
Access Hacker News stories, comments, and user profiles using the Hacker News API.
Postmark
An MCP server for sending emails via the Postmark service, configured through environment variables.
Outlook MCP Server
Read, search, and manage emails in Microsoft Outlook through a standardized interface.
OpenAI Speech-to-Text
Transcribe audio files using OpenAI's Speech-to-Text API.
SendGrid
Access SendGrid's Marketing API for email marketing and contact management.
MCP Headless Gmail Server
A headless server to get and send emails via the Gmail API, requiring Google API credentials at runtime.