MCP TTS VOICEVOX

English | 日本語

A text-to-speech MCP server using VOICEVOX

Features

Advanced playback control - Flexible audio processing with queue management, immediate playback, and synchronous/asynchronous control
Prefetching - Pre-generates next audio for smooth playback
Cross-platform support - Works on Windows, macOS, and Linux (including WSL environment audio playback)
Stdio/HTTP support - Supports Stdio, SSE, and StreamableHttp
Multiple speaker support - Individual speaker specification per segment
Automatic text segmentation - Stable audio synthesis through automatic long text segmentation
Independent client library - Provided as a separate package @kajidog/voicevox-client

Requirements

Node.js 18.0.0 or higher
VOICEVOX Engine or compatible engine

Installation

npm install -g @kajidog/mcp-tts-voicevox

Usage

As MCP Server

1. Start VOICEVOX Engine

Start the VOICEVOX Engine and have it wait on the default port (http://localhost:50021).

2. Start MCP Server

Standard I/O mode (recommended):

npx @kajidog/mcp-tts-voicevox

HTTP server mode:

# Linux/macOS
MCP_HTTP_MODE=true npx @kajidog/mcp-tts-voicevox

# Windows PowerShell
$env:MCP_HTTP_MODE='true'; npx @kajidog/mcp-tts-voicevox

MCP Tools

`speak` - Text-to-speech

Converts text to speech and plays it.

Parameters:

text: String (multiple texts separated by newlines, speaker specification in "1:text" format)
speaker (optional): Speaker ID
speedScale (optional): Playback speed
immediate (optional): Whether to start playback immediately (default: true)
waitForStart (optional): Whether to wait for playback to start (default: false)
waitForEnd (optional): Whether to wait for playback to end (default: false)

Examples:

// Simple text
{ "text": "Hello\nIt's a nice day today" }

// Speaker specification
{ "text": "Hello", "speaker": 3 }

// Per-segment speaker specification
{ "text": "1:Hello\n3:It's a nice day today" }

// Immediate playback (bypass queue)
{
  "text": "Emergency message",
  "immediate": true,
  "waitForEnd": true
}

// Wait for playback to complete (synchronous processing)
{
  "text": "Wait for this audio playback to complete before next processing",
  "waitForEnd": true
}

// Add to queue but don't auto-play
{
  "text": "Wait for manual playback start",
  "immediate": false
}

Advanced Playback Control Features

Immediate Playback (`immediate: true`)

Play audio immediately by bypassing the queue:

Parallel operation with regular queue: Does not interfere with existing queue playback
Multiple simultaneous playback: Multiple immediate playbacks can run simultaneously
Ideal for urgent notifications: Prioritizes important messages

Synchronous Playback Control (`waitForEnd: true`)

Wait for playback completion to synchronize processing:

Sequential processing: Execute next processing after audio playback
Timing control: Enables coordination between audio and other processing
UI synchronization: Align screen display with audio timing

// Example 1: Play urgent message immediately and wait for completion
{
  "text": "Emergency! Please check immediately",
  "immediate": true,
  "waitForEnd": true
}

// Example 2: Step-by-step audio guide
{
  "text": "Step 1: Please open the file",
  "waitForEnd": true
}
// Next processing executes after the above audio completes

Other Tools

generate_query - Generate query for speech synthesis
synthesize_file - Generate audio file
stop_speaker - Stop playback and clear queue
get_speakers - Get speaker list
get_speaker_detail - Get speaker details

Package Structure

@kajidog/mcp-tts-voicevox (this package)

MCP Server - Communicates with MCP clients like Claude Desktop
HTTP Server - Remote MCP communication via SSE/StreamableHTTP

@kajidog/voicevox-client (independent package)

General-purpose library - Communication functionality with VOICEVOX Engine
Cross-platform - Node.js and browser environment support
Advanced playback control - Immediate playback, synchronous playback, and queue management features

MCP Configuration Examples

Claude Desktop Configuration

Add the following configuration to your claude_desktop_config.json file:

{
  "mcpServers": {
    "tts-mcp": {
      "command": "npx",
      "args": ["-y", "@kajidog/mcp-tts-voicevox"]
    }
  }
}

When SSE Mode is Required

If you need speech synthesis in SSE mode, you can use mcp-remote for SSE↔Stdio conversion:

Claude Desktop Configuration

{
  "mcpServers": {
    "tts-mcp-proxy": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "http://localhost:3000/sse"]
    }
  }
}

Starting SSE Server

Mac/Linux:

MCP_HTTP_MODE=true MCP_HTTP_PORT=3000 npx @kajidog/mcp-tts-voicevox

Windows:

$env:MCP_HTTP_MODE='true'; $env:MCP_HTTP_PORT='3000'; npx @kajidog/mcp-tts-voicevox


### AivisSpeech Configuration Example

```json
{
  "mcpServers": {
    "tts-mcp": {
      "command": "npx",
      "args": ["-y", "@kajidog/mcp-tts-voicevox"],
      "env": {
        "VOICEVOX_URL": "http://127.0.0.1:10101",
        "VOICEVOX_DEFAULT_SPEAKER": "888753764"
      }
    }
  }
}

Environment Variables

VOICEVOX Configuration

VOICEVOX_URL: VOICEVOX Engine URL (default: http://localhost:50021)
VOICEVOX_DEFAULT_SPEAKER: Default speaker ID (default: 1)
VOICEVOX_DEFAULT_SPEED_SCALE: Default playback speed (default: 1.0)

Playback Options Configuration

VOICEVOX_DEFAULT_IMMEDIATE: Whether to start playback immediately when added to queue (default: true)
VOICEVOX_DEFAULT_WAIT_FOR_START: Whether to wait for playback to start (default: false)
VOICEVOX_DEFAULT_WAIT_FOR_END: Whether to wait for playback to end (default: false)

Usage Examples:

# Example 1: Wait for completion for all audio playback (synchronous processing)
export VOICEVOX_DEFAULT_WAIT_FOR_END=true
npx @kajidog/mcp-tts-voicevox

# Example 2: Wait for both playback start and end
export VOICEVOX_DEFAULT_WAIT_FOR_START=true
export VOICEVOX_DEFAULT_WAIT_FOR_END=true
npx @kajidog/mcp-tts-voicevox

# Example 3: Manual control (disable auto-play)
export VOICEVOX_DEFAULT_IMMEDIATE=false
npx @kajidog/mcp-tts-voicevox

These options allow fine-grained control of audio playback behavior according to application requirements.

Server Configuration

MCP_HTTP_MODE: Enable HTTP server mode (set to true to enable)
MCP_HTTP_PORT: HTTP server port number (default: 3000)
MCP_HTTP_HOST: HTTP server host (default: 0.0.0.0)

Usage with WSL (Windows Subsystem for Linux)

Configuration method for connecting from WSL environment to Windows host MCP server.

1. Windows Host Configuration

Starting MCP server with AivisSpeech and PowerShell:

$env:MCP_HTTP_MODE='true'; $env:MCP_HTTP_PORT='3000'; $env:VOICEVOX_URL='http://127.0.0.1:10101'; $env:VOICEVOX_DEFAULT_SPEAKER='888753764'; npx @kajidog/mcp-tts-voicevox

2. WSL Environment Configuration

Check Windows host IP address:

# Get Windows host IP address from WSL
ip route show | grep default | awk '{print $3}'

Usually in the format 172.x.x.1.

Claude Code .mcp.json configuration example:

{
  "mcpServers": {
    "tts": {
      "type": "sse",
      "url": "http://172.29.176.1:3000/sse"
    }
  }
}

Important Points:

Within WSL, localhost or 127.0.0.1 refers to WSL internal, so cannot access Windows host services
Use WSL gateway IP (usually 172.x.x.1) to access Windows host
Ensure the port is not blocked by Windows firewall

Connection Test:

# Check connection to Windows host MCP server from WSL
curl http://172.29.176.1:3000

If normal, 404 Not Found will be returned (because root path doesn't exist).

Troubleshooting

Common Issues

VOICEVOX Engine is not running
```
curl http://localhost:50021/speakers
```
Audio is not playing
- Check system audio output device
- Check platform-specific audio playback tools:
  - Linux: Requires one of aplay, paplay, play, ffplay
  - macOS: afplay (pre-installed)
  - Windows: PowerShell (pre-installed)
Not recognized by MCP client
- Check package installation: npm list -g @kajidog/mcp-tts-voicevox
- Check JSON syntax in configuration file

License

ISC

Developer Information

Instructions for developing this repository locally.

Setup

Clone the repository:

git clone https://github.com/kajidog/mcp-tts-voicevox.git
cd mcp-tts-voicevox

Install pnpm (if not already installed).
Install dependencies:
```
pnpm install
```

Main Development Commands

You can run the following commands in the project root.

Build all packages:
```
pnpm build
```
Run all tests:
```
pnpm test
```
Run all linters:
```
pnpm lint
```
Start root server in development mode:
```
pnpm dev
```
Start stdio interface in development mode:
```
pnpm dev:stdio
```

These commands will also properly handle processing for related packages within the workspace.

MCP TTS VOICEVOX

MCP TTS VOICEVOX

Features

Requirements

Installation

Usage

As MCP Server

1. Start VOICEVOX Engine

2. Start MCP Server

MCP Tools

speak - Text-to-speech

Advanced Playback Control Features

Immediate Playback (immediate: true)

Synchronous Playback Control (waitForEnd: true)

Other Tools

Package Structure

@kajidog/mcp-tts-voicevox (this package)

@kajidog/voicevox-client (independent package)

MCP Configuration Examples

Claude Desktop Configuration

When SSE Mode is Required

Environment Variables

VOICEVOX Configuration

Playback Options Configuration

Server Configuration

Usage with WSL (Windows Subsystem for Linux)

1. Windows Host Configuration

2. WSL Environment Configuration

Troubleshooting

Common Issues

License

Developer Information

Setup

Main Development Commands

Related Servers

Pushinator MCP

Wizzypedia MCP Server

Bluesky

mcp2mqtt

SendGrid MCP Server by CData

Pearl

Freshdesk MCP Server

Integration App

Gmail

Mailtrap

`speak` - Text-to-speech

Immediate Playback (`immediate: true`)

Synchronous Playback Control (`waitForEnd: true`)