Text-to-Speech (TTS)

A Text-to-Speech server supporting multiple backends like macOS say, ElevenLabs, Google Gemini, and OpenAI TTS.

GitHub

mcp-tts

MCP Server for TTS (Text-to-Speech)

What? 🤔

Adds Text-to-Speech to things like Claude Desktop and Cursor IDE.

It registers four TTS tools:

say_tts
elevenlabs_tts
google_tts
openai_tts

`say_tts`

Uses the macOS say binary to speak the text with built-in system voices

`elevenlabs_tts`

Uses the ElevenLabs text-to-speech API to speak the text with premium AI voices

`google_tts`

Uses Google's Gemini TTS models to speak the text with 30 high-quality voices. Available voices include:

Achernar, Achird, Algenib, Algieba, Alnilam, Aoede, Autonoe, Callirrhoe, Charon, Despina, Enceladus, Erinome, Fenrir, Gacrux, Iapetus, Kore, Laomedeia, Leda, Orus, Puck, Pulcherrima, Rasalgethi, Sadachbia, Sadaltager, Schedar, Sulafat, Umbriel, Vindemiatrix, Zephyr, Zubenelgenubi

`openai_tts`

Uses OpenAI's Text-to-Speech API to speak the text with 10 natural-sounding voices:

alloy (Warm, conversational, modern)
ash (Confident, assertive, slightly textured)
ballad (Gentle, melodious, slightly lyrical)
coral (Cheerful, fresh, upbeat)
echo (Neutral, calm, balanced)
fable (Storyteller-like, expressive)
nova (Clear, precise, slightly formal)
onyx (Deep, authoritative, resonant)
sage (Soothing, empathetic, reassuring)
shimmer (Bright, animated, playful)
verse (Versatile, expressive)

Supports three quality models:

gpt-4o-mini-tts - Default, optimized quality and speed
tts-1 - Standard quality, faster generation
tts-1-hd - High definition audio, premium quality

Additional features:

Speed control from 0.25x to 4.0x (default: 1.0x)
Custom voice instructions (e.g., "Speak in a cheerful and positive tone") via parameter or OPENAI_TTS_INSTRUCTIONS environment variable

Configuration

Sequential vs Concurrent TTS

By default, the TTS server enforces sequential speech operations - only one TTS request can play audio at a time. This prevents multiple agents from speaking simultaneously and creating an unintelligible cacophony. Subsequent requests will wait in a queue until the current speech completes.

Multi-Instance Protection: The mutex works both within a single MCP server process and across multiple Claude Desktop instances. When running multiple Claude Desktop terminals, they coordinate via a system-wide file lock to prevent overlapping speech.

To allow concurrent TTS operations (multiple speeches playing simultaneously):

Environment Variable:

export MCP_TTS_ALLOW_CONCURRENT=true

Command Line Flag:

mcp-tts --sequential-tts=false

Note: Concurrent TTS may result in overlapping audio that's difficult to understand. Use this option only when you explicitly want multiple TTS operations to run simultaneously.

Suppressing "Speaking:" Output

By default, TTS tools return a message like "Speaking: [text]" when speech completes. This can interfere with LLM responses. To suppress this output:

Environment Variable:

export MCP_TTS_SUPPRESS_SPEAKING_OUTPUT=true

Command Line Flag:

mcp-tts --suppress-speaking-output

When enabled, tools return "Speech completed" instead of echoing the spoken text.

Saving Audio to Disk

Save TTS audio output to files instead of (or in addition to) playing them:

Environment Variables:

export MCP_TTS_OUTPUT_DIR=/path/to/audio    # Save audio files to this directory
export MCP_TTS_NO_PLAY=true                  # Skip playback, only save (optional)

Command Line Flags:

mcp-tts --output-dir /path/to/audio          # Save and play
mcp-tts --output-dir /path/to/audio --no-play  # Save only, no playback

Files are saved with unique names: tts_{timestamp}_{hash}.{ext}

Provider	Format
macOS say	AIFF
ElevenLabs	MP3
Google TTS	WAV
OpenAI TTS	MP3

Getting Started

Install

go install github.com/blacktop/mcp-tts@latest

❱ mcp-tts --help

TTS (text-to-speech) MCP Server.

Provides multiple text-to-speech services via MCP protocol:

• say_tts - Uses macOS built-in 'say' command (macOS only)
• elevenlabs_tts - Uses ElevenLabs API for high-quality speech synthesis
• google_tts - Uses Google's Gemini TTS models for natural speech
• openai_tts - Uses OpenAI's TTS API with various voice options

Each tool supports different voices, rates, and configuration options.
Requires appropriate API keys for cloud-based services.

Designed to be used with the MCP (Model Context Protocol).

Usage:
  mcp-tts [flags]

Flags:
  -h, --help                       help for mcp-tts
      --no-play                    Skip playback, only save (requires --output-dir)
      --output-dir string          Save audio files to directory (env: MCP_TTS_OUTPUT_DIR)
      --sequential-tts             Enforce sequential TTS (prevent concurrent speech) (default true)
      --suppress-speaking-output   Suppress 'Speaking:' text output
  -v, --verbose                    Enable verbose debug logging

Configuration

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "say": {
      "command": "mcp-tts",
      "env": {
        "ELEVENLABS_API_KEY": "********",
        "ELEVENLABS_VOICE_ID": "1SM7GgM6IMuvQlz2BwM3",
        "GOOGLE_AI_API_KEY": "********",
        "OPENAI_API_KEY": "********",
        "OPENAI_TTS_INSTRUCTIONS": "Speak in a cheerful and positive tone",
        "MCP_TTS_SUPPRESS_SPEAKING_OUTPUT": "true",
        "MCP_TTS_ALLOW_CONCURRENT": "false"
      }
    }
  }
}

Claude Code

claude mcp add say \
  -e GOOGLE_AI_API_KEY=your_key \
  -e ELEVENLABS_API_KEY=your_key \
  -e OPENAI_API_KEY=your_key \
  -- mcp-tts

Codex CLI

codex mcp add say \
  --env GOOGLE_AI_API_KEY=your_key \
  --env ELEVENLABS_API_KEY=your_key \
  --env OPENAI_API_KEY=your_key \
  -- mcp-tts

Gemini CLI

gemini mcp add say mcp-tts \
  -e GOOGLE_AI_API_KEY=your_key \
  -e ELEVENLABS_API_KEY=your_key \
  -e OPENAI_API_KEY=your_key

Or manually add to ~/.gemini/settings.json (or .gemini/settings.json in project root):

{
  "mcpServers": {
    "say": {
      "command": ["mcp-tts"],
      "env": {
        "GOOGLE_AI_API_KEY": "..."
      }
    }
  }
}

Environment Variables

ELEVENLABS_API_KEY: Your ElevenLabs API key (required for elevenlabs_tts)
ELEVENLABS_VOICE_ID: ElevenLabs voice ID (optional, defaults to a built-in voice)
GOOGLE_AI_API_KEY or GEMINI_API_KEY: Your Google AI API key (required for google_tts)
OPENAI_API_KEY: Your OpenAI API key (required for openai_tts)
OPENAI_TTS_INSTRUCTIONS: Custom voice instructions for OpenAI TTS (optional, e.g., "Speak in a cheerful and positive tone")
MCP_TTS_SUPPRESS_SPEAKING_OUTPUT: Set to "true" to suppress "Speaking:" output (optional)
MCP_TTS_ALLOW_CONCURRENT: Set to "true" to allow concurrent TTS operations (optional, defaults to sequential)
MCP_TTS_OUTPUT_DIR: Directory to save audio files (optional)
MCP_TTS_NO_PLAY: Set to "true" to skip playback when saving (optional, requires MCP_TTS_OUTPUT_DIR)

Test

Test macOS TTS

❱ cat test/say.json | go run main.go --verbose

2025/03/23 22:41:49 INFO Starting MCP server name="Say TTS Service" version=1.0.0
2025/03/23 22:41:49 DEBU Say tool called request="{Request:{Method:tools/call Params:{Meta:<nil>}} Params:{Name:say_tts Arguments:map[text:Hello, world!] Meta:<nil>}}"
2025/03/23 22:41:49 DEBU Executing say command args="[--rate 200 Hello, world!]"
2025/03/23 22:41:49 INFO Speaking text text="Hello, world!"

{"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"Speaking: Hello, world!"}]}}

Test Google TTS

❱ cat test/google_tts.json | go run main.go --verbose

2025/05/23 18:26:45 INFO Starting MCP server name="Say TTS Service" version=""
2025/05/23 18:26:45 DEBU Google TTS tool called request="{...}"
2025/05/23 18:26:45 DEBU Generating TTS audio model=gemini-3.1-flash-tts-preview voice=Kore text="Hello! This is a test of Google's TTS API. How does it sound?"
2025/05/23 18:26:49 INFO Playing TTS audio via beep speaker bytes=181006
2025/05/23 18:26:53 INFO Speaking via Google TTS text="Hello! This is a test of Google's TTS API. How does it sound?" voice=Kore

{"jsonrpc":"2.0","id":4,"result":{"content":[{"type":"text","text":"Speaking: Hello! This is a test of Google's TTS API. How does it sound? (via Google TTS with voice Kore)"}]}}

Test OpenAI TTS

❱ cat test/openai_tts.json | go run main.go --verbose

2025/05/23 19:15:32 INFO Starting MCP server name="Say TTS Service" version=""
2025/05/23 19:15:32 DEBU OpenAI TTS tool called request="{...}"
2025/05/23 19:15:32 DEBU Generating OpenAI TTS audio model=tts-1 voice=nova speed=1.2 text="Hello! This is a test of OpenAI's text-to-speech API. I'm using the nova voice at 1.2x speed."
2025/05/23 19:15:34 DEBU Decoding MP3 stream from OpenAI
2025/05/23 19:15:34 DEBU Initializing speaker for OpenAI TTS sampleRate=22050
2025/05/23 19:15:36 INFO Speaking text via OpenAI TTS text="Hello! This is a test of OpenAI's text-to-speech API. I'm using the nova voice at 1.2x speed." voice=nova model=tts-1 speed=1.2

{"jsonrpc":"2.0","id":5,"result":{"content":[{"type":"text","text":"Speaking: Hello! This is a test of OpenAI's text-to-speech API. I'm using the nova voice at 1.2x speed. (via OpenAI TTS with voice nova)"}]}}

Test the mutex behavior with multiple TTS requests

# Sequential mode (default) - speeches play one after another
cat test/sequential.json | go run main.go --verbose

# Concurrent mode - allows overlapping speech  
cat test/sequential.json | go run main.go --verbose --sequential-tts=false

Skill: `speak`

This repo includes a speak skill that automatically announces plans, issues, and summaries aloud using TTS. Each project gets a unique voice so you can identify which project is speaking from another room.

Skills follow the Agent Skills open standard and work across Claude Code, Codex CLI, and Gemini CLI.

Install Skill

skills.sh

npx skills add https://github.com/blacktop/mcp-tts --skill speak

Claude Code

Via Plugin Marketplace (recommended):

claude plugin marketplace add blacktop/mcp-tts
claude plugin install speak@mcp-tts

Or manually:

mkdir -p ~/.claude/skills
git clone https://github.com/blacktop/mcp-tts.git /tmp/mcp-tts
cp -r /tmp/mcp-tts/skill ~/.claude/skills/speak

The skill is now available. Claude will use it automatically when relevant, or invoke directly with /speak.

Codex CLI

Using the skill-installer (within a Codex session):

$skill-installer install the speak skill from https://github.com/blacktop/mcp-tts --path skill

Or manually:

mkdir -p ~/.codex/skills
git clone https://github.com/blacktop/mcp-tts.git /tmp/mcp-tts
cp -r /tmp/mcp-tts/skill ~/.codex/skills/speak

Restart Codex after installing.

Gemini CLI

Gemini CLI uses extensions to bundle skills. Install this repo as an extension:

gemini extensions install https://github.com/blacktop/mcp-tts.git

This installs the speak skill.

Or manually (skill only):

mkdir -p ~/.gemini/skills
git clone https://github.com/blacktop/mcp-tts.git /tmp/mcp-tts
cp -r /tmp/mcp-tts/skill ~/.gemini/skills/speak

Note: Gemini CLI skills are experimental. Enable via /settings → search "Skills" → toggle on.

Shared Skills Directory (Optional)

To maintain one copy across all agents, run the install script:

git clone https://github.com/blacktop/mcp-tts.git
cd mcp-tts
./install-skill.sh

This copies the skill to ~/.agents/skills/speak and creates symlinks for Claude Code, Codex CLI, and Gemini CLI.

Verify Installation

Agent	Command
Claude Code	Ask "What skills are available?" or type `/speak`
Codex CLI	Skills load automatically on restart
Gemini CLI	`gemini extensions list` or check `/settings` for skills

How It Works

The skill triggers automatically after:

Planning complete - When a plan/todo list is finalized
Issue resolved - When a bug fix or error is resolved
Summary generated - When completing a major task

Providers fallback in order: google → openai → elevenlabs → say (macOS). If a provider fails due to missing API keys, it's marked unavailable and skipped in future attempts.

License

เซิร์ฟเวอร์ที่เกี่ยวข้อง

Poke-MCP

Fetches Pokémon data from the PokeAPI and exposes it through a standardized MCP interface.

Brokerage-MCP

An MCP server for brokerage functionalities, built with the MCP framework.

Amazon Ads MCP Server

Connect Amazon Ads to Claude or ChatGPT via Two Minute Reports MCP and get accurate insights on sponsored products, ACOS, keywords, and budget.

AGA MCP Server

Cryptographic runtime governance for AI agents. 20 tools. Sealed policy artifacts, continuous measurement, tamper-evident proof. Ed25519 + SHA-256.

LeadEnrich MCP

Waterfall lead enrichment for AI agents — cascades Apollo, Clearbit, and Hunter for maximum data coverage.

Ultra MCP SS

An MCP server for programmatic control of smartscreen.tv displays via HTTP and MCP commands, with YouTube integration.

Zo

Zo is your personal vibe server in the cloud with 50+ tools and integrations. Add texting, email, calendar, research and more to your harness easily.

Sysmetrics

Give your self-hosted agents 'situational awareness.' This MCP server provides a direct interface for agents to query Linux system telemetry, enabling autonomous resource monitoring, proactive alerting, and interactive troubleshooting via any MCP-compatible client.

Easy MCP AI for Wordpress

Free Complete Wordpress MCP. 100% Plugin, No external install needed.

AgentBroker MCP

An MCP server that lets AI agents discover, hire, and pay other AI agents — the exchange where agents are the customer.

Text-to-Speech (TTS)

mcp-tts

MCP Server for TTS (Text-to-Speech)

What? 🤔

say_tts

elevenlabs_tts

google_tts

openai_tts

Configuration

Sequential vs Concurrent TTS

Suppressing "Speaking:" Output

Saving Audio to Disk

Getting Started

Install

Configuration

Claude Desktop

Claude Code

Codex CLI

Gemini CLI

Environment Variables

Test

Test macOS TTS

Test Google TTS

Test OpenAI TTS

Test the mutex behavior with multiple TTS requests

Skill: speak

Install Skill

skills.sh

Claude Code

Codex CLI

Gemini CLI

Shared Skills Directory (Optional)

Verify Installation

How It Works

License

เซิร์ฟเวอร์ที่เกี่ยวข้อง

Poke-MCP

Brokerage-MCP

Amazon Ads MCP Server

AGA MCP Server

LeadEnrich MCP

Ultra MCP SS

Zo

Sysmetrics

Easy MCP AI for Wordpress

AgentBroker MCP

NotebookLM Web Importer

`say_tts`

`elevenlabs_tts`

`google_tts`

`openai_tts`

Skill: `speak`