supertonic3-mcp Server
Local, on-device TTS for Claude & Cursor. No API key, no cloud. 10 voices, 31 languages, ~820ms on Apple Silicon.
Documentation
supertonic3-mcp
Local, on-device TTS for Claude & Cursor, powered by Supertonic 3. No API key. No cloud. An internal tool open-sourced by Halozen — we build AI compliance intelligence for construction.
Not affiliated with Supertone Inc.
Expose speak, list_voices, and list_expressions to Claude Desktop, Cursor, or any MCP client over STDIO.
Quick start (TTHW < 3 min)
git clone https://github.com/nextic-tech/supertonic3-mcp && cd supertonic3-mcp
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
# Optional: pre-download model for offline use (~400MB)
supertonic3-mcp preload
# Run MCP server (STDIO)
supertonic3-mcp
Cursor MCP config
Add to .cursor/mcp.json (or Cursor Settings → MCP):
{
"mcpServers": {
"supertonic3": {
"command": "/absolute/path/to/supertonic-tts/.venv/bin/supertonic3-mcp",
"args": []
}
}
}
First server start downloads the Supertonic model into ~/.cache/supertonic3/ unless you ran preload first.
Tools
| Tool | Description |
|---|---|
speak | Synthesize text to a WAV file; returns absolute path + metadata |
list_voices | Built-in voices (voice_id, gender) |
list_expressions | Inline tags (<laugh>, <breath>, …) with descriptions |
speak parameters
text— 1–5000 characters; expression tags allowedvoice_id— optional (M1,F1, …)language— ISO 639-1 (en,ko,ja, …). For non-English text, always setlanguage=. Defaults toen.speed—0.7to2.0(SDK range)play— iftrue, plays audio on this machine viaafplay(macOS) oraplay(Linux). Unsupported on Windows.
WAV files are written to /tmp/supertonic_*.wav (macOS/Linux). Windows is not supported for synthesis output paths in v1.0.
Example return:
Audio saved to /tmp/supertonic_abc123.wav (1.4s, voice: M1, lang: en)
Performance (this repo)
Measured on Apple M3, supertonic 1.3.1 — see benchmark/results.md.
| Scenario | Median FSL |
|---|---|
| Warm (model loaded) | ~0.82s |
Cold (new TTS() per call) | ~0.81s |
FSL = time from synthesize() through WAV written (no streaming, no play=True).
Re-run: python benchmark/run.py
Offline use
supertonic3-mcp preload
Downloads ONNX weights atomically to ~/.cache/supertonic3/ and prints SHA256 checksums. After preload, synthesis works without network access.
Development
pip install -e ".[dev]"
pytest
Tests mock the Supertonic SDK (no network in CI).
Coming in v1.1
listen()— Whisper speech-to-text (pip install supertonic3-mcp[stt])- SSE transport + Docker image for remote agents
- PyPI publish workflow
License
MIT (this package). Supertonic SDK is MIT; model weights use OpenRAIL-M.
Disclaimer
AI-generated speech is not a substitute for certified safety, legal, or medical guidance. For demonstration purposes only.