Ringback MCP Server

Enables AI agents to initiate two-way voice calls and send tiered alerts to your phone using self-hosted, free telephony solutions.

Documentation

ringback

License Platform MCP Install with Claude Code

Your AI agent can call your phone β€” and actually talk to you.

ringback gives an LLM (Claude, or any MCP client) tools to reach you on your phone β€” from a one-way "fierce" alert all the way to a live, interruptible voice conversation β€” using only free, self-hosted pieces. No paid telephony. No extra API key for the conversation: the model already driving the MCP is the voice on the line.

Highlights

  • πŸ“ž Two-way voice calls β€” the agent rings your phone, you talk, it transcribes you and replies in speech. Barge-in: talk over it and it stops.
  • πŸ”” Tiered alerts β€” a loud push (ntfy / Pushover) or a real SIP ring + chat message, fired only when the LLM judges it urgent.
  • πŸ†“ Free & self-hosted β€” pjsua2 + whisper.cpp + Piper neural TTS + a free Linphone SIP account. No Twilio, no per-minute fees.
  • 🧠 No conversation API key β€” the calling model is the brain; these tools are just its ears and mouth.

It ships two MCP servers, ringback-alert and ringback-voice:

Platform: macOS, Linux, and Windows (via WSL2 or Docker). TTS is Piper by default (same voice everywhere), falling back to the OS-native voice (say on macOS). The engine is headless β€” it never opens a local mic/speaker (all audio is WAV ↔ SIP/RTP), so no sound card is required. Setup guides: macOS Β· Linux Β· Windows Β· Docker.

ServerToolsWhat it does
ringback-alertalert_me, alert_test, alert_statusFire-and-forget notification: a loud push (ntfy / Pushover) and/or a SIP ring + chat message.
ringback-voicecall_start, converse, get_conversation, call_end, …A real two-way phone conversation. Rings your phone; you talk, it transcribes you, the LLM replies in speech. Supports barge-in (talk over it and it stops).

The criteria for when to contact you live in the tool descriptions β€” the calling LLM decides. These servers are just the mechanism.

What a call looks like

agent β†’ call_start("Your nightly deploy failed β€” want me to walk you through it?")
   πŸ“ž your phone rings; you pick up and hear the line
you   β†’ "yeah, which step broke?"
agent β†’ "The database migration. I can roll it back and retry β€” want that?"
you   β†’ "yes, do it"            ← you can also just talk over the agent to interrupt
agent β†’ call_end()

The LLM calls call_start once, then converse(...) for each turn. Plain alerts are even simpler: one alert_me(...) call.


Let Claude Code install it for you

πŸ€– Easiest path: copy the prompt below and paste it into Claude Code β€” it'll clone, build, configure, and register everything, asking you only for what it needs (a free SIP account + your phone to answer a test call).

Set up the ringback MCP server for me β€” it lets you (the agent) call my phone when you
need a decision while I'm away. Repo: https://github.com/mohitbadwal/ringback
(runs on macOS, Linux, or Windows via WSL2/Docker).

Please:
1. Detect my OS and pick the path:
   - macOS                 β†’ ./setup.sh (Homebrew); read docs/SETUP_MACOS.md
   - Linux or Windows-WSL2 β†’ ./setup-linux.sh; read docs/SETUP_LINUX.md
   - Windows without WSL2   β†’ use Docker; read docs/SETUP_DOCKER.md
2. Clone https://github.com/mohitbadwal/ringback, cd in, and read the README + the doc above.
3. Run the setup for my OS (compiles pjsua2 from source β€” ~20–30 min β€” installs whisper +
   Piper TTS, downloads models, and creates voice.env).
4. I need a free Linphone SIP account (the phone line): walk me through signing up at
   https://subscribe.linphone.org and installing the Linphone app on my phone, then put my
   SIP id/username/password into voice.env (and set VOICE_DISPLAY_NAME to a caller-ID name).
5. Register it (per OS):
   - macOS:      claude mcp add ringback-voice --scope user -- "$PWD/run_voice_mcp.sh"
   - Linux/WSL2: claude mcp add ringback-voice --scope user -- python3 "$PWD/run_voice_mcp.py"
   - Docker:     see docs/SETUP_DOCKER.md (build image, convert creds to voice.docker.env, then register a `docker run -i` command)
6. macOS only: if a test call fails with error -32000 or a segfault, run ./fix_macos_twolevel.sh.
7. Tell me to start a fresh session, then call me to confirm two-way voice works.

Ask me whenever you need input (SIP credentials, my phone to answer the test call, etc.).

Prefer to do it by hand? Quick start and the full walkthrough are below.


Quick start

git clone https://github.com/mohitbadwal/ringback && cd ringback
./setup.sh        # installs EVERYTHING (toolchain, pjsua2, whisper model, deps) + creates voice.env
# edit voice.env β†’ add your 3 SIP vars (free account: https://subscribe.linphone.org), then:
claude mcp add ringback-voice --scope user -- "$PWD/run_voice_mcp.sh"

Full walkthrough + env-var reference: Set up ringback-voice below.


Honest caveats (read first)

  • Cross-platform. macOS (native), Linux (native), Windows (via WSL2 or Docker). The engine is headless β€” no sound card needed. Native Windows (MSVC) is intentionally not supported; WSL2/Docker is the Windows path.
  • Not ChatGPT-realtime. The voice loop is record β†’ whisper STT β†’ LLM β†’ Piper/say TTS, so expect ~1–2 s per turn. It's a reliable walkie-talkie with barge-in, not a streaming realtime voice.
  • The voice feature depends on GPL software (pjproject/pjsua2). This repo is Apache-2.0, but redistributing a bundle that links pjsua2 carries GPL obligations β€” see NOTICE. The ringback-alert server is unaffected.
  • Your machine must be awake and online, and for a voice call a Claude session must be live (it's the brain) for the duration.
  • Barge-in assumes low acoustic echo (handset or headset). On speakerphone, the TTS can echo into the mic and false-trigger "interruption." There's no echo cancellation in this path.
  • iOS push reality: a self-hosted/free push can't truly pierce Focus/Silent on iPhone except via Pushover's Critical Alerts (paid) β€” see ringback-alert notes below.

Architecture

  LLM (Claude)  ──MCP tools──▢  ringback-voice server (Python)
                                   β”‚  call_start / converse / listen / speak
                                   β–Ό
                 pjsua2 (SIP+SRTP, built from source)  ──▢  Linphone SIP server
                   β”‚  Piper/say β†’ ffmpeg β†’ WAV  (speak)        β”‚ APNs VoIP push
                   β”‚  record β†’ whisper.cpp (listen)            β–Ό
                   └───────────────────────────────────▢  your iPhone rings

ringback-alert is simpler: it shells out to ntfy/Pushover HTTP and/or baresip for a SIP ring + chat message.


Prerequisites

  • One of:
  • A free Linphone SIP account (sip.linphone.org) and the Linphone iOS/Android app (for the ring/voice features)
  • Python 3.10+ (the pjsua2 bindings are built against whichever python3 you point at)

Set up ringback-voice β€” 4 steps

1. Clone + install everything:

git clone https://github.com/mohitbadwal/ringback && cd ringback
./setup.sh

setup.sh installs the toolchain, compiles pjsua2 from source (~20–30 min β€” no Homebrew formula exists for the bindings), relinks the pjproject dylibs to a two-level OpenSSL namespace (the macOS fix that makes SIP/SRTP work), downloads the whisper model, installs Piper + a voice, installs deps, and creates voice.env for you. Safe to re-run. (Override PYTHON_BIN / PJPROJECT_DIR / WHISPER_MODEL_NAME if your layout differs.)

On Linux? Use ./setup-linux.sh instead β€” it does the same build with apt/dnf and needs no OpenSSL relink. On Windows? Use WSL2 (docs/SETUP_WINDOWS.md) or Docker. Register the server with the cross-platform launcher python3 run_voice_mcp.py (the .sh is macOS-only).

Hit a snag on macOS? docs/SETUP_MACOS.md is a field-tested root-cause + troubleshooting guide (build target, the OpenSSL flat-namespace fix, whisper model, symptom→fix table).

2. Get a free SIP account (this is the phone that rings):

  • Sign up at https://subscribe.linphone.org (or tap Create account in the Linphone app). You get a username and password; your address is sip:<username>@sip.linphone.org.
  • Install the Linphone app on your iPhone, sign in, and confirm it shows Connected.

3. Fill in voice.env (already created by setup.sh β€” just edit it). Only three vars are required:

export VOICE_SIP_ID="sip:[email protected]"
export VOICE_SIP_USER="yourname"
export VOICE_SIP_PASS="your-password"

Full variable reference:

VariableRequiredDefaultWhat it is
VOICE_SIP_IDβœ…β€”Your SIP address, e.g. sip:[email protected]
VOICE_SIP_USERβœ…β€”SIP username (the part before @)
VOICE_SIP_PASSβœ…β€”Your SIP password
VOICE_SIP_CALLEEβ€”= VOICE_SIP_IDAddress to call (normally yourself)
VOICE_SIP_PROXYβ€”sip:sip.linphone.org;transport=tlsSIP registrar/proxy
WHISPER_MODELβ€”~/.whisper-models/ggml-small.en.binSTT model: base.en (fast) Β· small.en (default) Β· medium.en (accurate)
VOICE_TTSβ€”autoTTS engine: auto (Piper if installed, else OS voice) Β· piper Β· say Β· espeak Β· sapi
VOICE_PIPER_MODELβ€”~/.piper-voices/en_US-lessac-medium.onnxPiper voice (.onnx; needs the matching .onnx.json beside it)
VOICE_TTS_CMDβ€”β€”Custom TTS command template with {text}/{out} (overrides VOICE_TTS)
VOICE_NULL_AUDIOβ€”autoForce pjsua2 null audio device (auto = on except macOS; 1/0 to force)
RINGBACK_PRESENCEβ€”β€”Override watchdog idle/presence: present or absent (for Wayland/headless)
PJPROJECT_DIRβ€”~/build/pjproject-2.17pjsua2 build dir (auto-detected)
PYTHON_BINβ€”$(command -v python3)Python that has pjsua2 (auto-detected)
OPENSSL_PREFIXβ€”$(brew --prefix openssl@3)OpenSSL libs (macOS; auto-detected)

4. Register + test:

# macOS:
claude mcp add ringback-voice --scope user -- "$PWD/run_voice_mcp.sh"
# Linux / Windows-WSL2 (cross-platform launcher):
claude mcp add ringback-voice --scope user -- python3 "$PWD/run_voice_mcp.py"
# Any OS via Docker (convert creds to voice.docker.env first β€” see docs/SETUP_DOCKER.md):
claude mcp add ringback-voice --scope user -- docker run -i --rm --network host --env-file voice.docker.env ringback

Then in a fresh Claude session say: "Use ringback-voice to call me and say hello." Your phone should ring.

Claude Desktop instead of Code? Add this to ~/Library/Application Support/Claude/claude_desktop_config.json (absolute path required; restart the app):

{ "mcpServers": { "ringback-voice": { "command": "/absolute/path/to/ringback/run_voice_mcp.sh" } } }

Set up ringback-alert (optional)

ringback-alert reads its config from the MCP client's env block (no file to source). Register it with the channels you want:

# Claude Code
claude mcp add ringback-alert --scope user \
  --env ALERT_CHANNEL=ntfy \
  --env NTFY_URL=https://ntfy.sh/your-long-random-topic \
  -- /opt/homebrew/bin/uv --directory "$PWD" run server.py

See alert.env.example for all variables (ntfy / Pushover / SIP ring). Use a long random ntfy topic β€” anyone who knows it can read/publish.


Using ringback-voice (the conversation)

The LLM drives a simple loop:

reply = call_start("Hi, it's your assistant β€” your deploy failed. Want details?")
# rings the phone, speaks the line, returns the user's first words
reply = converse("It failed on the database migration step. Want me to retry it?")
# speaks AND listens in one interruptible turn
... repeat converse() each turn ...
call_end()   # when the user says "bye" / hangs up
  • converse(text) speaks while listening. If you talk over it, it stops immediately and tells the LLM how far it got and what you said (barge-in).
  • get_conversation() returns the full transcript so far β€” both sides, plus where it got interrupted.
  • TTS reads text literally, so the tool descriptions instruct the model to speak plain-language summaries, never raw logs/codes β€” those go via alert_me as text instead.

Whisper model accuracy/speed trade-off (set WHISPER_MODEL): base.en (fast/rough) β†’ small.en (balanced, default) β†’ medium.en (most accurate/slow).


Using ringback-alert (notifications)

alert_me(message, severity, title) with severity = info | warn | critical. Channels via ALERT_CHANNEL (comma list of ntfy, pushover, call):

  • ntfy β€” free push; loud but does not pierce iOS Focus/Silent unless whitelisted per-Focus.
  • Pushover β€” $5 one-time; true iOS Critical Alerts (pierces Focus/Silent, repeats until acknowledged) at critical.
  • call β€” free SIP ring + Linphone chat message via baresip; rings full-screen, only at critical by default.

A built-in rate-limit guard (default 5/60s, per-process) stops a misfiring caller from spamming you.


Bundled skill: watchdog

skills/watchdog/ is a ready-to-use Claude skill built on these servers. It watches a condition you give it (a CI run, a deploy, a pod, a file) and escalates only when you're actually away from the laptop β€” chat status β†’ chat warning β†’ ringback-alert push β†’ ringback-voice call β€” judged by input-idle time (macOS, Linux, or Windows; see platform_compat.hid_idle_seconds). It never interrupts you while you're typing, and de-escalates the moment you touch the keyboard.

cp -r skills/watchdog ~/.claude/skills/watchdog   # install for Claude Code

Then: /watchdog <what to watch> | priority=<low|medium|critical> β€” low = chat only, medium = may send a phone alert, critical = may place a call. Full design in skills/watchdog/SKILL.md.


Security

  • SIP credentials live only in your local, gitignored voice.env (and the baresip accounts file for ringback-alert) β€” never in the repo or the MCP client config when you can avoid it.
  • The voice server only ever calls the single SIP URI you configure β€” it cannot dial arbitrary numbers.
  • Treat ntfy topics as secrets (use a long random topic); don't put sensitive detail in alert bodies on public ntfy.sh.
  • See NOTICE for the GPL/pjproject licensing caveat before redistributing.

License

Apache-2.0 (see LICENSE), with an important GPL caveat for the voice component's pjproject dependency β€” see NOTICE.

Credits

Built on pjproject/pjsua2, whisper.cpp, baresip, ntfy, and Linphone.