reachy-mini-mcp
Control the Reachy Mini robot (or simulator) from Claude, ChatGPT, or any MCP-compatible client.
Reachy Mini MCP
Control the Reachy Mini robot (or simulator) from Claude, ChatGPT, or any MCP-compatible client.
Dry run (video)
A short "dry run" of the reachy_debug.py sequential demo runner (simulator): step announcements, movements, vision, and artifacts.
Watch the dry run video on GitHub Pages
How it works
AI Assistant --stdio--> MCP Server (reachy.py) --> ReachyMini SDK --> Robot / Simulator
The server exposes 16 tools, 4 prompts, and 4 resources via the Model Context Protocol. An AI assistant calls these tools to see through the robot's camera, move the robot, express emotions, play sounds, or detect audio direction -- no robotics knowledge needed on the AI side.
Installation
Prerequisites
- Python 3.13+
- Reachy Mini robot or the Reachy Mini simulator
- uv (recommended) or pip
Install the server
git clone https://github.com/ArturSkowronski/reachy-mini-mcp.git
cd reachy-mini-mcp
uv sync
Or with pip:
pip install -e .
Configure your MCP client
Add the server to your MCP client configuration. The exact location depends on the client:
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"reachy-mini": {
"command": "uv",
"args": ["--directory", "/path/to/reachy-mini-mcp", "run", "reachy.py"]
}
}
}
Claude Code (.mcp.json in your project root):
{
"mcpServers": {
"reachy-mini": {
"command": "uv",
"args": ["--directory", "/path/to/reachy-mini-mcp", "run", "reachy.py"]
}
}
}
Generic stdio transport:
uv run reachy.py
ElevenLabs TTS (optional)
To enable the speak_text tool, set these environment variables:
export ELEVENLABS_API_KEY="your-api-key"
export ELEVENLABS_VOICE_ID="your-voice-id"
#
# Optional override prefix (takes precedence):
# export REACHY_ELEVENLABS_API_KEY="your-api-key"
# export REACHY_ELEVENLABS_VOICE_ID="your-voice-id"
Default voice (premade/free-tier friendly): George with Voice ID JBFqnCBsd6RMkjVDRZzb.
Favorite voice (author preference): Horatius with Voice ID qXpMhyvQqiRxWQs4qSSB.
Optional overrides: ELEVENLABS_MODEL_ID (default: eleven_multilingual_v2), ELEVENLABS_OUTPUT_FORMAT (default: mp3_44100_128).
WAV support: if your ElevenLabs plan allows it, you can set ELEVENLABS_OUTPUT_FORMAT=wav_44100 to get WAV output instead of MP3.
MP3 vs WAV playback note
- Default TTS output is MP3 (
mp3_44100_128) because it works on lower ElevenLabs tiers. - Some Reachy audio backends/environments may not have MP3 decoding available. In that case, MP3 playback can fail even though WAV works.
- If you need ElevenLabs to return WAV directly (
wav_44100), ElevenLabs requires a higher tier (minimum Pro). - To force WAV output (when available), override the output format via environment variables:
REACHY_ELEVENLABS_OUTPUT_FORMAT=wav_44100(preferred, takes precedence)- or
ELEVENLABS_OUTPUT_FORMAT=wav_44100
Environment variables
General:
NO_COLOR: disable ANSI colors inreachy_debug.pyoutput.
Debug runner (reachy_debug.py):
REACHY_DEBUG_ANNOUNCE_PAUSE_S(default:0.6): pause after each announcement before running the step.REACHY_DEBUG_TTS_SPEED(default:0.8): ElevenLabs speech speed.
ElevenLabs (used by speak_text and reachy_debug.py announcements):
REACHY_ELEVENLABS_API_KEYorELEVENLABS_API_KEY(required for TTS): API key.REACHY_prefixed value takes precedence.REACHY_ELEVENLABS_VOICE_IDorELEVENLABS_VOICE_ID(optional): voice id. Defaults toJBFqnCBsd6RMkjVDRZzb(George) if not set.REACHY_ELEVENLABS_MODEL_IDorELEVENLABS_MODEL_ID(optional): model id (default:eleven_multilingual_v2).REACHY_ELEVENLABS_OUTPUT_FORMATorELEVENLABS_OUTPUT_FORMAT(optional): output format (default:mp3_44100_128, optionallywav_44100if your plan allows it).
Available tools
| Tool | Description |
|---|---|
capture_image | Capture a JPEG frame from the robot's HD camera |
scan_surroundings | Pan camera across multiple angles and return a panoramic set of images |
track_face | Detect a face via OpenCV and turn head to face it |
move_head | 6-DOF head positioning (x/y/z in mm, roll/pitch/yaw in degrees) |
move_antennas | Independent antenna control (-3.14 to 3.14 radians) |
look_at_point | Orient head toward a 3D point in world coordinates |
express_emotion | Emoji-driven emotion system with synchronized movements and sounds |
play_sound | Play built-in sounds (wake_up, go_sleep, confused1, impatient1, dance1, count) |
speak_text | Text-to-speech via ElevenLabs API, played through robot speaker |
detect_sound_direction | Microphone array direction-of-arrival + speech detection |
wake_up | Built-in greeting animation with sound |
go_to_sleep | Built-in farewell animation with sound |
nod | Nod head up and down to indicate "yes" or agreement |
shake_head | Shake head left and right to indicate "no" or disagreement |
reset_position | Return head and antennas to neutral rest pose |
do_barrel_roll | Choreographed head tilt + antenna wiggle sequence |
Resources
The server also exposes MCP resources that let AI assistants discover robot capabilities dynamically:
| Resource URI | Description |
|---|---|
reachy://emotions | Supported emoji-to-emotion mappings |
reachy://sounds | Available built-in sound names |
reachy://limits | Physical limits (antenna range, head DOF, camera specs) |
reachy://capabilities | All tools grouped by category (vision, movement, expression, audio, lifecycle) |
Vision
capture_image grabs a frame from Reachy Mini's wide-angle HD camera and returns it as inline JPEG content through the MCP protocol. The AI assistant receives the image directly in the conversation -- no file paths, no URLs, no extra setup.
scan_surroundings takes this further by panning the camera across multiple angles and returning all frames in a single response:
User: "Look around and describe the room"
Claude calls scan_surroundings(steps=5, yaw_range=120)
<- Robot pans from -60deg to +60deg in 5 steps
<- MCP returns 5 labeled JPEG frames + summary text
Claude: "Starting from the left I can see a window with blinds,
then a whiteboard, your desk with two monitors in the
center, a bookshelf to the right, and a door at the
far right."
The camera returns a standard BGR numpy frame from OpenCV, which gets JPEG-compressed and delivered as MCP ImageContent. Any multimodal AI model that supports image inputs can process it -- Claude, GPT-4o, Gemini, etc.
Tool annotations
Every tool carries semantic annotations that tell AI clients how to use it safely:
| Annotation | Meaning | Tools |
|---|---|---|
readOnlyHint | Doesn't change robot state | capture_image, detect_sound_direction |
idempotentHint=true | Safe to call repeatedly | capture_image, detect_sound_direction |
idempotentHint=false | Repeated calls repeat actions/costs | All movement/gesture/audio tools, including speak_text |
destructiveHint=false | No irreversible actions | All tools |
openWorldHint | Calls external services | speak_text (ElevenLabs API) |
Prompts
Pre-built prompt templates that guide AI assistants through common robot interaction scenarios:
| Prompt | Description |
|---|---|
greet_user | Wake up the robot, express happiness, and greet by name |
explore_room | Scan surroundings with the camera and describe the environment |
react_to_conversation | Use gestures and emotions to physically react during chat |
find_person | Use camera and face tracking to locate and follow a person |
Emotion system
express_emotion maps emoji characters to choreographed movements:
| Emoji | Emotion | Behavior |
|---|---|---|
😊 | happy | antennas up, cheerful pose, dance sound |
😕 | confused | head tilt, asymmetric antennas, confused sound |
😤 | impatient | rapid antenna movements, impatient sound |
😴 | sleepy | sleep pose with sound |
👋 | greeting | wake up animation |
🤔 | thinking | contemplative head tilt, antennas spread |
😮 | surprised | antennas high, head back |
😢 | sad | antennas and head down |
🎉 | celebrate | energetic wiggle, dance sound |
😐 | neutral | return to rest position |
Development
# Install with dev dependencies
uv sync --extra dev
# Run all tests
uv run pytest -v
# Unit tests only
uv run pytest -v -m "not integration"
# Integration tests only (MCP protocol layer)
uv run pytest -v -m integration
# Lint
uv run ruff check . && uv run ruff format --check .
# Set up pre-commit hooks
pre-commit install
Direct robot testing
For a one-click, full sequential debug demo (movement, gestures, audio, vision, tracking) with per-step status checks:
uv sync --extra reachy
# If you want to auto-spawn the simulator daemon, also install:
uv sync --extra reachy-sim
uv run python reachy_debug.py
reachy_debug.py now:
- Announces each upcoming test step (voice via ElevenLabs if configured, otherwise console fallback).
- Executes a full demo suite in sequence.
- Saves all captured images and a markdown run summary to
results/run-YYYYMMDD-HHMMSS/. - Generates a single report file for the run:
run_report.md.
GitHub Pages
This repo includes a static GitHub Pages site under docs/ (with the dry-run video embedded).
To publish: GitHub repo Settings -> Pages -> "Build and deployment" -> Source: "Deploy from a branch" -> Branch: main -> Folder: /docs.
Related Servers
mcp-server-inject-bender
Security through absurdity: transforms SQL injection and XSS attempts into harmless comedy responses using AI-powered humor defense.
Data Wallets MCP
It connects Agents to data wallet with DID and verifiable credentials
ADM1 MCP Server
Control anaerobic digestion modeling (ADM1) using natural language.
stella-mcp
MCP server for creating and manipulating Stella system dynamics models (.stmx files in XMILE format)
ImageMagick
An MCP server for image conversion using ImageMagick and darktable.
TI Mindmap HUB — MCP Server
TI Mindmap HUB MCP Server provides AI assistants with direct access to curated threat intelligence — reports, CVEs, IOCs, STIX bundles, and weekly briefings — through the Model Context Protocol.
O'RLY MCP
Generates O'RLY? (O'Reilly parody) book covers.
Mind Reasoner MCP Server
Mind Reasoner's MCP Server
BlazeMeter MCP Server
MCP Server for AI-driven BlazeMeter performance testing
Sound Effects MCP
Plays various sound effects for different situations.