reachy-mini-mcp
Control the Reachy Mini robot (or simulator) from Claude, ChatGPT, or any MCP-compatible client.
Reachy Mini MCP
Control the Reachy Mini robot (or simulator) from Claude, ChatGPT, or any MCP-compatible client.
Dry run (video)
A short "dry run" of the reachy_debug.py sequential demo runner (simulator): step announcements, movements, vision, and artifacts.
Watch the dry run video on GitHub Pages
How it works
AI Assistant --stdio--> MCP Server (reachy.py) --> ReachyMini SDK --> Robot / Simulator
The server exposes 16 tools, 4 prompts, and 4 resources via the Model Context Protocol. An AI assistant calls these tools to see through the robot's camera, move the robot, express emotions, play sounds, or detect audio direction -- no robotics knowledge needed on the AI side.
Installation
Prerequisites
- Python 3.13+
- Reachy Mini robot or the Reachy Mini simulator
- uv (recommended) or pip
Install the server
git clone https://github.com/ArturSkowronski/reachy-mini-mcp.git
cd reachy-mini-mcp
uv sync
Or with pip:
pip install -e .
Configure your MCP client
Add the server to your MCP client configuration. The exact location depends on the client:
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"reachy-mini": {
"command": "uv",
"args": ["--directory", "/path/to/reachy-mini-mcp", "run", "reachy.py"]
}
}
}
Claude Code (.mcp.json in your project root):
{
"mcpServers": {
"reachy-mini": {
"command": "uv",
"args": ["--directory", "/path/to/reachy-mini-mcp", "run", "reachy.py"]
}
}
}
Generic stdio transport:
uv run reachy.py
ElevenLabs TTS (optional)
To enable the speak_text tool, set these environment variables:
export ELEVENLABS_API_KEY="your-api-key"
export ELEVENLABS_VOICE_ID="your-voice-id"
#
# Optional override prefix (takes precedence):
# export REACHY_ELEVENLABS_API_KEY="your-api-key"
# export REACHY_ELEVENLABS_VOICE_ID="your-voice-id"
Default voice (premade/free-tier friendly): George with Voice ID JBFqnCBsd6RMkjVDRZzb.
Favorite voice (author preference): Horatius with Voice ID qXpMhyvQqiRxWQs4qSSB.
Optional overrides: ELEVENLABS_MODEL_ID (default: eleven_multilingual_v2), ELEVENLABS_OUTPUT_FORMAT (default: mp3_44100_128).
WAV support: if your ElevenLabs plan allows it, you can set ELEVENLABS_OUTPUT_FORMAT=wav_44100 to get WAV output instead of MP3.
MP3 vs WAV playback note
- Default TTS output is MP3 (
mp3_44100_128) because it works on lower ElevenLabs tiers. - Some Reachy audio backends/environments may not have MP3 decoding available. In that case, MP3 playback can fail even though WAV works.
- If you need ElevenLabs to return WAV directly (
wav_44100), ElevenLabs requires a higher tier (minimum Pro). - To force WAV output (when available), override the output format via environment variables:
REACHY_ELEVENLABS_OUTPUT_FORMAT=wav_44100(preferred, takes precedence)- or
ELEVENLABS_OUTPUT_FORMAT=wav_44100
Environment variables
General:
NO_COLOR: disable ANSI colors inreachy_debug.pyoutput.
Debug runner (reachy_debug.py):
REACHY_DEBUG_ANNOUNCE_PAUSE_S(default:0.6): pause after each announcement before running the step.REACHY_DEBUG_TTS_SPEED(default:0.8): ElevenLabs speech speed.
ElevenLabs (used by speak_text and reachy_debug.py announcements):
REACHY_ELEVENLABS_API_KEYorELEVENLABS_API_KEY(required for TTS): API key.REACHY_prefixed value takes precedence.REACHY_ELEVENLABS_VOICE_IDorELEVENLABS_VOICE_ID(optional): voice id. Defaults toJBFqnCBsd6RMkjVDRZzb(George) if not set.REACHY_ELEVENLABS_MODEL_IDorELEVENLABS_MODEL_ID(optional): model id (default:eleven_multilingual_v2).REACHY_ELEVENLABS_OUTPUT_FORMATorELEVENLABS_OUTPUT_FORMAT(optional): output format (default:mp3_44100_128, optionallywav_44100if your plan allows it).
Available tools
| Tool | Description |
|---|---|
capture_image | Capture a JPEG frame from the robot's HD camera |
scan_surroundings | Pan camera across multiple angles and return a panoramic set of images |
track_face | Detect a face via OpenCV and turn head to face it |
move_head | 6-DOF head positioning (x/y/z in mm, roll/pitch/yaw in degrees) |
move_antennas | Independent antenna control (-3.14 to 3.14 radians) |
look_at_point | Orient head toward a 3D point in world coordinates |
express_emotion | Emoji-driven emotion system with synchronized movements and sounds |
play_sound | Play built-in sounds (wake_up, go_sleep, confused1, impatient1, dance1, count) |
speak_text | Text-to-speech via ElevenLabs API, played through robot speaker |
detect_sound_direction | Microphone array direction-of-arrival + speech detection |
wake_up | Built-in greeting animation with sound |
go_to_sleep | Built-in farewell animation with sound |
nod | Nod head up and down to indicate "yes" or agreement |
shake_head | Shake head left and right to indicate "no" or disagreement |
reset_position | Return head and antennas to neutral rest pose |
do_barrel_roll | Choreographed head tilt + antenna wiggle sequence |
Resources
The server also exposes MCP resources that let AI assistants discover robot capabilities dynamically:
| Resource URI | Description |
|---|---|
reachy://emotions | Supported emoji-to-emotion mappings |
reachy://sounds | Available built-in sound names |
reachy://limits | Physical limits (antenna range, head DOF, camera specs) |
reachy://capabilities | All tools grouped by category (vision, movement, expression, audio, lifecycle) |
Vision
capture_image grabs a frame from Reachy Mini's wide-angle HD camera and returns it as inline JPEG content through the MCP protocol. The AI assistant receives the image directly in the conversation -- no file paths, no URLs, no extra setup.
scan_surroundings takes this further by panning the camera across multiple angles and returning all frames in a single response:
User: "Look around and describe the room"
Claude calls scan_surroundings(steps=5, yaw_range=120)
<- Robot pans from -60deg to +60deg in 5 steps
<- MCP returns 5 labeled JPEG frames + summary text
Claude: "Starting from the left I can see a window with blinds,
then a whiteboard, your desk with two monitors in the
center, a bookshelf to the right, and a door at the
far right."
The camera returns a standard BGR numpy frame from OpenCV, which gets JPEG-compressed and delivered as MCP ImageContent. Any multimodal AI model that supports image inputs can process it -- Claude, GPT-4o, Gemini, etc.
Tool annotations
Every tool carries semantic annotations that tell AI clients how to use it safely:
| Annotation | Meaning | Tools |
|---|---|---|
readOnlyHint | Doesn't change robot state | capture_image, detect_sound_direction |
idempotentHint=true | Safe to call repeatedly | capture_image, detect_sound_direction |
idempotentHint=false | Repeated calls repeat actions/costs | All movement/gesture/audio tools, including speak_text |
destructiveHint=false | No irreversible actions | All tools |
openWorldHint | Calls external services | speak_text (ElevenLabs API) |
Prompts
Pre-built prompt templates that guide AI assistants through common robot interaction scenarios:
| Prompt | Description |
|---|---|
greet_user | Wake up the robot, express happiness, and greet by name |
explore_room | Scan surroundings with the camera and describe the environment |
react_to_conversation | Use gestures and emotions to physically react during chat |
find_person | Use camera and face tracking to locate and follow a person |
Emotion system
express_emotion maps emoji characters to choreographed movements:
| Emoji | Emotion | Behavior |
|---|---|---|
😊 | happy | antennas up, cheerful pose, dance sound |
😕 | confused | head tilt, asymmetric antennas, confused sound |
😤 | impatient | rapid antenna movements, impatient sound |
😴 | sleepy | sleep pose with sound |
👋 | greeting | wake up animation |
🤔 | thinking | contemplative head tilt, antennas spread |
😮 | surprised | antennas high, head back |
😢 | sad | antennas and head down |
🎉 | celebrate | energetic wiggle, dance sound |
😐 | neutral | return to rest position |
Development
# Install with dev dependencies
uv sync --extra dev
# Run all tests
uv run pytest -v
# Unit tests only
uv run pytest -v -m "not integration"
# Integration tests only (MCP protocol layer)
uv run pytest -v -m integration
# Lint
uv run ruff check . && uv run ruff format --check .
# Set up pre-commit hooks
pre-commit install
Direct robot testing
For a one-click, full sequential debug demo (movement, gestures, audio, vision, tracking) with per-step status checks:
uv sync --extra reachy
# If you want to auto-spawn the simulator daemon, also install:
uv sync --extra reachy-sim
uv run python reachy_debug.py
reachy_debug.py now:
- Announces each upcoming test step (voice via ElevenLabs if configured, otherwise console fallback).
- Executes a full demo suite in sequence.
- Saves all captured images and a markdown run summary to
results/run-YYYYMMDD-HHMMSS/. - Generates a single report file for the run:
run_report.md.
GitHub Pages
This repo includes a static GitHub Pages site under docs/ (with the dry-run video embedded).
To publish: GitHub repo Settings -> Pages -> "Build and deployment" -> Source: "Deploy from a branch" -> Branch: main -> Folder: /docs.
Related Servers
Plex MCP Server
An MCP server for managing your Plex media library with AI assistants.
mcp-server-inject-bender
Security through absurdity: transforms SQL injection and XSS attempts into harmless comedy responses using AI-powered humor defense.
Currency Exchange & Crypto Rates
Real-time forex and crypto conversion with multi-source failover across 5 providers. 60+ fiat currencies, 30+ cryptocurrencies, no API keys needed.
MCP Claude Spotify
An integration for Claude Desktop to interact with Spotify using the Model Context Protocol (MCP).
Airthings Consumer
Monitor air quality with Airthings devices.
Overleaf MCP server
allow Tools like copilot, claude desktop, claude code etc. perform CRUD operations on overleaf projects via git int
NebulaFinger MCP
An MCP server interface for the NebulaFinger fingerprint recognition tool.
Canvelete
API-first platform for image optimization and document design. Generate optimized images, PDFs, and documents at scale with our visual editor and REST API.
Bazi MCP
An AI-powered Bazi calculator providing precise data for personality analysis and destiny forecasting.
Scrptly Video Generator
An Ai Video Agent that can generate professional and complex videos with simple prompts and context images.