reachy-mini-mcp

Control the Reachy Mini robot (or simulator) from Claude, ChatGPT, or any MCP-compatible client.

Reachy Mini MCP

Control the Reachy Mini robot (or simulator) from Claude, ChatGPT, or any MCP-compatible client.

CI Python 3.13+ MCP Reachy Mini

Dry run (video)

A short "dry run" of the reachy_debug.py sequential demo runner (simulator): step announcements, movements, vision, and artifacts.

Watch the dry run video on GitHub Pages

How it works

AI Assistant  --stdio-->  MCP Server (reachy.py)  -->  ReachyMini SDK  -->  Robot / Simulator

The server exposes 16 tools, 4 prompts, and 4 resources via the Model Context Protocol. An AI assistant calls these tools to see through the robot's camera, move the robot, express emotions, play sounds, or detect audio direction -- no robotics knowledge needed on the AI side.

Installation

Prerequisites

Install the server

git clone https://github.com/ArturSkowronski/reachy-mini-mcp.git
cd reachy-mini-mcp
uv sync

Or with pip:

pip install -e .

Configure your MCP client

Add the server to your MCP client configuration. The exact location depends on the client:

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "reachy-mini": {
      "command": "uv",
      "args": ["--directory", "/path/to/reachy-mini-mcp", "run", "reachy.py"]
    }
  }
}

Claude Code (.mcp.json in your project root):

{
  "mcpServers": {
    "reachy-mini": {
      "command": "uv",
      "args": ["--directory", "/path/to/reachy-mini-mcp", "run", "reachy.py"]
    }
  }
}

Generic stdio transport:

uv run reachy.py

ElevenLabs TTS (optional)

To enable the speak_text tool, set these environment variables:

export ELEVENLABS_API_KEY="your-api-key"
export ELEVENLABS_VOICE_ID="your-voice-id"
#
# Optional override prefix (takes precedence):
# export REACHY_ELEVENLABS_API_KEY="your-api-key"
# export REACHY_ELEVENLABS_VOICE_ID="your-voice-id"

Default voice (premade/free-tier friendly): George with Voice ID JBFqnCBsd6RMkjVDRZzb.

Favorite voice (author preference): Horatius with Voice ID qXpMhyvQqiRxWQs4qSSB.

Optional overrides: ELEVENLABS_MODEL_ID (default: eleven_multilingual_v2), ELEVENLABS_OUTPUT_FORMAT (default: mp3_44100_128).

WAV support: if your ElevenLabs plan allows it, you can set ELEVENLABS_OUTPUT_FORMAT=wav_44100 to get WAV output instead of MP3.

MP3 vs WAV playback note

  • Default TTS output is MP3 (mp3_44100_128) because it works on lower ElevenLabs tiers.
  • Some Reachy audio backends/environments may not have MP3 decoding available. In that case, MP3 playback can fail even though WAV works.
  • If you need ElevenLabs to return WAV directly (wav_44100), ElevenLabs requires a higher tier (minimum Pro).
  • To force WAV output (when available), override the output format via environment variables:
    • REACHY_ELEVENLABS_OUTPUT_FORMAT=wav_44100 (preferred, takes precedence)
    • or ELEVENLABS_OUTPUT_FORMAT=wav_44100

Environment variables

General:

  • NO_COLOR: disable ANSI colors in reachy_debug.py output.

Debug runner (reachy_debug.py):

  • REACHY_DEBUG_ANNOUNCE_PAUSE_S (default: 0.6): pause after each announcement before running the step.
  • REACHY_DEBUG_TTS_SPEED (default: 0.8): ElevenLabs speech speed.

ElevenLabs (used by speak_text and reachy_debug.py announcements):

  • REACHY_ELEVENLABS_API_KEY or ELEVENLABS_API_KEY (required for TTS): API key. REACHY_ prefixed value takes precedence.
  • REACHY_ELEVENLABS_VOICE_ID or ELEVENLABS_VOICE_ID (optional): voice id. Defaults to JBFqnCBsd6RMkjVDRZzb (George) if not set.
  • REACHY_ELEVENLABS_MODEL_ID or ELEVENLABS_MODEL_ID (optional): model id (default: eleven_multilingual_v2).
  • REACHY_ELEVENLABS_OUTPUT_FORMAT or ELEVENLABS_OUTPUT_FORMAT (optional): output format (default: mp3_44100_128, optionally wav_44100 if your plan allows it).

Available tools

ToolDescription
capture_imageCapture a JPEG frame from the robot's HD camera
scan_surroundingsPan camera across multiple angles and return a panoramic set of images
track_faceDetect a face via OpenCV and turn head to face it
move_head6-DOF head positioning (x/y/z in mm, roll/pitch/yaw in degrees)
move_antennasIndependent antenna control (-3.14 to 3.14 radians)
look_at_pointOrient head toward a 3D point in world coordinates
express_emotionEmoji-driven emotion system with synchronized movements and sounds
play_soundPlay built-in sounds (wake_up, go_sleep, confused1, impatient1, dance1, count)
speak_textText-to-speech via ElevenLabs API, played through robot speaker
detect_sound_directionMicrophone array direction-of-arrival + speech detection
wake_upBuilt-in greeting animation with sound
go_to_sleepBuilt-in farewell animation with sound
nodNod head up and down to indicate "yes" or agreement
shake_headShake head left and right to indicate "no" or disagreement
reset_positionReturn head and antennas to neutral rest pose
do_barrel_rollChoreographed head tilt + antenna wiggle sequence

Resources

The server also exposes MCP resources that let AI assistants discover robot capabilities dynamically:

Resource URIDescription
reachy://emotionsSupported emoji-to-emotion mappings
reachy://soundsAvailable built-in sound names
reachy://limitsPhysical limits (antenna range, head DOF, camera specs)
reachy://capabilitiesAll tools grouped by category (vision, movement, expression, audio, lifecycle)

Vision

capture_image grabs a frame from Reachy Mini's wide-angle HD camera and returns it as inline JPEG content through the MCP protocol. The AI assistant receives the image directly in the conversation -- no file paths, no URLs, no extra setup.

scan_surroundings takes this further by panning the camera across multiple angles and returning all frames in a single response:

User:  "Look around and describe the room"

       Claude calls scan_surroundings(steps=5, yaw_range=120)
       <- Robot pans from -60deg to +60deg in 5 steps
       <- MCP returns 5 labeled JPEG frames + summary text

Claude: "Starting from the left I can see a window with blinds,
         then a whiteboard, your desk with two monitors in the
         center, a bookshelf to the right, and a door at the
         far right."

The camera returns a standard BGR numpy frame from OpenCV, which gets JPEG-compressed and delivered as MCP ImageContent. Any multimodal AI model that supports image inputs can process it -- Claude, GPT-4o, Gemini, etc.

Tool annotations

Every tool carries semantic annotations that tell AI clients how to use it safely:

AnnotationMeaningTools
readOnlyHintDoesn't change robot statecapture_image, detect_sound_direction
idempotentHint=trueSafe to call repeatedlycapture_image, detect_sound_direction
idempotentHint=falseRepeated calls repeat actions/costsAll movement/gesture/audio tools, including speak_text
destructiveHint=falseNo irreversible actionsAll tools
openWorldHintCalls external servicesspeak_text (ElevenLabs API)

Prompts

Pre-built prompt templates that guide AI assistants through common robot interaction scenarios:

PromptDescription
greet_userWake up the robot, express happiness, and greet by name
explore_roomScan surroundings with the camera and describe the environment
react_to_conversationUse gestures and emotions to physically react during chat
find_personUse camera and face tracking to locate and follow a person

Emotion system

express_emotion maps emoji characters to choreographed movements:

EmojiEmotionBehavior
😊happyantennas up, cheerful pose, dance sound
😕confusedhead tilt, asymmetric antennas, confused sound
😤impatientrapid antenna movements, impatient sound
😴sleepysleep pose with sound
👋greetingwake up animation
🤔thinkingcontemplative head tilt, antennas spread
😮surprisedantennas high, head back
😢sadantennas and head down
🎉celebrateenergetic wiggle, dance sound
😐neutralreturn to rest position

Development

# Install with dev dependencies
uv sync --extra dev

# Run all tests
uv run pytest -v

# Unit tests only
uv run pytest -v -m "not integration"

# Integration tests only (MCP protocol layer)
uv run pytest -v -m integration

# Lint
uv run ruff check . && uv run ruff format --check .

# Set up pre-commit hooks
pre-commit install

Direct robot testing

For a one-click, full sequential debug demo (movement, gestures, audio, vision, tracking) with per-step status checks:

uv sync --extra reachy
# If you want to auto-spawn the simulator daemon, also install:
uv sync --extra reachy-sim
uv run python reachy_debug.py

reachy_debug.py now:

  • Announces each upcoming test step (voice via ElevenLabs if configured, otherwise console fallback).
  • Executes a full demo suite in sequence.
  • Saves all captured images and a markdown run summary to results/run-YYYYMMDD-HHMMSS/.
  • Generates a single report file for the run: run_report.md.

GitHub Pages

This repo includes a static GitHub Pages site under docs/ (with the dry-run video embedded).

To publish: GitHub repo Settings -> Pages -> "Build and deployment" -> Source: "Deploy from a branch" -> Branch: main -> Folder: /docs.

Related Servers