reachy-mini-mcp

Control the Reachy Mini robot (or simulator) from Claude, ChatGPT, or any MCP-compatible client.

Reachy Mini MCP

Control the Reachy Mini robot (or simulator) from Claude, ChatGPT, or any MCP-compatible client.

CI Python 3.13+ MCP Reachy Mini

Reachy Mini MCP Quickstart
uv sync --extra reachy
uv run reachy.py
Simulator demo
uv sync --extra reachy-sim
uv run python reachy_debug.py

Dry run (video)

A short "dry run" of the reachy_debug.py sequential demo runner (simulator): step announcements, movements, vision, and artifacts.

Your browser can’t play this video. Download MP4.

Watch the dry run video on GitHub Pages

How it works

AI Assistant  --stdio-->  MCP Server (reachy.py)  -->  ReachyMini SDK  -->  Robot / Simulator

The server exposes 16 tools, 4 prompts, and 4 resources via the Model Context Protocol. An AI assistant calls these tools to see through the robot's camera, move the robot, express emotions, play sounds, or detect audio direction -- no robotics knowledge needed on the AI side.

Installation

Prerequisites

Install the server

git clone https://github.com/ArturSkowronski/reachy-mini-mcp.git
cd reachy-mini-mcp
uv sync

Or with pip:

pip install -e .

Configure your MCP client

Add the server to your MCP client configuration. The exact location depends on the client:

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "reachy-mini": {
      "command": "uv",
      "args": ["--directory", "/path/to/reachy-mini-mcp", "run", "reachy.py"]
    }
  }
}

Claude Code (.mcp.json in your project root):

{
  "mcpServers": {
    "reachy-mini": {
      "command": "uv",
      "args": ["--directory", "/path/to/reachy-mini-mcp", "run", "reachy.py"]
    }
  }
}

Generic stdio transport:

uv run reachy.py

ElevenLabs TTS (optional)

To enable the speak_text tool, set these environment variables:

export ELEVENLABS_API_KEY="your-api-key"
export ELEVENLABS_VOICE_ID="your-voice-id"
#
# Optional override prefix (takes precedence):
# export REACHY_ELEVENLABS_API_KEY="your-api-key"
# export REACHY_ELEVENLABS_VOICE_ID="your-voice-id"

Default voice (premade/free-tier friendly): George with Voice ID JBFqnCBsd6RMkjVDRZzb.

Favorite voice (author preference): Horatius with Voice ID qXpMhyvQqiRxWQs4qSSB.

Optional overrides: ELEVENLABS_MODEL_ID (default: eleven_multilingual_v2), ELEVENLABS_OUTPUT_FORMAT (default: mp3_44100_128).

WAV support: if your ElevenLabs plan allows it, you can set ELEVENLABS_OUTPUT_FORMAT=wav_44100 to get WAV output instead of MP3.

MP3 vs WAV playback note

  • Default TTS output is MP3 (mp3_44100_128) because it works on lower ElevenLabs tiers.
  • Some Reachy audio backends/environments may not have MP3 decoding available. In that case, MP3 playback can fail even though WAV works.
  • If you need ElevenLabs to return WAV directly (wav_44100), ElevenLabs requires a higher tier (minimum Pro).
  • To force WAV output (when available), override the output format via environment variables:
    • REACHY_ELEVENLABS_OUTPUT_FORMAT=wav_44100 (preferred, takes precedence)
    • or ELEVENLABS_OUTPUT_FORMAT=wav_44100

Environment variables

General:

  • NO_COLOR: disable ANSI colors in reachy_debug.py output.

Debug runner (reachy_debug.py):

  • REACHY_DEBUG_ANNOUNCE_PAUSE_S (default: 0.6): pause after each announcement before running the step.
  • REACHY_DEBUG_TTS_SPEED (default: 0.8): ElevenLabs speech speed.

ElevenLabs (used by speak_text and reachy_debug.py announcements):

  • REACHY_ELEVENLABS_API_KEY or ELEVENLABS_API_KEY (required for TTS): API key. REACHY_ prefixed value takes precedence.
  • REACHY_ELEVENLABS_VOICE_ID or ELEVENLABS_VOICE_ID (optional): voice id. Defaults to JBFqnCBsd6RMkjVDRZzb (George) if not set.
  • REACHY_ELEVENLABS_MODEL_ID or ELEVENLABS_MODEL_ID (optional): model id (default: eleven_multilingual_v2).
  • REACHY_ELEVENLABS_OUTPUT_FORMAT or ELEVENLABS_OUTPUT_FORMAT (optional): output format (default: mp3_44100_128, optionally wav_44100 if your plan allows it).

Available tools

ToolDescription
capture_imageCapture a JPEG frame from the robot's HD camera
scan_surroundingsPan camera across multiple angles and return a panoramic set of images
track_faceDetect a face via OpenCV and turn head to face it
move_head6-DOF head positioning (x/y/z in mm, roll/pitch/yaw in degrees)
move_antennasIndependent antenna control (-3.14 to 3.14 radians)
look_at_pointOrient head toward a 3D point in world coordinates
express_emotionEmoji-driven emotion system with synchronized movements and sounds
play_soundPlay built-in sounds (wake_up, go_sleep, confused1, impatient1, dance1, count)
speak_textText-to-speech via ElevenLabs API, played through robot speaker
detect_sound_directionMicrophone array direction-of-arrival + speech detection
wake_upBuilt-in greeting animation with sound
go_to_sleepBuilt-in farewell animation with sound
nodNod head up and down to indicate "yes" or agreement
shake_headShake head left and right to indicate "no" or disagreement
reset_positionReturn head and antennas to neutral rest pose
do_barrel_rollChoreographed head tilt + antenna wiggle sequence

Resources

The server also exposes MCP resources that let AI assistants discover robot capabilities dynamically:

Resource URIDescription
reachy://emotionsSupported emoji-to-emotion mappings
reachy://soundsAvailable built-in sound names
reachy://limitsPhysical limits (antenna range, head DOF, camera specs)
reachy://capabilitiesAll tools grouped by category (vision, movement, expression, audio, lifecycle)

Vision

capture_image grabs a frame from Reachy Mini's wide-angle HD camera and returns it as inline JPEG content through the MCP protocol. The AI assistant receives the image directly in the conversation -- no file paths, no URLs, no extra setup.

scan_surroundings takes this further by panning the camera across multiple angles and returning all frames in a single response:

User:  "Look around and describe the room"

       Claude calls scan_surroundings(steps=5, yaw_range=120)
       <- Robot pans from -60deg to +60deg in 5 steps
       <- MCP returns 5 labeled JPEG frames + summary text

Claude: "Starting from the left I can see a window with blinds,
         then a whiteboard, your desk with two monitors in the
         center, a bookshelf to the right, and a door at the
         far right."

The camera returns a standard BGR numpy frame from OpenCV, which gets JPEG-compressed and delivered as MCP ImageContent. Any multimodal AI model that supports image inputs can process it -- Claude, GPT-4o, Gemini, etc.

Tool annotations

Every tool carries semantic annotations that tell AI clients how to use it safely:

AnnotationMeaningTools
readOnlyHintDoesn't change robot statecapture_image, detect_sound_direction
idempotentHint=trueSafe to call repeatedlycapture_image, detect_sound_direction
idempotentHint=falseRepeated calls repeat actions/costsAll movement/gesture/audio tools, including speak_text
destructiveHint=falseNo irreversible actionsAll tools
openWorldHintCalls external servicesspeak_text (ElevenLabs API)

Prompts

Pre-built prompt templates that guide AI assistants through common robot interaction scenarios:

PromptDescription
greet_userWake up the robot, express happiness, and greet by name
explore_roomScan surroundings with the camera and describe the environment
react_to_conversationUse gestures and emotions to physically react during chat
find_personUse camera and face tracking to locate and follow a person

Emotion system

express_emotion maps emoji characters to choreographed movements:

EmojiEmotionBehavior
😊happyantennas up, cheerful pose, dance sound
😕confusedhead tilt, asymmetric antennas, confused sound
😤impatientrapid antenna movements, impatient sound
😴sleepysleep pose with sound
👋greetingwake up animation
🤔thinkingcontemplative head tilt, antennas spread
😮surprisedantennas high, head back
😢sadantennas and head down
🎉celebrateenergetic wiggle, dance sound
😐neutralreturn to rest position

Development

# Install with dev dependencies
uv sync --extra dev

# Run all tests
uv run pytest -v

# Unit tests only
uv run pytest -v -m "not integration"

# Integration tests only (MCP protocol layer)
uv run pytest -v -m integration

# Lint
uv run ruff check . && uv run ruff format --check .

# Set up pre-commit hooks
pre-commit install

Direct robot testing

For a one-click, full sequential debug demo (movement, gestures, audio, vision, tracking) with per-step status checks:

uv sync --extra reachy
# If you want to auto-spawn the simulator daemon, also install:
uv sync --extra reachy-sim
uv run python reachy_debug.py

reachy_debug.py now:

  • Announces each upcoming test step (voice via ElevenLabs if configured, otherwise console fallback).
  • Executes a full demo suite in sequence.
  • Saves all captured images and a markdown run summary to results/run-YYYYMMDD-HHMMSS/.
  • Generates a single report file for the run: run_report.md.

GitHub Pages

This repo includes a static GitHub Pages site under docs/ (with the dry-run video embedded).

To publish: GitHub repo Settings -> Pages -> "Build and deployment" -> Source: "Deploy from a branch" -> Branch: main -> Folder: /docs.

Máy chủ liên quan

NotebookLM Web Importer

Nhập trang web và video YouTube vào NotebookLM chỉ với một cú nhấp. Được tin dùng bởi hơn 200.000 người dùng.

Cài đặt tiện ích Chrome