MCP Voice Assistant

A voice-enabled AI personal assistant that leverages the Model Context Protocol (MCP) to integrate multiple tools and services through natural voice interactions.

Features

🎤 Voice Input: Real-time speech-to-text using OpenAI Whisper
🔊 Voice Output: High-quality text-to-speech using ElevenLabs (with pyttsx3 fallback)
🤖 AI-Powered: Conversational AI with memory persistence
🌐 Multiple Model Providers: Works with any LLM provider that supports tool calling (OpenAI, Anthropic, Groq, LLama, etc.)
🛠️ Multi-Tool Integration: Seamlessly connects to any MCP servers:
💾 Conversational Memory: Maintains context across interactions
🎯 Extensible: Easy to add new MCP servers and capabilities

Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐     ┌──────────────┐
│ User Voice  │ --> │ Speech-to-   │ --> │  LLM with   │ --> │ Text-to-     │
│   Input     │     │ Text (STT)   │     │  MCPAgent   │     │ Speech (TTS) │
└─────────────┘     └──────────────┘     └─────────────┘     └──────────────┘
                         Whisper                 │                ElevenLabs
                                                 │
                                          ┌──────▼──────┐
                                          │ MCP Servers │
                                          ├─────────────┤
                                          │ • Linear    │
                                          │ • Playwright│
                                          │ • Filesystem│
                                          └─────────────┘

Installation

Prerequisites

Python 3.11+
uv (Python package manager): pip install uv or pipx install uv
Node.js (for MCP servers)
System dependencies:
- macOS: brew install portaudio
- Ubuntu/Debian: sudo apt-get install portaudio19-dev
- Windows: PyAudio wheel includes PortAudio

Install from Source

# Clone the repository
git clone https://github.com/yourusername/mcp-voice-assistant.git
cd mcp-voice-assistant

# Create a virtual environment with uv
uv venv

# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate

# Install in development mode
uv pip install -e .

# Or install directly
uv pip install .

Configuration

Environment Variables

Create a .env file in your project root (see .env.example for a complete template):

# Required
OPENAI_API_KEY=your-openai-api-key

# Optional but recommended for better voice output
ELEVENLABS_API_KEY=your-elevenlabs-api-key

# Optional - Model Provider Settings
# You can use any model provider that supports tool calling
OPENAI_API_KEY=your-openai-api-key              # For OpenAI models
ANTHROPIC_API_KEY=your-anthropic-api-key        # For Claude models
GROQ_API_KEY=your-groq-api-key                  # For Groq models

# Model selection (defaults to gpt-4)
OPENAI_MODEL=gpt-4                              # OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
# Or use other providers:
# ANTHROPIC_MODEL=claude-3-5-sonnet-20240620   # Anthropic Claude
# GROQ_MODEL=llama3-8b-8192                    # Groq LLama

# Voice Settings
ELEVENLABS_VOICE_ID=ZF6FPAbjXT4488VcRRnw      # Default: Rachel voice

# Optional - Audio Configuration
VOICE_SILENCE_THRESHOLD=500                     # Lower = more sensitive
VOICE_SILENCE_DURATION=1.5                      # Seconds to wait after speech

# Optional - Assistant Configuration
ASSISTANT_SYSTEM_PROMPT="You are a helpful voice assistant..."  # Customize personality

# Optional - MCP Server Specific
LINEAR_API_KEY=your-linear-api-key              # For Linear integration

All environment variables can be overridden via command-line arguments when using the CLI.

MCP Server Configuration

The assistant loads MCP server configurations from mcp_servers.json in the project root. By default, it includes:

playwright: Web automation and browser control
linear: Task and project management

To add more servers, edit mcp_servers.json or copy mcp_servers.example.json which includes additional servers like:

filesystem, github, gitlab, google-drive, postgres, sqlite, slack, memory, puppeteer, brave-search, fetch

Environment variables in the config (like ${GITHUB_PERSONAL_ACCESS_TOKEN}) are automatically substituted from your .env file.

To override the default configuration programmatically:

config = {
    "mcpServers": {
        "your_server": {
            "command": "npx",
            "args": ["-y", "@your-org/mcp-server"],
            "env": {"YOUR_API_KEY": "${YOUR_API_KEY}"}
        }
    }
}

Running the Assistant

After installation, run the assistant:

# Using uv
uv run python voice_assistant/agent.py

# Or using python directly
python voice_assistant/agent.py

# Override specific settings via command line
python voice_assistant/agent.py --model gpt-3.5-turbo --silence-threshold 300

# Provide all settings via command line (no .env needed)
python voice_assistant/agent.py \
  --openai-api-key YOUR_KEY \
  --elevenlabs-api-key YOUR_ELEVENLABS_KEY \
  --model gpt-4 \
  --voice-id ZF6FPAbjXT4488VcRRnw \
  --silence-threshold 500 \
  --silence-duration 1.5

# See all available options
python voice_assistant/agent.py --help

Note: Command-line arguments take precedence over environment variables.

Changing Model Provider

The voice assistant supports multiple LLM providers through LangChain. Any model with tool calling capabilities can be used:

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_groq import ChatGroq

# Using OpenAI (default)
assistant = VoiceAssistant(
    openai_api_key="your-key",
    model="gpt-4"  # or gpt-4-turbo, gpt-3.5-turbo
)

# Using Anthropic Claude
llm = ChatAnthropic(
    api_key="your-anthropic-key",
    model="claude-3-5-sonnet-20240620"
)
assistant = VoiceAssistant(
    llm=llm,  # Pass custom LLM instance
    elevenlabs_api_key="your-key"
)

# Using Groq
llm = ChatGroq(
    api_key="your-groq-key",
    model="llama3-8b-8192"
)
assistant = VoiceAssistant(
    llm=llm,
    elevenlabs_api_key="your-key"
)

Note: Only models with tool calling capabilities can be used. Check your model provider's documentation for supported models.

Changing Voice Settings

Pass different parameters when initializing:

assistant = VoiceAssistant(
    openai_api_key="your-key",
    elevenlabs_api_key="your-key",
    elevenlabs_voice_id="different-voice-id",  # Change voice
    silence_threshold=300,  # More sensitive
    silence_duration=2.0,   # Wait longer
    model="gpt-3.5-turbo"  # Faster model
)

Troubleshooting

Common Issues

No Audio Input Detected
- Check microphone permissions
- Lower the silence_threshold value
- Verify PyAudio: python -c "import pyaudio; pyaudio.PyAudio()"
TTS Not Working
- Verify API keys are set correctly
- Check API quotas
- System will fall back to pyttsx3 if ElevenLabs fails
MCP Server Connection Issues
- Ensure Node.js is installed
- Check internet connection for npx downloads
- Verify API keys for specific servers
High Latency
- Use faster LLM model (e.g., gpt-3.5-turbo)
- Reduce max_steps in MCPAgent
- Consider using local models

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on top of mcp-use
Uses OpenAI Whisper for speech recognition
Voice synthesis powered by ElevenLabs
MCP servers from the Model Context Protocol ecosystem

Support

📧 Email: your.email@example.com
💬 Discord: Join our server
🐛 Issues: GitHub Issues
📖 Documentation: Full Docs

Features

Architecture

Installation

Prerequisites

Install from Source

Configuration

Environment Variables

MCP Server Configuration

Running the Assistant

Changing Model Provider

Changing Voice Settings

Troubleshooting

Common Issues

Contributing

License

Acknowledgments

Support

Related Servers

AISecretary

Google Calendar

Video Editor

Google Workspace MCP Server

Apple Books

ClickUp

Taskeract

Eventbrite

Feishu MCP Server

MCP-PDF2MD