MCP Voice Assistant
A voice-enabled AI personal assistant that integrates multiple tools and services through natural voice interactions using MCP.
A voice-enabled AI personal assistant that leverages the Model Context Protocol (MCP) to integrate multiple tools and services through natural voice interactions.
Features
- π€ Voice Input: Real-time speech-to-text using OpenAI Whisper
- π Voice Output: High-quality text-to-speech using ElevenLabs (with pyttsx3 fallback)
- π€ AI-Powered: Conversational AI with memory persistence
- π Multiple Model Providers: Works with any LLM provider that supports tool calling (OpenAI, Anthropic, Groq, LLama, etc.)
- π οΈ Multi-Tool Integration: Seamlessly connects to any MCP servers:
- πΎ Conversational Memory: Maintains context across interactions
- π― Extensible: Easy to add new MCP servers and capabilities
Architecture
βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ
β User Voice β --> β Speech-to- β --> β LLM with β --> β Text-to- β
β Input β β Text (STT) β β MCPAgent β β Speech (TTS) β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ
Whisper β ElevenLabs
β
ββββββββΌβββββββ
β MCP Servers β
βββββββββββββββ€
β β’ Linear β
β β’ Playwrightβ
β β’ Filesystemβ
βββββββββββββββ
Installation
Prerequisites
- Python 3.11+
- uv (Python package manager):
pip install uv
orpipx install uv
- Node.js (for MCP servers)
- System dependencies:
- macOS:
brew install portaudio
- Ubuntu/Debian:
sudo apt-get install portaudio19-dev
- Windows: PyAudio wheel includes PortAudio
- macOS:
Install from Source
# Clone the repository
git clone https://github.com/yourusername/mcp-voice-assistant.git
cd mcp-voice-assistant
# Create a virtual environment with uv
uv venv
# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate
# Install in development mode
uv pip install -e .
# Or install directly
uv pip install .
Configuration
Environment Variables
Create a .env
file in your project root (see .env.example
for a complete template):
# Required
OPENAI_API_KEY=your-openai-api-key
# Optional but recommended for better voice output
ELEVENLABS_API_KEY=your-elevenlabs-api-key
# Optional - Model Provider Settings
# You can use any model provider that supports tool calling
OPENAI_API_KEY=your-openai-api-key # For OpenAI models
ANTHROPIC_API_KEY=your-anthropic-api-key # For Claude models
GROQ_API_KEY=your-groq-api-key # For Groq models
# Model selection (defaults to gpt-4)
OPENAI_MODEL=gpt-4 # OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
# Or use other providers:
# ANTHROPIC_MODEL=claude-3-5-sonnet-20240620 # Anthropic Claude
# GROQ_MODEL=llama3-8b-8192 # Groq LLama
# Voice Settings
ELEVENLABS_VOICE_ID=ZF6FPAbjXT4488VcRRnw # Default: Rachel voice
# Optional - Audio Configuration
VOICE_SILENCE_THRESHOLD=500 # Lower = more sensitive
VOICE_SILENCE_DURATION=1.5 # Seconds to wait after speech
# Optional - Assistant Configuration
ASSISTANT_SYSTEM_PROMPT="You are a helpful voice assistant..." # Customize personality
# Optional - MCP Server Specific
LINEAR_API_KEY=your-linear-api-key # For Linear integration
All environment variables can be overridden via command-line arguments when using the CLI.
MCP Server Configuration
The assistant loads MCP server configurations from mcp_servers.json
in the project root. By default, it includes:
- playwright: Web automation and browser control
- linear: Task and project management
To add more servers, edit mcp_servers.json
or copy mcp_servers.example.json
which includes additional servers like:
- filesystem, github, gitlab, google-drive, postgres, sqlite, slack, memory, puppeteer, brave-search, fetch
Environment variables in the config (like ${GITHUB_PERSONAL_ACCESS_TOKEN}
) are automatically substituted from your .env
file.
To override the default configuration programmatically:
config = {
"mcpServers": {
"your_server": {
"command": "npx",
"args": ["-y", "@your-org/mcp-server"],
"env": {"YOUR_API_KEY": "${YOUR_API_KEY}"}
}
}
}
Running the Assistant
After installation, run the assistant:
# Using uv
uv run python voice_assistant/agent.py
# Or using python directly
python voice_assistant/agent.py
# Override specific settings via command line
python voice_assistant/agent.py --model gpt-3.5-turbo --silence-threshold 300
# Provide all settings via command line (no .env needed)
python voice_assistant/agent.py \
--openai-api-key YOUR_KEY \
--elevenlabs-api-key YOUR_ELEVENLABS_KEY \
--model gpt-4 \
--voice-id ZF6FPAbjXT4488VcRRnw \
--silence-threshold 500 \
--silence-duration 1.5
# See all available options
python voice_assistant/agent.py --help
Note: Command-line arguments take precedence over environment variables.
Changing Model Provider
The voice assistant supports multiple LLM providers through LangChain. Any model with tool calling capabilities can be used:
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_groq import ChatGroq
# Using OpenAI (default)
assistant = VoiceAssistant(
openai_api_key="your-key",
model="gpt-4" # or gpt-4-turbo, gpt-3.5-turbo
)
# Using Anthropic Claude
llm = ChatAnthropic(
api_key="your-anthropic-key",
model="claude-3-5-sonnet-20240620"
)
assistant = VoiceAssistant(
llm=llm, # Pass custom LLM instance
elevenlabs_api_key="your-key"
)
# Using Groq
llm = ChatGroq(
api_key="your-groq-key",
model="llama3-8b-8192"
)
assistant = VoiceAssistant(
llm=llm,
elevenlabs_api_key="your-key"
)
Note: Only models with tool calling capabilities can be used. Check your model provider's documentation for supported models.
Changing Voice Settings
Pass different parameters when initializing:
assistant = VoiceAssistant(
openai_api_key="your-key",
elevenlabs_api_key="your-key",
elevenlabs_voice_id="different-voice-id", # Change voice
silence_threshold=300, # More sensitive
silence_duration=2.0, # Wait longer
model="gpt-3.5-turbo" # Faster model
)
Troubleshooting
Common Issues
-
No Audio Input Detected
- Check microphone permissions
- Lower the
silence_threshold
value - Verify PyAudio:
python -c "import pyaudio; pyaudio.PyAudio()"
-
TTS Not Working
- Verify API keys are set correctly
- Check API quotas
- System will fall back to pyttsx3 if ElevenLabs fails
-
MCP Server Connection Issues
- Ensure Node.js is installed
- Check internet connection for npx downloads
- Verify API keys for specific servers
-
High Latency
- Use faster LLM model (e.g.,
gpt-3.5-turbo
) - Reduce
max_steps
in MCPAgent - Consider using local models
- Use faster LLM model (e.g.,
Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built on top of mcp-use
- Uses OpenAI Whisper for speech recognition
- Voice synthesis powered by ElevenLabs
- MCP servers from the Model Context Protocol ecosystem
Support
- π§ Email: your.email@example.com
- π¬ Discord: Join our server
- π Issues: GitHub Issues
- π Documentation: Full Docs
Related Servers
Feishu/Lark OpenAPI
Connects AI agents to the Feishu/Lark platform to automate document processing, conversation management, and calendar scheduling via its OpenAPI.
Microsoft 365
Interact with Microsoft 365 services like Outlook, OneDrive, and Teams using the Graph API.
Pleasanter MCP Server
An MCP server for interacting with the Pleasanter low-code/no-code business application platform.
Breathe HR
Provides secure, read-write access to Breathe HR data for AI assistants.
MCP Mistral OCR
Perform OCR on local files and URLs (images, PDFs) using the Mistral AI API.
n8n Workflow Builder
An MCP server for managing n8n workflows through its API.
Google Calendar
An MCP server for Google Calendar, enabling LLMs to read, create, and manage calendar events.
Shannon Thinking
A tool for systematic problem-solving based on Claude Shannon's methodology, breaking down complex problems into structured thoughts.
RegGuard
AI-powered regulatory compliance checking for financial marketing content across multiple jurisdictions.
Coda
Interact with the Coda API to manage documents and pages, including creating, reading, updating, and deleting.