MCP Voice Assistant
A voice-enabled AI personal assistant that integrates multiple tools and services through natural voice interactions using MCP.
A voice-enabled AI personal assistant that leverages the Model Context Protocol (MCP) to integrate multiple tools and services through natural voice interactions.
Features
- π€ Voice Input: Real-time speech-to-text using OpenAI Whisper
- π Voice Output: High-quality text-to-speech using ElevenLabs (with pyttsx3 fallback)
- π€ AI-Powered: Conversational AI with memory persistence
- π Multiple Model Providers: Works with any LLM provider that supports tool calling (OpenAI, Anthropic, Groq, LLama, etc.)
- π οΈ Multi-Tool Integration: Seamlessly connects to any MCP servers:
- πΎ Conversational Memory: Maintains context across interactions
- π― Extensible: Easy to add new MCP servers and capabilities
Architecture
βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ
β User Voice β --> β Speech-to- β --> β LLM with β --> β Text-to- β
β Input β β Text (STT) β β MCPAgent β β Speech (TTS) β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ
Whisper β ElevenLabs
β
ββββββββΌβββββββ
β MCP Servers β
βββββββββββββββ€
β β’ Linear β
β β’ Playwrightβ
β β’ Filesystemβ
βββββββββββββββ
Installation
Prerequisites
- Python 3.11+
- uv (Python package manager):
pip install uvorpipx install uv - Node.js (for MCP servers)
- System dependencies:
- macOS:
brew install portaudio - Ubuntu/Debian:
sudo apt-get install portaudio19-dev - Windows: PyAudio wheel includes PortAudio
- macOS:
Install from Source
# Clone the repository
git clone https://github.com/yourusername/mcp-voice-assistant.git
cd mcp-voice-assistant
# Create a virtual environment with uv
uv venv
# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate
# Install in development mode
uv pip install -e .
# Or install directly
uv pip install .
Configuration
Environment Variables
Create a .env file in your project root (see .env.example for a complete template):
# Required
OPENAI_API_KEY=your-openai-api-key
# Optional but recommended for better voice output
ELEVENLABS_API_KEY=your-elevenlabs-api-key
# Optional - Model Provider Settings
# You can use any model provider that supports tool calling
OPENAI_API_KEY=your-openai-api-key # For OpenAI models
ANTHROPIC_API_KEY=your-anthropic-api-key # For Claude models
GROQ_API_KEY=your-groq-api-key # For Groq models
# Model selection (defaults to gpt-4)
OPENAI_MODEL=gpt-4 # OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
# Or use other providers:
# ANTHROPIC_MODEL=claude-3-5-sonnet-20240620 # Anthropic Claude
# GROQ_MODEL=llama3-8b-8192 # Groq LLama
# Voice Settings
ELEVENLABS_VOICE_ID=ZF6FPAbjXT4488VcRRnw # Default: Rachel voice
# Optional - Audio Configuration
VOICE_SILENCE_THRESHOLD=500 # Lower = more sensitive
VOICE_SILENCE_DURATION=1.5 # Seconds to wait after speech
# Optional - Assistant Configuration
ASSISTANT_SYSTEM_PROMPT="You are a helpful voice assistant..." # Customize personality
# Optional - MCP Server Specific
LINEAR_API_KEY=your-linear-api-key # For Linear integration
All environment variables can be overridden via command-line arguments when using the CLI.
MCP Server Configuration
The assistant loads MCP server configurations from mcp_servers.json in the project root. By default, it includes:
- playwright: Web automation and browser control
- linear: Task and project management
To add more servers, edit mcp_servers.json or copy mcp_servers.example.json which includes additional servers like:
- filesystem, github, gitlab, google-drive, postgres, sqlite, slack, memory, puppeteer, brave-search, fetch
Environment variables in the config (like ${GITHUB_PERSONAL_ACCESS_TOKEN}) are automatically substituted from your .env file.
To override the default configuration programmatically:
config = {
"mcpServers": {
"your_server": {
"command": "npx",
"args": ["-y", "@your-org/mcp-server"],
"env": {"YOUR_API_KEY": "${YOUR_API_KEY}"}
}
}
}
Running the Assistant
After installation, run the assistant:
# Using uv
uv run python voice_assistant/agent.py
# Or using python directly
python voice_assistant/agent.py
# Override specific settings via command line
python voice_assistant/agent.py --model gpt-3.5-turbo --silence-threshold 300
# Provide all settings via command line (no .env needed)
python voice_assistant/agent.py \
--openai-api-key YOUR_KEY \
--elevenlabs-api-key YOUR_ELEVENLABS_KEY \
--model gpt-4 \
--voice-id ZF6FPAbjXT4488VcRRnw \
--silence-threshold 500 \
--silence-duration 1.5
# See all available options
python voice_assistant/agent.py --help
Note: Command-line arguments take precedence over environment variables.
Changing Model Provider
The voice assistant supports multiple LLM providers through LangChain. Any model with tool calling capabilities can be used:
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_groq import ChatGroq
# Using OpenAI (default)
assistant = VoiceAssistant(
openai_api_key="your-key",
model="gpt-4" # or gpt-4-turbo, gpt-3.5-turbo
)
# Using Anthropic Claude
llm = ChatAnthropic(
api_key="your-anthropic-key",
model="claude-3-5-sonnet-20240620"
)
assistant = VoiceAssistant(
llm=llm, # Pass custom LLM instance
elevenlabs_api_key="your-key"
)
# Using Groq
llm = ChatGroq(
api_key="your-groq-key",
model="llama3-8b-8192"
)
assistant = VoiceAssistant(
llm=llm,
elevenlabs_api_key="your-key"
)
Note: Only models with tool calling capabilities can be used. Check your model provider's documentation for supported models.
Changing Voice Settings
Pass different parameters when initializing:
assistant = VoiceAssistant(
openai_api_key="your-key",
elevenlabs_api_key="your-key",
elevenlabs_voice_id="different-voice-id", # Change voice
silence_threshold=300, # More sensitive
silence_duration=2.0, # Wait longer
model="gpt-3.5-turbo" # Faster model
)
Troubleshooting
Common Issues
-
No Audio Input Detected
- Check microphone permissions
- Lower the
silence_thresholdvalue - Verify PyAudio:
python -c "import pyaudio; pyaudio.PyAudio()"
-
TTS Not Working
- Verify API keys are set correctly
- Check API quotas
- System will fall back to pyttsx3 if ElevenLabs fails
-
MCP Server Connection Issues
- Ensure Node.js is installed
- Check internet connection for npx downloads
- Verify API keys for specific servers
-
High Latency
- Use faster LLM model (e.g.,
gpt-3.5-turbo) - Reduce
max_stepsin MCPAgent - Consider using local models
- Use faster LLM model (e.g.,
Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built on top of mcp-use
- Uses OpenAI Whisper for speech recognition
- Voice synthesis powered by ElevenLabs
- MCP servers from the Model Context Protocol ecosystem
Support
- π§ Email: your.email@example.com
- π¬ Discord: Join our server
- π Issues: GitHub Issues
- π Documentation: Full Docs
Related Servers
Spire.XLS MCP Server
Create, read, edit, and convert Excel files without requiring Microsoft Office.
Plus AI MCP
A Model Context Protocol (MCP) server for automatically generating professional PowerPoint and Google Slides presentations using the Plus AI presentation API
Bear
A server for interacting with the Bear note-taking application.
TinyTasks MCP Server
A hybrid MCP server compatible with Claude Desktop and Web, supporting both local and web deployment modes for task management.
Things MCP
Integrate with the Things 3 to-do app on macOS.
Squad AI
Productβdiscovery and strategy platform integration. Create, query and update opportunities, solutions, outcomes, requirements and feedback from any MCPβaware LLM.
Jira MCP Server by CData
A read-only MCP server for Jira, enabling LLMs to query live Jira data using the CData JDBC Driver.
Jira MCP Server
An MCP server for accessing JIRA issue data stored in Snowflake.
n8n MCP Server
Manage n8n workflows, executions, and credentials through the Model Context Protocol.
OneNote
Interact with Microsoft OneNote using AI language models like Claude and other LLMs.