AgentHub
Self-hosted MCP platform with multiple tools, multi-agent orchestration, web UI, and ML capabilities
MCP Multi-Server Architecture
A Model Context Protocol (MCP) implementation with distributed multi-server architecture, Agent-to-Agent (A2A) protocol support, and ML-powered recommendations.
Prerequisites
- Python 3.10+
- 16GB+ RAM recommended
- One of:
- Ollama installed OR
- GGUF file
1. Quick Start
Get the client running in 3 steps:
Install Dependencies
# Clone repository
git clone <repo-url>
cd mcp_a2a
# Create virtual environment
python -m venv .venv
# Activate (Linux/macOS)
source .venv/bin/activate
# Activate (Windows PowerShell)
.venv\Scripts\activate
# Install requirements
pip install -r requirements.txt
Choose LLM Backend
Option A: Ollama (recommended as a start)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama server
ollama serve
# Download a model (use 14B+ for best results)
ollama pull qwen2.5:14b
Option B: GGUF (local model files)
# Download a GGUF model (example)
wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
# Register the model
# (After starting client, use :gguf add command)
Start the Client
python client.py
Access web UI at: http://localhost:9000
That's it! The client auto-discovers all MCP servers and tools.
2. Using MCP Servers with Other Clients
Use these MCP servers with Claude Desktop, Cline, or any MCP-compatible client.
Example Configuration
Add to your MCP client config (e.g., claude_desktop_config.json):
{
"mcpServers": {
"code_review": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/code_review/server.py"]
},
"location": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/location/server.py"]
},
"plex": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/plex/server.py"]
},
"rag": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/rag/server.py"]
},
"system_tools": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/system_tools/server.py"]
},
"text_tools": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/text_tools/server.py"]
},
"todo": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/todo/server.py"]
},
"knowledge_base": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/knowledge_base/server.py"]
}
}
}
Windows paths:
"command": "C:\\path\\to\\mcp_a2a\\.venv\\Scripts\\python.exe"
Available servers:
code_review- Code analysis (5 tools)location- Weather, time, location (3 tools)plex- Media library + ML recommendations (17 tools)rag- Vector search (4 tools)system_tools- System info (4 tools)text_tools- Text processing (7 tools)todo- Task management (6 tools)knowledge_base- Notes management (10 tools)
3. Client Configuration
Environment Variables
Create .env in project root:
# === LLM Backend ===
MAX_MESSAGE_HISTORY=30 # Chat history limit (default: 20)
LLM_TEMPERATURE=0.3 # Model temperature 0 to 1 (default: 0.3)
# === GGUF Configuration (if using GGUF backend) ===
GGUF_GPU_LAYERS=-1 # -1 = all GPU, 0 = CPU only, N = N layers on GPU
GGUF_CONTEXT_SIZE=4096 # Context window size
GGUF_BATCH_SIZE=512 # Batch size for processing
# === API Keys (optional services) ===
PLEX_URL=http://localhost:32400 # Plex server URL
PLEX_TOKEN=your_token_here # Get from Plex account settings
WEATHER_TOKEN=your_token_here # OpenWeatherMap API key
LANGSEARCH_TOKEN=your_token_here # LangSearch API key (https://langsearch.com)
# === A2A Protocol (optional distributed mode) ===
A2A_ENDPOINTS=http://localhost:8010 # Comma-separated endpoints
A2A_EXPOSED_TOOLS= # Tool categories to expose (empty = all)
# === Performance Tuning (optional) ===
CONCURRENT_LIMIT=3 # Parallel ingestion jobs (default: 1)
EMBEDDING_BATCH_SIZE=50 # Embeddings per batch (default: 20)
DB_FLUSH_BATCH_SIZE=50 # DB inserts per batch (default: 30)
# === Tool Control (optional) ===
DISABLED_TOOLS=knowledge_base:*,todo:* # Disable specific tools/categories
Configuration Details
LLM Backend:
ollama: Uses Ollama server (requiresollama serverunning)gguf: Uses local GGUF model files (GPU recommended)
GGUF GPU Layers:
-1: Use all GPU (fastest, requires model fits in VRAM)0: CPU only (slow but works with any model size)20: Use 20 layers on GPU (balance for large models on limited VRAM)
Performance Tuning:
EMBEDDING_BATCH_SIZE=50+DB_FLUSH_BATCH_SIZE=50= ~6x faster RAG ingestion- For 12GB VRAM, can increase to 100 for even faster processing
CONCURRENT_LIMIT=2enables parallel media ingestion
Disabled Tools:
- Format:
category:tool_nameorcategory:* - Example:
DISABLED_TOOLS=todo:delete_all_todo_items,system:* - Hidden from
:toolslist, return error if called
Available Commands
These work in both CLI and web UI:
:commands - List all available commands
:clear history - Clear all chat history
:clear session <id> - Clear session
:sessions - List all sessions
:stop - Stop current operation
:stats - Show performance metrics
:tools - List available tools (hides disabled)
:tools --all - Show all tools including disabled
:tool <name> - Get tool description
:model - List all available models
:model <name> - Switch to a model (auto-detects backend)
:models - List models (legacy)
:sync - Sync to model in last_model.txt
:gguf add <path> - Register a GGUF model
:gguf remove <alias> - Remove a GGUF model
:gguf list - List registered GGUF models
:a2a on - Enable agent-to-agent mode
:a2a off - Disable agent-to-agent mode
:a2a status - Check A2A system status
:env - Show environment configuration
API Setup
Weather (OpenWeatherMap):
- Sign up at https://openweathermap.org/api
- Get API key from account settings
- Add to
.env:WEATHER_TOKEN=your_key
LangSearch (web search):
- Sign up at https://langsearch.com
- Get API key from dashboard
- Add to
.env:LANGSEARCH_TOKEN=your_key
Plex Media Server:
- Open Plex web interface
- Settings → Network → Show Advanced
- Copy server URL (e.g.,
http://192.168.1.100:32400) - Get token: Settings → Account → Show XML → Copy
authToken - Add to
.env:
PLEX_URL=http://your_server_ip:32400
PLEX_TOKEN=your_token
4. Adding Tools (Developer Guide)
Step 1: Create Tool Server
# Create server directory
mkdir servers/my_tool
# Create server file
touch servers/my_tool/server.py
Step 2: Implement Tool
# servers/my_tool/server.py
import asyncio
from mcp.server import Server
from mcp.types import TextContent
from mcp import tool
mcp = Server("my_tool-server")
@mcp.tool()
def my_function(arg1: str, arg2: int) -> str:
"""
Short description of what this tool does.
Detailed explanation of behavior, use cases, etc.
Args:
arg1: Description of arg1
arg2: Description of arg2
Returns:
Description of return value
"""
result = f"Processed {arg1} with {arg2}"
return result
async def main():
from mcp.server.stdio import stdio_server
async with stdio_server() as (read_stream, write_stream):
await mcp.run(
read_stream,
write_stream,
mcp.create_initialization_options()
)
if __name__ == "__main__":
asyncio.run(main())
Step 3: Create Skill Documentation (Optional)
# Create skills directory
mkdir -p servers/my_tool/skills
# Create skill file
touch servers/my_tool/skills/my_feature.md
# My Feature Skill
This skill enables X functionality.
## When to Use
- Use case 1
- Use case 2
## Examples
User: "Do something"
Assistant: [calls my_function with appropriate args]
Step 4: Update Intent Patterns (Optional)
If your tool needs specific routing, update client/langgraph.py:
INTENT_PATTERNS = {
# ... existing patterns ...
"my_tool": {
"pattern": r'\bmy keyword\b|\bmy phrase\b',
"tools": ["my_function"],
"priority": 3
}
}
Step 5: Add External MCP Servers (Optional)
To connect external or third-party MCP servers without writing a local server, create
client/external_servers.json. The client auto-discovers this file on startup — no code
changes needed.
SSE transport (remote HTTP-based servers):
{
"external_servers": {
"deepwiki": {
"transport": "sse",
"url": "https://mcp.deepwiki.com/mcp",
"enabled": true,
"notes": "DeepWiki — reads wiki content from GitHub repos"
}
}
}
Stdio transport (local process servers, e.g. IDE integrations):
{
"external_servers": {
"pycharm": {
"transport": "stdio",
"command": "/usr/lib/jvm/jdk-17.0.12-oracle-x64/bin/java",
"args": [
"-classpath",
"/path/to/pycharm/plugins/mcpserver/lib/mcpserver-frontend.jar",
"com.intellij.mcpserver.stdio.McpStdioRunnerKt"
],
"env": {
"IJ_MCP_SERVER_PORT": "64342"
},
"enabled": true,
"notes": "PyCharm MCP integration — requires PyCharm to be running"
}
}
}
Field reference:
| Field | Required | Description |
|---|---|---|
transport | ✅ | "sse" or "stdio" |
url | SSE only | Full URL to the SSE endpoint |
command | stdio only | Path to the executable |
args | stdio only | Command-line arguments as an array |
env | No | Environment variables passed to the process |
cwd | No | Working directory (defaults to project root) |
enabled | No | false skips the server without removing it (default: true) |
notes | No | Human-readable description, ignored by the client |
WSL2 note: If running the client in WSL2 and connecting to a Windows-hosted stdio process, set
IJ_MCP_SERVER_HOSTinenvto the Windows host IP (cat /etc/resolv.conf | grep nameserver). Stdio servers withIJ_MCP_SERVER_PORTare port-checked before registration — if unreachable the server is skipped cleanly rather than crashing the client.
Step 6: Test & Deploy
# Restart client (auto-discovers new server)
python client.py
# Test in CLI or web UI
> test my new tool
5. Distributed Mode (A2A Protocol)
Run tools on remote servers and expose them via HTTP.
Setup A2A Server
Terminal 1 - Start A2A server:
python a2a_server.py
Server starts on http://localhost:8010
Terminal 2 - Start client:
python client.py
Client auto-connects to A2A endpoints in .env
Control Exposed Tools
Use A2A_EXPOSED_TOOLS to control which categories are publicly accessible:
# Expose specific categories (comma-separated)
A2A_EXPOSED_TOOLS=plex,location,text_tools
# Expose everything (default)
A2A_EXPOSED_TOOLS=
# Available categories:
# plex, location, text_tools, system_tools, code_review,
# rag, todo, knowledge_base
Security:
- Empty = all 8 servers exposed (56 tools)
- Specified = only listed categories exposed
- Exclude
todo,knowledge_base,ragto protect personal data
Multi-Endpoint Support
Connect to multiple A2A servers:
# In .env
A2A_ENDPOINTS=http://localhost:8010,http://gpu-server:8020
Client aggregates tools from all successful connections.
Check Available Tools
Via HTTP:
curl http://localhost:8010/tool-categories
Via Client:
> :a2a status
6. Testing
Running Tests
The project includes a comprehensive test suite with 44+ tests covering unit, integration, and end-to-end scenarios.
Run all tests:
pytest
Run specific test categories:
pytest -m unit # Fast unit tests only
pytest -m integration # Integration tests
pytest -m e2e # End-to-end tests
Run with coverage:
pytest -c tests/pytest.coverage.ini
Run specific test file:
pytest tests/unit/test_session_manager.py
pytest tests/integration/test_websocket_flow.py
Test Reports
After running tests, reports are automatically generated in tests/results/:
junit.xml- Test results for CI/CD (Jenkins, GitHub Actions)coverage.xml- Coverage data for Codecov/Coverallstest-report.html- Interactive HTML test reportcoverage-report.html- Coverage overview with per-file metrics
View HTML reports:
# Test results
open tests/results/test-report.html
# Coverage report
open tests/results/coverage-report.html
Test Structure
tests/
├── conftest.py # Shared fixtures & config
├── pytest.ini # Test configuration
├── unit/ # Fast, isolated tests
│ ├── test_session_manager.py
│ ├── test_models.py
│ ├── test_context_tracker.py
│ ├── test_intent_patterns.py
│ └── test_code_review_tools.py
├── integration/ # Multiple component tests
│ ├── test_websocket_flow.py
│ └── test_langgraph_agent.py
├── e2e/ # Full system tests
│ └── test_full_conversation.py
└── results/ # Generated reports
├── junit.xml
├── coverage.xml
├── test-report.html
├── coverage-report.html
└── generate_html.py
Writing Tests
Tests use pytest with async support and comprehensive fixtures:
import pytest
from unittest.mock import MagicMock
@pytest.mark.unit
def test_create_session(session_manager):
"""Test session creation"""
session_id = session_manager.create_session("Test Session")
assert session_id is not None
assert session_id > 0
@pytest.mark.integration
@pytest.mark.asyncio
async def test_full_workflow(session_manager, mock_llm):
"""Test complete conversation workflow"""
# Test implementation
pass
Available fixtures:
session_manager- Temporary database session managermock_llm- Mocked LLM for testingmock_websocket- Mocked WebSocket connectiontemp_dir- Temporary directory for test filessample_python_file- Sample code for testing
Test Dependencies
# Install test dependencies
pip install pytest pytest-asyncio pytest-cov pytest-timeout pytest-xdist
CI/CD Integration
GitHub Actions:
- name: Run tests
run: pytest
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: tests/results/coverage.xml
GitLab CI:
test:
script:
- pytest
artifacts:
reports:
junit: tests/results/junit.xml
coverage_report:
coverage_format: cobertura
path: tests/results/coverage.xml
Coverage Goals
- Current coverage: ~90% of core client code
- Well-tested modules:
session_manager.py(90%+)context_tracker.py(62%+)models.py(40%+)query_patterns.py(66%+)
Running Tests from PyCharm
- Right-click
tests/folder → Run 'pytest in tests' - Or use the green play button next to test functions
- Reports auto-generate in
tests/results/
Troubleshooting Tests
Import errors:
- Ensure you're in the project root:
cd /path/to/mcp_a2a - Activate virtual environment:
source .venv/bin/activate - Install dependencies:
pip install -r requirements.txt
Async test failures:
- Install pytest-asyncio:
pip install pytest-asyncio - Tests are auto-marked with
@pytest.mark.asyncio
Coverage not working:
- Install pytest-cov:
pip install pytest-cov - Use coverage config:
pytest -c tests/pytest.coverage.ini
7. Architecture
Multi-Server Design
8 specialized MCP servers communicate via stdio:
servers/
├── code_review/ 5 tools - Code analysis
├── knowledge_base/ 10 tools - Notes management
├── location/ 3 tools - Weather, time, location
├── plex/ 17 tools - Media + ML recommendations
├── rag/ 4 tools - Vector search
├── system_tools/ 4 tools - System info
├── text_tools/ 7 tools - Text processing
└── todo/ 6 tools - Task management
Total: 56 local tools
Directory Structure
mcp_a2a/
├── servers/ # MCP servers (stdio)
│ ├── plex/
│ │ ├── server.py
│ │ ├── ml_recommender.py
│ │ └── skills/
│ └── ...
├── a2a_server.py # A2A HTTP server
├── client.py # AI agent client
├── client/
│ ├── ui/
│ │ ├── index.html # Web UI
│ │ └── dashboard.html # Dashboard UI
│ ├── langgraph.py # Agent execution
│ ├── websocket.py # WebSocket server
│ └── ...
└── tools/ # Tool implementations
8. Example Prompts & Troubleshooting
Example Prompts
Weather:
> What's the weather in Vancouver?
Plex ML Recommendations:
> What should I watch tonight?
> Recommend unwatched SciFi movies
> Show recommender stats
Code Analysis:
> Analyze the code in /path/to/project
> Review this Python file for bugs
Task Management:
> Add "deploy feature" to my todos
> List my todos
Web Search (via LangSearch):
> Who won the 2024 NBA championship?
> Latest AI developments
RAG (Retrieval-Augmented Generation):
Automatic ingestion from web research:
> Write a report about quantum computing using
https://en.wikipedia.org/wiki/Quantum_computing and
https://en.wikipedia.org/wiki/Quantum_algorithm as sources
✅ Fetches both Wikipedia pages
✅ Automatically stores content in RAG (16 chunks, ~2.5s)
✅ Generates report using the content
💾 Content available for future searches
Retrieving stored content:
> Use the rag_search_tool to search for "quantum entanglement"
> What do you have in the RAG about algorithm complexity?
> Search the rag for information about superposition
Checking RAG status:
> What's in the rag database?
> Show rag stats
How RAG works:
- When you research topics using URLs, content is automatically chunked (350 tokens max), embedded using
bge-large, and stored in SQLite - URLs are deduplicated - the same page won't be stored twice
- Semantic search finds the most relevant content even if exact keywords don't match
- Search returns top 5 results with similarity scores and source URLs
Troubleshooting
Ollama models not appearing:
# Make sure Ollama is running
ollama serve
# Check models are downloaded
ollama list
# Restart client
python client.py
GGUF model won't load:
- Check model size vs VRAM (use models <7GB for 12GB VRAM)
- Reduce GPU layers:
export GGUF_GPU_LAYERS=20 - Increase timeout:
export GGUF_LOAD_TIMEOUT=300 - Use CPU only:
export GGUF_GPU_LAYERS=0
Web UI won't load:
# Check ports are available: 8765, 8766, 9000
netstat -an | grep LISTEN
# Try localhost directly
http://localhost:9000
A2A server not connecting:
# Verify server is running
curl http://localhost:8010/.well-known/agent-card.json
# Check A2A_ENDPOINTS in .env
LangSearch not working:
- Verify
LANGSEARCH_TOKENin.env - Check API key at https://langsearch.com
- System falls back to LLM if unavailable
RAG search returns wrong results:
- RAG uses semantic similarity - it returns the closest matches even if they're not exact
- If searching for content that doesn't exist, it returns the most similar available content
- Check what's in the database:
> show rag stats - Content is only stored after researching URLs or manually adding via
rag_add_tool
RAG ingestion is slow:
- Current performance: ~2.5 seconds for 16 chunks (10,000 characters)
- Embeddings: ~0.5-2s (concurrent processing with 5 workers)
- Database insert: ~0.02s (binary embeddings with batch inserts)
- If slower, check Ollama is running:
ollama list
Conversation history not working ("what was my last prompt" fails):
- Cause: Smaller models (7B parameters) often refuse to answer questions about conversation history, even when the data is present in their context
- Solution 1 (Recommended): Switch to a larger model with better instruction following:
ollama pull qwen2.5:14b
:model qwen2.5:14b
- Solution 2: Use models known for good instruction following:
qwen2.5:14borqwen2.5:32b(excellent, 80-95% success rate)llama3.1:8b(good, ~70% success rate)mistral-nemo(good, ~70% success rate)- Avoid:
qwen2.5:3b,qwen2.5:7b(~10-30% success rate)
- Why this happens: Smaller models prioritize their safety training ("don't claim knowledge you don't have") over system instructions, causing them to deny access to conversation history even when it's available in their context window
- Expected behavior with larger models:
You: "what's the weather?"
Bot: "It's sunny, 22°C"
You: "what was my last prompt?"
Bot: "Your last prompt was: what's the weather?"
Tools not appearing:
# Check tool is enabled
:tools --all
# Check DISABLED_TOOLS in .env
# Restart client
python client.py
License
MIT License
Related Servers
MCP-PROCESS
Provides shell access to execute commands and interact with the local file system.
SharePoint MCP Server
Browse and interact with Microsoft SharePoint sites and documents.
Edit File Lines MCP Server
Make precise line-based edits to text files within allowed directories.
Desktop Commander MCP Server
A Node.js MCP server for managing local files, processes, and terminal sessions.
File System MCP Server
A server for comprehensive file and directory management on the local file system.
Custom PDF MCP Server
A server for processing PDF files, allowing text and table extraction, metadata retrieval, and file listing within a specific directory.
YggTorrent
A server to programmatically interact with the YggTorrent file-sharing platform.
KnowledgeBaseMCP
Extract text content from local PDF, DOCX, and PPTX files to build a knowledge base.
LDIMS MCP
Provides an MCP interface for the LDIMS document management system.
FTP Access
Provides access to an FTP server for file operations.