AgentHub
Self-hosted MCP platform with multiple tools, multi-agent orchestration, web UI, and ML capabilities
MCP Platform
Local MCP runtime with multi-agent orchestration, distributed tool servers, and ML-powered media recommendations.
⚠️ Experimental — intended for personal and experimental use only, not for production deployment.
Prerequisites
- Python 3.12+
- 16GB+ RAM recommended
- One of:
- Ollama installed OR
- GGUF file
1. Quick Start
Get the client running in 3 steps:
Install Dependencies
Clone repo and do the following
cd mcp-platform
# Create virtual environment
python -m venv .venv
# Activate (Linux/macOS)
source .venv/bin/activate
# Activate (Windows PowerShell)
.venv\Scripts\activate
# Install requirements - this will take a while
pip install -r requirements.txt
Choose LLM Backend
Option A: Ollama (recommended as a start)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama server
ollama serve
# Download a model (use 14B+ for best results)
ollama pull qwen2.5:14b-instruct-q4_K_M
Option B: GGUF (local model files)
# Download a GGUF model (example)
wget https://huggingface.co/TheRains/Qwen2.5-14B-Instruct-Q4_K_M-GGUF/blob/main/qwen2.5-14b-instruct-q4_k_m.gguf
# Register the model
# (After starting client, use `:gguf add` command to the downloaded file)
Start the Client
python client.py
Access web UI at: http://localhost:9000
That's it! The client auto-discovers all MCP servers and tools.
2. Using MCP Servers with Other Clients
Use these MCP servers with Claude Desktop, Cline, or any MCP-compatible client.
Example Configuration
Add to your MCP client config (e.g., claude_desktop_config.json):
{
"mcpServers": {
"code_review": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/code_review/server.py"]
},
"location": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/location/server.py"]
},
"plex": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/plex/server.py"]
},
"rag": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/rag/server.py"]
},
"system_tools": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/system_tools/server.py"]
},
"text_tools": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/text_tools/server.py"]
},
"todo": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/todo/server.py"]
},
"knowledge_base": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/knowledge_base/server.py"]
}
}
}
Windows paths:
"command": "C:\\path\\to\\mcp_a2a\\.venv\\Scripts\\python.exe"
Available servers:
code_review- Code analysis (5 tools)location- Weather, time, location (3 tools)plex- Media library + ML recommendations (17 tools) ⚠️ Requires Plex env varsrag- Vector search (4 tools) ⚠️ Requires Ollama + bge-largesystem_tools- System info (4 tools)text_tools- Text processing (7 tools)todo- Task management (6 tools)knowledge_base- Notes management (10 tools)
3. Client Configuration
Environment Variables
Create .env in project root:
# === LLM Backend ===
OLLAMA_VISION_MODEL=qwen3-vl:8b-instruct
MAX_MESSAGE_HISTORY=30 # Chat history limit (default: 20)
LLM_TEMPERATURE=0.3 # Model temperature 0 to 1 (default: 0.3)
# === GGUF Configuration (if using GGUF backend) ===
GGUF_GPU_LAYERS=-1 # -1 = all GPU, 0 = CPU only, N = N layers on GPU
GGUF_CONTEXT_SIZE=4096 # Context window size
GGUF_BATCH_SIZE=512 # Batch size for processing
# === API Keys (optional services) ===
PLEX_URL=http://localhost:32400 # Plex server URL
PLEX_TOKEN=your_token_here # Get from Plex account settings
TRILIUM_URL=http://localhost:8888
TRILIUM_TOKEN=your_token_here
SHASHIN_BASE_URL=http://localhost:6624/
SHASHIN_API_KEY=your_key_here
SERPER_API_KEY=your_key_here # Serper image search (https://serper.dev/api-keys)
OLLAMA_TOKEN=your_token_here # Ollama API key (https://ollama.com/settings/keys)
# === A2A Protocol (optional distributed mode) ===
A2A_ENDPOINTS=http://localhost:8010 # Comma-separated endpoints
A2A_EXPOSED_TOOLS= # Tool categories to expose (empty = all)
# === Performance Tuning (optional) ===
CONCURRENT_LIMIT=3 # Parallel ingestion jobs (default: 1)
EMBEDDING_BATCH_SIZE=50 # Embeddings per batch (default: 20)
DB_FLUSH_BATCH_SIZE=50 # DB inserts per batch (default: 30)
# === Tool Control (optional) ===
DISABLED_TOOLS=knowledge_base:*,todo:* # Disable specific tools/categories
Recommended Setup
Use Ollama for easy setup. Download and install Ollama at https://ollama.com/download and run:
ollama serve
Recommended LLM
ollama pull llama3.2:3b-instruct-q8_0
RAG requires Ollama + bge-large: If bge-large has not been pulled from Ollama, RAG ingestion and semantic search will not work.
ollama pull bge-large
Image tools requires Ollama + vision models: If a vision model has not been pulled from Ollama, image tools will not work.
ollama pull qwen3-vl:8b-instruct
A minimal .env to get started with the core features:
# === Vision ===
OLLAMA_VISION_MODEL=qwen3-vl:8b-instruct
# === Disable unused servers ===
DISABLED_TOOLS=knowledge_base:*,todo:*,plex:*,image_tools:shashin_analyze,shashin_random,shashin_search
# === API Keys ===
OLLAMA_TOKEN=<token> # Free at https://ollama.com — required for web_search_tool
SERPER_API_KEY=<key> # Required for web_image_search_tool (https://serper.dev)
Configuration Details
LLM Backend:
ollama: Uses Ollama server (requiresollama serverunning)gguf: Uses local GGUF model files (GPU recommended)
GGUF GPU Layers:
-1: Use all GPU (fastest, requires model fits in VRAM)0: CPU only (slow but works with any model size)20: Use 20 layers on GPU (balance for large models on limited VRAM)
Performance Tuning:
EMBEDDING_BATCH_SIZE=50+DB_FLUSH_BATCH_SIZE=50= ~6x faster RAG ingestion- For 12GB VRAM, can increase to 100 for even faster processing
CONCURRENT_LIMIT=2enables parallel media ingestion
Disabled Tools:
- Format:
category:tool_nameorcategory:* - Example:
DISABLED_TOOLS=todo:delete_all_todo_items,system:* - Hidden from
:toolslist, return error if called
Feature Requirements
Some features require additional setup before they will function. The table below summarizes what's needed:
| Feature | Required env vars | Additional setup |
|---|---|---|
| RAG ingestion & search | — | Ollama running + bge-large pulled |
| RAG reranking (optional) | — | bge-reranker-v2-m3 pulled — improves result ranking, falls back to cosine if absent |
| Plex media library | PLEX_URL, PLEX_TOKEN | Plex Media Server running |
| Plex ingestion & recommendations | PLEX_URL, PLEX_TOKEN | Ollama running + bge-large pulled |
| Ollama web search | OLLAMA_TOKEN | Ollama account + API key |
| A2A distributed mode | A2A_ENDPOINTS | Remote A2A server running |
Available Commands
These work in both CLI and web UI:
:commands - List all available commands
:clear sessions - Clear all chat history
:clear session <id> - Clear session
:sessions - List all sessions
:stop - Stop current operation
:stats - Show performance metrics
:tools - List available tools (hides disabled)
:tools --all - Show all tools including disabled
:tool <name> - Get tool description
:model - List all available models
:model <name> - Switch to a model (auto-detects backend)
:models - List models (legacy)
:gguf add <path> - Register a GGUF model
:gguf remove <alias> - Remove a GGUF model
:gguf list - List registered GGUF models
:a2a on - Enable agent-to-agent mode
:a2a off - Disable agent-to-agent mode
:a2a status - Check A2A system status
:health - Health overview of all servers and tools
:env - Show environment configuration
API Setup
Ollama Search API (web search):
- Sign up at https://ollama.com/
- Get API key from https://ollama.com/settings/keys
- Add to
.env:OLLAMA_TOKEN=your_key
Plex Media Server:
- Open Plex web interface
- Settings → Network → Show Advanced
- Copy server URL (e.g.,
http://192.168.1.100:32400) - Get token: Settings → Account → Show XML → Copy
authToken - Add to
.env:
PLEX_URL=http://your_server_ip:32400
PLEX_TOKEN=your_token
⚠️ Without
PLEX_URLandPLEX_TOKEN, all Plex tools (library browsing, ingestion, ML recommendations) will be unavailable. The server will load but calls will return a configuration error.
4. Adding Tools (Developer Guide)
Step 1: Create Tool Server
mkdir servers/my_tool
touch servers/my_tool/server.py
Step 2: Implement Tool
# servers/my_tool/server.py
import asyncio
from mcp.server import Server
from mcp.types import TextContent
from mcp import tool
mcp = Server("my_tool-server")
@mcp.tool()
def my_function(arg1: str, arg2: int) -> str:
"""
Short description of what this tool does.
Args:
arg1: Description of arg1
arg2: Description of arg2
Returns:
Description of return value
"""
return f"Processed {arg1} with {arg2}"
async def main():
from mcp.server.stdio import stdio_server
async with stdio_server() as (read_stream, write_stream):
await mcp.run(read_stream, write_stream, mcp.create_initialization_options())
if __name__ == "__main__":
asyncio.run(main())
Step 3: Create Skill Documentation (Optional)
mkdir -p servers/my_tool/skills
touch servers/my_tool/skills/my_feature.md
Step 4: Update Intent Patterns (Optional)
If your tool needs specific routing, add an entry to INTENT_CATALOG in client/query_patterns.py:
{
"name": "my_tool",
"pattern": r'\bmy keyword\b|\bmy phrase\b',
"tools": ["my_function"],
"priority": 3,
"web_search": False,
"skills": False,
}
Step 5: Add External MCP Servers (Optional)
To connect external or third-party MCP servers, create client/external_servers.json.
The client auto-discovers this file on startup — no code changes needed.
SSE transport (remote HTTP event stream):
{
"external_servers": {
"deepwiki": {
"transport": "sse",
"url": "https://mcp.deepwiki.com/mcp",
"enabled": true
}
}
}
HTTP transport (streamable HTTP, e.g. authenticated APIs):
{
"external_servers": {
"neon": {
"transport": "http",
"url": "https://mcp.neon.tech/mcp",
"enabled": true,
"headers": { "Authorization": "Bearer <$TOKEN>" }
}
}
}
Header authentication uses the ES_{SERVER_NAME}_{PLACEHOLDER} convention in .env:
# Server "mcpserver" with <$TOKEN> → ES_MCPSERVER_TOKEN
# Server "mcpserver" with <$API_KEY> → ES_MCPSERVER_API_KEY
ES_MCPSERVER_TOKEN=your_token_here
ES_MCPSERVER_API_KEY=your_api_key_here
Stdio transport (local process servers):
{
"external_servers": {
"pycharm": {
"transport": "stdio",
"command": "/usr/lib/jvm/jdk-17/bin/java",
"args": ["-classpath", "/path/to/mcpserver.jar", "com.intellij.mcpserver.stdio.McpStdioRunnerKt"],
"env": { "IJ_MCP_SERVER_PORT": "64342" },
"enabled": true
}
}
}
Field reference:
| Field | Required | Description |
|---|---|---|
transport | ✅ | "sse", "http", or "stdio" |
url | SSE/HTTP only | Full URL to the endpoint |
headers | No | Request headers — use <$PLACEHOLDER> for secrets |
command | stdio only | Path to the executable |
args | stdio only | Command-line arguments |
env | No | Environment variables passed to the process |
cwd | No | Working directory (defaults to project root) |
enabled | No | false skips without removing (default: true) |
notes | No | Human-readable description, ignored by client |
WSL2 note: For stdio servers bridging to Windows, set
IJ_MCP_SERVER_HOSTinenvto the Windows host IP (cat /etc/resolv.conf | grep nameserver).
Step 6: Test & Deploy
python client.py # restart to auto-discover new server
5. Distributed Mode (A2A Protocol)
Run tools on remote servers and expose them via HTTP.
Setup A2A Server
# Terminal 1
python a2a_server.py # starts on http://localhost:8010
# Terminal 2
python client.py # auto-connects to A2A endpoints in .env
Control Exposed Tools
# Expose specific categories (comma-separated)
A2A_EXPOSED_TOOLS=plex,location,text_tools
# Expose everything (default)
A2A_EXPOSED_TOOLS=
Security: Exclude todo, knowledge_base, rag to protect personal data.
Multi-Endpoint Support
A2A_ENDPOINTS=http://localhost:8010,http://gpu-server:8020
6. Testing
Running Tests
pytest # all tests
pytest -m unit # fast unit tests only
pytest -m integration # integration tests
pytest -m e2e # end-to-end tests
pytest -c tests/pytest.coverage.ini # with coverage
Test Structure
tests/
├── conftest.py
├── pytest.ini
├── unit/
│ ├── test_session_manager.py
│ ├── test_models.py
│ ├── test_context_tracker.py
│ ├── test_intent_patterns.py
│ └── test_code_review_tools.py
├── integration/
│ ├── test_websocket_flow.py
│ └── test_langgraph_agent.py
├── e2e/
│ └── test_full_conversation.py
└── results/
├── junit.xml
├── coverage.xml
├── test-report.html
└── coverage-report.html
CI/CD Integration
GitHub Actions:
- name: Run tests
run: pytest
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: tests/results/coverage.xml
7. Architecture
Multi-Server Design
servers/
├── code_review/ 5 tools - Code analysis
├── knowledge_base/ 10 tools - Notes management
├── location/ 3 tools - Weather, time, location
├── plex/ 17 tools - Media + ML recommendations [requires PLEX_URL + PLEX_TOKEN]
├── rag/ 4 tools - Vector search [requires Ollama + bge-large]
├── system_tools/ 4 tools - System info
├── text_tools/ 7 tools - Text processing
└── todo/ 6 tools - Task management
Total: 56 local tools
Directory Structure
mcp_a2a/
├── servers/
├── a2a_server.py
├── client.py
├── client/
│ ├── ui/
│ │ ├── index.html
│ │ └── dashboard.html
│ ├── langgraph.py
│ ├── query_patterns.py ← Intent routing catalog
│ ├── search_client.py ← Ollama web search & fetch
│ ├── websocket.py
│ └── ...
└── tools/
8. Intent Patterns & Troubleshooting
Intent Patterns
The client uses a pattern catalog (client/query_patterns.py) to route queries to the right tools without sending all 75+ tools to the LLM on every message. Each intent has a priority — lower number wins when multiple patterns match.
Priority 1 — specific, high-confidence routing
| Intent | Example prompts | Tools | Key params |
|---|---|---|---|
analyze_image | "Analyze this image: https://…/photo.jpg", "Describe /home/mike/img.png" | analyze_image_tool | image_url or image_file_path |
shashin_random | "Show me a random photo", "Surprise me with a picture" | shashin_random_tool | none |
web_image_search | "Show me a picture of Jorma Tommila", "What does the Eiffel Tower look like?" | web_image_search_tool | query: str |
web_image_searchexcludes queries containing "my" or "shashin" — those fall through toshashin_searchinstead.
Priority 2 — contextual, content-aware routing
| Intent | Example prompts | Tools | Key params |
|---|---|---|---|
github_review | "Review github.com/user/repo", "Analyze this GitHub project" | github_clone_repo, analyze_project, review_code | repo_url: str |
file_analyst | "Analyze /home/mike/budget.csv", "Open C:\Users\data.json" | read_file_tool_handler | file_path: str |
code_assistant | "What's the tech stack?", "Fix the bug in auth.py", "List npm dependencies" | analyze_project, get_project_dependencies, fix_code_file | project_path or file_path |
shashin_analyze | "Describe the beach pictures", "What does the sunset photo show?" | shashin_search_tool, shashin_analyze_tool | term: str → image_id: uuid |
shashin_search | "Find photos of Noah", "Show my photos from Japan" | shashin_search_tool | term: str, page: int (default 0) |
plex_search | "Find movies about time travel", "Scene where the hero escapes the prison" | rag_search_tool, semantic_media_search_text, scene_locator_tool | query: str |
rag | "What do you know about quantum computing?", "How many items are in RAG?" | rag_search_tool, rag_status_tool, rag_list_sources_tool | query: str |
trilium | "Search my notes for project ideas", "Create a note about today's meeting" | search_notes, create_note, update_note_content | query / title / content / note_id |
Priority 3 — utility, lower specificity
| Intent | Example prompts | Tools | Key params |
|---|---|---|---|
weather | "What's the weather in Vancouver?", "Will it rain today?" | get_location_tool, get_weather_tool | location auto-resolved |
location | "Where am I?", "What's my current location?" | get_location_tool | none |
time | "What time is it?", "What's today's date?" | get_time_tool | none |
system | "What's my GPU utilization?", "Show running processes" | get_hardware_specs_tool, list_system_processes | none |
ml_recommendation | "What should I watch tonight?", "Train the recommender model" | recommend_content, train_recommender, record_viewing | title / count / media_type |
code | "Debug this code", "Review and summarize this file" | review_code, summarize_code_file, debug_fix | file_path or inline code |
text | "Summarize this", "Explain this concept simply" | summarize_text_tool, explain_simplified_tool | text: str |
todo | "Add 'deploy feature' to my todos", "List my tasks" | add_todo_item, list_todo_items, update_todo_item | text / item_id / status |
knowledge | "Remember that Mike prefers dark mode", "Search my notes for API keys" | add_entry, search_entries, search_semantic | content / query / tag |
ingest | "Ingest 5 items from Plex", "Process subtitles now" | ingest_movies, ingest_batch_tool | limit: int |
a2a | "Discover remote agents", "Send this task to the remote agent" | send_a2a*, discover_a2a | agent_url / task |
current_events | "What's the latest news?", "What's happening in the world?" | web search only | query passed to search |
stock_price | "What's NVIDIA trading at?", "Apple market cap today" | web search only | query passed to search |
Conversational bypass
Queries that start with personal statements ("I like…", "My favourite…"), filler words ("yes", "thanks"), creative tasks ("write me a poem"), or pronoun follow-ups ("what did he do?", "tell me more about them") bypass the catalog entirely — no tools are bound and the LLM answers from context.
Overriding intent routing
Prefix your message with Using <tool_name>, to bypass pattern matching entirely and force a specific tool:
Using shashin_search_tool, find photos of Noah
Using web_image_search_tool, show me a picture of a red panda
Troubleshooting
Ollama models not appearing:
ollama serve
ollama list
python client.py
RAG not working / embedding errors:
- Ensure Ollama is running:
ollama serve - Confirm
bge-largeis available:ollama list - If missing, pull it:
ollama pull bge-large - RAG requires Ollama for embeddings regardless of which LLM backend (Ollama or GGUF) you use for chat
Plex tools returning errors:
- Confirm
PLEX_URLandPLEX_TOKENare set in.env - Verify the Plex server is reachable:
curl $PLEX_URL/identity?X-Plex-Token=$PLEX_TOKEN - See API Setup for how to locate your token
GGUF model won't load:
- Check model size vs VRAM (use models <7GB for 12GB VRAM)
- Reduce GPU layers:
export GGUF_GPU_LAYERS=20 - CPU only:
export GGUF_GPU_LAYERS=0
Web UI won't load:
netstat -an | grep LISTEN # check ports 8765, 8766, 9000
A2A server not connecting:
curl http://localhost:8010/.well-known/agent-card.json
Ollama Search not working:
- Verify
OLLAMA_TOKENin.env - Get API key at https://ollama.com/settings/keys
- System falls back to LLM knowledge if unavailable
RAG search returns wrong results:
- RAG uses semantic similarity — returns closest matches even if not exact
- Check what's in the database:
> show rag stats - Content is only stored after researching URLs or manually adding via
rag_add_tool
RAG ingestion is slow:
- Normal: ~2.5s for 16 chunks (10,000 characters)
- If slower, check Ollama is running:
ollama list
Conversation history not working:
- Smaller models (≤7B) often refuse to answer questions about conversation history
- Switch to a larger model:
:model qwen2.5:14b-instruct-q4_K_M - Models with good instruction following:
qwen2.5:14b(80-95%),llama3.1:8b(~70%),mistral-nemo(~70%) - Avoid for this use case:
qwen2.5:3b,qwen2.5:7b(~10-30%)
Query not routing to the right tool:
- Intent patterns are matched in priority order (1 → 3)
- Use explicit phrasing:
"Using shashin_search_tool, find photos of Noah"bypasses pattern matching entirely - Check active intents in
client/query_patterns.py
Tools not appearing:
:tools --all # check if disabled
# check DISABLED_TOOLS in .env
python client.py # restart
License
MIT License
Servidores relacionados
YaraFlux
An MCP server for YARA scanning, enabling LLMs to analyze files using YARA rules.
Sharepoint
Provides access to organizational SharePoint sites and files.
Filesystem MCP Server
A secure server for filesystem operations with controlled access to specified directories.
Readonly Filesystem MCP Server
Provides read-only access to local files and directories.
MCP Apple Notes
Perform semantic search and retrieval augmented generation over your Apple Notes.
Excel/CSV MCP Server
Read, analyze, and manipulate data in Excel (XLSX, XLS) and CSV files with advanced filtering and analytics.
Everything Search
Fast Windows file search using Everything SDK
Filesystem MCP Server
Provides AI agents with secure access to local filesystem operations like reading, writing, and managing files and directories.
plsreadme
Share markdown files and text as clean, readable web links. Works with Cursor, Claude Desktop, VS Code, Windsurf, and any MCP client.
MCP Start App
An MCP server for local file management and system operations.