AgentHub
Self-hosted MCP platform with multiple tools, multi-agent orchestration, web UI, and ML capabilities
MCP Platform
Local MCP runtime with multi-agent orchestration, distributed tool servers, and ML-powered media recommendations.
⚠️ Experimental — intended for personal and experimental use only, not for production deployment.
Prerequisites
- Python 3.12+
- 16GB+ RAM recommended
- One of:
- Ollama installed OR
- GGUF file
1. Quick Start
Get the client running in 3 steps:
Install Dependencies
Clone repo and do the following
cd mcp-platform
# Create virtual environment
python -m venv .venv
# Activate (Linux/macOS)
source .venv/bin/activate
# Activate (Windows PowerShell)
.venv\Scripts\activate
# Install requirements - this will take a while
pip install -r requirements.txt
Choose LLM Backend
Option A: Ollama (recommended as a start)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama server
ollama serve
# Download a model (use 14B+ for best results)
ollama pull qwen2.5:14b-instruct-q4_K_M
Option B: GGUF (local model files)
# Download a GGUF model (example)
wget https://huggingface.co/TheRains/Qwen2.5-14B-Instruct-Q4_K_M-GGUF/blob/main/qwen2.5-14b-instruct-q4_k_m.gguf
# Register the model
# (After starting client, use `:gguf add` command to the downloaded file)
Start the Client
python client.py
Access web UI at: http://localhost:9000
That's it! The client auto-discovers all MCP servers and tools.
2. Using MCP Servers with Other Clients
Use these MCP servers with Claude Desktop, Cline, or any MCP-compatible client.
Example Configuration
Add to your MCP client config (e.g., claude_desktop_config.json):
{
"mcpServers": {
"code_review": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/code_review/server.py"]
},
"location": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/location/server.py"]
},
"plex": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/plex/server.py"]
},
"rag": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/rag/server.py"]
},
"system_tools": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/system_tools/server.py"]
},
"text_tools": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/text_tools/server.py"]
},
"todo": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/todo/server.py"]
},
"knowledge_base": {
"command": "/path/to/mcp_a2a/.venv/bin/python",
"args": ["/path/to/mcp_a2a/servers/knowledge_base/server.py"]
}
}
}
Windows paths:
"command": "C:\\path\\to\\mcp_a2a\\.venv\\Scripts\\python.exe"
Available servers:
code_review- Code analysis (5 tools)location- Weather, time, location (3 tools)plex- Media library + ML recommendations (17 tools) ⚠️ Requires Plex env varsrag- Vector search (4 tools) ⚠️ Requires Ollama + bge-largesystem_tools- System info (4 tools)text_tools- Text processing (7 tools)todo- Task management (6 tools)knowledge_base- Notes management (10 tools)
3. Client Configuration
Environment Variables
Create .env in project root:
# === LLM Backend ===
OLLAMA_VISION_MODEL=qwen3-vl:8b-instruct
MAX_MESSAGE_HISTORY=30 # Chat history limit (default: 20)
LLM_TEMPERATURE=0.3 # Model temperature 0 to 1 (default: 0.3)
# === GGUF Configuration (if using GGUF backend) ===
GGUF_GPU_LAYERS=-1 # -1 = all GPU, 0 = CPU only, N = N layers on GPU
GGUF_CONTEXT_SIZE=4096 # Context window size
GGUF_BATCH_SIZE=512 # Batch size for processing
# === API Keys (optional services) ===
PLEX_URL=http://localhost:32400 # Plex server URL
PLEX_TOKEN=your_token_here # Get from Plex account settings
TRILIUM_URL=http://localhost:8888
TRILIUM_TOKEN=your_token_here
SHASHIN_BASE_URL=http://localhost:6624/
SHASHIN_API_KEY=your_key_here
SERPER_API_KEY=your_key_here # Serper image search (https://serper.dev/api-keys)
OLLAMA_TOKEN=your_token_here # Ollama API key (https://ollama.com/settings/keys)
# === A2A Protocol (optional distributed mode) ===
A2A_ENDPOINTS=http://localhost:8010 # Comma-separated endpoints
A2A_EXPOSED_TOOLS= # Tool categories to expose (empty = all)
# === Performance Tuning (optional) ===
CONCURRENT_LIMIT=3 # Parallel ingestion jobs (default: 1)
EMBEDDING_BATCH_SIZE=50 # Embeddings per batch (default: 20)
DB_FLUSH_BATCH_SIZE=50 # DB inserts per batch (default: 30)
# === Tool Control (optional) ===
DISABLED_TOOLS=knowledge_base:*,todo:* # Disable specific tools/categories
Recommended Setup
Use Ollama for easy setup. Download and install Ollama at https://ollama.com/download and run:
ollama serve
Recommended LLM
ollama pull llama3.2:3b-instruct-q8_0
RAG requires Ollama + bge-large: If bge-large has not been pulled from Ollama, RAG ingestion and semantic search will not work.
ollama pull bge-large
Image tools requires Ollama + vision models: If a vision model has not been pulled from Ollama, image tools will not work.
ollama pull qwen3-vl:8b-instruct
A minimal .env to get started with the core features:
# === Vision ===
OLLAMA_VISION_MODEL=qwen3-vl:8b-instruct
# === Disable unused servers ===
DISABLED_TOOLS=knowledge_base:*,todo:*,plex:*,image_tools:shashin_analyze,shashin_random,shashin_search
# === API Keys ===
OLLAMA_TOKEN=<token> # Free at https://ollama.com — required for web_search_tool
SERPER_API_KEY=<key> # Required for web_image_search_tool (https://serper.dev)
Configuration Details
LLM Backend:
ollama: Uses Ollama server (requiresollama serverunning)gguf: Uses local GGUF model files (GPU recommended)
GGUF GPU Layers:
-1: Use all GPU (fastest, requires model fits in VRAM)0: CPU only (slow but works with any model size)20: Use 20 layers on GPU (balance for large models on limited VRAM)
Performance Tuning:
EMBEDDING_BATCH_SIZE=50+DB_FLUSH_BATCH_SIZE=50= ~6x faster RAG ingestion- For 12GB VRAM, can increase to 100 for even faster processing
CONCURRENT_LIMIT=2enables parallel media ingestion
Disabled Tools:
- Format:
category:tool_nameorcategory:* - Example:
DISABLED_TOOLS=todo:delete_all_todo_items,system:* - Hidden from
:toolslist, return error if called
Feature Requirements
Some features require additional setup before they will function. The table below summarizes what's needed:
| Feature | Required env vars | Additional setup |
|---|---|---|
| RAG ingestion & search | — | Ollama running + bge-large pulled |
| RAG reranking (optional) | — | bge-reranker-v2-m3 pulled — improves result ranking, falls back to cosine if absent |
| Plex media library | PLEX_URL, PLEX_TOKEN | Plex Media Server running |
| Plex ingestion & recommendations | PLEX_URL, PLEX_TOKEN | Ollama running + bge-large pulled |
| Ollama web search | OLLAMA_TOKEN | Ollama account + API key |
| A2A distributed mode | A2A_ENDPOINTS | Remote A2A server running |
Available Commands
These work in both CLI and web UI:
:commands - List all available commands
:clear sessions - Clear all chat history
:clear session <id> - Clear session
:sessions - List all sessions
:stop - Stop current operation
:stats - Show performance metrics
:tools - List available tools (hides disabled)
:tools --all - Show all tools including disabled
:tool <name> - Get tool description
:model - List all available models
:model <name> - Switch to a model (auto-detects backend)
:models - List models (legacy)
:gguf add <path> - Register a GGUF model
:gguf remove <alias> - Remove a GGUF model
:gguf list - List registered GGUF models
:a2a on - Enable agent-to-agent mode
:a2a off - Disable agent-to-agent mode
:a2a status - Check A2A system status
:health - Health overview of all servers and tools
:env - Show environment configuration
API Setup
Ollama Search API (web search):
- Sign up at https://ollama.com/
- Get API key from https://ollama.com/settings/keys
- Add to
.env:OLLAMA_TOKEN=your_key
Plex Media Server:
- Open Plex web interface
- Settings → Network → Show Advanced
- Copy server URL (e.g.,
http://192.168.1.100:32400) - Get token: Settings → Account → Show XML → Copy
authToken - Add to
.env:
PLEX_URL=http://your_server_ip:32400
PLEX_TOKEN=your_token
⚠️ Without
PLEX_URLandPLEX_TOKEN, all Plex tools (library browsing, ingestion, ML recommendations) will be unavailable. The server will load but calls will return a configuration error.
4. Adding Tools (Developer Guide)
Step 1: Create Tool Server
mkdir servers/my_tool
touch servers/my_tool/server.py
Step 2: Implement Tool
# servers/my_tool/server.py
import asyncio
from mcp.server import Server
from mcp.types import TextContent
from mcp import tool
mcp = Server("my_tool-server")
@mcp.tool()
def my_function(arg1: str, arg2: int) -> str:
"""
Short description of what this tool does.
Args:
arg1: Description of arg1
arg2: Description of arg2
Returns:
Description of return value
"""
return f"Processed {arg1} with {arg2}"
async def main():
from mcp.server.stdio import stdio_server
async with stdio_server() as (read_stream, write_stream):
await mcp.run(read_stream, write_stream, mcp.create_initialization_options())
if __name__ == "__main__":
asyncio.run(main())
Step 3: Create Skill Documentation (Optional)
mkdir -p servers/my_tool/skills
touch servers/my_tool/skills/my_feature.md
Step 4: Update Intent Patterns (Optional)
If your tool needs specific routing, add an entry to INTENT_CATALOG in client/query_patterns.py:
{
"name": "my_tool",
"pattern": r'\bmy keyword\b|\bmy phrase\b',
"tools": ["my_function"],
"priority": 3,
"web_search": False,
"skills": False,
}
Step 5: Add External MCP Servers (Optional)
To connect external or third-party MCP servers, create client/external_servers.json.
The client auto-discovers this file on startup — no code changes needed.
SSE transport (remote HTTP event stream):
{
"external_servers": {
"deepwiki": {
"transport": "sse",
"url": "https://mcp.deepwiki.com/mcp",
"enabled": true
}
}
}
HTTP transport (streamable HTTP, e.g. authenticated APIs):
{
"external_servers": {
"neon": {
"transport": "http",
"url": "https://mcp.neon.tech/mcp",
"enabled": true,
"headers": { "Authorization": "Bearer <$TOKEN>" }
}
}
}
Header authentication uses the ES_{SERVER_NAME}_{PLACEHOLDER} convention in .env:
# Server "mcpserver" with <$TOKEN> → ES_MCPSERVER_TOKEN
# Server "mcpserver" with <$API_KEY> → ES_MCPSERVER_API_KEY
ES_MCPSERVER_TOKEN=your_token_here
ES_MCPSERVER_API_KEY=your_api_key_here
Stdio transport (local process servers):
{
"external_servers": {
"pycharm": {
"transport": "stdio",
"command": "/usr/lib/jvm/jdk-17/bin/java",
"args": ["-classpath", "/path/to/mcpserver.jar", "com.intellij.mcpserver.stdio.McpStdioRunnerKt"],
"env": { "IJ_MCP_SERVER_PORT": "64342" },
"enabled": true
}
}
}
Field reference:
| Field | Required | Description |
|---|---|---|
transport | ✅ | "sse", "http", or "stdio" |
url | SSE/HTTP only | Full URL to the endpoint |
headers | No | Request headers — use <$PLACEHOLDER> for secrets |
command | stdio only | Path to the executable |
args | stdio only | Command-line arguments |
env | No | Environment variables passed to the process |
cwd | No | Working directory (defaults to project root) |
enabled | No | false skips without removing (default: true) |
notes | No | Human-readable description, ignored by client |
WSL2 note: For stdio servers bridging to Windows, set
IJ_MCP_SERVER_HOSTinenvto the Windows host IP (cat /etc/resolv.conf | grep nameserver).
Step 6: Test & Deploy
python client.py # restart to auto-discover new server
5. Distributed Mode (A2A Protocol)
Run tools on remote servers and expose them via HTTP.
Setup A2A Server
# Terminal 1
python a2a_server.py # starts on http://localhost:8010
# Terminal 2
python client.py # auto-connects to A2A endpoints in .env
Control Exposed Tools
# Expose specific categories (comma-separated)
A2A_EXPOSED_TOOLS=plex,location,text_tools
# Expose everything (default)
A2A_EXPOSED_TOOLS=
Security: Exclude todo, knowledge_base, rag to protect personal data.
Multi-Endpoint Support
A2A_ENDPOINTS=http://localhost:8010,http://gpu-server:8020
6. Testing
Running Tests
pytest # all tests
pytest -m unit # fast unit tests only
pytest -m integration # integration tests
pytest -m e2e # end-to-end tests
pytest -c tests/pytest.coverage.ini # with coverage
Test Structure
tests/
├── conftest.py
├── pytest.ini
├── unit/
│ ├── test_session_manager.py
│ ├── test_models.py
│ ├── test_context_tracker.py
│ ├── test_intent_patterns.py
│ └── test_code_review_tools.py
├── integration/
│ ├── test_websocket_flow.py
│ └── test_langgraph_agent.py
├── e2e/
│ └── test_full_conversation.py
└── results/
├── junit.xml
├── coverage.xml
├── test-report.html
└── coverage-report.html
CI/CD Integration
GitHub Actions:
- name: Run tests
run: pytest
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: tests/results/coverage.xml
7. Architecture
Multi-Server Design
servers/
├── code_review/ 5 tools - Code analysis
├── knowledge_base/ 10 tools - Notes management
├── location/ 3 tools - Weather, time, location
├── plex/ 17 tools - Media + ML recommendations [requires PLEX_URL + PLEX_TOKEN]
├── rag/ 4 tools - Vector search [requires Ollama + bge-large]
├── system_tools/ 4 tools - System info
├── text_tools/ 7 tools - Text processing
└── todo/ 6 tools - Task management
Total: 56 local tools
Directory Structure
mcp_a2a/
├── servers/
├── a2a_server.py
├── client.py
├── client/
│ ├── ui/
│ │ ├── index.html
│ │ └── dashboard.html
│ ├── langgraph.py
│ ├── query_patterns.py ← Intent routing catalog
│ ├── search_client.py ← Ollama web search & fetch
│ ├── websocket.py
│ └── ...
└── tools/
8. Intent Patterns & Troubleshooting
Intent Patterns
The client uses a pattern catalog (client/query_patterns.py) to route queries to the right tools without sending all 75+ tools to the LLM on every message. Each intent has a priority — lower number wins when multiple patterns match.
Priority 1 — specific, high-confidence routing
| Intent | Example prompts | Tools | Key params |
|---|---|---|---|
analyze_image | "Analyze this image: https://…/photo.jpg", "Describe /home/mike/img.png" | analyze_image_tool | image_url or image_file_path |
shashin_random | "Show me a random photo", "Surprise me with a picture" | shashin_random_tool | none |
web_image_search | "Show me a picture of Jorma Tommila", "What does the Eiffel Tower look like?" | web_image_search_tool | query: str |
web_image_searchexcludes queries containing "my" or "shashin" — those fall through toshashin_searchinstead.
Priority 2 — contextual, content-aware routing
| Intent | Example prompts | Tools | Key params |
|---|---|---|---|
github_review | "Review github.com/user/repo", "Analyze this GitHub project" | github_clone_repo, analyze_project, review_code | repo_url: str |
file_analyst | "Analyze /home/mike/budget.csv", "Open C:\Users\data.json" | read_file_tool_handler | file_path: str |
code_assistant | "What's the tech stack?", "Fix the bug in auth.py", "List npm dependencies" | analyze_project, get_project_dependencies, fix_code_file | project_path or file_path |
shashin_analyze | "Describe the beach pictures", "What does the sunset photo show?" | shashin_search_tool, shashin_analyze_tool | term: str → image_id: uuid |
shashin_search | "Find photos of Noah", "Show my photos from Japan" | shashin_search_tool | term: str, page: int (default 0) |
plex_search | "Find movies about time travel", "Scene where the hero escapes the prison" | rag_search_tool, semantic_media_search_text, scene_locator_tool | query: str |
rag | "What do you know about quantum computing?", "How many items are in RAG?" | rag_search_tool, rag_status_tool, rag_list_sources_tool | query: str |
trilium | "Search my notes for project ideas", "Create a note about today's meeting" | search_notes, create_note, update_note_content | query / title / content / note_id |
Priority 3 — utility, lower specificity
| Intent | Example prompts | Tools | Key params |
|---|---|---|---|
weather | "What's the weather in Vancouver?", "Will it rain today?" | get_location_tool, get_weather_tool | location auto-resolved |
location | "Where am I?", "What's my current location?" | get_location_tool | none |
time | "What time is it?", "What's today's date?" | get_time_tool | none |
system | "What's my GPU utilization?", "Show running processes" | get_hardware_specs_tool, list_system_processes | none |
ml_recommendation | "What should I watch tonight?", "Train the recommender model" | recommend_content, train_recommender, record_viewing | title / count / media_type |
code | "Debug this code", "Review and summarize this file" | review_code, summarize_code_file, debug_fix | file_path or inline code |
text | "Summarize this", "Explain this concept simply" | summarize_text_tool, explain_simplified_tool | text: str |
todo | "Add 'deploy feature' to my todos", "List my tasks" | add_todo_item, list_todo_items, update_todo_item | text / item_id / status |
knowledge | "Remember that Mike prefers dark mode", "Search my notes for API keys" | add_entry, search_entries, search_semantic | content / query / tag |
ingest | "Ingest 5 items from Plex", "Process subtitles now" | ingest_movies, ingest_batch_tool | limit: int |
a2a | "Discover remote agents", "Send this task to the remote agent" | send_a2a*, discover_a2a | agent_url / task |
current_events | "What's the latest news?", "What's happening in the world?" | web search only | query passed to search |
stock_price | "What's NVIDIA trading at?", "Apple market cap today" | web search only | query passed to search |
Conversational bypass
Queries that start with personal statements ("I like…", "My favourite…"), filler words ("yes", "thanks"), creative tasks ("write me a poem"), or pronoun follow-ups ("what did he do?", "tell me more about them") bypass the catalog entirely — no tools are bound and the LLM answers from context.
Overriding intent routing
Prefix your message with Using <tool_name>, to bypass pattern matching entirely and force a specific tool:
Using shashin_search_tool, find photos of Noah
Using web_image_search_tool, show me a picture of a red panda
Troubleshooting
Ollama models not appearing:
ollama serve
ollama list
python client.py
RAG not working / embedding errors:
- Ensure Ollama is running:
ollama serve - Confirm
bge-largeis available:ollama list - If missing, pull it:
ollama pull bge-large - RAG requires Ollama for embeddings regardless of which LLM backend (Ollama or GGUF) you use for chat
Plex tools returning errors:
- Confirm
PLEX_URLandPLEX_TOKENare set in.env - Verify the Plex server is reachable:
curl $PLEX_URL/identity?X-Plex-Token=$PLEX_TOKEN - See API Setup for how to locate your token
GGUF model won't load:
- Check model size vs VRAM (use models <7GB for 12GB VRAM)
- Reduce GPU layers:
export GGUF_GPU_LAYERS=20 - CPU only:
export GGUF_GPU_LAYERS=0
Web UI won't load:
netstat -an | grep LISTEN # check ports 8765, 8766, 9000
A2A server not connecting:
curl http://localhost:8010/.well-known/agent-card.json
Ollama Search not working:
- Verify
OLLAMA_TOKENin.env - Get API key at https://ollama.com/settings/keys
- System falls back to LLM knowledge if unavailable
RAG search returns wrong results:
- RAG uses semantic similarity — returns closest matches even if not exact
- Check what's in the database:
> show rag stats - Content is only stored after researching URLs or manually adding via
rag_add_tool
RAG ingestion is slow:
- Normal: ~2.5s for 16 chunks (10,000 characters)
- If slower, check Ollama is running:
ollama list
Conversation history not working:
- Smaller models (≤7B) often refuse to answer questions about conversation history
- Switch to a larger model:
:model qwen2.5:14b-instruct-q4_K_M - Models with good instruction following:
qwen2.5:14b(80-95%),llama3.1:8b(~70%),mistral-nemo(~70%) - Avoid for this use case:
qwen2.5:3b,qwen2.5:7b(~10-30%)
Query not routing to the right tool:
- Intent patterns are matched in priority order (1 → 3)
- Use explicit phrasing:
"Using shashin_search_tool, find photos of Noah"bypasses pattern matching entirely - Check active intents in
client/query_patterns.py
Tools not appearing:
:tools --all # check if disabled
# check DISABLED_TOOLS in .env
python client.py # restart
License
MIT License
相关服务器
FTP Access
Provides access to an FTP server for file operations.
Obsidian MCP Server - Enhanced
Provides comprehensive access to an Obsidian vault, allowing AI agents to read, write, search, and manage notes via the Local REST API plugin.
AI FileSystem MCP
An AI-powered MCP server for advanced file system operations, including search, comparison, and security analysis.
Image Compression
A high-performance microservice for compressing images. Supports JPEG, PNG, WebP, and AVIF formats with smart compression and batch processing.
JSON MCP Server
A high-performance MCP server for comprehensive JSON file operations, including reading, writing, and advanced querying, optimized for LLM interactions.
Smart Photo Journal MCP Server
Create a memory journal from your local photos in the macOS Photos library.
Filesystem MCP Server SSE
A Node.js server for filesystem operations using the Model Context Protocol (MCP), with operations restricted to specified directories.
Basic Memory
Build a persistent, local knowledge base in Markdown files through conversations with LLMs.
Paths Tree Generator
Converts a flat list of filesystem paths into a JSON directory tree.
MCP File Preview Server
Preview local HTML files and capture screenshots, saving them to a local directory.