AgentHub

Self-hosted MCP platform with multiple tools, multi-agent orchestration, web UI, and ML capabilities

MCP Platform

Local MCP runtime with multi-agent orchestration, distributed tool servers, and ML-powered media recommendations.

⚠️ Experimental — intended for personal and experimental use only, not for production deployment.


Prerequisites

  • Python 3.12+
  • 16GB+ RAM recommended
  • One of:
    • Ollama installed OR
    • GGUF file

1. Quick Start

Get the client running in 3 steps:

Install Dependencies

Clone repo and do the following

cd mcp-platform

# Create virtual environment
python -m venv .venv

# Activate (Linux/macOS)
source .venv/bin/activate

# Activate (Windows PowerShell)
.venv\Scripts\activate

# Install requirements - this will take a while
pip install -r requirements.txt

Choose LLM Backend

Option A: Ollama (recommended as a start)

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama server
ollama serve

# Download a model (use 14B+ for best results)
ollama pull qwen2.5:14b-instruct-q4_K_M

Option B: GGUF (local model files)

# Download a GGUF model (example)
wget https://huggingface.co/TheRains/Qwen2.5-14B-Instruct-Q4_K_M-GGUF/blob/main/qwen2.5-14b-instruct-q4_k_m.gguf

# Register the model
# (After starting client, use `:gguf add` command to the downloaded file)

Start the Client

python client.py

Access web UI at: http://localhost:9000

That's it! The client auto-discovers all MCP servers and tools.


2. Using MCP Servers with Other Clients

Use these MCP servers with Claude Desktop, Cline, or any MCP-compatible client.

Example Configuration

Add to your MCP client config (e.g., claude_desktop_config.json):

{
    "mcpServers": {
        "code_review": {
            "command": "/path/to/mcp_a2a/.venv/bin/python",
            "args": ["/path/to/mcp_a2a/servers/code_review/server.py"]
        },
        "location": {
            "command": "/path/to/mcp_a2a/.venv/bin/python",
            "args": ["/path/to/mcp_a2a/servers/location/server.py"]
        },
        "plex": {
            "command": "/path/to/mcp_a2a/.venv/bin/python",
            "args": ["/path/to/mcp_a2a/servers/plex/server.py"]
        },
        "rag": {
            "command": "/path/to/mcp_a2a/.venv/bin/python",
            "args": ["/path/to/mcp_a2a/servers/rag/server.py"]
        },
        "system_tools": {
            "command": "/path/to/mcp_a2a/.venv/bin/python",
            "args": ["/path/to/mcp_a2a/servers/system_tools/server.py"]
        },
        "text_tools": {
            "command": "/path/to/mcp_a2a/.venv/bin/python",
            "args": ["/path/to/mcp_a2a/servers/text_tools/server.py"]
        },
        "todo": {
            "command": "/path/to/mcp_a2a/.venv/bin/python",
            "args": ["/path/to/mcp_a2a/servers/todo/server.py"]
        },
        "knowledge_base": {
            "command": "/path/to/mcp_a2a/.venv/bin/python",
            "args": ["/path/to/mcp_a2a/servers/knowledge_base/server.py"]
        }
    }
}

Windows paths:

"command": "C:\\path\\to\\mcp_a2a\\.venv\\Scripts\\python.exe"

Available servers:

  • code_review - Code analysis (5 tools)
  • location - Weather, time, location (3 tools)
  • plex - Media library + ML recommendations (17 tools) ⚠️ Requires Plex env vars
  • rag - Vector search (4 tools) ⚠️ Requires Ollama + bge-large
  • system_tools - System info (4 tools)
  • text_tools - Text processing (7 tools)
  • todo - Task management (6 tools)
  • knowledge_base - Notes management (10 tools)

3. Client Configuration

Environment Variables

Create .env in project root:

# === LLM Backend ===
OLLAMA_VISION_MODEL=qwen3-vl:8b-instruct
MAX_MESSAGE_HISTORY=30          # Chat history limit (default: 20)
LLM_TEMPERATURE=0.3             # Model temperature 0 to 1 (default: 0.3)

# === GGUF Configuration (if using GGUF backend) ===
GGUF_GPU_LAYERS=-1              # -1 = all GPU, 0 = CPU only, N = N layers on GPU
GGUF_CONTEXT_SIZE=4096          # Context window size
GGUF_BATCH_SIZE=512             # Batch size for processing

# === API Keys (optional services) ===
PLEX_URL=http://localhost:32400  # Plex server URL
PLEX_TOKEN=your_token_here       # Get from Plex account settings
TRILIUM_URL=http://localhost:8888
TRILIUM_TOKEN=your_token_here
SHASHIN_BASE_URL=http://localhost:6624/
SHASHIN_API_KEY=your_key_here
SERPER_API_KEY=your_key_here     # Serper image search (https://serper.dev/api-keys)
OLLAMA_TOKEN=your_token_here     # Ollama API key (https://ollama.com/settings/keys)

# === A2A Protocol (optional distributed mode) ===
A2A_ENDPOINTS=http://localhost:8010  # Comma-separated endpoints
A2A_EXPOSED_TOOLS=                   # Tool categories to expose (empty = all)

# === Performance Tuning (optional) ===
CONCURRENT_LIMIT=3              # Parallel ingestion jobs (default: 1)
EMBEDDING_BATCH_SIZE=50         # Embeddings per batch (default: 20)
DB_FLUSH_BATCH_SIZE=50          # DB inserts per batch (default: 30)

# === Tool Control (optional) ===
DISABLED_TOOLS=knowledge_base:*,todo:*  # Disable specific tools/categories

Recommended Setup

Use Ollama for easy setup. Download and install Ollama at https://ollama.com/download and run:

ollama serve

Recommended LLM

ollama pull llama3.2:3b-instruct-q8_0

RAG requires Ollama + bge-large: If bge-large has not been pulled from Ollama, RAG ingestion and semantic search will not work.

ollama pull bge-large

Image tools requires Ollama + vision models: If a vision model has not been pulled from Ollama, image tools will not work.

ollama pull qwen3-vl:8b-instruct

A minimal .env to get started with the core features:

# === Vision ===
OLLAMA_VISION_MODEL=qwen3-vl:8b-instruct

# === Disable unused servers ===
DISABLED_TOOLS=knowledge_base:*,todo:*,plex:*,image_tools:shashin_analyze,shashin_random,shashin_search

# === API Keys ===
OLLAMA_TOKEN=<token>      # Free at https://ollama.com — required for web_search_tool
SERPER_API_KEY=<key>      # Required for web_image_search_tool (https://serper.dev)

Configuration Details

LLM Backend:

  • ollama: Uses Ollama server (requires ollama serve running)
  • gguf: Uses local GGUF model files (GPU recommended)

GGUF GPU Layers:

  • -1: Use all GPU (fastest, requires model fits in VRAM)
  • 0: CPU only (slow but works with any model size)
  • 20: Use 20 layers on GPU (balance for large models on limited VRAM)

Performance Tuning:

  • EMBEDDING_BATCH_SIZE=50 + DB_FLUSH_BATCH_SIZE=50 = ~6x faster RAG ingestion
  • For 12GB VRAM, can increase to 100 for even faster processing
  • CONCURRENT_LIMIT=2 enables parallel media ingestion

Disabled Tools:

  • Format: category:tool_name or category:*
  • Example: DISABLED_TOOLS=todo:delete_all_todo_items,system:*
  • Hidden from :tools list, return error if called

Feature Requirements

Some features require additional setup before they will function. The table below summarizes what's needed:

FeatureRequired env varsAdditional setup
RAG ingestion & searchOllama running + bge-large pulled
RAG reranking (optional)bge-reranker-v2-m3 pulled — improves result ranking, falls back to cosine if absent
Plex media libraryPLEX_URL, PLEX_TOKENPlex Media Server running
Plex ingestion & recommendationsPLEX_URL, PLEX_TOKENOllama running + bge-large pulled
Ollama web searchOLLAMA_TOKENOllama account + API key
A2A distributed modeA2A_ENDPOINTSRemote A2A server running

Available Commands

These work in both CLI and web UI:

:commands              - List all available commands
:clear sessions        - Clear all chat history
:clear session <id>    - Clear session
:sessions              - List all sessions
:stop                  - Stop current operation
:stats                 - Show performance metrics
:tools                 - List available tools (hides disabled)
:tools --all           - Show all tools including disabled
:tool <name>           - Get tool description
:model                 - List all available models
:model <name>          - Switch to a model (auto-detects backend)
:models                - List models (legacy)
:gguf add <path>       - Register a GGUF model
:gguf remove <alias>   - Remove a GGUF model
:gguf list             - List registered GGUF models
:a2a on                - Enable agent-to-agent mode
:a2a off               - Disable agent-to-agent mode
:a2a status            - Check A2A system status
:health                - Health overview of all servers and tools
:env                   - Show environment configuration

API Setup

Ollama Search API (web search):

  1. Sign up at https://ollama.com/
  2. Get API key from https://ollama.com/settings/keys
  3. Add to .env: OLLAMA_TOKEN=your_key

Plex Media Server:

  1. Open Plex web interface
  2. Settings → Network → Show Advanced
  3. Copy server URL (e.g., http://192.168.1.100:32400)
  4. Get token: Settings → Account → Show XML → Copy authToken
  5. Add to .env:
   PLEX_URL=http://your_server_ip:32400
   PLEX_TOKEN=your_token

⚠️ Without PLEX_URL and PLEX_TOKEN, all Plex tools (library browsing, ingestion, ML recommendations) will be unavailable. The server will load but calls will return a configuration error.


4. Adding Tools (Developer Guide)

Step 1: Create Tool Server

mkdir servers/my_tool
touch servers/my_tool/server.py

Step 2: Implement Tool

# servers/my_tool/server.py
import asyncio
from mcp.server import Server
from mcp.types import TextContent
from mcp import tool

mcp = Server("my_tool-server")

@mcp.tool()
def my_function(arg1: str, arg2: int) -> str:
    """
    Short description of what this tool does.

    Args:
        arg1: Description of arg1
        arg2: Description of arg2

    Returns:
        Description of return value
    """
    return f"Processed {arg1} with {arg2}"

async def main():
    from mcp.server.stdio import stdio_server
    async with stdio_server() as (read_stream, write_stream):
        await mcp.run(read_stream, write_stream, mcp.create_initialization_options())

if __name__ == "__main__":
    asyncio.run(main())

Step 3: Create Skill Documentation (Optional)

mkdir -p servers/my_tool/skills
touch servers/my_tool/skills/my_feature.md

Step 4: Update Intent Patterns (Optional)

If your tool needs specific routing, add an entry to INTENT_CATALOG in client/query_patterns.py:

{
    "name": "my_tool",
    "pattern": r'\bmy keyword\b|\bmy phrase\b',
    "tools": ["my_function"],
    "priority": 3,
    "web_search": False,
    "skills": False,
}

Step 5: Add External MCP Servers (Optional)

To connect external or third-party MCP servers, create client/external_servers.json. The client auto-discovers this file on startup — no code changes needed.

SSE transport (remote HTTP event stream):

{
    "external_servers": {
        "deepwiki": {
            "transport": "sse",
            "url": "https://mcp.deepwiki.com/mcp",
            "enabled": true
        }
    }
}

HTTP transport (streamable HTTP, e.g. authenticated APIs):

{
    "external_servers": {
        "neon": {
            "transport": "http",
            "url": "https://mcp.neon.tech/mcp",
            "enabled": true,
            "headers": { "Authorization": "Bearer <$TOKEN>" }
        }
    }
}

Header authentication uses the ES_{SERVER_NAME}_{PLACEHOLDER} convention in .env:

# Server "mcpserver" with <$TOKEN>   → ES_MCPSERVER_TOKEN
# Server "mcpserver" with <$API_KEY> → ES_MCPSERVER_API_KEY
ES_MCPSERVER_TOKEN=your_token_here
ES_MCPSERVER_API_KEY=your_api_key_here

Stdio transport (local process servers):

{
    "external_servers": {
        "pycharm": {
            "transport": "stdio",
            "command": "/usr/lib/jvm/jdk-17/bin/java",
            "args": ["-classpath", "/path/to/mcpserver.jar", "com.intellij.mcpserver.stdio.McpStdioRunnerKt"],
            "env": { "IJ_MCP_SERVER_PORT": "64342" },
            "enabled": true
        }
    }
}

Field reference:

FieldRequiredDescription
transport"sse", "http", or "stdio"
urlSSE/HTTP onlyFull URL to the endpoint
headersNoRequest headers — use <$PLACEHOLDER> for secrets
commandstdio onlyPath to the executable
argsstdio onlyCommand-line arguments
envNoEnvironment variables passed to the process
cwdNoWorking directory (defaults to project root)
enabledNofalse skips without removing (default: true)
notesNoHuman-readable description, ignored by client

WSL2 note: For stdio servers bridging to Windows, set IJ_MCP_SERVER_HOST in env to the Windows host IP (cat /etc/resolv.conf | grep nameserver).

Step 6: Test & Deploy

python client.py   # restart to auto-discover new server

5. Distributed Mode (A2A Protocol)

Run tools on remote servers and expose them via HTTP.

Setup A2A Server

# Terminal 1
python a2a_server.py        # starts on http://localhost:8010

# Terminal 2
python client.py            # auto-connects to A2A endpoints in .env

Control Exposed Tools

# Expose specific categories (comma-separated)
A2A_EXPOSED_TOOLS=plex,location,text_tools

# Expose everything (default)
A2A_EXPOSED_TOOLS=

Security: Exclude todo, knowledge_base, rag to protect personal data.

Multi-Endpoint Support

A2A_ENDPOINTS=http://localhost:8010,http://gpu-server:8020

6. Testing

Running Tests

pytest                              # all tests
pytest -m unit                      # fast unit tests only
pytest -m integration               # integration tests
pytest -m e2e                       # end-to-end tests
pytest -c tests/pytest.coverage.ini # with coverage

Test Structure

tests/
├── conftest.py
├── pytest.ini
├── unit/
│   ├── test_session_manager.py
│   ├── test_models.py
│   ├── test_context_tracker.py
│   ├── test_intent_patterns.py
│   └── test_code_review_tools.py
├── integration/
│   ├── test_websocket_flow.py
│   └── test_langgraph_agent.py
├── e2e/
│   └── test_full_conversation.py
└── results/
    ├── junit.xml
    ├── coverage.xml
    ├── test-report.html
    └── coverage-report.html

CI/CD Integration

GitHub Actions:

- name: Run tests
  run: pytest
- name: Upload coverage
  uses: codecov/codecov-action@v3
  with:
    files: tests/results/coverage.xml

7. Architecture

Multi-Server Design

servers/
├── code_review/       5 tools  - Code analysis
├── knowledge_base/   10 tools  - Notes management
├── location/          3 tools  - Weather, time, location
├── plex/             17 tools  - Media + ML recommendations  [requires PLEX_URL + PLEX_TOKEN]
├── rag/               4 tools  - Vector search               [requires Ollama + bge-large]
├── system_tools/      4 tools  - System info
├── text_tools/        7 tools  - Text processing
└── todo/              6 tools  - Task management

Total: 56 local tools

Directory Structure

mcp_a2a/
├── servers/
├── a2a_server.py
├── client.py
├── client/
│   ├── ui/
│   │   ├── index.html
│   │   └── dashboard.html
│   ├── langgraph.py
│   ├── query_patterns.py  ← Intent routing catalog
│   ├── search_client.py   ← Ollama web search & fetch
│   ├── websocket.py
│   └── ...
└── tools/

8. Intent Patterns & Troubleshooting

Intent Patterns

The client uses a pattern catalog (client/query_patterns.py) to route queries to the right tools without sending all 75+ tools to the LLM on every message. Each intent has a priority — lower number wins when multiple patterns match.

Priority 1 — specific, high-confidence routing

IntentExample promptsToolsKey params
analyze_image"Analyze this image: https://…/photo.jpg", "Describe /home/mike/img.png"analyze_image_toolimage_url or image_file_path
shashin_random"Show me a random photo", "Surprise me with a picture"shashin_random_toolnone
web_image_search"Show me a picture of Jorma Tommila", "What does the Eiffel Tower look like?"web_image_search_toolquery: str

web_image_search excludes queries containing "my" or "shashin" — those fall through to shashin_search instead.

Priority 2 — contextual, content-aware routing

IntentExample promptsToolsKey params
github_review"Review github.com/user/repo", "Analyze this GitHub project"github_clone_repo, analyze_project, review_coderepo_url: str
file_analyst"Analyze /home/mike/budget.csv", "Open C:\Users\data.json"read_file_tool_handlerfile_path: str
code_assistant"What's the tech stack?", "Fix the bug in auth.py", "List npm dependencies"analyze_project, get_project_dependencies, fix_code_fileproject_path or file_path
shashin_analyze"Describe the beach pictures", "What does the sunset photo show?"shashin_search_tool, shashin_analyze_toolterm: strimage_id: uuid
shashin_search"Find photos of Noah", "Show my photos from Japan"shashin_search_toolterm: str, page: int (default 0)
plex_search"Find movies about time travel", "Scene where the hero escapes the prison"rag_search_tool, semantic_media_search_text, scene_locator_toolquery: str
rag"What do you know about quantum computing?", "How many items are in RAG?"rag_search_tool, rag_status_tool, rag_list_sources_toolquery: str
trilium"Search my notes for project ideas", "Create a note about today's meeting"search_notes, create_note, update_note_contentquery / title / content / note_id

Priority 3 — utility, lower specificity

IntentExample promptsToolsKey params
weather"What's the weather in Vancouver?", "Will it rain today?"get_location_tool, get_weather_toollocation auto-resolved
location"Where am I?", "What's my current location?"get_location_toolnone
time"What time is it?", "What's today's date?"get_time_toolnone
system"What's my GPU utilization?", "Show running processes"get_hardware_specs_tool, list_system_processesnone
ml_recommendation"What should I watch tonight?", "Train the recommender model"recommend_content, train_recommender, record_viewingtitle / count / media_type
code"Debug this code", "Review and summarize this file"review_code, summarize_code_file, debug_fixfile_path or inline code
text"Summarize this", "Explain this concept simply"summarize_text_tool, explain_simplified_tooltext: str
todo"Add 'deploy feature' to my todos", "List my tasks"add_todo_item, list_todo_items, update_todo_itemtext / item_id / status
knowledge"Remember that Mike prefers dark mode", "Search my notes for API keys"add_entry, search_entries, search_semanticcontent / query / tag
ingest"Ingest 5 items from Plex", "Process subtitles now"ingest_movies, ingest_batch_toollimit: int
a2a"Discover remote agents", "Send this task to the remote agent"send_a2a*, discover_a2aagent_url / task
current_events"What's the latest news?", "What's happening in the world?"web search onlyquery passed to search
stock_price"What's NVIDIA trading at?", "Apple market cap today"web search onlyquery passed to search

Conversational bypass

Queries that start with personal statements ("I like…", "My favourite…"), filler words ("yes", "thanks"), creative tasks ("write me a poem"), or pronoun follow-ups ("what did he do?", "tell me more about them") bypass the catalog entirely — no tools are bound and the LLM answers from context.

Overriding intent routing

Prefix your message with Using <tool_name>, to bypass pattern matching entirely and force a specific tool:

Using shashin_search_tool, find photos of Noah
Using web_image_search_tool, show me a picture of a red panda

Troubleshooting

Ollama models not appearing:

ollama serve
ollama list
python client.py

RAG not working / embedding errors:

  • Ensure Ollama is running: ollama serve
  • Confirm bge-large is available: ollama list
  • If missing, pull it: ollama pull bge-large
  • RAG requires Ollama for embeddings regardless of which LLM backend (Ollama or GGUF) you use for chat

Plex tools returning errors:

  • Confirm PLEX_URL and PLEX_TOKEN are set in .env
  • Verify the Plex server is reachable: curl $PLEX_URL/identity?X-Plex-Token=$PLEX_TOKEN
  • See API Setup for how to locate your token

GGUF model won't load:

  • Check model size vs VRAM (use models <7GB for 12GB VRAM)
  • Reduce GPU layers: export GGUF_GPU_LAYERS=20
  • CPU only: export GGUF_GPU_LAYERS=0

Web UI won't load:

netstat -an | grep LISTEN   # check ports 8765, 8766, 9000

A2A server not connecting:

curl http://localhost:8010/.well-known/agent-card.json

Ollama Search not working:

RAG search returns wrong results:

  • RAG uses semantic similarity — returns closest matches even if not exact
  • Check what's in the database: > show rag stats
  • Content is only stored after researching URLs or manually adding via rag_add_tool

RAG ingestion is slow:

  • Normal: ~2.5s for 16 chunks (10,000 characters)
  • If slower, check Ollama is running: ollama list

Conversation history not working:

  • Smaller models (≤7B) often refuse to answer questions about conversation history
  • Switch to a larger model: :model qwen2.5:14b-instruct-q4_K_M
  • Models with good instruction following: qwen2.5:14b (80-95%), llama3.1:8b (~70%), mistral-nemo (~70%)
  • Avoid for this use case: qwen2.5:3b, qwen2.5:7b (~10-30%)

Query not routing to the right tool:

  • Intent patterns are matched in priority order (1 → 3)
  • Use explicit phrasing: "Using shashin_search_tool, find photos of Noah" bypasses pattern matching entirely
  • Check active intents in client/query_patterns.py

Tools not appearing:

:tools --all        # check if disabled
# check DISABLED_TOOLS in .env
python client.py    # restart

License

MIT License

관련 서버