LLM Router
Multi-LLM routing MCP server — route text, image, video, and audio tasks to 20+ providers (OpenAI, Gemini, Perplexity, Anthropic, fal, ElevenLabs, Runway) with automatic complexity-based model selection, budget control, and provider failover.
LLM Router
One MCP server. Every AI model. Smart routing.
Route text, image, video, and audio tasks to 20+ AI providers — automatically picking the best model for the job based on your budget and active profile.
Quick Start • How It Works • Providers • Profiles • Budget Control • Provider Setup
The Problem
You use Claude Code (or any MCP client). You also have access to GPT-4o, Gemini, Perplexity, DALL-E, Runway, ElevenLabs — but switching between them is manual, slow, and expensive.
LLM Router gives your AI assistant one unified interface to all of them — and it automatically picks the right one based on what you're doing and what you can afford.
You: "Research the latest AI funding rounds"
Router: → Perplexity Sonar Pro (search-augmented, best for current facts)
You: "Generate a hero image for the landing page"
Router: → Flux Pro via fal.ai (best quality/cost for images)
You: "Write unit tests for the auth module"
Router: → Claude Sonnet (top coding model, within budget)
You: "Create a 5-second product demo clip"
Router: → Kling 2.0 via fal.ai (best value for short video)
How It Saves You Real Money
Here's the key insight: not every task needs the same model.
When you use Claude Code without a router, every single request — whether it's "what does this function do?" or "redesign this entire architecture" — goes to the same expensive model. That's like hiring a surgeon to change a lightbulb.
LLM Router classifies each task automatically and sends it to the cheapest model that can handle it well:
"What does os.path.join do?" → Gemini Flash ($0.000001 — literally free)
"Refactor the auth module" → Claude Sonnet ($0.003)
"Design the full system arch" → Claude Opus ($0.015)
| Task type | Without Router | With Router | Savings |
|---|---|---|---|
| Simple queries (60% of work) | Opus — $0.015 | Haiku/Gemini Flash — $0.0001 | 99% |
| Moderate tasks (30% of work) | Opus — $0.015 | Sonnet — $0.003 | 80% |
| Complex tasks (10% of work) | Opus — $0.015 | Opus — $0.015 | 0% |
| Blended monthly estimate | ~$50/mo | ~$8–15/mo | 70–85% |
💡 With Ollama: Route simple tasks to a free local model (
llama3.2,qwen2.5-coder) and the savings become even more dramatic — those 60% of simple tasks cost $0.
The router pays for itself in the first hour of use.
Quick Start
Option A: PyPI (Recommended)
pip install claude-code-llm-router
Option B: Claude Code Plugin
claude plugin add ypollak2/llm-router
Option C: Manual Install
git clone https://github.com/ypollak2/llm-router.git
cd llm-router
uv sync
./scripts/install.sh # registers as MCP server in Claude Code
Get Running in 3 Steps
Enable Global Auto-Routing
Make the router evaluate every prompt across all projects:
# From the MCP tool:
llm_setup(action='install_hooks')
# Or from the CLI:
llm-router-install-hooks
This installs hooks + rules to ~/.claude/ so every Claude Code session auto-routes tasks to the optimal model.
Start for free: Google's Gemini API has a free tier with 1M tokens/day — no credit card needed. Groq also offers a generous free tier with ultra-fast inference.
What You Get
- 24 MCP tools — Smart routing, text, image, video, audio, streaming, setup, quality analytics, usage monitoring, cache management
/routeskill — Smart task classification and routing in one command- Smart classifier — Auto-picks Claude Haiku/Sonnet/Opus based on complexity
- Prompt classification cache — SHA-256 exact-match LRU cache (1000 entries, 1h TTL) for instant repeat classifications
- Auto-route hook — Multi-layer
UserPromptSubmitclassifier: routes every prompt (including codebase questions) through Haiku/Ollama first; heuristic scoring (instant) → Ollama local LLM (free, ~1s) → cheap API (Gemini Flash/GPT-4o-mini, ~$0.0001) → auto fallback. Hooks self-update afterpip upgrade— no reinstall needed. - Streaming responses —
llm_streamtool for long-running tasks, shows output as it arrives - Usage auto-refresh —
PostToolUsehook detects stale Claude subscription data (>15 min) and nudges for refresh - Savings awareness — Every 5th routed task, shows estimated Claude API costs and rate limit capacity saved
- Rate limit detection — Catches 429/rate_limit errors with smart cooldowns (15s for rate limits vs 60s for hard failures)
- Key validation —
llm_setup(action='test')validates API keys with minimal LLM calls (~$0.0001 each) - Claude subscription monitoring — Live session/weekly usage from claude.ai
- Codex desktop integration — Route tasks to local OpenAI Codex (free)
- LLM Orchestrator agent — Autonomous multi-step task decomposition across models
How It Works
Architecture
Routing Decision Flow
Benchmark-Driven Routing
Model chains are ranked using weekly-refreshed data from four authoritative sources, so the router always sends your task to the current best model for that task type.
Current Top Models by Task
| Task | 🥇 Premium | 🥈 Balanced | 🥉 Budget |
|---|---|---|---|
| 💻 Code | DeepSeek-R1, o3, Opus | DeepSeek Chat, GPT-4o, Sonnet | Flash, DeepSeek, Haiku |
| 🔍 Analyze | DeepSeek-R1, GPT-4o, Sonnet | DeepSeek-R1, GPT-4o, Gemini Pro | Flash, DeepSeek, Haiku |
| ❓ Query | DeepSeek Chat, GPT-4o, Gemini Pro | DeepSeek Chat, GPT-4o, Gemini Pro | Flash, DeepSeek, Haiku |
| ✍️ Generate | DeepSeek Chat, GPT-4o, Gemini Pro | DeepSeek Chat, GPT-4o, Gemini Pro | Flash, DeepSeek, Haiku |
| 🔎 Research | Perplexity Pro, Perplexity, GPT-4o | Perplexity Pro, Perplexity, GPT-4o | Perplexity, Flash, Haiku |
Bold = first model tried when Claude quota is high (> 85%) or in subscription mode. Full benchmark data, scoring weights, raw scores, and sources: docs/BENCHMARKS.md 🔄 Updated every Monday via GitHub Actions — distributed to all users on next
pip upgrade
How rankings are computed
Arena Hard win-rate ──┐
Aider code pass rate ──┼── weighted by task type ──► quality score ──► quality-cost tier sort
HuggingFace MMLU/MATH──┤ ↓
LiteLLM pricing ──┘ within 5% quality band → cheapest model first
Quality-cost sorting: models within 5% quality of each other are grouped into a tier. Within that tier, the cheapest model sorts first. This means GPT-4o ($0.006/1K) leads over Sonnet ($0.009/1K) when their quality difference is under 5%, and DeepSeek Chat ($0.0007/1K) leads over everyone when it's within the top quality band.
Auto-Route Hook — Every Prompt, Cheaper Model First
The UserPromptSubmit hook intercepts all prompts — not just explicit routing requests — and classifies them before your top-tier model sees them. Simple tasks go straight to Haiku or a local Ollama model; only genuinely complex work escalates.
What gets routed
| Prompt | Classified as | Model used |
|---|---|---|
why doesn't the router work? | analyze/moderate | Haiku |
how does benchmarks.py work? | query/simple | Ollama / Haiku |
fix the bug in profiles.py | code/moderate | Haiku / Sonnet |
implement a distributed cache | code/complex | Sonnet / Opus |
write a blog post about LLMs | generate/moderate | Haiku / Gemini Flash |
git status (raw shell command) | (skipped — terminal op) | — |
Classification chain (stops at first success)
1. Heuristic scoring instant, free → high-confidence patterns route immediately
2. Ollama local LLM free, ~1s → catches what heuristics miss
3. Cheap API ~$0.0001 → Gemini Flash / GPT-4o-mini fallback
4. Query catch-all instant, free → any remaining question → Haiku
Self-updating hooks
Hook scripts are versioned (# llm-router-hook-version: N). On every MCP server startup, if the bundled version in the installed package is newer than what's in ~/.claude/hooks/, it's automatically overwritten. Existing users get classification improvements automatically after pip install --upgrade claude-code-llm-router — no need to re-run llm-router-install-hooks.
Smart Routing (Claude Code Models)
Use Claude Code's own models (Haiku/Sonnet/Opus) without extra API keys via the smart classifier:
llm_classify("What is the capital of France?")
→ [S] simple (99%) → haiku
llm_classify("Write a REST API with auth and pagination")
→ [M] moderate (98%) → sonnet
llm_classify("Design a distributed CQRS architecture")
→ [C] complex (85%) → opus
Complexity-First Routing
Complexity drives model selection — this is the real savings mechanism. You don't need opus for "what time is it?" and you don't want haiku for architecture design. Budget pressure is a late safety net, not the primary router.
# In .env
QUALITY_MODE=balanced # best | balanced | conserve
MIN_MODEL=haiku # floor: never route below this
| Claude Usage | Effect |
|---|---|
| 0-85% | No downshift — complexity routing handles efficiency |
| 85-95% | Downshift by 1 tier + suggest external fallback |
| 95%+ | Downshift by 2 tiers + recommend external (Codex, OpenAI, Gemini) |
Budget pressure comes from real Claude subscription data (session %, weekly %) fetched live from claude.ai. The router also factors in time until session reset — if you're at 90% but the session resets in 5 minutes, no downshift needed.
External Fallback
When Claude quota is tight (85%+), the router ranks available external models:
llm_classify("Design auth architecture")
# -> complex -> sonnet (downshifted from opus)
# pressure: [========..] 90%
# >> fallback: codex/gpt-5.4 (free, preserves Claude quota)
- Codex (local): Free — uses your OpenAI desktop subscription
- OpenAI API: GPT-4o, o3 (ranked by quality, filtered by budget)
- Gemini API: gemini-2.5-pro, gemini-2.5-flash
Per-provider budgets via LLM_ROUTER_BUDGET_OPENAI=10.00, LLM_ROUTER_BUDGET_GEMINI=5.00.
Claude Subscription Monitoring
Live usage data from your claude.ai account — no guessing:
+----------------------------------------------------------+
| Claude Subscription (Live) |
+----------------------------------------------------------+
| Session [====........] 35% resets in 3h 7m |
| Weekly (all) [===.........] 23% resets Fri 01:00 PM |
| Sonnet only [===.........] 26% resets Wed 10:00 AM |
+----------------------------------------------------------+
| OK 35% pressure -- full model selection |
+----------------------------------------------------------+
Fetched via Playwright from claude.ai's internal JSON API (same data the settings page uses). One browser_evaluate call, cached in memory for routing decisions.
Providers
Text & Code LLMs
| Provider | Models | Free Tier | Best For |
|---|---|---|---|
| 🦙 Ollama | Any local model | Yes (free forever) | Privacy, zero cost, offline use |
| Google Gemini | 2.5 Pro, 2.5 Flash | Yes (1M tokens/day) | Generation, long context |
| Groq | Llama 3.3, Mixtral | Yes | Ultra-fast inference |
| OpenAI | GPT-4o, GPT-4o-mini, o3 | No | Code, analysis, reasoning |
| Perplexity | Sonar, Sonar Pro | No | Research, current events |
| Anthropic | Claude Sonnet, Haiku | No | Nuanced writing, safety |
| Deepseek | V3, Reasoner | Yes (limited) | Cost-effective reasoning |
| Mistral | Large, Small | Yes (limited) | Multilingual |
| Together | Llama 3, CodeLlama | Yes (limited) | Open-source models |
| xAI | Grok 3 | No | Real-time information |
| Cohere | Command R+ | Yes (trial) | RAG, enterprise search |
🦙 Ollama runs models locally — no API key, no cost, no data sent externally. Full Ollama setup guide →
Image Generation
| Provider | Models | Best For |
|---|---|---|
| Google Gemini | Imagen 3 | High quality, integrated with text models |
| fal.ai | Flux Pro, Flux Dev | Quality/cost ratio, fast generation |
| OpenAI | DALL-E 3, DALL-E 2 | Prompt adherence, text in images |
| Stability AI | Stable Diffusion 3 | Fine control, open weights |
Video Generation
| Provider | Models | Best For |
|---|---|---|
| Google Gemini | Veo 2 | Integrated with Gemini ecosystem |
| Runway | Gen-3 Alpha | Professional quality, motion control |
| fal.ai | Kling, minimax | Value, fast generation |
| Replicate | Various | Open-source video models |
Audio & Voice
| Provider | Models | Best For |
|---|---|---|
| ElevenLabs | Multilingual v2 | Voice cloning, highest quality |
| OpenAI | TTS-1, TTS-1-HD | Cost-effective text-to-speech |
20+ providers and growing. See docs/PROVIDERS.md for full setup guides with API key links.
MCP Tools
Once installed, Claude Code gets these 25 tools:
| Tool | What It Does |
|---|---|
| Smart Routing | |
llm_classify | Classify complexity + recommend model with time-aware budget pressure |
llm_route | Auto-classify, then route to the best external LLM |
llm_track_usage | Report Claude Code token usage for budget tracking |
| Text & Code | |
llm_query | General questions — auto-routed to the best text LLM |
llm_research | Search-augmented answers via Perplexity |
llm_generate | Creative content — writing, summaries, brainstorming |
llm_analyze | Deep reasoning — analysis, debugging, problem decomposition |
llm_code | Coding tasks — generation, refactoring, algorithms |
llm_edit | Route code-edit reasoning to a cheap model → returns exact {file, old_string, new_string} pairs for Claude to apply |
| Media | |
llm_image | Image generation — Gemini Imagen, DALL-E, Flux, or SD |
llm_video | Video generation — Gemini Veo, Runway, Kling, etc. |
llm_audio | Voice/audio — TTS via ElevenLabs or OpenAI |
| Orchestration | |
llm_orchestrate | Multi-step pipelines across multiple models |
llm_pipeline_templates | List available orchestration templates |
| Cache | |
llm_cache_stats | View cache hit rate, entries, memory estimate, evictions |
llm_cache_clear | Clear the classification cache |
| Streaming | |
llm_stream | Stream LLM responses for long-running tasks — output as it arrives |
| Monitoring & Setup | |
llm_check_usage | Check live Claude subscription usage (session %, weekly %) |
llm_update_usage | Feed live usage data from claude.ai into the router |
llm_codex | Route tasks to local Codex desktop agent (free, uses OpenAI sub) |
llm_setup | Discover API keys, add providers, get setup guides, validate keys, install global hooks |
llm_quality_report | Routing accuracy, classifier stats, savings metrics, downshift rate |
llm_set_profile | Switch routing profile (budget / balanced / premium) |
llm_usage | Unified dashboard — Claude sub, Codex, APIs, savings in one view |
llm_health | Check provider availability and circuit breaker status |
llm_providers | List all supported and configured providers |
| Session Memory | |
llm_save_session | Summarize + persist current session for cross-session context injection |
Context injection: text tools (
llm_query,llm_research,llm_generate,llm_analyze,llm_code) automatically prepend recent conversation history and previous session summaries to every external LLM call — so GPT-4o, Gemini, and Perplexity receive the same context you have. Passcontext="..."to add caller-supplied context on top. Controlled byLLM_ROUTER_CONTEXT_ENABLED(default: on).
Routing Profiles
Three built-in profiles map to task complexity. Model order is pressure-aware — the router dynamically reorders chains based on live Claude subscription usage.
| Budget (simple) | Balanced (medium) | Premium (complex) | |
|---|---|---|---|
| Text | Ollama → Haiku → cheap | Sonnet → DeepSeek → GPT-4o | Opus → Sonnet → o3 |
| Research | Perplexity Sonar | Perplexity Sonar Pro | Perplexity Sonar Pro |
| Code | Ollama → Haiku → DeepSeek | Sonnet → DeepSeek → GPT-4o | Opus → Sonnet → DeepSeek-R1 → o3 |
| Image | Flux Dev, Imagen Fast | Flux Pro, Imagen 3, DALL-E 3 | Imagen 3, DALL-E 3 |
| Video | minimax, Veo 2 | Kling, Veo 2, Runway Turbo | Veo 2, Runway Gen-3 |
| Audio | OpenAI TTS | ElevenLabs | ElevenLabs |
Quota-aware chain reordering
Claude Pro/Max tokens are treated as free — the router uses them first. As quota is consumed, chains automatically reorder to preserve remaining Claude budget:
| Claude usage | Chain order |
|---|---|
| 0–84% | Claude first (free under subscription) |
| 85–98% | DeepSeek/Codex → cheap externals → Claude last |
| ≥ 99% (hard cap) | DeepSeek → Codex → cheap → paid — zero Claude |
| Research (any) | Perplexity always first (web-grounded) |
Claude Code subscription mode
If you use Claude Code (Pro/Max), set LLM_ROUTER_CLAUDE_SUBSCRIPTION=true in .env. The router will never route to Anthropic via API — you're already on Claude, so API routing would require a separate key and add duplicate billing. Instead, every task routes to the best non-Claude alternative:
# In .env
LLM_ROUTER_CLAUDE_SUBSCRIPTION=true # no ANTHROPIC_API_KEY needed
At normal quota (< 85%), chains lead with the highest-quality available model. At high quota (> 85%), DeepSeek takes over — quality 1.0 benchmark score at ~1/8th the cost of GPT-4o:
| Low quota (< 85%) | High quota (> 85%) | |
|---|---|---|
| BUDGET/CODE | DeepSeek Chat | DeepSeek Chat |
| BALANCED/CODE | DeepSeek Chat | DeepSeek Chat |
| BALANCED/ANALYZE | DeepSeek Reasoner | DeepSeek Reasoner |
| PREMIUM/CODE | o3 | DeepSeek Reasoner |
| PREMIUM/ANALYZE | DeepSeek Reasoner | DeepSeek Reasoner |
Switch profile anytime:
llm_set_profile("budget") # Development, drafts, exploration
llm_set_profile("balanced") # Production work, client deliverables
llm_set_profile("premium") # Critical tasks, maximum quality
Budget Control
Set a monthly budget to prevent overspending:
# In .env
LLM_ROUTER_MONTHLY_BUDGET=50 # USD, 0 = unlimited
The router:
- Tracks real-time spend across all providers in SQLite
- Blocks requests when the monthly budget is reached
- Shows budget status in
llm_usage
llm_usage("month")
## Usage Summary (month)
Calls: 142
Tokens: 240,000 in + 80,000 out = 320,000 total
Cost: $3.4200
Avg latency: 1200ms
### Budget Status
Monthly budget: $50.00
Spent this month: $3.4200 (6.8%)
Remaining: $46.5800
Multi-Step Orchestration
Chain tasks across different models in a pipeline:
llm_orchestrate("Research AI trends and write a report", template="research_report")
Built-in templates:
| Template | Steps | Pipeline |
|---|---|---|
research_report | 3 | Research → Analyze → Write |
competitive_analysis | 4 | Multi-source research → SWOT → Report |
content_pipeline | 4 | Research → Draft → Review → Polish |
code_review_fix | 3 | Review → Fix → Test |
Configuration
Environment Variables
# Required: at least one provider
GEMINI_API_KEY=AIza... # Free tier! https://aistudio.google.com/apikey
OPENAI_API_KEY=sk-proj-...
PERPLEXITY_API_KEY=pplx-...
# Optional: more providers (add as many as you want)
ANTHROPIC_API_KEY=sk-ant-...
DEEPSEEK_API_KEY=...
GROQ_API_KEY=gsk_...
FAL_KEY=...
ELEVENLABS_API_KEY=...
# Router config
LLM_ROUTER_PROFILE=balanced # budget | balanced | premium
LLM_ROUTER_MONTHLY_BUDGET=0 # USD, 0 = unlimited
LLM_ROUTER_CLAUDE_SUBSCRIPTION=false # true = you're a Claude Code Pro/Max user;
# anthropic/* excluded, router uses non-Claude models
# Smart routing (Claude Code model selection)
DAILY_TOKEN_BUDGET=0 # tokens/day, 0 = unlimited
QUALITY_MODE=balanced # best | balanced | conserve
MIN_MODEL=haiku # floor: haiku | sonnet | opus
See .env.example for the full list of supported providers.
Claude Code Integration
After running ./scripts/install.sh, your ~/.claude.json will include:
{
"mcpServers": {
"llm-router": {
"command": "uv",
"args": ["run", "--directory", "/path/to/llm-router", "llm-router"]
}
}
}
Development
# Install with dev dependencies
uv sync --extra dev
# Run tests
uv run pytest -v
# Run integration tests (requires real API keys)
uv run pytest tests/test_integration.py -v
# Lint
uv run ruff check src/
Roadmap
See ROADMAP.md for the detailed roadmap with phases and priorities.
Completed (v0.1–v0.5)
- Core text LLM routing (10+ providers)
- Configurable profiles (budget / balanced / premium)
- Cost tracking with SQLite
- Health checks with circuit breaker
- Image generation (Gemini Imagen 3, DALL-E, Flux, SD)
- Video generation (Gemini Veo 2, Runway, Kling, minimax)
- Audio/voice routing (ElevenLabs, OpenAI TTS)
- Monthly budget enforcement
- Multi-step orchestration with pipeline templates
- Claude Code plugin with orchestrator agent and /route skill
- Freemium tier gating
- CI with GitHub Actions
- Smart complexity-first routing (simple->haiku, moderate->sonnet, complex->opus)
- Live Claude subscription monitoring (session %, weekly %, Sonnet %)
- Time-aware budget pressure (factors in session reset proximity)
- External fallback ranking when Claude is tight (Codex, OpenAI, Gemini)
- Codex desktop integration (local agent, free via OpenAI subscription)
- Unified usage dashboard (Claude sub + Codex + APIs + savings)
-
llm_setuptool for API discovery and secure key management - Per-provider budget limits
- ASCII box-drawing dashboard (terminal-friendly, no Unicode issues)
- Prompt classification cache (SHA-256 exact-match, in-memory LRU, 1h TTL)
-
llm_cache_stats+llm_cache_clearMCP tools - Auto-route hook (UserPromptSubmit heuristic classifier, zero-latency)
- Rate limit detection with smart cooldowns (15s rate limit vs 60s hard failure)
-
llm_setup(action='test')— API key validation with minimal LLM calls - Streaming responses (
llm_streamtool +call_llm_stream()async generator) - Usage auto-refresh hook (PostToolUse staleness detection + usage pulse wiring)
- Published to PyPI as
claude-code-llm-router - Multi-layer auto-classification: scoring heuristic → Ollama local LLM (qwen3.5) → cheap API (Gemini Flash/GPT-4o-mini)
- Savings awareness (PostToolUse hook tracks routed calls, periodic cost savings reminders)
- Structural context compaction (5 strategies: whitespace, comments, dedup, truncation, stack traces)
- Quality logging (
routing_decisionstable +llm_quality_reporttool) - Savings persistence (JSONL + SQLite import, lifetime analytics)
- Gemini media APIs (Imagen 3 images, Veo 2 video)
- Global hook installer (
llm_setup(action='install_hooks')+llm-router-install-hooksCLI) - Global routing rules (auto-installed to
~/.claude/rules/llm-router.md) - Session context injection (ring buffer + SQLite summaries, injected into all text tools)
-
llm_save_sessionMCP tool (auto-summarize + persist session for future context) - Cross-session memory (previous session summaries prepended to external LLM calls)
- Auto-update routing rules (version header + silent update on MCP startup after pip upgrade)
- Token arbitrage enforcement — routing hint override bug fixed; simple tasks now correctly route to cheap models
- Claude Code subscription mode (
LLM_ROUTER_CLAUDE_SUBSCRIPTION) — exclude Anthropic from chains; route to DeepSeek/Gemini/GPT-4o instead - Quality-cost tier sorting — within 5% quality band, prefer cheaper model (GPT-4o over Sonnet, DeepSeek over everyone when near-equal quality)
- DeepSeek Reasoner in cheap tier — $0.0014/1K leads at >85% pressure (was treated as "paid" tier alongside o3 at $0.025)
- Codex injection fix — no longer injected at position 0 when subscription mode removes Claude from chain (caused 300s timeouts)
- Codex task filtering — excluded from RESEARCH (no web access) and QUERY (too slow) chains
Completed (v0.7)
- Availability-aware routing — P95 latency from
routing_decisionstable folded into benchmark quality score. Penalty range 0.0–0.50 (<5s=0, <15s=0.03, <60s=0.10, <180s=0.30, ≥180s=0.50). 60s cache prevents repeated DB hits per routing cycle. - Codex cold-start defaults —
_COLD_START_LATENCY_MSapplies pessimistic 60-90s P95 before any history exists, preventing Codex from being placed first in chains on a fresh install. -
llm_editMCP tool — Routes code-edit reasoning to a cheap CODE model. Reads files locally (32 KB cap), gets{file, old_string, new_string}JSON back, returns formatted instructions for Claude to apply mechanically. Keeps Opus out of the "what to change" loop.
Next Up (v0.8 — Evaluation & Learning)
- Classification outcome tracking (was the routed model's response good?)
- A/B testing framework for routing decisions
- Adaptive routing based on historical success rates
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Key areas where help is needed:
- Adding new provider integrations
- Improving routing intelligence
- Testing across different MCP clients
- Documentation and examples
License
MIT — use it however you want.
Server Terkait
Upstox MCP server
A MCP server for integrating with the Upstox trading API by Upstox.
Fortinet MCP Server
A complete Model Context Protocol (MCP) server for Fortinet FortiOS 7.6.6
MCP Wallet Service
An MCP server that provides wallet balance checking capabilities.
OpenRoute MCP
🗺️ MCP server to help plan routes using OpenRouteService.org, for activities such as hiking or mountain biking.
TI Mindmap HUB — MCP Server
TI Mindmap HUB MCP Server provides AI assistants with direct access to curated threat intelligence — reports, CVEs, IOCs, STIX bundles, and weekly briefings — through the Model Context Protocol.
Text-to-Speech (TTS)
A Text-to-Speech server supporting multiple backends like macOS say, ElevenLabs, Google Gemini, and OpenAI TTS.
aibtc-mcp-server
Bitcoin-native MCP server for AI agents: BTC/STX wallets, DeFi yield, sBTC peg, NFTs, and x402 payments.
Contextd
Reasoning Bank and Context folding. Let your agents learn and self-improve
Uniswap MCP Server
MCP server for Uniswap — swap routing, pool data, and liquidity queries across all supported chains.
USA Spending MCP
Track government spending, search government spending be agency, explore government spending to communities, and much more.