LLM Router
Multi-LLM routing MCP server — route text, image, video, and audio tasks to 20+ providers (OpenAI, Gemini, Perplexity, Anthropic, fal, ElevenLabs, Runway) with automatic complexity-based model selection, budget control, and provider failover.
LLM Router
One MCP server. Every AI model. Smart routing.
Route text, image, video, and audio tasks to 20+ AI providers — automatically picking the best model for the job based on your budget and active profile.
Quick Start • How It Works • Providers • Profiles • Budget Control • Provider Setup
The Problem
You use Claude Code (or any MCP client). You also have access to GPT-4o, Gemini, Perplexity, DALL-E, Runway, ElevenLabs — but switching between them is manual, slow, and expensive.
LLM Router gives your AI assistant one unified interface to all of them — and it automatically picks the right one based on what you're doing and what you can afford.
You: "Research the latest AI funding rounds"
Router: → Perplexity Sonar Pro (search-augmented, best for current facts)
You: "Generate a hero image for the landing page"
Router: → Flux Pro via fal.ai (best quality/cost for images)
You: "Write unit tests for the auth module"
Router: → Claude Sonnet (top coding model, within budget)
You: "Create a 5-second product demo clip"
Router: → Kling 2.0 via fal.ai (best value for short video)
How It Saves You Real Money
Here's the key insight: not every task needs the same model.
When you use Claude Code without a router, every single request — whether it's "what does this function do?" or "redesign this entire architecture" — goes to the same expensive model. That's like hiring a surgeon to change a lightbulb.
LLM Router classifies each task automatically and sends it to the cheapest model that can handle it well:
"What does os.path.join do?" → Gemini Flash ($0.000001 — literally free)
"Refactor the auth module" → Claude Sonnet ($0.003)
"Design the full system arch" → Claude Opus ($0.015)
| Task type | Without Router | With Router | Savings |
|---|---|---|---|
| Simple queries (60% of work) | Opus — $0.015 | Haiku/Gemini Flash — $0.0001 | 99% |
| Moderate tasks (30% of work) | Opus — $0.015 | Sonnet — $0.003 | 80% |
| Complex tasks (10% of work) | Opus — $0.015 | Opus — $0.015 | 0% |
| Blended monthly estimate | ~$50/mo | ~$8–15/mo | 70–85% |
💡 With Ollama: Route simple tasks to a free local model (
llama3.2,qwen2.5-coder) and the savings become even more dramatic — those 60% of simple tasks cost $0.
The router pays for itself in the first hour of use.
Quick Start
Option A: PyPI (Recommended)
pip install claude-code-llm-router
Option B: Claude Code Plugin
claude plugin add ypollak2/llm-router
Option C: Manual Install
git clone https://github.com/ypollak2/llm-router.git
cd llm-router
uv sync
./scripts/install.sh # registers as MCP server in Claude Code
Get Running in 3 Steps
Enable Global Auto-Routing
Make the router evaluate every prompt across all projects:
# From the MCP tool:
llm_setup(action='install_hooks')
# Or from the CLI:
llm-router-install-hooks
This installs hooks + rules to ~/.claude/ so every Claude Code session auto-routes tasks to the optimal model.
Start for free: Google's Gemini API has a free tier with 1M tokens/day — no credit card needed. Groq also offers a generous free tier with ultra-fast inference.
What You Get
- 24 MCP tools — Smart routing, text, image, video, audio, streaming, setup, quality analytics, usage monitoring, cache management
/routeskill — Smart task classification and routing in one command- Smart classifier — Auto-picks Claude Haiku/Sonnet/Opus based on complexity
- Prompt classification cache — SHA-256 exact-match LRU cache (1000 entries, 1h TTL) for instant repeat classifications
- Auto-route hook — Multi-layer
UserPromptSubmitclassifier: routes every prompt (including codebase questions) through Haiku/Ollama first; heuristic scoring (instant) → Ollama local LLM (free, ~1s) → cheap API (Gemini Flash/GPT-4o-mini, ~$0.0001) → auto fallback. Hooks self-update afterpip upgrade— no reinstall needed. - Streaming responses —
llm_streamtool for long-running tasks, shows output as it arrives - Usage auto-refresh —
PostToolUsehook detects stale Claude subscription data (>15 min) and nudges for refresh - Savings awareness — Every 5th routed task, shows estimated Claude API costs and rate limit capacity saved
- Rate limit detection — Catches 429/rate_limit errors with smart cooldowns (15s for rate limits vs 60s for hard failures)
- Key validation —
llm_setup(action='test')validates API keys with minimal LLM calls (~$0.0001 each) - Claude subscription monitoring — Live session/weekly usage from claude.ai
- Codex desktop integration — Route tasks to local OpenAI Codex (free)
- LLM Orchestrator agent — Autonomous multi-step task decomposition across models
How It Works
Architecture
Routing Decision Flow
Benchmark-Driven Routing
Model chains are ranked using weekly-refreshed data from four authoritative sources, so the router always sends your task to the current best model for that task type.
Current Top Models by Task
| Task | 🥇 Premium | 🥈 Balanced | 🥉 Budget |
|---|---|---|---|
| 💻 Code | DeepSeek-R1, o3, Opus | DeepSeek Chat, GPT-4o, Sonnet | Flash, DeepSeek, Haiku |
| 🔍 Analyze | DeepSeek-R1, GPT-4o, Sonnet | DeepSeek-R1, GPT-4o, Gemini Pro | Flash, DeepSeek, Haiku |
| ❓ Query | DeepSeek Chat, GPT-4o, Gemini Pro | DeepSeek Chat, GPT-4o, Gemini Pro | Flash, DeepSeek, Haiku |
| ✍️ Generate | DeepSeek Chat, GPT-4o, Gemini Pro | DeepSeek Chat, GPT-4o, Gemini Pro | Flash, DeepSeek, Haiku |
| 🔎 Research | Perplexity Pro, Perplexity, GPT-4o | Perplexity Pro, Perplexity, GPT-4o | Perplexity, Flash, Haiku |
Bold = first model tried when Claude quota is high (> 85%) or in subscription mode. Full benchmark data, scoring weights, raw scores, and sources: docs/BENCHMARKS.md 🔄 Updated every Monday via GitHub Actions — distributed to all users on next
pip upgrade
How rankings are computed
Arena Hard win-rate ──┐
Aider code pass rate ──┼── weighted by task type ──► quality score ──► quality-cost tier sort
HuggingFace MMLU/MATH──┤ ↓
LiteLLM pricing ──┘ within 5% quality band → cheapest model first
Quality-cost sorting: models within 5% quality of each other are grouped into a tier. Within that tier, the cheapest model sorts first. This means GPT-4o ($0.006/1K) leads over Sonnet ($0.009/1K) when their quality difference is under 5%, and DeepSeek Chat ($0.0007/1K) leads over everyone when it's within the top quality band.
Auto-Route Hook — Every Prompt, Cheaper Model First
The UserPromptSubmit hook intercepts all prompts — not just explicit routing requests — and classifies them before your top-tier model sees them. Simple tasks go straight to Haiku or a local Ollama model; only genuinely complex work escalates.
What gets routed
| Prompt | Classified as | Model used |
|---|---|---|
why doesn't the router work? | analyze/moderate | Haiku |
how does benchmarks.py work? | query/simple | Ollama / Haiku |
fix the bug in profiles.py | code/moderate | Haiku / Sonnet |
implement a distributed cache | code/complex | Sonnet / Opus |
write a blog post about LLMs | generate/moderate | Haiku / Gemini Flash |
git status (raw shell command) | (skipped — terminal op) | — |
Classification chain (stops at first success)
1. Heuristic scoring instant, free → high-confidence patterns route immediately
2. Ollama local LLM free, ~1s → catches what heuristics miss
3. Cheap API ~$0.0001 → Gemini Flash / GPT-4o-mini fallback
4. Query catch-all instant, free → any remaining question → Haiku
Self-updating hooks
Hook scripts are versioned (# llm-router-hook-version: N). On every MCP server startup, if the bundled version in the installed package is newer than what's in ~/.claude/hooks/, it's automatically overwritten. Existing users get classification improvements automatically after pip install --upgrade claude-code-llm-router — no need to re-run llm-router-install-hooks.
Smart Routing (Claude Code Models)
Use Claude Code's own models (Haiku/Sonnet/Opus) without extra API keys via the smart classifier:
llm_classify("What is the capital of France?")
→ [S] simple (99%) → haiku
llm_classify("Write a REST API with auth and pagination")
→ [M] moderate (98%) → sonnet
llm_classify("Design a distributed CQRS architecture")
→ [C] complex (85%) → opus
Complexity-First Routing
Complexity drives model selection — this is the real savings mechanism. You don't need opus for "what time is it?" and you don't want haiku for architecture design. Budget pressure is a late safety net, not the primary router.
# In .env
QUALITY_MODE=balanced # best | balanced | conserve
MIN_MODEL=haiku # floor: never route below this
| Claude Usage | Effect |
|---|---|
| 0-85% | No downshift — complexity routing handles efficiency |
| 85-95% | Downshift by 1 tier + suggest external fallback |
| 95%+ | Downshift by 2 tiers + recommend external (Codex, OpenAI, Gemini) |
Budget pressure comes from real Claude subscription data (session %, weekly %) fetched live from claude.ai. The router also factors in time until session reset — if you're at 90% but the session resets in 5 minutes, no downshift needed.
External Fallback
When Claude quota is tight (85%+), the router ranks available external models:
llm_classify("Design auth architecture")
# -> complex -> sonnet (downshifted from opus)
# pressure: [========..] 90%
# >> fallback: codex/gpt-5.4 (free, preserves Claude quota)
- Codex (local): Free — uses your OpenAI desktop subscription
- OpenAI API: GPT-4o, o3 (ranked by quality, filtered by budget)
- Gemini API: gemini-2.5-pro, gemini-2.5-flash
Per-provider budgets via LLM_ROUTER_BUDGET_OPENAI=10.00, LLM_ROUTER_BUDGET_GEMINI=5.00.
Claude Subscription Monitoring
Live usage data from your claude.ai account — no guessing:
+----------------------------------------------------------+
| Claude Subscription (Live) |
+----------------------------------------------------------+
| Session [====........] 35% resets in 3h 7m |
| Weekly (all) [===.........] 23% resets Fri 01:00 PM |
| Sonnet only [===.........] 26% resets Wed 10:00 AM |
+----------------------------------------------------------+
| OK 35% pressure -- full model selection |
+----------------------------------------------------------+
Fetched via Playwright from claude.ai's internal JSON API (same data the settings page uses). One browser_evaluate call, cached in memory for routing decisions.
Providers
Text & Code LLMs
| Provider | Models | Free Tier | Best For |
|---|---|---|---|
| 🦙 Ollama | Any local model | Yes (free forever) | Privacy, zero cost, offline use |
| Google Gemini | 2.5 Pro, 2.5 Flash | Yes (1M tokens/day) | Generation, long context |
| Groq | Llama 3.3, Mixtral | Yes | Ultra-fast inference |
| OpenAI | GPT-4o, GPT-4o-mini, o3 | No | Code, analysis, reasoning |
| Perplexity | Sonar, Sonar Pro | No | Research, current events |
| Anthropic | Claude Sonnet, Haiku | No | Nuanced writing, safety |
| Deepseek | V3, Reasoner | Yes (limited) | Cost-effective reasoning |
| Mistral | Large, Small | Yes (limited) | Multilingual |
| Together | Llama 3, CodeLlama | Yes (limited) | Open-source models |
| xAI | Grok 3 | No | Real-time information |
| Cohere | Command R+ | Yes (trial) | RAG, enterprise search |
🦙 Ollama runs models locally — no API key, no cost, no data sent externally. Full Ollama setup guide →
Image Generation
| Provider | Models | Best For |
|---|---|---|
| Google Gemini | Imagen 3 | High quality, integrated with text models |
| fal.ai | Flux Pro, Flux Dev | Quality/cost ratio, fast generation |
| OpenAI | DALL-E 3, DALL-E 2 | Prompt adherence, text in images |
| Stability AI | Stable Diffusion 3 | Fine control, open weights |
Video Generation
| Provider | Models | Best For |
|---|---|---|
| Google Gemini | Veo 2 | Integrated with Gemini ecosystem |
| Runway | Gen-3 Alpha | Professional quality, motion control |
| fal.ai | Kling, minimax | Value, fast generation |
| Replicate | Various | Open-source video models |
Audio & Voice
| Provider | Models | Best For |
|---|---|---|
| ElevenLabs | Multilingual v2 | Voice cloning, highest quality |
| OpenAI | TTS-1, TTS-1-HD | Cost-effective text-to-speech |
20+ providers and growing. See docs/PROVIDERS.md for full setup guides with API key links.
MCP Tools
Once installed, Claude Code gets these 25 tools:
| Tool | What It Does |
|---|---|
| Smart Routing | |
llm_classify | Classify complexity + recommend model with time-aware budget pressure |
llm_route | Auto-classify, then route to the best external LLM |
llm_track_usage | Report Claude Code token usage for budget tracking |
| Text & Code | |
llm_query | General questions — auto-routed to the best text LLM |
llm_research | Search-augmented answers via Perplexity |
llm_generate | Creative content — writing, summaries, brainstorming |
llm_analyze | Deep reasoning — analysis, debugging, problem decomposition |
llm_code | Coding tasks — generation, refactoring, algorithms |
llm_edit | Route code-edit reasoning to a cheap model → returns exact {file, old_string, new_string} pairs for Claude to apply |
| Media | |
llm_image | Image generation — Gemini Imagen, DALL-E, Flux, or SD |
llm_video | Video generation — Gemini Veo, Runway, Kling, etc. |
llm_audio | Voice/audio — TTS via ElevenLabs or OpenAI |
| Orchestration | |
llm_orchestrate | Multi-step pipelines across multiple models |
llm_pipeline_templates | List available orchestration templates |
| Cache | |
llm_cache_stats | View cache hit rate, entries, memory estimate, evictions |
llm_cache_clear | Clear the classification cache |
| Streaming | |
llm_stream | Stream LLM responses for long-running tasks — output as it arrives |
| Monitoring & Setup | |
llm_check_usage | Check live Claude subscription usage (session %, weekly %) |
llm_update_usage | Feed live usage data from claude.ai into the router |
llm_codex | Route tasks to local Codex desktop agent (free, uses OpenAI sub) |
llm_setup | Discover API keys, add providers, get setup guides, validate keys, install global hooks |
llm_quality_report | Routing accuracy, classifier stats, savings metrics, downshift rate |
llm_set_profile | Switch routing profile (budget / balanced / premium) |
llm_usage | Unified dashboard — Claude sub, Codex, APIs, savings in one view |
llm_health | Check provider availability and circuit breaker status |
llm_providers | List all supported and configured providers |
| Session Memory | |
llm_save_session | Summarize + persist current session for cross-session context injection |
Context injection: text tools (
llm_query,llm_research,llm_generate,llm_analyze,llm_code) automatically prepend recent conversation history and previous session summaries to every external LLM call — so GPT-4o, Gemini, and Perplexity receive the same context you have. Passcontext="..."to add caller-supplied context on top. Controlled byLLM_ROUTER_CONTEXT_ENABLED(default: on).
Routing Profiles
Three built-in profiles map to task complexity. Model order is pressure-aware — the router dynamically reorders chains based on live Claude subscription usage.
| Budget (simple) | Balanced (medium) | Premium (complex) | |
|---|---|---|---|
| Text | Ollama → Haiku → cheap | Sonnet → DeepSeek → GPT-4o | Opus → Sonnet → o3 |
| Research | Perplexity Sonar | Perplexity Sonar Pro | Perplexity Sonar Pro |
| Code | Ollama → Haiku → DeepSeek | Sonnet → DeepSeek → GPT-4o | Opus → Sonnet → DeepSeek-R1 → o3 |
| Image | Flux Dev, Imagen Fast | Flux Pro, Imagen 3, DALL-E 3 | Imagen 3, DALL-E 3 |
| Video | minimax, Veo 2 | Kling, Veo 2, Runway Turbo | Veo 2, Runway Gen-3 |
| Audio | OpenAI TTS | ElevenLabs | ElevenLabs |
Quota-aware chain reordering
Claude Pro/Max tokens are treated as free — the router uses them first. As quota is consumed, chains automatically reorder to preserve remaining Claude budget:
| Claude usage | Chain order |
|---|---|
| 0–84% | Claude first (free under subscription) |
| 85–98% | DeepSeek/Codex → cheap externals → Claude last |
| ≥ 99% (hard cap) | DeepSeek → Codex → cheap → paid — zero Claude |
| Research (any) | Perplexity always first (web-grounded) |
Claude Code subscription mode
If you use Claude Code (Pro/Max), set LLM_ROUTER_CLAUDE_SUBSCRIPTION=true in .env. The router will never route to Anthropic via API — you're already on Claude, so API routing would require a separate key and add duplicate billing. Instead, every task routes to the best non-Claude alternative:
# In .env
LLM_ROUTER_CLAUDE_SUBSCRIPTION=true # no ANTHROPIC_API_KEY needed
At normal quota (< 85%), chains lead with the highest-quality available model. At high quota (> 85%), DeepSeek takes over — quality 1.0 benchmark score at ~1/8th the cost of GPT-4o:
| Low quota (< 85%) | High quota (> 85%) | |
|---|---|---|
| BUDGET/CODE | DeepSeek Chat | DeepSeek Chat |
| BALANCED/CODE | DeepSeek Chat | DeepSeek Chat |
| BALANCED/ANALYZE | DeepSeek Reasoner | DeepSeek Reasoner |
| PREMIUM/CODE | o3 | DeepSeek Reasoner |
| PREMIUM/ANALYZE | DeepSeek Reasoner | DeepSeek Reasoner |
Switch profile anytime:
llm_set_profile("budget") # Development, drafts, exploration
llm_set_profile("balanced") # Production work, client deliverables
llm_set_profile("premium") # Critical tasks, maximum quality
Budget Control
Set a monthly budget to prevent overspending:
# In .env
LLM_ROUTER_MONTHLY_BUDGET=50 # USD, 0 = unlimited
The router:
- Tracks real-time spend across all providers in SQLite
- Blocks requests when the monthly budget is reached
- Shows budget status in
llm_usage
llm_usage("month")
## Usage Summary (month)
Calls: 142
Tokens: 240,000 in + 80,000 out = 320,000 total
Cost: $3.4200
Avg latency: 1200ms
### Budget Status
Monthly budget: $50.00
Spent this month: $3.4200 (6.8%)
Remaining: $46.5800
Multi-Step Orchestration
Chain tasks across different models in a pipeline:
llm_orchestrate("Research AI trends and write a report", template="research_report")
Built-in templates:
| Template | Steps | Pipeline |
|---|---|---|
research_report | 3 | Research → Analyze → Write |
competitive_analysis | 4 | Multi-source research → SWOT → Report |
content_pipeline | 4 | Research → Draft → Review → Polish |
code_review_fix | 3 | Review → Fix → Test |
Configuration
Environment Variables
# Required: at least one provider
GEMINI_API_KEY=AIza... # Free tier! https://aistudio.google.com/apikey
OPENAI_API_KEY=sk-proj-...
PERPLEXITY_API_KEY=pplx-...
# Optional: more providers (add as many as you want)
ANTHROPIC_API_KEY=sk-ant-...
DEEPSEEK_API_KEY=...
GROQ_API_KEY=gsk_...
FAL_KEY=...
ELEVENLABS_API_KEY=...
# Router config
LLM_ROUTER_PROFILE=balanced # budget | balanced | premium
LLM_ROUTER_MONTHLY_BUDGET=0 # USD, 0 = unlimited
LLM_ROUTER_CLAUDE_SUBSCRIPTION=false # true = you're a Claude Code Pro/Max user;
# anthropic/* excluded, router uses non-Claude models
# Smart routing (Claude Code model selection)
DAILY_TOKEN_BUDGET=0 # tokens/day, 0 = unlimited
QUALITY_MODE=balanced # best | balanced | conserve
MIN_MODEL=haiku # floor: haiku | sonnet | opus
See .env.example for the full list of supported providers.
Claude Code Integration
After running ./scripts/install.sh, your ~/.claude.json will include:
{
"mcpServers": {
"llm-router": {
"command": "uv",
"args": ["run", "--directory", "/path/to/llm-router", "llm-router"]
}
}
}
Development
# Install with dev dependencies
uv sync --extra dev
# Run tests
uv run pytest -v
# Run integration tests (requires real API keys)
uv run pytest tests/test_integration.py -v
# Lint
uv run ruff check src/
Roadmap
See ROADMAP.md for the detailed roadmap with phases and priorities.
Completed (v0.1–v0.5)
- Core text LLM routing (10+ providers)
- Configurable profiles (budget / balanced / premium)
- Cost tracking with SQLite
- Health checks with circuit breaker
- Image generation (Gemini Imagen 3, DALL-E, Flux, SD)
- Video generation (Gemini Veo 2, Runway, Kling, minimax)
- Audio/voice routing (ElevenLabs, OpenAI TTS)
- Monthly budget enforcement
- Multi-step orchestration with pipeline templates
- Claude Code plugin with orchestrator agent and /route skill
- Freemium tier gating
- CI with GitHub Actions
- Smart complexity-first routing (simple->haiku, moderate->sonnet, complex->opus)
- Live Claude subscription monitoring (session %, weekly %, Sonnet %)
- Time-aware budget pressure (factors in session reset proximity)
- External fallback ranking when Claude is tight (Codex, OpenAI, Gemini)
- Codex desktop integration (local agent, free via OpenAI subscription)
- Unified usage dashboard (Claude sub + Codex + APIs + savings)
-
llm_setuptool for API discovery and secure key management - Per-provider budget limits
- ASCII box-drawing dashboard (terminal-friendly, no Unicode issues)
- Prompt classification cache (SHA-256 exact-match, in-memory LRU, 1h TTL)
-
llm_cache_stats+llm_cache_clearMCP tools - Auto-route hook (UserPromptSubmit heuristic classifier, zero-latency)
- Rate limit detection with smart cooldowns (15s rate limit vs 60s hard failure)
-
llm_setup(action='test')— API key validation with minimal LLM calls - Streaming responses (
llm_streamtool +call_llm_stream()async generator) - Usage auto-refresh hook (PostToolUse staleness detection + usage pulse wiring)
- Published to PyPI as
claude-code-llm-router - Multi-layer auto-classification: scoring heuristic → Ollama local LLM (qwen3.5) → cheap API (Gemini Flash/GPT-4o-mini)
- Savings awareness (PostToolUse hook tracks routed calls, periodic cost savings reminders)
- Structural context compaction (5 strategies: whitespace, comments, dedup, truncation, stack traces)
- Quality logging (
routing_decisionstable +llm_quality_reporttool) - Savings persistence (JSONL + SQLite import, lifetime analytics)
- Gemini media APIs (Imagen 3 images, Veo 2 video)
- Global hook installer (
llm_setup(action='install_hooks')+llm-router-install-hooksCLI) - Global routing rules (auto-installed to
~/.claude/rules/llm-router.md) - Session context injection (ring buffer + SQLite summaries, injected into all text tools)
-
llm_save_sessionMCP tool (auto-summarize + persist session for future context) - Cross-session memory (previous session summaries prepended to external LLM calls)
- Auto-update routing rules (version header + silent update on MCP startup after pip upgrade)
- Token arbitrage enforcement — routing hint override bug fixed; simple tasks now correctly route to cheap models
- Claude Code subscription mode (
LLM_ROUTER_CLAUDE_SUBSCRIPTION) — exclude Anthropic from chains; route to DeepSeek/Gemini/GPT-4o instead - Quality-cost tier sorting — within 5% quality band, prefer cheaper model (GPT-4o over Sonnet, DeepSeek over everyone when near-equal quality)
- DeepSeek Reasoner in cheap tier — $0.0014/1K leads at >85% pressure (was treated as "paid" tier alongside o3 at $0.025)
- Codex injection fix — no longer injected at position 0 when subscription mode removes Claude from chain (caused 300s timeouts)
- Codex task filtering — excluded from RESEARCH (no web access) and QUERY (too slow) chains
Completed (v0.7)
- Availability-aware routing — P95 latency from
routing_decisionstable folded into benchmark quality score. Penalty range 0.0–0.50 (<5s=0, <15s=0.03, <60s=0.10, <180s=0.30, ≥180s=0.50). 60s cache prevents repeated DB hits per routing cycle. - Codex cold-start defaults —
_COLD_START_LATENCY_MSapplies pessimistic 60-90s P95 before any history exists, preventing Codex from being placed first in chains on a fresh install. -
llm_editMCP tool — Routes code-edit reasoning to a cheap CODE model. Reads files locally (32 KB cap), gets{file, old_string, new_string}JSON back, returns formatted instructions for Claude to apply mechanically. Keeps Opus out of the "what to change" loop.
Next Up (v0.8 — Evaluation & Learning)
- Classification outcome tracking (was the routed model's response good?)
- A/B testing framework for routing decisions
- Adaptive routing based on historical success rates
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Key areas where help is needed:
- Adding new provider integrations
- Improving routing intelligence
- Testing across different MCP clients
- Documentation and examples
License
MIT — use it however you want.
Serveurs connexes
RateAPI MCP Server
Real interest rates from 1,400+ US credit unions across 50 states. Covers mortgages, auto loans, HELOCs, personal loans, and credit cards. Rates ranked by APR with zero affiliate bias. Works with Claude Desktop and ChatGPT. Free tier available.
D&D MCP Server
A server for managing Dungeons & Dragons campaigns, storing all data in local JSON files.
Lichess MCP
Interact with the Lichess chess platform using natural language.
Euroleague Live
Provides club information and advanced player statistics for Euroleague and Eurocup basketball from the Euroleague API.
Strider DoorDash
MCP server for DoorDash food delivery - AI agents can search restaurants, browse menus, and place delivery orders.
Government Contracts MCP
SAM.gov federal contract opportunities and USAspending award data. 4 MCP tools for procurement intelligence.
CardRail
MCP server that lets AI agents make purchases with personal Visa/Mastercard cards and configurable guardrails. No LLC required.
Weather MCP
An API for AI models to fetch weather data.
BSC MultiSend MCP
Perform bulk BNB and BEP20 token transfers on the BNB Smart Chain (BSC).
System Information MCP Server
Provides real-time system information and metrics, including CPU, memory, disk, network, and process status.