LLM Router

Multi-LLM routing MCP server — route text, image, video, and audio tasks to 20+ providers (OpenAI, Gemini, Perplexity, Anthropic, fal, ElevenLabs, Runway) with automatic complexity-based model selection, budget control, and provider failover.

LLM Router animated hero — route every AI call through a moving complexity pipeline into free, budget, and premium model tiers across 20+ providers, 60 MCP tools, and 60-80% savings.

LLM Router

A local control plane for AI coding tools.
Routes tasks to the cheapest model that can do the job well.
Protects quota. Enforces policy. Tracks spend. Falls back on failure.

PyPI Tests Stars Downloads (claude-code-llm-router) Downloads (llm-routing) Python MCP License

Star llm-router on GitHub


Why This Exists

AI coding assistants route every task — simple questions, complex architecture — to the same expensive model. You pay full price for work that a cheaper model handles equally well.

llm-router sits between your AI tool and the LLM providers. It classifies each task by complexity, picks the cheapest capable model, and falls back through a provider chain on failure. You don't change your workflow. The router handles model selection automatically.

Use this if:

  • You use Claude Code, Codex CLI, Gemini CLI, or Pi and want to reduce spend
  • You want automatic fallback when a provider is down or rate-limited
  • You want local Ollama models tried first (free) before paid APIs
  • You want visibility into token spend across providers

Don't use this if:

  • You always want the best possible model regardless of cost
  • You don't use MCP-compatible tools
  • You need guaranteed latency (routing adds classification overhead)

Animated benefits panel for llm-router showing cheaper routing, preserved quality, quota protection, and low-config setup.


Quick Start

1. Install

pip install llm-routing
llm-router install

Package name: llm-routing on PyPI. CLI command: llm-router.

2. Add providers (optional)

export OPENAI_API_KEY="sk-..."      # GPT-4o, o3
export GEMINI_API_KEY="AIza..."     # Gemini Flash/Pro (free tier available)
export OLLAMA_BASE_URL="http://localhost:11434"  # Local models (free)

Works with zero API keys on Claude Code Pro/Max subscriptions — routing uses MCP tools that call external models only when beneficial.

3. Verify

llm-router install --check   # Preview what will be installed
llm-router health            # Check provider connectivity

In Claude Code, ask a simple question. The session-end summary shows routing decisions and savings.


How It Works

User prompt
    │
    ▼
┌──────────────────────┐
│ Complexity Classifier │  ← Heuristic (free, instant) or Ollama/Flash ($0.0001)
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│  Free-First Router   │  ← Tries cheapest model first, walks up the chain
│                      │
│  Ollama (free)       │
│  → Codex (prepaid)   │
│  → Gemini Flash      │
│  → GPT-4o / Claude   │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│  Guards (parallel)   │  ← Circuit breaker, budget pressure, quality check
└──────────┬───────────┘
           │
           ▼
      Response + cost logged to local SQLite

Routing examples

TaskComplexityChain
"What does this error mean?"SimpleOllama → Codex → Gemini Flash → Groq
"Implement OAuth"ModerateOllama → Codex → GPT-4o → Gemini Pro
"Design distributed tracing"ComplexOllama → Codex → o3 → Claude Opus

Classification is free (regex heuristics catch ~70% of tasks) or near-free (local Ollama / Gemini Flash for ambiguous cases).


Host Support

HostAuto-RoutingMCP ToolsSavings Potential
Claude CodeFull (hooks)60 tools60–80%
Codex CLIFull (hooks)60 tools60–80%
Gemini CLIFull (hooks)60 tools50–70%
VS Code / CursorManual60 tools30–50%
Any MCP clientManual60 toolsVaries

Animated host support cards for Claude Code, Codex CLI, Gemini CLI, Pi, VS Code, Cursor, and any MCP client.

Full = hooks intercept prompts and route automatically. No workflow change needed. Manual = MCP tools are available; you invoke them explicitly (e.g., call llm_query).

llm-router install                    # Claude Code (default)
llm-router install --host codex       # Codex CLI
llm-router install --host gemini-cli  # Gemini CLI
llm-router install --host vscode      # VS Code
llm-router install --host cursor      # Cursor

See docs/HOST_SUPPORT_MATRIX.md for full details on each host.


What You Can Do

Use caseHow
Route simple questions to free local modelsAuto (hooks) or llm_query
Protect Claude subscription quotaBudget pressure monitoring + auto-downgrade
Fall back across providers on failureAutomatic chain with circuit breakers
Track token spend and savingsllm_usage, llm_savings, session-end reports
Enforce routing policy for your teamLLM_ROUTER_POLICY=aggressive
Generate images/video/audiollm_image, llm_video, llm_audio
Run multi-step research pipelinesllm_orchestrate with templates
Bulk-edit files with cheap modelsllm_fs_edit_many

Providers

Routing chains are built from your configured providers. You only need one.

Text LLM Providers

ProviderModelsCostSetup
Ollamagemma4, qwen3.5, llama3, etc.Free (local)OLLAMA_BASE_URL
OpenAIGPT-4o, o3, GPT-4o-miniPaid APIOPENAI_API_KEY
GoogleGemini Flash, ProFree tier + paidGEMINI_API_KEY
AnthropicClaude Sonnet, Opus, HaikuPaid API or subscriptionANTHROPIC_API_KEY or subscription
xAIGrok-3Paid APIXAI_API_KEY
DeepSeekDeepSeek Chat, ReasonerPaid API (ultra-cheap)DEEPSEEK_API_KEY
MistralMistral Large, SmallPaid APIMISTRAL_API_KEY
CohereCommand R+Paid APICOHERE_API_KEY
PerplexitySonar Pro (web-grounded)Paid APIPERPLEXITY_API_KEY
GroqFast inference (Llama, Mixtral)Free tierGROQ_API_KEY
TogetherOpen-source modelsPaid APITOGETHER_API_KEY
HuggingFaceOpen-source modelsFree tier + paidHF_TOKEN
CodexGPT-5.4, o3 (prepaid desktop)Included with Codex CLIAuto-detected

Media Providers

ProviderTypeSetup
falImage (Flux), Video (Kling)FAL_KEY
StabilityImage (Stable Diffusion 3)STABILITY_API_KEY
ElevenLabsAudio / TTSELEVENLABS_API_KEY
RunwayVideo (Gen-3)RUNWAY_API_KEY
ReplicateVarious open-source modelsREPLICATE_API_TOKEN

See docs/PROVIDERS.md for setup instructions and model recommendations.


Routing Policies

Control how aggressively the router offloads to cheap models.

PolicyConfidence ThresholdTypical SavingsBest For
Aggressive260–75%Maximum cost reduction
Balanced (default)435–45%Cost/quality tradeoff
Conservative610–15%Quality over cost
export LLM_ROUTER_POLICY=aggressive     # Or: balanced, conservative
export LLM_ROUTER_ENFORCE=smart          # smart | hard | soft | off
export LLM_ROUTER_PROFILE=balanced       # budget | balanced | premium

LLM_ROUTER_ENFORCE controls how strictly the auto-route hook blocks direct model use:

  • smart — route when confident, pass through when uncertain
  • hard — always route, block unrouted tool calls
  • soft — suggest routing, never block
  • off — disable hook enforcement

MCP Tools (60)

llm-router exposes 60 MCP tools organized by function:

CategoryToolsExamples
Routing & classification7llm_route, llm_classify, llm_auto, llm_stream
Text generation6llm_query, llm_code, llm_analyze, llm_research
Media generation3llm_image, llm_video, llm_audio
Pipeline orchestration2llm_orchestrate, llm_pipeline_templates
Admin & monitoring20+llm_usage, llm_budget, llm_health, llm_savings
Filesystem operations4llm_fs_find, llm_fs_edit_many
Subscription tracking3llm_check_usage, llm_refresh_claude_usage

Slim mode (LLM_ROUTER_SLIM=routing or core) reduces registered tools to save context tokens in constrained environments.

Full Tool Reference


Savings: How It Works

Animated savings breakdown showing 60-80% typical cost reduction with token distribution across free, budget, and premium tiers.

Savings are calculated by comparing actual spend against a baseline of routing every task to Claude Sonnet/Opus.

Methodology:

  1. Each routed task logs: model used, tokens consumed, estimated cost
  2. A baseline cost is computed as if the same tokens were processed by the most expensive model in the chain
  3. Savings = (baseline - actual) / baseline

Assumptions and limitations:

  • Baseline assumes you would have used Opus/Sonnet for everything (worst case)
  • Token estimates use len(text) / 4 approximation, not exact tokenizer counts
  • Cost data comes from LiteLLM's pricing tables (may lag provider price changes)
  • Savings vary significantly by workload — code-heavy sessions route more to cheap models
  • The router itself adds small overhead (classification costs ~$0.0001 per ambiguous task)

Observed range: 35–80% savings depending on policy and task mix. The "87%" figure in some docs represents a single-user peak over a specific development period, not a guaranteed outcome.


Trust, Privacy, and Local-First Design

llm-router runs entirely on your machine. There is no hosted proxy, no telemetry, no account required.

WhatWhereDetails
Your promptsSent to configured providersExactly like using those providers directly
API keys.env or ~/.llm-router/config.yamlLocal files, never transmitted
Usage logs~/.llm-router/usage.dbUnencrypted SQLite (filesystem permissions)
Classification cacheIn-memoryCleared on process restart
Hook scripts~/.claude/hooks/Local shell scripts, inspectable

What we do:

  • Scrub API keys from structured logs
  • Detect hook deadlocks before installation
  • Store all data locally in ~/.llm-router/
  • Respect provider rate limits and TOS

What you should know:

  • Prompts are sent to whichever provider the router selects — review your provider's privacy policy
  • Usage logs (SQLite) are not encrypted at rest — use full-disk encryption if needed
  • The router cannot prevent model jailbreaks or prompt injection at the provider level

See SECURITY.md for responsible disclosure policy and docs/SECURITY_DESIGN.md for the full threat model.


Configuration

Minimal setup — only configure what you have:

# Provider keys (set any combination)
export OPENAI_API_KEY="sk-proj-..."
export GEMINI_API_KEY="AIza..."
export OLLAMA_BASE_URL="http://localhost:11434"
export OLLAMA_BUDGET_MODELS="gemma4:latest,qwen3.5:latest"

# Routing behavior
export LLM_ROUTER_PROFILE="balanced"       # budget | balanced | premium
export LLM_ROUTER_POLICY="balanced"        # aggressive | balanced | conservative
export LLM_ROUTER_ENFORCE="smart"          # smart | hard | soft | off

For teams or environments where .env is restricted:

# User-level config (no project .env needed)
mkdir -p ~/.llm-router && chmod 700 ~/.llm-router
cat > ~/.llm-router/config.yaml << 'EOF'
openai_api_key: "sk-proj-..."
gemini_api_key: "AIza..."
ollama_base_url: "http://localhost:11434"
llm_router_profile: "balanced"
EOF
chmod 600 ~/.llm-router/config.yaml

Documentation

DocumentPurpose
Quick Start (2 min)Fastest path to working routing
Getting StartedFull setup walkthrough
Host Support MatrixPer-host feature comparison
ProvidersProvider setup and model recommendations
Tool ReferenceAll 60 MCP tools with examples
ArchitectureInternal design and module structure
TroubleshootingCommon issues and fixes
Security DesignThreat model and data handling

Contributing

Contributions welcome. See CONTRIBUTING.md for full guidelines.

git clone https://github.com/ypollak2/llm-router.git
cd llm-router
uv sync --extra dev
uv run pytest tests/ -q         # Run tests (1700+)
uv run ruff check src/ tests/   # Lint

Package Names

NameWhat it is
llm-routingCurrent PyPI package (pip install llm-routing)
llm-routerCLI command and GitHub repo name
claude-code-llm-routerDeprecated legacy package (redirects to llm-routing)

Issues · PyPI · Changelog

MIT License

เซิร์ฟเวอร์ที่เกี่ยวข้อง

NotebookLM Web Importer

นำเข้าหน้าเว็บและวิดีโอ YouTube ไปยัง NotebookLM ด้วยคลิกเดียว ผู้ใช้กว่า 200,000 คนไว้วางใจ

ติดตั้งส่วนขยาย Chrome