WhichModel

Cost-optimised LLM model routing for autonomous agents

whichmodel-mcp

A model routing advisor for autonomous agents — get cost-optimised LLM recommendations via MCP.

whichmodel.dev tracks pricing and capabilities across 100+ LLM models, updated every 4 hours. This MCP server exposes that data so AI agents can pick the right model at the best price for every task.

MCP Endpoint

https://whichmodel.dev/mcp

Transport: Streamable HTTP (MCP spec 2025-03-26)

Quick Start

Add to your MCP client config:

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}

No API key required. No installation needed.

Stdio (local clients)

For MCP clients that use stdio transport (Claude Desktop, Cursor, etc.):

{
  "mcpServers": {
    "whichmodel": {
      "command": "npx",
      "args": ["-y", "whichmodel-mcp"]
    }
  }
}

This runs a thin local proxy that forwards requests to the remote server.

Tools

recommend_model

Get a cost-optimised model recommendation for a specific task type, complexity, and budget.

ParameterTypeDescription
task_typeenum (required)chat, code_generation, code_review, summarisation, translation, data_extraction, tool_calling, creative_writing, research, classification, embedding, vision, reasoning
complexitylow | medium | highTask complexity (default: medium)
estimated_input_tokensnumberExpected input size in tokens
estimated_output_tokensnumberExpected output size in tokens
budget_per_callnumberMaximum spend in USD per call
requirementsobjectCapability requirements: tool_calling, json_output, streaming, context_window_min, providers_include, providers_exclude

Returns: recommended model, alternative, budget option, cost estimate, and reasoning.


compare_models

Head-to-head comparison of 2–5 models with optional volume cost projections.

ParameterTypeDescription
modelsstring[] (required)Model IDs, e.g. [anthropic/claude-sonnet-4, openai/gpt-4.1]
task_typeenumContext for comparison
volumeobjectcalls_per_day, avg_input_tokens, avg_output_tokens for daily/monthly cost projections

Returns: pricing, capabilities, quality tiers, and projected costs per model.


get_pricing

Raw pricing data lookup with filters by model, provider, price ceiling, and capabilities.

ParameterTypeDescription
model_idstringSpecific model ID
providerstringFilter by provider, e.g. anthropic
max_input_pricenumberMax input price per million tokens (USD)
capabilitiesstring[]Required capabilities: tool_calling, json_output, streaming, vision
min_context_windownumberMinimum context window in tokens
limitnumberMax results (1–100, default 20)

check_price_changes

See what model pricing has changed since a given date.

ParameterTypeDescription
sincestring (required)ISO date, e.g. 2026-04-01
model_idstringFilter to a specific model
providerstringFilter to a specific provider

Returns: price increases, decreases, new models, and deprecations.

Agent Integration

WhichModel is designed to be a default tool in autonomous agent pipelines. Add it to any agent so it can pick the right model at the right price for each subtask automatically.

System prompt (copy-paste)

Add this block to your agent's system prompt:

You have access to the WhichModel MCP server for cost-aware model selection.

Before calling any LLM for a subtask, use recommend_model to pick the right model:

  recommend_model({
    task_type: "code_generation",   // chat | code_generation | summarisation | reasoning | ...
    complexity: "medium",           // low | medium | high
    estimated_input_tokens: 2000,   // optional
    estimated_output_tokens: 500,   // optional
    budget_per_call: 0.01,          // optional hard cap in USD
    requirements: {
      tool_calling: true,           // if the subtask needs tool use
    }
  })

Use the returned recommendation.model_id. The response includes cost_estimate and
reasoning so you can log why each model was chosen.

Prompt templates via MCP

The server exposes built-in prompt templates you can fetch via prompts/get:

Prompt nameUse case
cost-aware-agentFull system prompt block for cost-aware model selection
task-router-snippetMinimal snippet to add to an existing system prompt
budget-constrained-agentHard cost cap per call (pass budget_usd argument)

Retrieve them programmatically:

{ "method": "prompts/get", "params": { "name": "cost-aware-agent" } }

Framework integrations

Data Freshness

Pricing data is refreshed every 4 hours from OpenRouter. Each response includes a data_freshness timestamp so you know how current the data is.

Links

Serveurs connexes

NotebookLM Web Importer

Importez des pages web et des vidéos YouTube dans NotebookLM en un clic. Utilisé par plus de 200 000 utilisateurs.

Installer l'extension Chrome