WhichModel

Cost-optimised LLM model routing for autonomous agents

whichmodel-mcp

A model routing advisor for autonomous agents — get cost-optimised LLM recommendations via MCP.

whichmodel.dev tracks pricing and capabilities across 100+ LLM models, updated every 4 hours. This MCP server exposes that data so AI agents can pick the right model at the best price for every task.

MCP Endpoint

https://whichmodel.dev/mcp

Transport: Streamable HTTP (MCP spec 2025-03-26)

Quick Start

Add to your MCP client config:

{
  "mcpServers": {
    "whichmodel": {
      "url": "https://whichmodel.dev/mcp"
    }
  }
}

No API key required. No installation needed.

Stdio (local clients)

For MCP clients that use stdio transport (Claude Desktop, Cursor, etc.):

{
  "mcpServers": {
    "whichmodel": {
      "command": "npx",
      "args": ["-y", "whichmodel-mcp"]
    }
  }
}

This runs a thin local proxy that forwards requests to the remote server.

Tools

`recommend_model`

Get a cost-optimised model recommendation for a specific task type, complexity, and budget.

Parameter	Type	Description
`task_type`	enum (required)	`chat`, `code_generation`, `code_review`, `summarisation`, `translation`, `data_extraction`, `tool_calling`, `creative_writing`, `research`, `classification`, `embedding`, `vision`, `reasoning`
`complexity`	`low` \| `medium` \| `high`	Task complexity (default: `medium`)
`estimated_input_tokens`	number	Expected input size in tokens
`estimated_output_tokens`	number	Expected output size in tokens
`budget_per_call`	number	Maximum spend in USD per call
`requirements`	object	Capability requirements: `tool_calling`, `json_output`, `streaming`, `context_window_min`, `providers_include`, `providers_exclude`

Returns: recommended model, alternative, budget option, cost estimate, and reasoning.

`compare_models`

Head-to-head comparison of 2–5 models with optional volume cost projections.

Parameter	Type	Description
`models`	string[] (required)	Model IDs, e.g. `[anthropic/claude-sonnet-4, openai/gpt-4.1]`
`task_type`	enum	Context for comparison
`volume`	object	`calls_per_day`, `avg_input_tokens`, `avg_output_tokens` for daily/monthly cost projections

Returns: pricing, capabilities, quality tiers, and projected costs per model.

`get_pricing`

Raw pricing data lookup with filters by model, provider, price ceiling, and capabilities.

Parameter	Type	Description
`model_id`	string	Specific model ID
`provider`	string	Filter by provider, e.g. `anthropic`
`max_input_price`	number	Max input price per million tokens (USD)
`capabilities`	string[]	Required capabilities: `tool_calling`, `json_output`, `streaming`, `vision`
`min_context_window`	number	Minimum context window in tokens
`limit`	number	Max results (1–100, default 20)

`check_price_changes`

See what model pricing has changed since a given date.

Parameter	Type	Description
`since`	string (required)	ISO date, e.g. `2026-04-01`
`model_id`	string	Filter to a specific model
`provider`	string	Filter to a specific provider

Returns: price increases, decreases, new models, and deprecations.

Agent Integration

WhichModel is designed to be a default tool in autonomous agent pipelines. Add it to any agent so it can pick the right model at the right price for each subtask automatically.

System prompt (copy-paste)

Add this block to your agent's system prompt:

You have access to the WhichModel MCP server for cost-aware model selection.

Before calling any LLM for a subtask, use recommend_model to pick the right model:

  recommend_model({
    task_type: "code_generation",   // chat | code_generation | summarisation | reasoning | ...
    complexity: "medium",           // low | medium | high
    estimated_input_tokens: 2000,   // optional
    estimated_output_tokens: 500,   // optional
    budget_per_call: 0.01,          // optional hard cap in USD
    requirements: {
      tool_calling: true,           // if the subtask needs tool use
    }
  })

Use the returned recommendation.model_id. The response includes cost_estimate and
reasoning so you can log why each model was chosen.

Prompt templates via MCP

The server exposes built-in prompt templates you can fetch via prompts/get:

Prompt name	Use case
`cost-aware-agent`	Full system prompt block for cost-aware model selection
`task-router-snippet`	Minimal snippet to add to an existing system prompt
`budget-constrained-agent`	Hard cost cap per call (pass `budget_usd` argument)

Retrieve them programmatically:

{ "method": "prompts/get", "params": { "name": "cost-aware-agent" } }

Framework integrations

LangChain: langchain-whichmodel — WhichModelRouter chain
Haystack: whichmodel-haystack — WhichModelRouter component

Data Freshness

Pricing data is refreshed every 4 hours from OpenRouter. Each response includes a data_freshness timestamp so you know how current the data is.

Servidores relacionados

Scout Monitoring MCP

patrocinador

Put performance and error data directly in the hands of your AI assistant.

Alpha Vantage MCP Server

patrocinador

Access financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more

MCP Playground

A playground for MCP implementations featuring multiple microservices, including news and weather examples.

APIWeaver

Dynamically creates MCP servers from web API configurations, integrating any REST API, GraphQL endpoint, or web service into MCP-compatible tools.

iOS Device Control

An MCP server to control iOS simulators and real devices, enabling AI assistant integration on macOS.

Layered Code

An AI-assisted web development tool for creating, modifying, and deploying code through natural language conversations.

Hetzner Cloud MCP Server — (Cloud API + SSH)

Hetzner Cloud MCP Server — two management layers (Cloud API + SSH) with 60 tools. Manage server power, snapshots, firewalls, DNS, plus SSH into servers for service control, log viewing, Nginx management, MySQL queries, and system monitoring. Self-hosted PHP, MIT licensed.

MCP Spring Boot Actuator

Spring Boot Actuator MCP server — analyzes health, metrics, environment, beans, and startup endpoints. Detects configuration issues and security risks with actionable recommendations.

AC to Automation Converter

An AI-powered system that converts Acceptance Criteria (AC) from QA specifications into automated browser testing workflows.

Minecraft MCP Server

A Python MCP server to control a Minecraft server via RCON using FastMCP.

QuantConnect

A server for local interactions with the QuantConnect API.

MCP Server AI extension

Provides AI extension capabilities via the Model Context Protocol.