WhichModel
Cost-optimised LLM model routing for autonomous agents
whichmodel-mcp
A model routing advisor for autonomous agents — get cost-optimised LLM recommendations via MCP.
whichmodel.dev tracks pricing and capabilities across 100+ LLM models, updated every 4 hours. This MCP server exposes that data so AI agents can pick the right model at the best price for every task.
MCP Endpoint
https://whichmodel.dev/mcp
Transport: Streamable HTTP (MCP spec 2025-03-26)
Quick Start
Add to your MCP client config:
{
"mcpServers": {
"whichmodel": {
"url": "https://whichmodel.dev/mcp"
}
}
}
No API key required. No installation needed.
Stdio (local clients)
For MCP clients that use stdio transport (Claude Desktop, Cursor, etc.):
{
"mcpServers": {
"whichmodel": {
"command": "npx",
"args": ["-y", "whichmodel-mcp"]
}
}
}
This runs a thin local proxy that forwards requests to the remote server.
Tools
recommend_model
Get a cost-optimised model recommendation for a specific task type, complexity, and budget.
| Parameter | Type | Description |
|---|---|---|
task_type | enum (required) | chat, code_generation, code_review, summarisation, translation, data_extraction, tool_calling, creative_writing, research, classification, embedding, vision, reasoning |
complexity | low | medium | high | Task complexity (default: medium) |
estimated_input_tokens | number | Expected input size in tokens |
estimated_output_tokens | number | Expected output size in tokens |
budget_per_call | number | Maximum spend in USD per call |
requirements | object | Capability requirements: tool_calling, json_output, streaming, context_window_min, providers_include, providers_exclude |
Returns: recommended model, alternative, budget option, cost estimate, and reasoning.
compare_models
Head-to-head comparison of 2–5 models with optional volume cost projections.
| Parameter | Type | Description |
|---|---|---|
models | string[] (required) | Model IDs, e.g. [anthropic/claude-sonnet-4, openai/gpt-4.1] |
task_type | enum | Context for comparison |
volume | object | calls_per_day, avg_input_tokens, avg_output_tokens for daily/monthly cost projections |
Returns: pricing, capabilities, quality tiers, and projected costs per model.
get_pricing
Raw pricing data lookup with filters by model, provider, price ceiling, and capabilities.
| Parameter | Type | Description |
|---|---|---|
model_id | string | Specific model ID |
provider | string | Filter by provider, e.g. anthropic |
max_input_price | number | Max input price per million tokens (USD) |
capabilities | string[] | Required capabilities: tool_calling, json_output, streaming, vision |
min_context_window | number | Minimum context window in tokens |
limit | number | Max results (1–100, default 20) |
check_price_changes
See what model pricing has changed since a given date.
| Parameter | Type | Description |
|---|---|---|
since | string (required) | ISO date, e.g. 2026-04-01 |
model_id | string | Filter to a specific model |
provider | string | Filter to a specific provider |
Returns: price increases, decreases, new models, and deprecations.
Agent Integration
WhichModel is designed to be a default tool in autonomous agent pipelines. Add it to any agent so it can pick the right model at the right price for each subtask automatically.
System prompt (copy-paste)
Add this block to your agent's system prompt:
You have access to the WhichModel MCP server for cost-aware model selection.
Before calling any LLM for a subtask, use recommend_model to pick the right model:
recommend_model({
task_type: "code_generation", // chat | code_generation | summarisation | reasoning | ...
complexity: "medium", // low | medium | high
estimated_input_tokens: 2000, // optional
estimated_output_tokens: 500, // optional
budget_per_call: 0.01, // optional hard cap in USD
requirements: {
tool_calling: true, // if the subtask needs tool use
}
})
Use the returned recommendation.model_id. The response includes cost_estimate and
reasoning so you can log why each model was chosen.
Prompt templates via MCP
The server exposes built-in prompt templates you can fetch via prompts/get:
| Prompt name | Use case |
|---|---|
cost-aware-agent | Full system prompt block for cost-aware model selection |
task-router-snippet | Minimal snippet to add to an existing system prompt |
budget-constrained-agent | Hard cost cap per call (pass budget_usd argument) |
Retrieve them programmatically:
{ "method": "prompts/get", "params": { "name": "cost-aware-agent" } }
Framework integrations
- LangChain:
langchain-whichmodel—WhichModelRouterchain - Haystack:
whichmodel-haystack—WhichModelRoutercomponent
Data Freshness
Pricing data is refreshed every 4 hours from OpenRouter. Each response includes a data_freshness timestamp so you know how current the data is.
Links
- Website: whichmodel.dev
- MCP endpoint: https://whichmodel.dev/mcp
- Discovery: https://whichmodel.dev/.well-known/mcp.json
관련 서버
Scout Monitoring MCP
스폰서Put performance and error data directly in the hands of your AI assistant.
Alpha Vantage MCP Server
스폰서Access financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
Azure DevOps MCP Server for Cursor
Integrate Azure DevOps with Cursor IDE. Provides over 60 tools using PAT authentication, no CLI needed.
Universal Crypto MCP
Enable AI agents to interact with any EVM blockchain through natural language.
Flow MCP
A set of tools for interacting with the Flow blockchain through the Model Context Protocol.
APIWeaver
Dynamically creates MCP servers from web API configurations, integrating any REST API, GraphQL endpoint, or web service into MCP-compatible tools.
MCP LaTeX Server
Create, edit, and manage LaTeX files. Requires an external LaTeX distribution like MiKTeX, TeX Live, or MacTeX.
AI Pair Programmer (Ruby)
AI-powered tools for code review, brainstorming, performance analysis, and security review in Ruby.
Raysurfer Code Caching
MCP server for LLM output caching and reuse. Caches and retrieves code from prior AI agent executions, delivering cached outputs up to 30x faster.
PHP MCP Server
Provides semantic PHP code analysis and refactoring tools, enabling AI assistants to perform safe, intelligent code transformations at the AST level.
MCP HTTP Requests
A comprehensive HTTP client for API testing, web automation, and security testing with detailed logging.
Framer Plugin MCP Server
Create and manage Framer plugins with web3 capabilities.