LLMTest
LLM proxy that benchmarks AI models on real prompts and finds cheaper, faster alternatives across 340+ models.
LLMTest MCP Server
MCP server that benchmarks AI models on your actual prompts and finds cheaper, faster alternatives. Works with Claude Code, Cursor, Windsurf, and any MCP-compatible tool.
Quick Start
1. Get your API key
Sign up at llmtest.io and grab your API key from the dashboard.
2. Add to your tool
Claude Code:
claude mcp add llmtest -- npx llmtest-mcp
Then set your key:
export LLMTEST_API_KEY=llmt_your_key_here
Cursor / Windsurf / Other MCP clients:
Add to your MCP config file:
{
"mcpServers": {
"llmtest": {
"command": "npx",
"args": ["llmtest-mcp"],
"env": {
"LLMTEST_API_KEY": "llmt_your_key_here"
}
}
}
}
3. Talk to your AI
Just ask in natural language:
- "Check my LLMTest status"
- "Find cheaper models for my AI calls"
- "Run a benchmark on my blog-writer flow"
- "What models are trending?"
How It Works
LLMTest is a proxy that sits between your app and AI providers. Point your app at https://llmtest.io/v1 instead of calling OpenAI/Anthropic directly, and LLMTest tracks your usage, benchmarks alternatives, and suggests cost savings.
This MCP server gives your AI assistant access to LLMTest's tools so it can manage everything for you.
Available Tools
| Tool | Description |
|---|---|
status | Show proxy status and activity summary |
list_flows | List all AI flows with cost and latency stats |
get_suggestions | Get pending model-switch recommendations |
update_suggestion | Accept or dismiss a suggestion |
run_benchmark | Benchmark a flow against challenger models |
optimize_prompt | Rewrite a flow's prompt and find a cheaper model that still works |
seed_samples | Add test prompts for pre-launch benchmarking |
list_samples | Show stored test samples per flow |
list_new_models | Show new and trending models |
get_account | Check credit balance and usage |
get_autopilot_status | Check whether autopilot is on and whether the account is eligible |
enable_autopilot | Turn on weekly auto-optimization with safety gates + drift-based auto-revert |
disable_autopilot | Turn off autopilot (existing optimizations stay active) |
list_active_optimizations | List auto-accepted optimizations still inside their 24h revert window |
revert_optimization | Roll an auto-accepted optimization back to the previous prompt |
Autopilot
Autopilot automatically optimizes your flows on a weekly cadence. Changes that pass every safety gate go live with a 24-hour revert window. Drift detection keeps checking after that and rolls back if quality slips.
To enable from your IDE: ask your AI assistant something like "enable LLMTest autopilot". It will call enable_autopilot. Use get_autopilot_status to confirm prerequisites.
Prerequisites (checked per flow each cycle):
- Autopilot enabled on the account
- Email verified
- Account age ≥ 14 days (trust ramp)
- Flow has ≥ 20 real calls in the last 7 days
- Flow not optimized by autopilot in the last 14 days (cooldown)
- Positive credit balance (~$1–2 per run)
Safety gates (all must pass for auto-accept): 95% CI lower bound > 50% win rate, multi-judge agreement ≥ 80%, ≥ 20% total savings, no length-bias warning, golden-set regression check.
Revert: 24h window after auto-accept. After that, only drift detection can roll back.
Typical Workflow
Pre-launch (no traffic yet):
- Tell your AI: "I'm building a support chatbot using gpt-4o"
- It seeds realistic test samples with
seed_samples - It runs
run_benchmarkto compare models - It shows you
get_suggestionswith cheaper alternatives
Post-launch (with real traffic):
- Route your AI calls through
https://llmtest.io/v1 - LLMTest monitors usage and auto-benchmarks when flows hit 50+ calls
- Ask "any cost-saving suggestions?" to see recommendations
- Accept a suggestion and update your code
Environment Variables
| Variable | Required | Description |
|---|---|---|
LLMTEST_API_KEY | Yes | Your API key from llmtest.io/dashboard |
LLMTEST_BASE_URL | No | Custom API URL (defaults to https://llmtest.io) |
Links
License
MIT
Related Servers
Alpha Vantage MCP Server
sponsorAccess financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
Generic API MCP Server
A generic server to interact with any REST API, allowing you to query data, create items, and call methods.
Nextflow Developer Tools
An MCP server for Nextflow development and testing, which requires a local clone of the Nextflow Git repository.
MCP Server Starter
A starter project for building MCP servers with TypeScript and Bun.
PocketLantern
Blocker-aware decision layer for AI coding agents. Adds source-linked, time-sensitive blockers to AI technical choices — breaking changes, EOLs, lock-in, pricing shifts, and migration risk.
mcbedrock-mcp
Gives your AI assistants access to Minecraft Bedrock Edition scripting and addon documentation.
Bazel MCP Server
Exposes the Bazel build system to AI agents, enabling them to build, query, test, and manage dependencies.
AGS Extend SDK MCP Server
An MCP server to help AI assistants to answer questions and generate AccelByte Extend SDK code more effectively .
Clangaroo
Provides fast C++ code intelligence for LLMs using the clangd language server.
Feishu MCP Server
An MCP server with built-in Feishu OAuth authentication, deployable on Cloudflare Workers.
SolHunt Solana Wallet Intelligence
Solana wallet health analysis platform and top-notch dev tool. Helps people and agents to recover their SOLs from burner and old wallets super securely. Features a complete trustless recovery flow natively via MCP: preview yields, build unsigned transactions, and sign locally.