quelllm-mcp Server

Query a catalog of 250+ open-weights LLMs — list, compare, estimate VRAM and API-vs-self-hosted cost — directly from Claude Code, Cursor or any MCP client.

GitHub

Documentation

quelllm-mcp

MCP server exposing the quelllm.fr catalog of 190+ open-weights LLMs via Model Context Protocol tools. Use it from Claude Code, Cursor, Continue, or any MCP-compatible client to query models, compare them, estimate VRAM, and compute API vs self-hosted cost.

Tools exposed

Tool	Description
`list_models(filter_origin?, filter_family?, max_params_b?)`	List models with filters (origin code, family, max params in B)
`get_model(model_id)`	Full record for one model (params, vram per quant, context window, family, tags, license, URLs)
`compare(model_a_id, model_b_id)`	Side-by-side comparison with verdict
`estimate_vram(model_id, quant)`	VRAM in GB at chosen quant + recommended GPU/Mac tiers
`estimate_cost(input_tokens_per_month, output_tokens_per_month, ...)`	Cost in EUR — full table API providers vs self-hosted hardware OR a specific id
`search_models(query, limit?)`	Fuzzy search by name, family, tag, author

Install

Install from source (not yet on PyPI) :

pip install git+https://github.com/MGM-FALCON/quelllm-mcp.git

Or run without installing, using uv :

uvx --from git+https://github.com/MGM-FALCON/quelllm-mcp.git quelllm-mcp

For local development :

git clone https://github.com/MGM-FALCON/quelllm-mcp.git
cd quelllm-mcp
pip install -e .

Use with Claude Code

Add to ~/.claude.json or a project's .mcp.json. If you installed with pip :

{
  "mcpServers": {
    "quelllm": {
      "command": "quelllm-mcp"
    }
  }
}

Or zero-install with uvx :

{
  "mcpServers": {
    "quelllm": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/MGM-FALCON/quelllm-mcp.git", "quelllm-mcp"]
    }
  }
}

Use with Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) :

{
  "mcpServers": {
    "quelllm": {
      "command": "quelllm-mcp"
    }
  }
}

Use with Cursor / Continue / Cline

Most MCP clients accept the same JSON config :

{
  "command": "quelllm-mcp"
}

Example queries (from your client)

> Quels LLM Mistral peuvent tourner sur RTX 5070 Ti 16GB ?
→ list_models(filter_family='Mistral', max_params_b=24)
→ estimate_vram('mistral-small-24b', 'q4')

> Compare Llama 3.3 70B vs Qwen 2.5 32B
→ compare('llama33-70b', 'qwen25-32b')

> J'utilise 10M tokens input + 2.5M output / mois. Combien je paye chez OpenAI vs DeepSeek ?
→ estimate_cost(10_000_000, 2_500_000)

Data source

All data pulled from quelllm.fr/api/ (CC BY 4.0, no key, CORS-enabled). Cached locally for 1h to avoid rate-limiting.

API pricing data (GPT-5, Claude Opus 4.7, Gemini 2.5, DeepSeek, Mistral) and hardware pricing (RTX 50-series, Mac M4) are hardcoded as of 2026-05 — verify semestrially.

License

MIT — see LICENSE.

Contributing

Source : https://github.com/MGM-FALCON/quelllm-mcp Issues + PRs welcome. Particularly :

API pricing updates (semestrial)
Hardware additions (new GPUs, Mac Mx series)
New tools (e.g. find_alternatives_to(model_id), recommend_gpu(budget_eur))

Tests

A pytest smoke suite lives under tests/. It covers all 6 tools and the v1.1.0 output invariants, never touches the network (local fixture + mocked httpx), and stubs the mcp SDK when it isn't importable — so it also runs on Python 3.9.

pip install -e ".[test]"
pytest

Author

Mohamed Meguedmi — LinkedIn · Hugging Face Founder of La Gazette IA and QuelLLM.fr.