LLMKit
AI cost tracking MCP server with 11 tools for spend analytics, budget enforcement, and session costs across Claude Code, Cursor, and Cline.
Know what your AI agents cost.
Open-source API gateway for AI providers. Logs every request with token counts and dollar costs.
Budget limits reject requests before they reach the provider, not after.
$ npx @f3d1/llmkit-cli -- python my_agent.py
$0.0215 total 3 requests 4.2s ~$18.43/hr
claude-sonnet-4-20250514 1 req $0.0156 ████████████████████
gpt-4o 2 reqs $0.0059 ███████░░░░░░░░░░░░░
Works with Python, Ruby, Go, Rust - anything that calls the OpenAI or Anthropic API. One command, no code changes.
Get started
- Create an account at llmkit.sh (free while in beta)
- Create an API key in the Keys tab
- Pick a method below
CLI
Wrap any command. The CLI intercepts API calls, forwards them through the proxy, and prints a cost summary when the process exits.
npx @f3d1/llmkit-cli -- python my_agent.py
Use -v for per-request costs as they happen, --json for machine-readable output.
Python
pip install llmkit-sdk
With the proxy (budget enforcement, logging, dashboard):
from openai import OpenAI
client = OpenAI(
base_url="https://api.llmkit.sh/v1",
api_key="llmk_your_key_here",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "hello"}],
)
Without the proxy (local cost estimation, zero setup):
from llmkit import tracked
from openai import OpenAI
client = OpenAI(http_client=tracked())
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "hello"}],
)
# costs estimated locally from bundled pricing table
tracked() wraps your HTTP client and estimates costs from token usage. No proxy needed. Works with any SDK that accepts http_client.
Framework integrations (LangChain, LlamaIndex, Pydantic AI):
from llmkit.integrations.langchain import LLMKitCallbackHandler
handler = LLMKitCallbackHandler()
chain.invoke("...", config={"callbacks": [handler]})
print(f"${handler.total_cost:.4f}")
TypeScript
npm install @f3d1/llmkit-sdk
import { LLMKit } from '@f3d1/llmkit-sdk'
const kit = new LLMKit({ apiKey: process.env.LLMKIT_KEY })
const agent = kit.session()
const res = await agent.chat({
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
messages: [{ role: 'user', content: 'summarize this document' }],
})
console.log(res.content)
console.log(res.cost) // { inputCost: 0.003, outputCost: 0.015, totalCost: 0.018, currency: 'USD' }
Streaming, CostTracker, and Vercel AI SDK provider also available.
MCP Server
Query AI costs from Claude Code, Cline, or Cursor:
{
"mcpServers": {
"llmkit": {
"command": "npx",
"args": ["@f3d1/llmkit-mcp-server"],
"env": { "LLMKIT_API_KEY": "llmk_your_key_here" }
}
}
}
11 tools - 6 proxy (need API key), 5 local (no key, auto-detect Claude Code + Cline + Cursor):
llmkit_usage_stats llmkit_cost_query llmkit_budget_status llmkit_session_summary llmkit_list_keys llmkit_health llmkit_local_session llmkit_local_projects llmkit_local_cache llmkit_local_forecast llmkit_local_agents
SessionEnd hook - auto-log session costs when Claude Code exits. Add to settings.json:
{
"hooks": {
"SessionEnd": [
{
"type": "command",
"command": "npx @f3d1/llmkit-mcp-server --hook"
}
]
}
}
Parses the session transcript and prints cost summary. No API key needed.
GitHub Action
Cap AI spend in CI. The action runs your command through the CLI, tracks cost, and fails the job if it exceeds the budget.
- uses: smigolsmigol/llmkit/.github/actions/llmkit-budget@main
with:
command: python agent.py
budget-usd: '5.00'
post-comment: 'true'
Posts a cost report as a PR comment. Outputs total-cost, total-requests, budget-exceeded, and summary-json for downstream steps.
Why LLMKit
Most cost tracking tools give you "soft limits" that agents blow past in the first hour. LLMKit runs cost estimation before every request. If it would exceed the budget, the request gets rejected before reaching the provider. Per-key or per-session scope.
Tag requests with a session ID or end-user ID to track costs per agent, per conversation, per user. The dashboard and MCP server surface this data in real time. Cost anomaly detection alerts when a single request costs 3x the recent median.
11 providers through one interface: Anthropic, OpenAI, Google Gemini, Groq, Together, Fireworks, DeepSeek, Mistral, xAI, Ollama, OpenRouter. Fallback chains with one header (x-llmkit-fallback: anthropic,openai,gemini).
Runs on Cloudflare Workers at the edge. Cache-aware pricing across 7 providers with prompt caching. 730+ models priced across all providers.
Automatic prompt caching for Anthropic: the proxy injects cache breakpoints on system prompts and conversation history. Second request with the same system prompt costs 90% less. Zero config, zero code changes.
Framework integrations: drop-in cost tracking for LangChain, LlamaIndex, and Pydantic AI via callback handlers. Works alongside the httpx transport for direct SDK use.
470+ tests, ClusterFuzzLite fuzzing, 6-stage security pipeline (gitleaks, semgrep, CodeQL, bandit, pip-audit, pnpm audit). OpenSSF Scorecard 8.3 - higher than React, Django, Kubernetes, and every AI gateway competitor.
Public API endpoints (no auth required):
/v1/pricing/compare- compare cost across all 730+ models for a given token count
Security
LLMKit handles your API keys. We take that seriously.
| Layer | What |
|---|---|
| Encryption | Provider keys: AES-256-GCM, random IV, context-bound AAD |
| Hashing | User API keys: SHA-256, never stored in plaintext |
| Runtime | Cloudflare Workers: no filesystem, no .env, nothing to exfiltrate |
| Supply chain | All CI actions pinned to commit SHAs, explicit least-privilege permissions |
| Provenance | npm packages published with Sigstore provenance via GitHub Actions OIDC |
| Pre-commit | 19 secret patterns + credential file blocking + gitleaks |
| CI pipeline | gitleaks, semgrep, pnpm audit, pip-audit, bandit, KeyGuard |
| AI exclusion | .cursorignore + .claudeignore block AI tools from reading secrets |
Full details in SECURITY.md.
Packages
| Package | Description |
|---|---|
| llmkit-sdk (PyPI) | Python SDK: tracked() transport, cost estimation, streaming, sessions |
| @f3d1/llmkit-sdk (npm) | TypeScript client, CostTracker, streaming |
| @f3d1/llmkit-cli | npx @f3d1/llmkit-cli -- <cmd>: zero-code cost tracking for any language |
| @f3d1/llmkit-proxy | Hono-based CF Workers proxy: auth, budgets, routing, logging |
| @f3d1/llmkit-ai-sdk-provider | Vercel AI SDK v6 custom provider |
| @f3d1/llmkit-mcp-server | 11 tools: proxy analytics, local costs (Claude Code + Cline + Cursor) |
| @f3d1/llmkit-shared | Types, pricing table (11 providers, 730+ models), cost calculation |
Self-host
git clone https://github.com/smigolsmigol/llmkit
cd llmkit && pnpm install && pnpm build
cd packages/proxy
echo 'DEV_MODE=true' > .dev.vars
pnpm dev
# proxy running at http://localhost:8787
Deploy to Cloudflare Workers:
npx wrangler login
npx wrangler secret put SUPABASE_URL
npx wrangler secret put SUPABASE_KEY
npx wrangler secret put ENCRYPTION_KEY
npx wrangler deploy
Testing
470+ tests across TypeScript and Python: cost calculation, budget enforcement, crypto, reservations, pricing accuracy, streaming, transport hooks, contract tests, and integration tests. CI runs on every push with a 6-stage security pipeline.
Audit logging
Per-request logging with timestamps, model attribution, cost tracking, per-end-user attribution (x-llmkit-user-id), tool invocation logging, CSV export with sha256 integrity hash. This data can support record-keeping requirements but does not constitute regulatory compliance.
Listed on
- LobeHub
- Glama
- MCP Registry - official
- Smithery
- AgentHotspot
- TensorBlock awesome-mcp-servers
- awesome-cloudflare
- Pricing comparison - 730+ models
- Cost calculator
Star this repo if you find it useful.
Servidores relacionados
Alpha Vantage MCP Server
patrocinadorAccess financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
cratesio-mcp
MCP server for querying crates.io - the Rust package registry
HeyBeauty
Perform virtual try-ons using the HeyBeauty API.
BlenderMCP
Connects Blender to AI models via MCP for prompt-assisted 3D modeling, scene creation, and manipulation.
Unity MCP Server
An MCP server that allows AI assistants to programmatically interact with Unity development projects.
Brev
A MCP server for managing Brev development environments using the Brev CLI.
Remote MCP Server (Authless)
An example of a remote MCP server deployable on Cloudflare Workers without authentication.
Maya MCP
MCP server for Autodesk Maya
CRAN Package README MCP Server
Fetch comprehensive information about CRAN packages, including READMEs, metadata, and search functionality.
Postman API
An MCP server for interacting with the Postman API, requiring an API key.
code-reason
MCP server that gives coding agents program-analysis primitives — data flow, call graphs, taint analysis — so they reason from ground truth instead of grep-and-guess. (same as the GitHub About — keeps your messaging consistent across the web).