tokentoll
Scan codebases for LLM API calls and estimate monthly costs. Compare costs between git refs to catch cost regressions during code review.
tokentoll
Catch LLM cost changes in code review. Infracost for LLM spend.
A CLI tool and GitHub Action that statically analyzes your code for LLM API calls, estimates their cost, and shows you the cost impact of every change in your terminal or as a PR comment. Zero runtime dependencies.
The Problem
A single model swap from gpt-4o-mini to gpt-4o increases costs 15x.
A new API call in a hot path can add $10,000/month to your bill.
These changes hide in normal code review.
tokentoll finds LLM API calls in your code, estimates their cost, and shows you the cost impact of every change before it hits production.
Quick Start
pip install tokentoll
# Scan current directory for LLM API calls and their costs
tokentoll scan .
# Show cost impact of your last commit
tokentoll diff HEAD~1
# Compare two branches
tokentoll diff main..feature-branch
GitHub Action
name: LLM Cost Diff
on:
pull_request:
paths:
- "**.py"
permissions:
pull-requests: write
jobs:
cost-diff:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: Jwrede/[email protected]
What It Detects
| SDK | Patterns | Status |
|---|---|---|
| OpenAI | chat.completions.create, responses.create | Supported |
| Anthropic | messages.create, messages.stream | Supported |
| Google GenAI | models.generate_content | Supported |
| LiteLLM | completion, acompletion | Supported |
| LangChain | ChatOpenAI, ChatAnthropic, init_chat_model | Supported |
| Zhipu AI | ZhipuAiClient, ZhipuAI (GLM models) | Supported |
| JS/TS SDKs | Planned |
Example Output
tokentoll scan
LLM API Calls Detected
============================================================
File: src/agents/summarizer.py
Line 42: openai client.chat.completions.create
Model: gpt-4o | Max tokens: 4096
Est. cost/call: $0.03 | Monthly (1000 calls/month per call site): $26.50
Line 78: openai client.chat.completions.create
Model: gpt-4o-mini | Max tokens: 1000
Est. cost/call: $0.000301 | Monthly (1000 calls/month per call site): $0.30
--
Total estimated monthly cost: $26.80
1000 calls/month per call site
tokentoll diff
LLM Cost Diff: main..feature-branch
============================================================
+ ADDED src/agents/rewriter.py:35
openai | Model: gpt-4o
Est. cost/call: $0.03 | Monthly: +$26.50
~ MODIFIED src/agents/summarizer.py:42
openai | Model: gpt-4o -> gpt-4o-mini
Est. cost/call: $0.03 -> $0.000301 | Monthly: -$26.20
--
Monthly cost impact: +$0.30
Added: 1 | Changed: 1 | Removed: 0
1000 calls/month per call site
How It Works
Source Code (.py files)
|
v
+-------------+ +------------------+
| AST Scanner |---->| SDK Detectors |
| (ast.parse) | | OpenAI, Anthropic|
+-------------+ | Google, LiteLLM |
| LangChain |
+------------------+
|
v
+------------------+
| Pricing Engine |
| 2200+ models |
| Auto-cached |
+------------------+
|
+-----------+-----------+
| |
v v
+------------+ +-------------+
| Scan Report| | Diff Engine |
| (costs) | | (old vs new) |
+------------+ +-------------+
| |
v v
+------------+ +-------------+
| Table/JSON | | Table/JSON/ |
| | | PR Comment |
+------------+ +-------------+
- Parses Python files using the
astmodule to find LLM API calls - Multi-pass constant propagation resolves model names through variables,
os.getenv()fallbacks, class attributes, constructor args, dict contents, and**kwargsunpacking - Looks up pricing from a local cache (sourced from LiteLLM, 2200+ models)
- For diff mode: compares calls between two git refs and computes the cost delta
- Outputs a cost report as a table, JSON, or GitHub PR comment
CLI Reference
tokentoll scan [PATH...] [--format table|json|markdown] [--calls-per-month N] [--config PATH]
tokentoll diff [REF] [--base REF] [--head REF] [--format table|json|markdown|github-comment] [--config PATH]
tokentoll update # Update bundled pricing data
MCP Server
tokentoll includes an MCP (Model Context Protocol) server that lets Claude Code and other MCP hosts check the cost impact of LLM code changes directly from an agent conversation.
Install
pip install tokentoll[mcp]
Register with Claude Code
claude mcp add --transport stdio tokentoll -- tokentoll-mcp
Tools
| Tool | Description |
|---|---|
scan | Find LLM API calls in a directory and estimate monthly costs. Accepts a path and optional calls_per_month. |
diff | Compare LLM costs between two git refs. Accepts base_ref and optional head_ref (defaults to HEAD). |
Both tools return JSON output.
Example use case
Claude Code can check the cost impact of its own changes before committing.
For example, after swapping a model from gpt-4o to gpt-4o-mini, the agent
can call the diff tool against HEAD to verify the cost reduction before
creating the commit.
Pricing Data
Pricing is bundled and works offline. To update to the latest prices:
tokentoll update
Pricing data is sourced from LiteLLM's model_prices_and_context_window.json
and covers 300+ models across OpenAI, Anthropic, Google, AWS Bedrock,
Azure, and more.
Dynamic Model Defaults
When tokentoll encounters a call where the model name is a variable it cannot resolve, it applies a sensible per-SDK default so you still get cost estimates:
| SDK | Default Model |
|---|---|
| OpenAI | gpt-4o |
| Anthropic | claude-sonnet-4-20250514 |
| Google GenAI | gemini-2.0-flash |
| LiteLLM | gpt-4o |
| LangChain | gpt-4o |
| Zhipu AI | zai/glm-4.6 |
These defaults are shown as gpt-4o (default) in scan output. You can override
them per-project or per-path using a .tokentoll.yml config file (see below).
Configuration
Create a .tokentoll.yml in your project root to customize behavior.
tokentoll automatically finds this file by walking up from the scanned directory.
# Default model for all dynamic (unresolved) calls
default_model: gpt-4o
# Per-SDK defaults (override the built-in defaults above)
default_models:
openai: gpt-4o-mini
anthropic: claude-haiku-3-20240307
# Assumed calls per month per call site
calls_per_month: 5000
# Skip cost estimation entirely for dynamic (unresolved) models. When true,
# calls whose model name cannot be resolved statically are reported with no
# cost rather than priced against a default. Useful for projects that prefer
# silence over a guess.
skip_dynamic_models: false
# Exclude paths from scanning (prefix match or glob pattern)
exclude:
- tests/
- examples/
- docs/
- "*_test.py"
# Per-path overrides (longest prefix match)
overrides:
- path: src/agents/
default_model: gpt-4o
calls_per_month: 10000
- path: src/azure/
skip_dynamic_models: true
Resolution order for dynamic model defaults: per-SDK config (default_models) >
generic config (default_model) > built-in SDK defaults.
You can also pass --config path/to/.tokentoll.yml to use a specific config file.
Token Estimation
By default, tokentoll estimates token counts using a characters/4 heuristic. For more accurate estimates, install tiktoken:
pip install tiktoken
When tiktoken is available, tokentoll uses the correct tokenizer encoding for
each model. Unknown models fall back to cl100k_base. Tiktoken is lazy-loaded
and encoders are cached, so there is no startup penalty if you don't need it.
Smart Variable Resolution
Real codebases rarely pass model names as string literals. tokentoll's multi-pass constant propagation engine follows:
DEFAULT_MODEL = os.getenv("MODEL", "gpt-4o")
class Config:
model: str = DEFAULT_MODEL
config = Config()
kwargs = {"model": config.model, "max_tokens": 2000}
client.chat.completions.create(**kwargs)
# tokentoll resolves: model="gpt-4o", max_tokens=2000
- Variable assignments (
MODEL = "gpt-4o") os.getenv()/os.environ.get()fallback values- Function default parameters
- Class attribute defaults
- Constructor argument propagation
- Dict literal and subscript contents
**kwargsunpacking
Roadmap
- Context-aware call frequency (planned): infer calls/month from surrounding code (FastAPI route handlers = high traffic, scripts = low, loops = multiplied) instead of assuming uniform volume across all call sites.
- JS/TS support (planned): detect LLM calls in JavaScript and TypeScript files.
- Cost alerts: configurable thresholds that fail CI when a PR exceeds a cost delta.
Limitations
- Cannot resolve models loaded from external config files or databases at runtime.
These calls use per-SDK defaults (configurable via
.tokentoll.yml). - Token estimates use a characters/4 heuristic unless tiktoken is installed.
- Monthly estimates assume uniform call volume per call site (configurable via
--calls-per-month,.tokentoll.yml, or per-path overrides). Use theexcludeoption to skip test and example files. - Python only for now (JS/TS support planned).
License
MIT
संबंधित सर्वर
Alpha Vantage MCP Server
प्रायोजकAccess financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
Assay
The firewall for MCP tool calls. Block unsafe calls, audit every decision, replay anything. Deterministic policy enforcement with replayable evidence bundles.
mcp-hosts-installer
MCP server that installs and registers other MCP servers in Cursor, VS Code, or Claude Desktop from npm, PyPI, or a local folder (via npx).
mcp-of-mcps
MCP of MCPs is a meta-server that merges all your MCP servers into a single smart endpoint. It gives AI agents instant tool discovery, selective schema loading, and massively cheaper execution, so you stop wasting tokens and time. With persistent tool metadata, semantic search, and direct code execution between tools, it turns chaotic multi-server setups into a fast, efficient, hallucination-free workflow. It also automatically analyzes the tools output schemas if not exist and preserves them across sessions for consistent behavior.
MCP Performance Analysis Server
A server for detecting critical performance issues in code, providing concise analysis and output.
GitLab MR & Confluence Linker
Analyzes GitLab merge requests and links them to Confluence documentation.
MCP Server Test
An example MCP server deployable on Cloudflare Workers without authentication.
Codacy
Access the Codacy API to analyze code quality, coverage, and security for your repositories.
Code Editor
Enables AI assistants to write, edit, and manage code files directly in a specified directory, respecting .gitignore patterns.
BuiltWith
Query the BuiltWith API to discover the technology stacks of websites. Requires a BuiltWith API key.
Starknet MCP
An MCP server providing access to various Starknet RPC methods.