cxpak

Spends CPU cycles so you don't spend tokens. The LLM gets a briefing packet instead of a flashlight in a dark room.

cxpak

Rust CI Crates.io Downloads Homebrew License

Spends CPU cycles so you don't spend tokens. The LLM gets a briefing packet instead of a flashlight in a dark room.

A Rust CLI that indexes codebases using tree-sitter and produces token-budgeted context bundles for LLMs.

Installation

# Via Homebrew (macOS/Linux)
brew tap Barnett-Studios/tap
brew install cxpak

# Via cargo
cargo install cxpak

How to Use cxpak

There are four ways to use cxpak, from simplest to most powerful:

1. CLI (no setup required)

Run cxpak directly on any git repo:

# Structured repo summary within a token budget
cxpak overview --tokens 50k .

# Trace a symbol through the dependency graph
cxpak trace --tokens 50k "handle_request" .

# Show changes with dependency context
cxpak diff --tokens 50k .

# More options
cxpak overview --tokens 50k --out context.md .       # Write to file
cxpak overview --tokens 50k --focus src/api .         # Focus on a directory
cxpak overview --tokens 50k --format json .           # JSON or XML output
cxpak trace --tokens 50k --all "MyError" .            # Full graph traversal
cxpak diff --tokens 50k --git-ref main .              # Diff against a branch
cxpak diff --tokens 50k --since "1 week" .            # Diff by time range
cxpak overview --tokens 50k --timing .                # Show pipeline timing
cxpak clean .                                         # Clear cache

2. MCP Server (for Claude Code, Cursor, and other AI tools)

Run cxpak as an MCP server so your AI tool gets live access to 11 codebase tools — including relevance scoring, query expansion, and schema-aware context packing.

Claude Code — add to .mcp.json in your project root (or ~/.claude/.mcp.json globally):

{
  "mcpServers": {
    "cxpak": {
      "command": "cxpak",
      "args": ["serve", "--mcp", "."]
    }
  }
}

Restart Claude Code after adding the config. The cxpak tools will appear automatically.

Cursor — add to .cursor/mcp.json in your project:

{
  "mcpServers": {
    "cxpak": {
      "command": "cxpak",
      "args": ["serve", "--mcp", "."]
    }
  }
}

Any MCP client — run cxpak serve --mcp . over stdio. It speaks JSON-RPC 2.0.

Once configured, your AI tool can call these tools:

ToolDescription
cxpak_auto_contextOne-call optimal context for any task
cxpak_overviewStructured repo summary
cxpak_traceTrace a symbol through dependencies
cxpak_statsLanguage stats and token counts
cxpak_diffShow changes with dependency context
cxpak_context_for_taskScore and rank files by relevance to a task
cxpak_pack_contextPack selected files into a token-budgeted bundle
cxpak_searchRegex search with context lines
cxpak_blast_radiusAnalyze change impact with risk scores
cxpak_api_surfaceExtract public API surface
cxpak_context_diffShow what changed since last auto_context call

All tools support a focus path prefix parameter to scope results.

Note: The MCP server, embeddings, and all features are included by default. No extra feature flags needed.

3. Claude Code Plugin (auto-triggers + slash commands)

The plugin wraps cxpak as skills and slash commands. Skills auto-trigger when Claude detects relevant questions; slash commands give you direct control.

Install:

/plugin marketplace add Barnett-Studios/cxpak
/plugin install cxpak

The plugin installs cxpak automatically via Homebrew (or cargo) if not already on PATH.

Skills (auto-invoked):

SkillTriggers when you...
codebase-contextAsk about project structure, architecture, how components relate
diff-contextAsk to review changes, prepare a PR description, understand what changed

Commands (user-invoked):

CommandDescription
/cxpak:overviewGenerate a structured repo summary
/cxpak:trace <symbol>Trace a symbol through the dependency graph
/cxpak:diffShow changes with dependency context
/cxpak:cleanRemove .cxpak/ cache and output files

4. HTTP Server (for custom integrations)

Run cxpak as a persistent HTTP server with a hot index:

# Start HTTP server (default port 3000)
cxpak serve .
cxpak serve --port 8080 .

# Watch for file changes and keep index hot
cxpak watch .
EndpointDescription
GET /healthHealth check
GET /statsLanguage stats and token counts
GET /overview?tokens=50000Structured repo summary
GET /trace?target=handle_requestTrace a symbol through dependencies
GET /diff?git_ref=HEAD~1Show changes with dependency context
POST /searchRegex search with context
POST /blast_radiusChange impact analysis
POST /api_surfacePublic API extraction
POST /auto_contextOne-call optimal context
POST /context_diffSession delta

What You Get

The overview command produces a structured briefing with these sections:

  • Project Metadata — file counts, languages, estimated tokens
  • Directory Tree — full file listing
  • Module / Component Map — files with their public symbols
  • Dependency Graph — import relationships between files
  • Key Files — full content of README, config files, manifests
  • Function / Type Signatures — every public symbol's signature
  • Git Context — recent commits, file churn, contributors

Each section has a budget allocation. When content exceeds its budget, it's truncated with the most important items preserved first.

Context Quality

cxpak applies intelligent context management to maximize the usefulness of every token:

Progressive Degradation — When content exceeds the budget, symbols are progressively reduced through 5 detail levels (Full → Trimmed → Documented → Signature → Stub). High-relevance files keep full detail while low-relevance dependencies are summarized. Selected files never degrade below Documented; dependencies can be dropped entirely as a last resort.

Concept Priority — Symbols are ranked by type: functions/methods (1.0) > structs/classes (0.86) > API surface (0.71) > configuration (0.57) > documentation (0.43) > constants (0.29). This determines degradation order — functions survive longest.

Query Expansion — When using context_for_task, queries are expanded with ~30 core synonym mappings (e.g., "auth" → authentication, login, jwt, oauth) plus 8 domain-specific maps (Web, Database, Auth, Infra, Testing, API, Mobile, ML) activated automatically by detecting file patterns in the repo.

Context Annotations — Each packed file gets a language-aware comment header showing its relevance score, role (selected/dependency), signal breakdown, and detail level. The LLM knows exactly why each file was included and how much detail it's seeing.

Chunk Splitting — Symbols exceeding 4000 tokens are split into labeled chunks (e.g., handler [1/3]) that degrade independently. Each chunk carries the parent signature for context.

Data Layer Awareness

cxpak understands the data layer of your codebase and uses that knowledge to build richer dependency graphs.

Schema Detection — SQL (CREATE TABLE, CREATE VIEW, stored procedures), Prisma schema files, and other database DSLs are parsed to extract table definitions, column names, foreign key references, and view dependencies.

ORM Detection — Django models, SQLAlchemy mapped classes, TypeORM entities, and ActiveRecord models are recognized and linked to their underlying table definitions.

Typed Dependency Graph — Every edge in the dependency graph carries one of 9 semantic types:

Edge TypeMeaning
importStandard language import / require
foreign_keyTable FK reference to another table file
view_referenceSQL view references a source table
trigger_targetTrigger defined on a table
index_targetIndex defined on a table
function_referenceStored function references a table
embedded_sqlApplication code contains inline SQL referencing a table
orm_modelORM model class maps to a table file
migration_sequenceMigration file depends on its predecessor

Non-import edges are surfaced in the dependency graph output and in pack context annotations:

// score: 0.82 | role: dependency | parent: src/api/orders.py (via: embedded_sql)

Migration Support — Migration sequences are detected for Rails, Alembic, Flyway, Django, Knex, Prisma, and Drizzle. Each migration is linked to its predecessor so cxpak can trace the full migration chain.

Embedded SQL Linking — When application code (Python, TypeScript, Rust, etc.) contains inline SQL strings that reference known tables, cxpak creates embedded_sql edges connecting those files to the table definition files. This means context_for_task and pack_context will automatically pull in relevant schema files when you ask about database-related tasks.

Schema-Aware Query Expansion — When the Database domain is detected, table names and column names from the schema index are added as expansion terms. Queries for "orders" or "user_id" will match files that reference those identifiers even if the query term doesn't appear literally in the file path or symbol names.

Intelligence

cxpak includes graph-based intelligence features that go beyond static analysis.

PageRank File Importance — Every file in the dependency graph is scored 0.0–1.0 using PageRank over the import graph. Files that are transitively imported by many others rank higher. PageRank is used as signal #6 in relevance scoring (weight 0.17) and drives degradation priority via the formula 0.6 × pagerank + 0.2 × concept_priority + 0.2 × file_role. Symbol-level importance is computed as file_pagerank × symbol_weight, where symbol_weight is 1.0 (public + referenced), 0.7 (public), or 0.3 (private).

Blast Radius Analysis — The cxpak_blast_radius MCP tool takes a set of changed files and returns categorized affected files: direct_dependents, transitive_dependents, test_files, and schema_dependents, each with a risk score. Risk is calculated as hop_decay × edge_weight × pagerank × test_penalty, clamped to [0, 1]. This tells you which parts of the codebase are most likely to break when you change a file.

API Surface Extraction — The cxpak_api_surface MCP tool extracts the public API of a codebase: public symbols sorted by PageRank, HTTP routes (12 frameworks including Express, Actix, Axum, Flask, Django, FastAPI, Spring, Gin, Echo, Fiber, Rails, and Phoenix), gRPC services, and GraphQL types. Output is token-budgeted.

Test File Mapping — cxpak automatically maps source files to their test files using naming conventions for 6 languages (Rust, TypeScript/JavaScript, Python, Java, Go, Ruby) plus a catch-all pattern, supplemented by import analysis. The pack_context tool auto-includes test files when the include_tests parameter is set. Blast radius uses the test map to populate the test_files category.

Auto Context

cxpak_auto_context is the hero feature of v1.0.0 — one call that delivers optimal context for any task. Give it a task description and token budget; it returns everything the LLM needs.

10-step pipeline:

  1. Query expansion — expands the task description with synonyms and domain-specific terms
  2. Relevance scoring — scores every file against the expanded query using 7 weighted signals
  3. Seed selection — picks the top-scoring files as seeds for graph traversal
  4. Noise filtering — 3 layers remove low-value files: blocklist (generated/vendored), similarity dedup (near-duplicate content), and relevance floor (below minimum score). Files removed by each layer are reported in filtered_out for transparency
  5. Test inclusion — maps seed files to their test files via naming conventions and import analysis
  6. Schema linking — pulls in schema files connected to seeds via typed dependency edges
  7. Blast radius — identifies files at risk from the seed set, sorted by risk score
  8. API surface — extracts public symbols and HTTP routes from seed files
  9. Budget allocation — fill-then-overflow priority packing: seeds first, then tests, schema, blast radius, and API surface until the budget is exhausted
  10. Annotations — each packed file gets a language-aware comment header with score, role, signals, and detail level

Noise filtering applies three independent layers. The filtered_out field in the response lists every file removed and which layer caught it, so you can audit what was excluded and why.

Token-budgeted output uses fill-then-overflow priority packing: high-priority categories (seeds, tests) fill first; lower-priority categories (blast radius, API surface) overflow into remaining budget. Content that doesn't fit is progressively degraded through 5 detail levels before being dropped.

Embeddings

cxpak supports semantic embeddings as the 7th scoring signal (embedding_similarity, weight 0.15), improving relevance scoring for queries that don't share exact keywords with file content.

Local (zero config) — On first use, cxpak downloads the all-MiniLM-L6-v2 model (~30 MB) and runs inference locally via candle. No API keys needed.

BYOK (Bring Your Own Key) — For higher-quality embeddings, configure a remote provider in .cxpak.json:

{
  "embeddings": {
    "provider": "openai",
    "model": "text-embedding-3-small",
    "api_key_env": "OPENAI_API_KEY",
    "base_url": "https://api.openai.com/v1",
    "dimensions": 1536,
    "batch_size": 100
  }
}

Supported providers: openai, voyageai, cohere. Set api_key_env to the environment variable holding your API key.

Graceful fallback — If embedding computation fails for any reason (model download error, API timeout, missing key), cxpak falls back to the 6 deterministic scoring signals with zero impact on the rest of the pipeline.

Context Diff

cxpak_context_diff shows what changed in the codebase since the last cxpak_auto_context call, enabling efficient session-length workflows.

Tracked changes:

  • Modified files — files with content changes since the snapshot
  • New files — files added since the snapshot
  • Deleted files — files removed since the snapshot
  • Symbol changes — functions, types, and other symbols added, removed, or modified
  • Graph edge changes — new or removed dependency relationships

The output includes a human-readable recommendation summarizing what changed and whether a fresh auto_context call is warranted.

Stable API

v1.0.0 establishes semver for the MCP API. Tool names, required parameters, and response structures are stable in 1.x.

Pack Mode

When a repo exceeds the token budget, cxpak automatically switches to pack mode:

  • The overview stays within budget (one file, fits in one LLM prompt)
  • A .cxpak/ directory is created with full untruncated detail files
  • Truncated sections in the overview get pointers to their detail files
repo/
  .cxpak/
    tree.md          # complete directory tree
    modules.md       # every file, every symbol
    dependencies.md  # full import graph
    signatures.md    # every public signature
    key-files.md     # full key file contents
    git.md           # full git history

Detail file extensions match --format: .md for markdown, .json for json, .xml for xml.

The overview tells the LLM what exists. The detail files let it drill in on demand. .cxpak/ is automatically added to .gitignore.

If the repo fits within budget, you get a single file with everything — no .cxpak/ directory needed.

Caching

cxpak caches parse results in .cxpak/cache/ to speed up re-runs. The cache is keyed on file modification time and size — when a file changes, it's automatically re-parsed.

To clear the cache and all output files:

cxpak clean .

Supported Languages (42)

Tier 1 — Full extraction (functions, classes, methods, imports, exports): Rust, TypeScript, JavaScript, Python, Java, Go, C, C++, Ruby, C#, Swift, Kotlin, Bash, PHP, Dart, Scala, Lua, Elixir, Zig, Haskell, Groovy, Objective-C, R, Julia, OCaml, MATLAB

Tier 2 — Structural extraction (selectors, headings, keys, blocks, targets, etc.): CSS, SCSS, Markdown, JSON, YAML, TOML, Dockerfile, HCL/Terraform, Protobuf, Svelte, Makefile, HTML, GraphQL, XML

Database DSLs: SQL, Prisma

Tree-sitter grammars are compiled in. All 42 languages are enabled by default. Language features can be toggled:

# Only Rust and Python support
cargo install cxpak --no-default-features --features lang-rust,lang-python

License

MIT


About

Built and maintained by Barnett Studios — building products, teams, and systems that last. Part-time technical leadership for startups and scale-ups.

เซิร์ฟเวอร์ที่เกี่ยวข้อง