ken MCP Server

Tìm kiếm mã nguồn kết hợp nhanh cho tác nhân - hoàn toàn bằng Go, tệp nhị phân tĩnh đơn lẻ, 5 từ vựng + nhúng ngữ nghĩa Model2Vec + kết hợp RRF + bộ xếp hạng lại nhận biết mã, với thuật toán truy xuất được chuyển đổi nguyên văn từ semble

Tài liệu

ken

Fast hybrid code search for agents. Pure Go, single static binary, drop-in MCP-compatible with MinishLab/semble — same tool schemas, same output format, install steps swapped to a Go binary.

CI License: MIT Go Reference Go 1.26+

ken is a Go port of semble: BM25 lexical + Model2Vec semantic embeddings + RRF fusion + a code-aware reranker, with the retrieval algorithm ported verbatim from semble's search.py + ranking/*.py.

Why ken

  • ~97% recall@10 in the default (hybrid) mode0.967 NL / 0.995 symbol on semble's 1,251-query benchmark, vs grep's ~99.9% — while costing an agent ~46× fewer tokens than grep + Read (4,120 vs 189,773 median tokens on NL queries — measured in the same default hybrid mode). For "find the chunk that answers this," that's a 1–2 order-of-magnitude token win at near-parity recall. (Reproduce: docs/BENCH.md.)
  • Single static binary. Pure Go, no cgo, no Python interpreter on cold start, no GIL on indexing. Cross-compiles to Linux / macOS / Windows (amd64/arm64) for free.
  • Drop-in for semble. Same search / find_related MCP tool schemas and the same markdown-string wire format — swap the command: path and existing agents work unchanged.
  • Local, CPU-only. Embedding inference, BM25, and fusion all run on the CPU. No API keys, no GPU, no vector DB, air-gapped friendly.

One knob controls recall. The 82–91% figure in the token-budget tables is the BM25-only fallback ken runs in when no embedding model is installed. ken-mcp fetches the model automatically on first run (~60 MB, pure-Go, no Python — serves bm25 until it lands, then upgrades to the ~97% hybrid path; KEN_MCP_AUTO_FETCH=0 to disable). For the CLI, run ken download-model once. Exhaustive enumeration (refactors, pre-rename audits) still belongs to grep; ken is for "find the chunk that answers this."

Where to start

  • ARCHITECTURE.md — current-state map: module layout, runtime/concurrency model, data flow, invariants. Start here for the code.
  • docs/USERS.md — agent users. Install ken-mcp, point your agent at it, use the nine tools. 5-minute on-ramp.
  • docs/DEVELOPERS.md — SDK authors and tuners. The mcp.Run embedded-corpus library, prebuilt indices, fs.FS indexing, custom chunkers, tuning rerank, performance expectations.
  • docs/DESIGN.md + docs/internal/DECISIONS.md — algorithm spec + every architectural decision (ADRs).
  • docs/BENCH.md — benchmark reproduction (NDCG, token-budget recall, the hybrid-vs-BM25 decomposition).

Quickstart

Install via a package manager:

# macOS / Linux (Homebrew) — installs both `ken` and `ken-mcp`:
brew install --cask townsendmerino/tap/ken
# Windows (Scoop):
scoop bucket add townsendmerino https://github.com/townsendmerino/scoop-bucket
scoop install ken

Or with Go:

# Install both binaries (Go 1.26+).
go install github.com/townsendmerino/ken/cmd/ken@latest
go install github.com/townsendmerino/ken/cmd/ken-mcp@latest

# Download the default Model2Vec model (~60 MB, one-time). Pure Go, no Python.
# (ken-mcp auto-fetches this on first run; the CLI needs it explicitly.)
# This is the single biggest retrieval-quality lever — it puts you on the ~97% path.
ken download-model

# Search any local repo from the CLI.
ken search /path/to/myrepo "save model to disk" --model ~/.ken/model

Or skip the model and use lexical-only mode (BM25-only costs ~14 pp recall@10 vs the hybrid default — see docs/BENCH.md):

ken search /path/to/myrepo "validateToken" --mode bm25

Pre-built binaries for macOS, Linux, and Windows (amd64/arm64) are attached to each release.tar.gz for macOS/Linux, .zip for Windows.

As of v0.3, ken index <path> defaults to watch mode — it stays alive and re-indexes on change (2 s debounce); --no-watch restores build-once-and-exit. ken-mcp always watches, so an agent editing the repo mid-session sees its own changes without a restart. ken also respects nested .gitignore files (per-directory, matching git).

Install as an MCP server

ken-mcp speaks JSON-RPC over stdio and serves the same two core tools (search, find_related) semble does, with the same arg shapes and markdown output.

# Claude Code
claude mcp add ken -s user -- /absolute/path/to/ken-mcp
// ~/.cursor/mcp.json  (or .cursor/mcp.json) — also .vscode/mcp.json with "servers"
{ "mcpServers": { "ken": { "command": "/absolute/path/to/ken-mcp" } } }
# ~/.codex/config.toml
[mcp_servers.ken]
command = "/absolute/path/to/ken-mcp"
// ~/.opencode/config.json
{ "mcp": { "ken": { "type": "local", "command": ["/absolute/path/to/ken-mcp"] } } }

Core environment variables

VariableDefaultPurpose
KEN_MCP_DEFAULT_REPO(unset)Pre-indexed source; lets tools omit the repo arg.
KEN_MCP_MODEhybridbm25 / semantic / hybrid. Serves bm25 while the model is missing — fetched on first run by default (see KEN_MCP_AUTO_FETCH).
KEN_MCP_MODEL_DIR~/.ken/modelPath to a Model2Vec snapshot containing model.safetensors. Falls back to ~/.ken/model (where ken download-model writes) when unset.
KEN_MCP_AUTO_FETCH1On first run with a model-needing mode and no model present, fetch potion-code-16M (~60 MB) in the background, serving bm25 until it lands then upgrading to hybrid. 0 disables (serve bm25, warn).
KEN_MCP_CHUNKERregexregex / treesitter / line / markdown. See Choosing a chunker.
KEN_MCP_CACHE_SIZE16LRU bound on the repo→Index cache.
KEN_MCP_LOG_LEVELwarndebug / info / warn / error. All logs go to stderr; stdout is the JSON-RPC channel (details).
KEN_ALLOW_PRIVATE_CLONE_TARGETS0Off by default: for http(s) repo URLs, ken rejects loopback / link-local / RFC1918 addresses (SSRF guard). Set 1 to allow internal git hosts.

The full env reference — including the KEN_DB_* database variables — is in docs/USERS.md and docs/db-indexing.md. For agents that should route between ken and grep deliberately (rather than ken's default "prefer ken" instruction), see the routing snippet in docs/USERS.md.

Tools

Both core tools return a formatted markdown string identical to semble's _format_results output. (ken-mcp also exposes seven structural tools — definition, references, callers, outline, symbols, recently_changed, status — plus reindex_db when a database is configured; see docs/USERS.md.)

search

ArgTypeRequiredDefaultDescription
querystringNatural language or code query.
repostringhttps:// / http:// URL or local directory. Required if no KEN_MCP_DEFAULT_REPO.
modehybrid|semantic|bm25hybridSearch mode.
top_kint5Number of results.

find_related

ArgTypeRequiredDefaultDescription
file_pathstringPath as it appears in a search result.
lineint (1-indexed)A line inside the chunk to seed the similarity search.
repostringSame as for search.
top_kint5Number of similar chunks.

What ken indexes

ken's hybrid retrieval is calibrated for source code (Python / Go / TypeScript / Java / Rust have language-aware chunking; others fall back to the line chunker) and documentation (markdown chunked on heading boundaries, code blocks/tables kept atomic, frontmatter handled). Mixed code-and-docs corpora route per file by extension.

It also indexes database schemas alongside code — static .sql files (with migration-history folding) and live introspection of Postgres / SQLite / MySQL / MariaDB — so an agent answering "how do users get authenticated" gets the Go function, the SQL it runs, the users table definition, and the FK relationships in one ranked list. Full reference (Tier-1/Tier-2, row sampling, LISTEN/NOTIFY, the reindex_db tool, PII stance, all KEN_DB_* vars): docs/db-indexing.md.

For plain prose with no code or structured docs, BM25 mode (--mode=bm25) carries the load; the semantic model is code-trained and unvalidated on literary text.

How it works

gitignore-respecting walk
    → regex chunker (Python / Go / TS / Java / Rust) with line-chunker fallback
    → BM25 (Lucene variant, k1=1.5, b=0.75)  +  Model2Vec semantic (cosine over a dense matrix)
    → α-weighted RRF fusion (α auto-detected: 0.3 for symbol queries, 0.5 for NL)
    → file-coherence boost + query-type boosts (definition / embedded-symbol / stem-match)
    → path penalties (test files, compat / legacy, `.d.ts`) + file-saturation decay
    → top-k

The retrieval algorithm is a verbatim port of semble's search.py + ranking/*.py; see docs/DESIGN.md §7 for every constant and pipeline-order subtlety, and §4 for the Model2Vec inference contract (three-tensor safetensors, the mapping[] indirection, the float64 precision that's load-bearing for cosine parity).

Comparison to semble

Propertysembleken
Language / distributionPython · uvx / pipGo · single static binary
Cold start~500 ms (interpreter + numpy + model)~10–20 ms ken search over a tiny index
Retrieval algorithmreference implementationverbatim port (constants + pipeline order from search.py + ranking/*.py)
NDCG@10 on semble's benchmark0.8540.842 hybrid (gap 0.012, full 63 repos × 1,251 queries)
Recall@10 on agent queries(not measured)~0.97 hybrid (0.967 NL / 0.995 symbol); BM25-only fallback ~0.84
Tokens to recall@10(not measured)~46× fewer than grep+Read on NL queries (4,120 vs 189,773 median, hybrid)
MCP serveryesyes — drop-in (same schemas + wire format)
Binary sizen/arelease (slim) ken ~22 MB · ken-mcp ~38 MB
Requires huggingface-cliyesnoken download-model fetches direct from HF

Full methodology, the per-ablation breakdown (semantic-raw matches semble within 0.003, validating the embedding + tokenizer + ANN port), the CoIR-CSN-Python external anchor, and every footnote are in docs/BENCH.md.

Compared to other agent code-search tools

The crowded part of this category splits on one axis: what you have to run. ken's bet is that the embedding model belongs inside the binary — pure-Go Model2Vec inference, no cgo — so there's nothing else to stand up: no embedding daemon, no vector database, no API key, air-gapped. The two closest points of comparison:

  • grepai — the closest architectural analog: a single Go binary with a file watcher and an MCP server, 100% local. It offloads embeddings to a separate Ollama server (you install + run Ollama and pull a model).
  • claude-context (Zilliz) — the most visible: hybrid BM25 + dense search, but backed by a vector database (self-hosted Milvus via Docker, or managed Zilliz Cloud) and an embedding provider (OpenAI / VoyageAI / Gemini API, or local Ollama).
kengrepaiclaude-context
Runtimesingle static Go binary (no cgo)single Go binaryNode/TS (npm)
Embeddingsin-process, pure Go (Model2Vec)external Ollama daemonexternal provider (OpenAI / Voyage / Gemini, or Ollama)
External services needednone — auto-fetches a ~60 MB model, then runs offlineOllama (daemon + model)vector DB (Milvus/Docker or Zilliz Cloud) + an embedding API/daemon
RetrievalBM25 + dense + RRF + code-aware rerankdense + call graphshybrid (BM25 + dense)
Recall / NDCG0.967 recall@10 · 0.842 NDCG@10, with a reproduction harnessnot publishednot published
Token savings~46× vs grep+Read, measured + reproduciblenot publishedvendor-claimed −39% vs a baseline
Speedindex ~1.6 s / 13 k chunks; hybrid search p50 ~1.5 ms (measured)vendor: "10 k files in seconds, ms queries"depends on the vector DB + network
Languages (structural)13 (tree-sitter)10chunk-level, language-agnostic
LicenseMITMITMIT

Two honest caveats. First, ken's numbers ship with reproduction commands (docs/BENCH.md); the cells marked "not published" mean we found no standard-benchmark figure to cite and have not independently benchmarked the others' speed — architecture, dependencies, and license are the verifiable axes (as of June 2026). Second, the tools optimize for different things — grepai adds call-graph tracing; claude-context leans on a managed vector DB for scale-out. ken's specific claim is near-grep recall at ~1–2 orders of magnitude fewer tokens, from one binary with no external services, every number reproducible.

Choosing a chunker

The default regex chunker handles most cases well. The opt-in treesitter chunker (--chunker=treesitter / KEN_MCP_CHUNKER=treesitter, pure-Go gotreesitter) measurably wins for Kotlin, Zig, TypeScript, Java, PHP and loses on Python, C, Rust, Lua, Scala — net Δ −0.004 NDCG overall (within noise), so it stays opt-in. The full per-language recommendation table is in docs/BENCH.md; the default-stays-regex rationale is ADR-011.

For SDK authors: ship docs as a single binary

The mcp.Run library lets you bake a //go:embed corpus + the Model2Vec model into one static MCP server binary — no backend, no vector DB, no per-query network egress, version-pinned by build artifact. ~20 lines of main.go, go build, push to a GitHub release; users brew install and add one line to their agent config. The walker and indexer take any fs.FS (embed.FS, fstest.MapFS, tarball-backed), which also gives agent sandboxing by construction.

Full guide — the canonical pattern, prebuilt indices for fast cold start, the binary-size contract, and the opt-in mcp/db package — is in docs/DEVELOPERS.md.

Live demos (downloadable mcp.Run binaries over real codebases, with audit transcripts): demos/v0.1.0 release — Kubernetes v1.31.0 (59,795 chunks) and PostgreSQL 17.0 (64,506 chunks). Writeup: I shipped two downloadable code search binaries. The audit caught two bugs..

Roadmap

The risk register with explicit triggers is in docs/DESIGN.md §10; the living 1.0-readiness tracker is docs/internal/road-to-1.0.md. Retrieval is treated as closed for 1.0 (the relevance curve is flat); remaining work is polish + onboarding (getting fresh installs onto the hybrid path) + distribution.

How this was built

ken is a port. The retrieval algorithm is verbatim from MinishLab/semble (Python); the Go implementation was written by Claude under fixed constraints: pure Go / no cgo, algorithm constants ported verbatim and never tuned, original source wins whenever Claude's reconstruction diverges from semble's live code. That last rule caught five material errors during the rerank-pipeline port — each a confident-sounding hallucination that was wrong when checked against the Python source. The discipline of always checking, the verbatim-port rule, and the 11k-input tokenizer parity harness (which surfaced three bugs an 18-case spot-check missed) are human-supplied. Every architectural decision is recorded in docs/internal/DECISIONS.md.

Acknowledgments

ken stands on MinishLab's shoulders — the retrieval algorithm, the model, the whole embedding-table approach are theirs.

  • semble — the original Python implementation. © Thomas van Dongen, MIT.
  • model2vec — the static-embedding library whose three-tensor format ken implements. © Thomas van Dongen, MIT.
  • potion-code-16M — model weights, distilled from nomic-ai/CodeRankEmbed (MIT), itself from Snowflake/snowflake-arctic-embed-m-long (Apache-2.0). © Minish Lab. Redistributed per NOTICE.

License

ken is MIT-licensed. It bundles attribution for the redistributed model weights in NOTICE and a generated dependency-license list in THIRD_PARTY_LICENSES.md; every link in the provenance chain is permissive (MIT, Apache-2.0, MPL-2.0). See docs/DESIGN.md §6.

For contributors: CLAUDE.md has the build/test/formatting conventions and the project's invariants (precision contract, stdout/stderr contract).