ChunkHound
A local-first semantic code search tool with vector and regex capabilities, designed for AI assistants.
Local-first codebase intelligence
Your AI assistant searches code but doesn't understand it. ChunkHound researches your codebase—extracting architecture, patterns, and institutional knowledge at any scale. Integrates via MCP.
Features
- cAST Algorithm - Research-backed semantic code chunking
- Multi-Hop Semantic Search - Discovers interconnected code relationships beyond direct matches
- Semantic search - Natural language queries like "find authentication code"
- Regex search - Pattern matching without API keys
- Local-first - Your code stays on your machine
- 32 languages with structured parsing
- Programming (via Tree-sitter): Python, JavaScript, TypeScript, JSX, TSX, Java, Kotlin, Groovy, C, C++, C#, Go, Rust, Haskell, Swift, Bash, MATLAB, Makefile, Objective-C, PHP, Dart, Lua, Vue, Svelte, Zig
- Configuration: JSON, YAML, TOML, HCL, Markdown
- Text-based (custom parsers): Text files, PDF
- MCP integration - Works with Claude, VS Code, Cursor, Windsurf, Zed, etc
- Real-time indexing - Automatic file watching, smart diffs, seamless branch switching, and explicit backend selection (
watchdog,watchman,polling)
Documentation
Visit chunkhound.github.io for complete guides:
Requirements
- Python 3.10+
- uv package manager
- API keys (optional - regex search works without any keys)
- Embeddings: VoyageAI (recommended) | OpenAI | Local with Ollama
- LLM (for Code Research): Claude Code CLI or Codex CLI (no API key needed) | Anthropic | OpenAI | Grok (xAI)
Installation
# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install ChunkHound
uv tool install chunkhound
Quick Start
- Create
.chunkhound.jsonin project root
{
"embedding": {
"provider": "voyageai",
"api_key": "your-voyageai-key"
},
"llm": {
"provider": "claude-code-cli"
}
}
Note: Use
"codex-cli"instead if you prefer Codex. Both work equally well and require no API key.
- Index your codebase
chunkhound index
For configuration, IDE setup, and advanced usage, see the documentation.
Why ChunkHound?
| Approach | Capability | Scale | Maintenance |
|---|---|---|---|
| Keyword Search | Exact matching | Fast | None |
| Traditional RAG | Semantic search | Scales | Re-index files |
| Knowledge Graphs | Relationship queries | Expensive | Continuous sync |
| ChunkHound | Semantic + Regex + Code Research | Automatic | Incremental + realtime |
Ideal for:
- Large monorepos with cross-team dependencies
- Security-sensitive codebases (local-only, no cloud)
- Multi-language projects needing consistent search
- Offline/air-gapped development environments
License
MIT
相關伺服器
ExploitDB MCP Server
Query security exploits and vulnerabilities from the ExploitDB database.
Tavily Search
A search API tailored for LLMs, providing web search, RAG context generation, and Q&A capabilities through the Tavily API.
Webcamexplore
Discover and search live webcams through the public Webcam Explore MCP server
Wizzy TMDB
A wrapper for TMDB
Untappd
Query the Untappd API for beer and brewery information.
Ferengi Rules of Acquisition
Provides the Ferengi Rules of Acquisition with powerful search and retrieval capabilities.
Docs MCP Server
Creates a personal, always-current knowledge base for AI by indexing documentation from websites, GitHub, npm, PyPI, and local files.
SearXNG
A privacy-respecting metasearch engine powered by a self-hosted SearXNG instance.
Custom Elasticsearch
A simple MCP server for Elasticsearch, designed for cloud environments where your public key is already authorized.
Needle
Production-ready RAG out of the box to search and retrieve data from your own documents.