ChromaDB
Provides AI assistants with persistent memory using ChromaDB vector storage.
ChromaDB MCP Server š§
A Model Context Protocol (MCP) server that gives AI assistants persistent memory through ChromaDB vector storage. Now with EXIF extraction, Watch Folders, and Duplicate Detection - the ultimate tool for creators!
⨠Features
Core
- Persistent AI Memory: Your AI assistant remembers past conversations and solutions
- Vector Search: Find similar code patterns, configurations, and documentation instantly
- Local First: Run everything on your own hardware, no cloud dependencies
š Batch Processing
- Fast Batch Ingest: Process entire directories in seconds (500+ files)
- 77 File Types: Photos, CAD, documents, data files, code
- Quick Load/Unload: Temporary collections for rapid workflows
- Export/Import: Backup and transfer collections as JSON
šø Photo Features (NEW in v3.0)
- EXIF Extraction: Camera, lens, exposure, GPS location, date taken
- Search by Camera: "Find photos shot with my Canon 5D"
- Search by Location: GPS coordinates embedded and searchable
- Search by Date: "Find photos from vacation 2024"
šļø Watch Folders (NEW in v3.0)
- Auto-Ingest: Drop files in watched folders, auto-add to ChromaDB
- Hands-Free: Perfect for incoming photo dumps, downloads
- Filter by Type: Watch only for specific file types
š Duplicate Detection (NEW in v3.0)
- Find Duplicates: Hash-based detection across directories
- Reclaim Space: See exactly how much space duplicates waste
- Compare Files: Check if two files are identical
- Perceptual Hashing: Find similar (not just identical) images
š Quick Start
Prerequisites
- Bun (JavaScript runtime)
- Docker (for ChromaDB)
- Claude Desktop (or any MCP client)
Installation
-
Clone the repository
git clone https://github.com//vespo92/chromadblocal-mcp-server.git cd chromadb-mcp-server -
Install dependencies
bun install -
Start ChromaDB
docker run -d \ --name chromadb-local \ -p 8001:8000 \ -v ~/chromadb-data:/chroma/chroma \ -e IS_PERSISTENT=TRUE \ chromadb/chroma:latest -
Initialize collections
bun run setup -
Configure Claude Desktop
Add to
~/Library/Application Support/Claude/claude_desktop_config.json:{ "mcpServers": { "chromadb-context": { "command": "bun", "args": ["run", "/path/to/chromadb-mcp-server/index.js"], "env": { "CHROMADB_URL": "http://localhost:8001" } } } } -
Restart Claude Desktop and start building your knowledge base!
š¬ Usage Examples
Once configured, interact naturally with your AI:
Store Knowledge
- "Store this Docker configuration in ChromaDB for future reference"
- "Save this React component pattern with tags: hooks, authentication"
- "Remember this solution for GPU passthrough issues"
Retrieve Information
- "Search ChromaDB for Python async examples"
- "Find similar component patterns to this one"
- "What solutions do we have for Docker networking issues?"
Build Context
- "Add this API documentation to the project_docs collection"
- "Store these test patterns for our testing suite"
š Batch File Processing
The killer feature! Process massive amounts of files instantly for AI-powered search and retrieval.
Quick Load Workflow (Fastest)
Perfect for "load, process, discard" workflows:
You: "Quick load my photos from /home/photos/vacation2024"
AI: Creates temp collection, ingests 500 photos in seconds
You: "Find photos with mountains or beaches"
AI: Returns matching photos with metadata
You: "Unload the collection"
AI: Cleans up, frees memory
Supported File Types
| Category | Extensions | Metadata Extracted |
|---|---|---|
| Images | .jpg, .jpeg, .png, .heic, .raw, .cr2, .nef, .arw, .tiff, .gif, .webp | Dimensions, size, format |
| CAD | .stl, .obj, .dxf, .dwg, .step, .iges, .fbx, .blend, .skp, .scad | Vertices, faces, format |
| Documents | .pdf, .txt, .md, .doc, .docx, .rtf | Full text content |
| Data | .json, .yaml, .xml, .csv, .toml, .ini | Parsed content |
| Code | .js, .ts, .py, .go, .rs, .java, .cpp, .c, .php, .rb + 20 more | Full source code |
Batch Processing Examples
"Scan /projects/cad-files to see what's there"
"Batch ingest all STL files from /3d-prints into the 'print_library' collection"
"Quick load my Downloads folder, find anything mentioning 'invoice'"
"Export the photo_archive collection to backup.json"
"Import backup.json into a new collection called 'restored_photos'"
Processing Speed
- Quick Load: ~200 files in 2-3 seconds
- Batch Ingest: ~500 files in 5-10 seconds (with full metadata)
- Concurrent Processing: 10-20 parallel file operations
- No external dependencies: Pure JavaScript/Bun processing
š Available Collections
| Collection | Description | Use Case |
|---|---|---|
home_automation | Smart home configs & automations | Home Assistant, IoT scripts |
code_snippets | Reusable code patterns | Functions, hooks, utilities |
configurations | System & app configs | Docker, Kubernetes, services |
troubleshooting | Problem solutions | Fixes, workarounds, debugging |
project_docs | Project documentation | APIs, architecture, guides |
learning_notes | Learning insights | Tutorials, concepts, notes |
š ļø MCP Tools
search_context
Search for relevant information across collections
Parameters:
- query: Search query
- collection: (optional) Specific collection to search
- limit: (optional) Number of results
store_context
Store new information with metadata
Parameters:
- content: The content to store
- metadata: Tags, categories, descriptions
- collection: Target collection
list_collections
List all available collections and their metadata
find_similar_patterns
Find code patterns similar to provided example
Batch Processing Tools
scan_directory
Preview files in a directory before ingesting
Parameters:
- path: Directory to scan
- categories: Filter by type (images, cad, documents, data, code)
- extensions: Filter by extension (.jpg, .stl, etc.)
- recursive: Include subdirectories (default: true)
batch_ingest
Bulk ingest files into ChromaDB with full metadata
Parameters:
- path: Source directory
- collection: Target collection name
- categories: File types to include
- max_files: Limit number of files
quick_load
š FAST: Rapidly load files for temporary processing
Parameters:
- path: Directory to load
- name: Collection name (auto-generated if omitted)
- categories: File types to include
unload_collection
Delete a collection (cleanup after quick_load)
Parameters:
- collection: Name of collection to delete
export_collection
Export collection to JSON file
Parameters:
- collection: Collection to export
- output_path: File path for JSON output
import_collection
Import collection from JSON file
Parameters:
- input_path: JSON file to import
- collection: Override collection name
- overwrite: Delete existing first (default: false)
get_collection_info
Get detailed stats about a collection
Parameters:
- collection: Collection name
ingest_file
Ingest a single file with metadata extraction
Parameters:
- path: File to ingest
- collection: Target collection
list_file_types
Show all supported file extensions
EXIF & Photo Tools
extract_exif
Extract detailed EXIF metadata from photos
Parameters:
- path: Path to JPEG or TIFF image
Returns: Camera, lens, exposure, GPS, date taken
Watch Folder Tools
watch_folder
Start auto-ingesting new files from a folder
Parameters:
- path: Folder to watch
- collection: Target collection (default: auto_ingest)
- categories: File types to watch
- include_exif: Extract EXIF from photos (default: true)
stop_watch
Stop watching a folder
Parameters:
- path: Folder to stop watching
list_watchers
List all active folder watchers
Duplicate Detection Tools
find_duplicates
Scan directory for duplicate files
Parameters:
- path: Directory to scan
- hash_method: "partial" (fast), "full" (thorough), "perceptual" (images)
- categories: File types to check
Returns: Duplicate groups with wasted space info
compare_files
Check if two files are duplicates
Parameters:
- file1: First file path
- file2: Second file path
find_collection_duplicates
Find duplicate entries in a ChromaDB collection
Parameters:
- collection: Collection name
š§ Configuration
Environment Variables
CHROMADB_URL=http://localhost:8001 # ChromaDB server URL
Custom Collections
Add new collections in setup-home-collections.js:
await createCollection('ml_experiments', {
description: 'Machine learning experiments and results'
});
š¦ Project Structure
chromadb-mcp-server/
āāā index.js # MCP server with 22 tools
āāā batch-processor.js # Fast batch file processing engine
āāā exif-extractor.js # EXIF metadata extraction for photos
āāā watch-folder.js # Auto-ingest watch folder system
āāā duplicate-detector.js # Duplicate file detection
āāā setup-home-collections.js # Collection initialization
āāā test-chromadb.js # Connection test script
āāā test-mcp.js # MCP functionality test
āāā test-batch-processor.js # Batch processing tests
āāā HOME-AI-SETUP.md # Detailed setup guide
āāā package.json # Project dependencies
āāā README.md # This file
š¤ Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
See CONTRIBUTING.md for more details.
š License
This project is licensed under the MIT License - see the LICENSE file for details.
š Acknowledgments
- Anthropic for the MCP specification
- Chroma for the excellent vector database
- The open-source community for inspiration and support
š What's Next?
- ā
Export/import collectionsDONE! - ā
Batch file processingDONE! - ā
EXIF metadata extractionDONE in v3.0! - ā
Watch folders / auto-ingestDONE in v3.0! - ā
Duplicate detectionDONE in v3.0! - Cloud sync capabilities
- Multi-user support
- Web UI for collection management
- AI-powered image descriptions (what's in the photo)
- 3D print analysis (volume, time estimates)
Built with ā¤ļø for the Home AI Community
Related Servers
Fabi Analyst Agent MCP
Fabi MCP is an autonomous agent that handles end-to-end data analysis tasks from natural language requests, automatically discovering data schemas, generating sql or python code, executing queries, and presenting insights.
GeoServer MCP Server
Connects Large Language Models to the GeoServer REST API, enabling AI assistants to interact with geospatial data and services.
Dynamics 365 MCP Server by CData
A read-only MCP server by CData that enables LLMs to query live data from Dynamics 365. Requires the CData JDBC Driver for Dynamics 365.
ClickHouse
An MCP server for interacting with a ClickHouse database.
Hologres
Connect to a Hologres instance, get table metadata, query and analyze data.
XiYan MCP Server
A server that enables natural language queries to databases using XiyanSQL.
Shardeum MCP Server
An MCP server for interacting with the Shardeum blockchain.
Domainkits.com MCP
Domain intelligence tools - NS reverse lookup, newly registered domain search and more
LanceDB Node.js Vector Search
Vector search using the LanceDB vector database and Ollama embedding models.
RudderStack
Customer data pipeline inspection, debugging, and configuration changes from tools like Claude Desktop and Cursor