Local Flow
A minimal, local, GPU-accelerated RAG server for document ingestion and querying.
Local Flow
Minimal, local, RAG with GPU acceleration. It actually works. Ships with more dependencies than the Vatican's import list. Runs on WSL2 and Windows out-of-the-box (mostly).
Architecture
MCP Server + FAISS + SentenceTransformers + LangChain + FastMCP
Vector database stored in ./vector_db (or wherever RAG_DATA_DIR points). Don't delete it unless you enjoy re-indexing everything.
JSON-RPC over stdin/stdout because apparently that's how we communicate with AI tools now.
Quick Start
Because slow start isn't good enough for all you accelerationists.
1. Platform
- Windows: Native Windows setup with CUDA toolkit → See
INSTALL_WINDOWS.md - WSL2: Used to have a guide for installing the CUDA stack on WSL2, but I'm thinking that's masochism -- now we have config which just calls Powershell from WSL
2. Install Dependencies
Assuming you already have CUDA Toolkit and CUDA Runtime installed. If you don't see, INSTALL_WINDOWS.md, again
# Clone the repo somewhere
git clone <repo_url>
# Create virtual environment (shocking, I know)
python -m venv flow-env
# Activate venv
flow-env\Scripts\activate.bat
# Install everything
pip install sentence-transformers langchain-community langchain-text-splitters faiss-cpu pdfplumber requests beautifulsoup4 gitpython nbformat pydantic fastmcp
# PyTorch with CUDA (check https://pytorch.org/get-started/locally/ for your version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# -- CUDA 12.9 (selected 12.8) I used `cu128`
Note: Using faiss-cpu because faiss-gpu is apparently allergic to recent CUDA versions. Your embeddings will still use GPU. Chill.
3. Configure MCP in Cursor
Add this to your mcp.json file:
Windows (%APPDATA%\Cursor\User\globalStorage\cursor.mcp\mcp.json):
Adjust paths to your setup (or it won't work, unsurprisingly).
{
"mcpServers": {
"LocalFlow": {
"command": "C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe",
"args": ["C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py"],
"env": {
"RAG_DATA_DIR": "C:\\Users\\user.name\\Documents\\flow_db"
},
"scopes": ["rag_read", "rag_write"],
"tools": ["add_source", "query_context", "list_sources", "remove_source"]
}
}
}
WSL2 (~/.cursor/mcp.json):
{
"mcpServers": {
"LocalFlow": {
"command": "powershell.exe",
"args": [
"-Command",
"$env:RAG_DATA_DIR='C:\\Users\\user.name\\Documents\\flow_db'; & 'C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe' 'C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py'"
],
"scopes": ["rag_read", "rag_write"],
"tools": ["add_source", "query_context", "list_sources", "remove_source"]
}
}
}
Server runs on http://localhost:8081. Revolutionary stuff.
4. Restart Cursor
Because restarting always fixes everything, right?
Usage
Adding Documents
Tell Cursor to use the add_source tool:
PDFs:
- Source type:
pdf - Path:
/path/to/your/document.pdf(Linux) orC:\path\to\document.pdf(Windows) - Source ID: Whatever makes you happy
Web Pages:
- Source type:
webpage - URL:
https://stackoverflow.com/questions/definitely-not-copy-pasted - Source ID: Optional
Git Repositories:
- Source type:
git_repo - URL:
https://github.com/someone/hopefully-documented.gitor local path - Source ID: Optional identifier
Like magic, but with more dependencies.
Querying (who knew it could be so complicated to ask a simple question)
Use the query_context tool:
- Query: "What does this thing actually do?"
- Top K: How many results you want (default: 5)
- Source IDs: Filter to specific sources (optional)
Managing Sources
list_sources- See what you've fed the machineremove_source- Pretend to delete things (metadata only, embeddings stick around like bad memories)
Features
- ✅ GPU acceleration (most of the time)
- ✅ Arbitrary text (PDFs, web pages, Git repos)
- ✅ Local vector DB
- ✅ Source filtering (TODO: nested vector DBs for faster re-indexing so we can modify RAG params)
- ❌ Your sanity (sold separately)
Troubleshooting
Universal Issues
"Tool not found": Did you restart Cursor? Restart Cursor.
"CUDA out of memory": Your GPU is having feelings. Try smaller batch sizes or less ambitious documents.
"It's not working": That's not a question. But yes, welcome to local AI tooling.
Platform-Specific Issues
For detailed troubleshooting:
- Windows: Check
INSTALL_WINDOWS.md - WSL2: Check
INSTALL_WSL2.md
Both have extensive troubleshooting sections because, let's face it, you'll need them.
Related Servers
Pokemon TCG Card Search MCP
Search and display Pokemon Trading Card Game cards using the Pokemon TCG API.
Perplexica Search
Perform conversational searches with the Perplexica AI-powered answer engine.
NPI Registry
Search the National Provider Identifier (NPI) registry for healthcare providers and organizations in the United States.
Brave Search
An MCP server for the Brave Search API, providing web and local search capabilities via a streaming SSE interface.
Amazon Shopping with Claude
An MCP server for searching and buying products on Amazon.
Find BGM
Finds background music for YouTube shorts by analyzing script content and recommending tracks from YouTube Music.
Google Research
Perform advanced web research using Google Search, with intelligent content extraction and multi-source synthesis.
News Fact-Checker
Automated fact-checking of news headlines using web search and Google Gemini AI.
Algolia Search
A server for searching an Algolia index using the Algolia Search API.
Anime MCP Server
An AI-powered server for searching and getting recommendations for anime.