Local Flow
A minimal, local, GPU-accelerated RAG server for document ingestion and querying.
Local Flow
Minimal, local, RAG with GPU acceleration. It actually works. Ships with more dependencies than the Vatican's import list. Runs on WSL2 and Windows out-of-the-box (mostly).
Architecture
MCP Server + FAISS + SentenceTransformers + LangChain + FastMCP
Vector database stored in ./vector_db (or wherever RAG_DATA_DIR points). Don't delete it unless you enjoy re-indexing everything.
JSON-RPC over stdin/stdout because apparently that's how we communicate with AI tools now.
Quick Start
Because slow start isn't good enough for all you accelerationists.
1. Platform
- Windows: Native Windows setup with CUDA toolkit → See
INSTALL_WINDOWS.md - WSL2: Used to have a guide for installing the CUDA stack on WSL2, but I'm thinking that's masochism -- now we have config which just calls Powershell from WSL
2. Install Dependencies
Assuming you already have CUDA Toolkit and CUDA Runtime installed. If you don't see, INSTALL_WINDOWS.md, again
# Clone the repo somewhere
git clone <repo_url>
# Create virtual environment (shocking, I know)
python -m venv flow-env
# Activate venv
flow-env\Scripts\activate.bat
# Install everything
pip install sentence-transformers langchain-community langchain-text-splitters faiss-cpu pdfplumber requests beautifulsoup4 gitpython nbformat pydantic fastmcp
# PyTorch with CUDA (check https://pytorch.org/get-started/locally/ for your version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# -- CUDA 12.9 (selected 12.8) I used `cu128`
Note: Using faiss-cpu because faiss-gpu is apparently allergic to recent CUDA versions. Your embeddings will still use GPU. Chill.
3. Configure MCP in Cursor
Add this to your mcp.json file:
Windows (%APPDATA%\Cursor\User\globalStorage\cursor.mcp\mcp.json):
Adjust paths to your setup (or it won't work, unsurprisingly).
{
"mcpServers": {
"LocalFlow": {
"command": "C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe",
"args": ["C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py"],
"env": {
"RAG_DATA_DIR": "C:\\Users\\user.name\\Documents\\flow_db"
},
"scopes": ["rag_read", "rag_write"],
"tools": ["add_source", "query_context", "list_sources", "remove_source"]
}
}
}
WSL2 (~/.cursor/mcp.json):
{
"mcpServers": {
"LocalFlow": {
"command": "powershell.exe",
"args": [
"-Command",
"$env:RAG_DATA_DIR='C:\\Users\\user.name\\Documents\\flow_db'; & 'C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe' 'C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py'"
],
"scopes": ["rag_read", "rag_write"],
"tools": ["add_source", "query_context", "list_sources", "remove_source"]
}
}
}
Server runs on http://localhost:8081. Revolutionary stuff.
4. Restart Cursor
Because restarting always fixes everything, right?
Usage
Adding Documents
Tell Cursor to use the add_source tool:
PDFs:
- Source type:
pdf - Path:
/path/to/your/document.pdf(Linux) orC:\path\to\document.pdf(Windows) - Source ID: Whatever makes you happy
Web Pages:
- Source type:
webpage - URL:
https://stackoverflow.com/questions/definitely-not-copy-pasted - Source ID: Optional
Git Repositories:
- Source type:
git_repo - URL:
https://github.com/someone/hopefully-documented.gitor local path - Source ID: Optional identifier
Like magic, but with more dependencies.
Querying (who knew it could be so complicated to ask a simple question)
Use the query_context tool:
- Query: "What does this thing actually do?"
- Top K: How many results you want (default: 5)
- Source IDs: Filter to specific sources (optional)
Managing Sources
list_sources- See what you've fed the machineremove_source- Pretend to delete things (metadata only, embeddings stick around like bad memories)
Features
- ✅ GPU acceleration (most of the time)
- ✅ Arbitrary text (PDFs, web pages, Git repos)
- ✅ Local vector DB
- ✅ Source filtering (TODO: nested vector DBs for faster re-indexing so we can modify RAG params)
- ❌ Your sanity (sold separately)
Troubleshooting
Universal Issues
"Tool not found": Did you restart Cursor? Restart Cursor.
"CUDA out of memory": Your GPU is having feelings. Try smaller batch sizes or less ambitious documents.
"It's not working": That's not a question. But yes, welcome to local AI tooling.
Platform-Specific Issues
For detailed troubleshooting:
- Windows: Check
INSTALL_WINDOWS.md - WSL2: Check
INSTALL_WSL2.md
Both have extensive troubleshooting sections because, let's face it, you'll need them.
Related Servers
Yandex Search MCP Server
Perform real-time web searches using the Yandex Search API.
Agora MCP
Search and buy products across thousands of online stores using the SearchAgora universal product search engine.
MCP Deep Search
A server for performing deep web searches using the @just-every/search library, requiring API keys via an environment file.
OpenAI WebSearch
Provides web search functionality for AI assistants using the OpenAI API, enabling access to up-to-date information.
Rijksmuseum MCP Server
Explore the Rijksmuseum's art collection using natural language.
Manticore Search
Provides access to Manticore Search, an open-source database for real-time, full-text search.
Local RAG
Privacy-first local RAG server for semantic document search without external APIs
Dartpoint
Access public disclosure information for Korean companies (DART) using the dartpoint.ai API.
Web fetch and search MCP Server
Provides web search, Wikipedia search, and web content fetching capabilities using OCaml.
SearXNG MCP Server
A web search server powered by the SearXNG API.