Local Flow
A minimal, local, GPU-accelerated RAG server for document ingestion and querying.
Local Flow
Minimal, local, RAG with GPU acceleration. It actually works. Ships with more dependencies than the Vatican's import list. Runs on WSL2 and Windows out-of-the-box (mostly).
Architecture
MCP Server + FAISS + SentenceTransformers + LangChain + FastMCP
Vector database stored in ./vector_db
(or wherever RAG_DATA_DIR
points). Don't delete it unless you enjoy re-indexing everything.
JSON-RPC over stdin/stdout because apparently that's how we communicate with AI tools now.
Quick Start
Because slow start isn't good enough for all you accelerationists.
1. Platform
- Windows: Native Windows setup with CUDA toolkit → See
INSTALL_WINDOWS.md
- WSL2: Used to have a guide for installing the CUDA stack on WSL2, but I'm thinking that's masochism -- now we have config which just calls Powershell from WSL
2. Install Dependencies
Assuming you already have CUDA Toolkit and CUDA Runtime installed. If you don't see, INSTALL_WINDOWS.md
, again
# Clone the repo somewhere
git clone <repo_url>
# Create virtual environment (shocking, I know)
python -m venv flow-env
# Activate venv
flow-env\Scripts\activate.bat
# Install everything
pip install sentence-transformers langchain-community langchain-text-splitters faiss-cpu pdfplumber requests beautifulsoup4 gitpython nbformat pydantic fastmcp
# PyTorch with CUDA (check https://pytorch.org/get-started/locally/ for your version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# -- CUDA 12.9 (selected 12.8) I used `cu128`
Note: Using faiss-cpu
because faiss-gpu
is apparently allergic to recent CUDA versions. Your embeddings will still use GPU. Chill.
3. Configure MCP in Cursor
Add this to your mcp.json
file:
Windows (%APPDATA%\Cursor\User\globalStorage\cursor.mcp\mcp.json
):
Adjust paths to your setup (or it won't work, unsurprisingly).
{
"mcpServers": {
"LocalFlow": {
"command": "C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe",
"args": ["C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py"],
"env": {
"RAG_DATA_DIR": "C:\\Users\\user.name\\Documents\\flow_db"
},
"scopes": ["rag_read", "rag_write"],
"tools": ["add_source", "query_context", "list_sources", "remove_source"]
}
}
}
WSL2 (~/.cursor/mcp.json
):
{
"mcpServers": {
"LocalFlow": {
"command": "powershell.exe",
"args": [
"-Command",
"$env:RAG_DATA_DIR='C:\\Users\\user.name\\Documents\\flow_db'; & 'C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe' 'C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py'"
],
"scopes": ["rag_read", "rag_write"],
"tools": ["add_source", "query_context", "list_sources", "remove_source"]
}
}
}
Server runs on http://localhost:8081
. Revolutionary stuff.
4. Restart Cursor
Because restarting always fixes everything, right?
Usage
Adding Documents
Tell Cursor to use the add_source
tool:
PDFs:
- Source type:
pdf
- Path:
/path/to/your/document.pdf
(Linux) orC:\path\to\document.pdf
(Windows) - Source ID: Whatever makes you happy
Web Pages:
- Source type:
webpage
- URL:
https://stackoverflow.com/questions/definitely-not-copy-pasted
- Source ID: Optional
Git Repositories:
- Source type:
git_repo
- URL:
https://github.com/someone/hopefully-documented.git
or local path - Source ID: Optional identifier
Like magic, but with more dependencies.
Querying (who knew it could be so complicated to ask a simple question)
Use the query_context
tool:
- Query: "What does this thing actually do?"
- Top K: How many results you want (default: 5)
- Source IDs: Filter to specific sources (optional)
Managing Sources
list_sources
- See what you've fed the machineremove_source
- Pretend to delete things (metadata only, embeddings stick around like bad memories)
Features
- ✅ GPU acceleration (most of the time)
- ✅ Arbitrary text (PDFs, web pages, Git repos)
- ✅ Local vector DB
- ✅ Source filtering (TODO: nested vector DBs for faster re-indexing so we can modify RAG params)
- ❌ Your sanity (sold separately)
Troubleshooting
Universal Issues
"Tool not found": Did you restart Cursor? Restart Cursor.
"CUDA out of memory": Your GPU is having feelings. Try smaller batch sizes or less ambitious documents.
"It's not working": That's not a question. But yes, welcome to local AI tooling.
Platform-Specific Issues
For detailed troubleshooting:
- Windows: Check
INSTALL_WINDOWS.md
- WSL2: Check
INSTALL_WSL2.md
Both have extensive troubleshooting sections because, let's face it, you'll need them.
Related Servers
CoolPC MCP Server
Query computer component prices from Taiwan's CoolPC website to generate AI-assisted price quotes.
Naver Search
Search across various Naver services and analyze data trends using the Naver Search and DataLab APIs.
TripGo
Find transport-related locations, departures, and routes using the TripGo API.
OpenAI WebSearch
Provides web search functionality for AI assistants using the OpenAI API, enabling access to up-to-date information.
Tavily Search
A search API tailored for LLMs, providing web search, RAG context generation, and Q&A capabilities through the Tavily API.
Serpstat MCP Server
SEO analysis using the Serpstat API.
RAG Documentation
Retrieve and process documentation using vector search to provide context for AI assistants.
招投标大数据服务
Provides comprehensive information queries for enterprise qualification certificates, including honors, administrative licenses, and profiles.
Web Search MCP
Scrapes Google search results using a headless browser. Requires Chrome to be installed.
FastDomainCheck
Check domain name registration status in bulk using WHOIS and DNS dual verification.