Local Flow
A minimal, local, GPU-accelerated RAG server for document ingestion and querying.
Local Flow
Minimal, local, RAG with GPU acceleration. It actually works. Ships with more dependencies than the Vatican's import list. Runs on WSL2 and Windows out-of-the-box (mostly).
Architecture
MCP Server + FAISS + SentenceTransformers + LangChain + FastMCP
Vector database stored in ./vector_db (or wherever RAG_DATA_DIR points). Don't delete it unless you enjoy re-indexing everything.
JSON-RPC over stdin/stdout because apparently that's how we communicate with AI tools now.
Quick Start
Because slow start isn't good enough for all you accelerationists.
1. Platform
- Windows: Native Windows setup with CUDA toolkit → See
INSTALL_WINDOWS.md - WSL2: Used to have a guide for installing the CUDA stack on WSL2, but I'm thinking that's masochism -- now we have config which just calls Powershell from WSL
2. Install Dependencies
Assuming you already have CUDA Toolkit and CUDA Runtime installed. If you don't see, INSTALL_WINDOWS.md, again
# Clone the repo somewhere
git clone <repo_url>
# Create virtual environment (shocking, I know)
python -m venv flow-env
# Activate venv
flow-env\Scripts\activate.bat
# Install everything
pip install sentence-transformers langchain-community langchain-text-splitters faiss-cpu pdfplumber requests beautifulsoup4 gitpython nbformat pydantic fastmcp
# PyTorch with CUDA (check https://pytorch.org/get-started/locally/ for your version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# -- CUDA 12.9 (selected 12.8) I used `cu128`
Note: Using faiss-cpu because faiss-gpu is apparently allergic to recent CUDA versions. Your embeddings will still use GPU. Chill.
3. Configure MCP in Cursor
Add this to your mcp.json file:
Windows (%APPDATA%\Cursor\User\globalStorage\cursor.mcp\mcp.json):
Adjust paths to your setup (or it won't work, unsurprisingly).
{
"mcpServers": {
"LocalFlow": {
"command": "C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe",
"args": ["C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py"],
"env": {
"RAG_DATA_DIR": "C:\\Users\\user.name\\Documents\\flow_db"
},
"scopes": ["rag_read", "rag_write"],
"tools": ["add_source", "query_context", "list_sources", "remove_source"]
}
}
}
WSL2 (~/.cursor/mcp.json):
{
"mcpServers": {
"LocalFlow": {
"command": "powershell.exe",
"args": [
"-Command",
"$env:RAG_DATA_DIR='C:\\Users\\user.name\\Documents\\flow_db'; & 'C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe' 'C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py'"
],
"scopes": ["rag_read", "rag_write"],
"tools": ["add_source", "query_context", "list_sources", "remove_source"]
}
}
}
Server runs on http://localhost:8081. Revolutionary stuff.
4. Restart Cursor
Because restarting always fixes everything, right?
Usage
Adding Documents
Tell Cursor to use the add_source tool:
PDFs:
- Source type:
pdf - Path:
/path/to/your/document.pdf(Linux) orC:\path\to\document.pdf(Windows) - Source ID: Whatever makes you happy
Web Pages:
- Source type:
webpage - URL:
https://stackoverflow.com/questions/definitely-not-copy-pasted - Source ID: Optional
Git Repositories:
- Source type:
git_repo - URL:
https://github.com/someone/hopefully-documented.gitor local path - Source ID: Optional identifier
Like magic, but with more dependencies.
Querying (who knew it could be so complicated to ask a simple question)
Use the query_context tool:
- Query: "What does this thing actually do?"
- Top K: How many results you want (default: 5)
- Source IDs: Filter to specific sources (optional)
Managing Sources
list_sources- See what you've fed the machineremove_source- Pretend to delete things (metadata only, embeddings stick around like bad memories)
Features
- ✅ GPU acceleration (most of the time)
- ✅ Arbitrary text (PDFs, web pages, Git repos)
- ✅ Local vector DB
- ✅ Source filtering (TODO: nested vector DBs for faster re-indexing so we can modify RAG params)
- ❌ Your sanity (sold separately)
Troubleshooting
Universal Issues
"Tool not found": Did you restart Cursor? Restart Cursor.
"CUDA out of memory": Your GPU is having feelings. Try smaller batch sizes or less ambitious documents.
"It's not working": That's not a question. But yes, welcome to local AI tooling.
Platform-Specific Issues
For detailed troubleshooting:
- Windows: Check
INSTALL_WINDOWS.md - WSL2: Check
INSTALL_WSL2.md
Both have extensive troubleshooting sections because, let's face it, you'll need them.
Related Servers
招投标大数据服务
Provides comprehensive information queries for enterprise qualification certificates, including honors, administrative licenses, and profiles.
Kagi Search
Search the web using Kagi's search API
Local Research MCP Server
A private, local research assistant that searches the web and scrapes content using DuckDuckGo.
News Fact-Checker
Automated fact-checking of news headlines using web search and Google Gemini AI.
RagDocs
A server for RAG-based document search and management using Qdrant vector database with Ollama or OpenAI embeddings.
The Movie Database (TMDB)
Integrates with The Movie Database (TMDB) API, allowing AI assistants to search for movies, retrieve details, and generate related content.
Perplexity MCP Server
Adds Perplexity AI as a tool provider for Claude Desktop.
Deep Research
Generates in-depth research reports using powerful AI models.
Dynamics Partner Advisor
An MCP server for finding and comparing Microsoft Dynamics 365 implementation partners. Features tools to search by industry/region, get detailed partner profiles, and generate personalized shortlists for new projects.
Perplexity
Interacting with Perplexity