Local Flow
A minimal, local, GPU-accelerated RAG server for document ingestion and querying.
Local Flow
Minimal, local, RAG with GPU acceleration. It actually works. Ships with more dependencies than the Vatican's import list. Runs on Windows and WSL.
Architecture
MCP Server + FAISS + SentenceTransformers + LangChain + FastMCP
Vector database stored in ./vector_db (or wherever RAG_DATA_DIR points). Don't delete it unless you enjoy re-indexing everything. The default is a directory in Windows. You should edit RAG_DATA_DIR if using it with WSL because the argument doesn't always work.
JSON-RPC over stdin/stdout, but we log everything over stderr because we're not cowards.
Quick Start
Because slow start isn't good enough for all you accelerationists.
1. Platform
- Windows: Native Windows setup with CUDA toolkit → See
INSTALL_WINDOWS.md - WSL2: Used to have a guide for installing the CUDA stack on WSL2, but I'm thinking that's masochism -- now we have config which calls Windows Python from WSL
2. Install Dependencies
Assuming you already have CUDA Toolkit and CUDA Runtime installed. If you don't see, INSTALL_WINDOWS.md, again
git clone <repo_url>
# shocking, I know
python -m venv flow-env
flow-env\Scripts\activate.bat
pip install sentence-transformers langchain-community langchain-text-splitters faiss-cpu pdfplumber requests beautifulsoup4 gitpython nbformat pydantic fastmcp
# PyTorch with CUDA (check https://pytorch.org/get-started/locally/ for your version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# -- CUDA 12.9 (selected 12.8) I used `cu128`
Note: Using faiss-cpu because faiss-gpu is allergic to recent CUDA versions.
3. Configure MCP in Cursor
Add this to your mcp.json file - also accessible via the "MCP settings" menu:
Windows (%APPDATA%\Cursor\User\globalStorage\cursor.mcp\mcp.json):
Adjust paths to your setup (or it won't work, unsurprisingly).
{
"mcpServers": {
"LocalFlow": {
"command": "C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe",
"args": ["C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py"],
"env": {
"RAG_DATA_DIR": "C:\\Users\\user.name\\Documents\\flow_db"
},
"scopes": ["rag_read", "rag_write"],
"tools": ["add_source", "query_context", "list_sources", "remove_source"]
}
}
}
WSL2 (~/.cursor/mcp.json):
{
"mcpServers": {
"LocalFlow": {
"command": "/mnt/c/Users/your.name/Documents/git/local_flow/flow-env/Scripts/python.exe",
"args": [
"C:\\Users\\your.name\\Documents\\git\\local_flow\\rag_mcp_server.py"
],
"env": {
"RAG_DATA_DIR": "C:\\Users\\your.name\\Documents\\flow_db"
}
}
}
}
When using WSL conf cannot execute binary file indicates WSL interop is disabled. Fix it:
# Add to /etc/wsl.conf
[interop]
enabled = true
appendWindowsPath = true
Then restart WSL from PowerShell: wsl --shutdown. UNC paths not supported is a related warning. If this is not persistent on restarts, you can manually register with the following from your target WSL2 distribution.
sudo sh -c 'echo ":WSLInterop:M::MZ::/init:PF" > /proc/sys/fs/binfmt_misc/register'
If RAG_DATA_DIR isn't being picked up (vector_db path shows \\wsl.localhost\... in logs), hardcode the fallback in rag_mcp_server.py -- the curernt fallback is my local path:
VECTOR_DB_PATH = os.environ.get("RAG_DATA_DIR") or "C:\\Users\\your.name\\Documents\\flow_db"
Server runs on http://localhost:8081.
4. Restart Cursor
If you're a coward.
Usage
Adding Documents
Tell Cursor to use the add_source tool, like magic, but with more dependencies.
PDFs:
- Source type:
pdf - Path:
/path/to/your/document.pdf(Linux) orC:\path\to\document.pdf(Windows) - Source ID: Whatever makes you happy (Optional)
Web Pages:
- Source type:
webpage - URL:
https://stackoverflow.com/questions/definitely-not-copy-pasted - Source ID: Optional
Git Repositories:
- Source type:
git_repo - URL:
https://github.com/someone/vibed/tree.gitor local path - Source ID: Optional
Querying (examples below)
Use the query_context tool:
- Query: "What does this thing actually do?"
- Top K: How many results you want (default: 5)
- Source IDs: Filter to specific sources (optional)
Managing Sources
list_sources- See what you've fed the machineremove_source- Pretend to delete things (metadata only)
Troubleshooting
Universal Issues
"Tool not found": Did you restart Cursor? Restart Cursor. "CUDA out of memory": Your GPU is having feelings. Try smaller batch sizes or less ambitious documents. "It's not working": That's not a question. But yes, I agree.
Examples of stuff that you can index
Your prompts should indicate one of the following behaviours:
- Listing the sources currently available.
- Indexing a new source (given some local path)
- Removing a source (from metadata, not embeddings)
- Querying existing source(s) given some prompt, from which keywords/phrases are generated. You can modulate parameters such as
top_k, requesting a greater sample of the top ranked return document chunks.
Source example
"What sources are available in the local Flow tool?"
Azure & Machine Learning
| Source ID | Type | Description | Chunks |
|---|---|---|---|
azure_ml_api_docs | Azure Machine Learning API documentation | 4,713 | |
azureml_examples_local_new | Git Repo | Azure ML examples repository | 24,290 |
grounding_dino_repo | Git Repo | GroundingDINO repository | 240 |
folder_9 | Folder | HuggingFace Hub documentation | 19,706 |
Vivado / FPGA Development
| Source ID | Type | Description | Chunks |
|---|---|---|---|
vivado_2019_1_ip_guide | UG896 - Vivado IP User Guide | 154 | |
vivado_2019_1_design_flows | UG892 - Vivado Design Flows Overview | 170 | |
vivado_2019_1_synthesis_guide | UG901 - Vivado Synthesis Guide | 320 | |
pdf_11 | UG894 - Vivado TCL Scripting | 164 | |
pdf_8 | ADEAS IP License Config | 9 |
Query example 1
"Give me an example of a synthesis parameter I could edit in Vivado to improve my build speed?" -> Ran query_context:LocalFlow -> Query:
"query": "synthesis parameters settings options improve build speed runtime performance"
"source_ids": [
"vivado_2019_1_synthesis_guide"
],
"top_k": 10
-> Response (raw, this is used to inform the answer):
{
"results": [
{
"content": "MAX_FANOUT option described in Chapter 2, Synthesis Attributes.\nNote: The -fanout_limit switch does not impact control signals (such as set, reset, clock\nenable): use MAX_FANOUT to replicate these signals if needed.\n-directive: Replaces the -effort_level option. When specified, this option runs\n°\nVivado synthesis with different optimizations. See Table 1-2 for a list of all\nstrategies and settings. Values are:\n- Default: Default settings. See Table 1-2.\n- RuntimeOptimized: Performs fewer timing optimizations and eliminates some\nRTL optimizations to reduce synthesis run time.\n- AreaOptimized_high: Performs general area optimizations including forcing\nternary adder implementation, applying new thresholds for use of carry chain in\ncomparators, and implementing area-optimized multiplexers.\nSynthesis 12\nSend Feedback\nUG901 (v2019.1) June 12, 2019 www.xilinx.com",
"source": "C:\\Users\\helios.lyons\\Documents\\flow_src\\ug901-20191-vivado-synthesis.pdf",
"metadata": {
"source": "C:\\Users\\helios.lyons\\Documents\\flow_src\\ug901-20191-vivado-synthesis.pdf",
"page": 12,
"source_type": "pdf",
"source_id": "vivado_2019_1_synthesis_guide"
}
[...]
-> Prompt completion (using chunks from the response):
Based on the Vivado Synthesis Guide (UG901), here's a great example:
-directive RuntimeOptimized
This is the most impactful single parameter for reducing synthesis time. It performs fewer timing optimizations and eliminates some RTL optimizations to reduce run time.
Related Servers
Code Research MCP Server
Search and access programming resources from Stack Overflow, MDN, GitHub, npm, and PyPI.
Harmonic Search
Search for companies and professionals using the Harmonic.ai API.
ClinicalTrials.gov
Search and retrieve clinical trial data from the official ClinicalTrials.gov API.
Wikipedia
Retrieves information from Wikipedia to provide context to Large Language Models (LLMs).
SearxNG MCP Server
Provides web search capabilities using a self-hosted SearxNG instance, allowing AI assistants to search the web.
ProPublica MCP Server
Search and analyze nonprofit organizations' Form 990 data using ProPublica's Nonprofit Explorer API.
Stack Overflow
Access Stack Overflow's trusted and verified technical questions and answers.
Docs MCP Server
Creates a personal, always-current knowledge base for AI by indexing documentation from websites, GitHub, npm, PyPI, and local files.
Gemini AI MCP Server
Provides AI-powered web search and summarization using the Gemini API's grounding feature.
OpenAI WebSearch
Provides web search functionality for AI assistants using the OpenAI API, enabling access to up-to-date information.