Local Flow
A minimal, local, GPU-accelerated RAG server for document ingestion and querying.
Local Flow
Minimal, local, RAG with GPU acceleration. It actually works. Ships with more dependencies than the Vatican's import list. Runs on Windows and WSL.
Architecture
MCP Server + FAISS + SentenceTransformers + LangChain + FastMCP
Vector database stored in ./vector_db (or wherever RAG_DATA_DIR points). Don't delete it unless you enjoy re-indexing everything. The default is a directory in Windows. You should edit RAG_DATA_DIR if using it with WSL because the argument doesn't always work.
JSON-RPC over stdin/stdout, but we log everything over stderr because we're not cowards.
Quick Start
Because slow start isn't good enough for all you accelerationists.
1. Platform
- Windows: Native Windows setup with CUDA toolkit → See
INSTALL_WINDOWS.md - WSL2: Used to have a guide for installing the CUDA stack on WSL2, but I'm thinking that's masochism -- now we have config which calls Windows Python from WSL
2. Install Dependencies
Assuming you already have CUDA Toolkit and CUDA Runtime installed. If you don't see, INSTALL_WINDOWS.md, again
git clone <repo_url>
# shocking, I know
python -m venv flow-env
flow-env\Scripts\activate.bat
pip install sentence-transformers langchain-community langchain-text-splitters faiss-cpu pdfplumber requests beautifulsoup4 gitpython nbformat pydantic fastmcp
# PyTorch with CUDA (check https://pytorch.org/get-started/locally/ for your version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# -- CUDA 12.9 (selected 12.8) I used `cu128`
Note: Using faiss-cpu because faiss-gpu is allergic to recent CUDA versions.
3. Configure MCP in Cursor
Add this to your mcp.json file - also accessible via the "MCP settings" menu:
Windows (%APPDATA%\Cursor\User\globalStorage\cursor.mcp\mcp.json):
Adjust paths to your setup (or it won't work, unsurprisingly).
{
"mcpServers": {
"LocalFlow": {
"command": "C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe",
"args": ["C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py"],
"env": {
"RAG_DATA_DIR": "C:\\Users\\user.name\\Documents\\flow_db"
},
"scopes": ["rag_read", "rag_write"],
"tools": ["add_source", "query_context", "list_sources", "remove_source"]
}
}
}
WSL2 (~/.cursor/mcp.json):
{
"mcpServers": {
"LocalFlow": {
"command": "/mnt/c/Users/your.name/Documents/git/local_flow/flow-env/Scripts/python.exe",
"args": [
"C:\\Users\\your.name\\Documents\\git\\local_flow\\rag_mcp_server.py"
],
"env": {
"RAG_DATA_DIR": "C:\\Users\\your.name\\Documents\\flow_db"
}
}
}
}
When using WSL conf cannot execute binary file indicates WSL interop is disabled. Fix it:
# Add to /etc/wsl.conf
[interop]
enabled = true
appendWindowsPath = true
Then restart WSL from PowerShell: wsl --shutdown. UNC paths not supported is a related warning. If this is not persistent on restarts, you can manually register with the following from your target WSL2 distribution.
sudo sh -c 'echo ":WSLInterop:M::MZ::/init:PF" > /proc/sys/fs/binfmt_misc/register'
If RAG_DATA_DIR isn't being picked up (vector_db path shows \\wsl.localhost\... in logs), hardcode the fallback in rag_mcp_server.py -- the curernt fallback is my local path:
VECTOR_DB_PATH = os.environ.get("RAG_DATA_DIR") or "C:\\Users\\your.name\\Documents\\flow_db"
Server runs on http://localhost:8081.
4. Restart Cursor
If you're a coward.
Usage
Adding Documents
Tell Cursor to use the add_source tool, like magic, but with more dependencies.
PDFs:
- Source type:
pdf - Path:
/path/to/your/document.pdf(Linux) orC:\path\to\document.pdf(Windows) - Source ID: Whatever makes you happy (Optional)
Web Pages:
- Source type:
webpage - URL:
https://stackoverflow.com/questions/definitely-not-copy-pasted - Source ID: Optional
Git Repositories:
- Source type:
git_repo - URL:
https://github.com/someone/vibed/tree.gitor local path - Source ID: Optional
Querying (examples below)
Use the query_context tool:
- Query: "What does this thing actually do?"
- Top K: How many results you want (default: 5)
- Source IDs: Filter to specific sources (optional)
Managing Sources
list_sources- See what you've fed the machineremove_source- Pretend to delete things (metadata only)
Troubleshooting
Universal Issues
"Tool not found": Did you restart Cursor? Restart Cursor. "CUDA out of memory": Your GPU is having feelings. Try smaller batch sizes or less ambitious documents. "It's not working": That's not a question. But yes, I agree.
Examples of stuff that you can index
Your prompts should indicate one of the following behaviours:
- Listing the sources currently available.
- Indexing a new source (given some local path)
- Removing a source (from metadata, not embeddings)
- Querying existing source(s) given some prompt, from which keywords/phrases are generated. You can modulate parameters such as
top_k, requesting a greater sample of the top ranked return document chunks.
Source example
"What sources are available in the local Flow tool?"
Azure & Machine Learning
| Source ID | Type | Description | Chunks |
|---|---|---|---|
azure_ml_api_docs | Azure Machine Learning API documentation | 4,713 | |
azureml_examples_local_new | Git Repo | Azure ML examples repository | 24,290 |
grounding_dino_repo | Git Repo | GroundingDINO repository | 240 |
folder_9 | Folder | HuggingFace Hub documentation | 19,706 |
Vivado / FPGA Development
| Source ID | Type | Description | Chunks |
|---|---|---|---|
vivado_2019_1_ip_guide | UG896 - Vivado IP User Guide | 154 | |
vivado_2019_1_design_flows | UG892 - Vivado Design Flows Overview | 170 | |
vivado_2019_1_synthesis_guide | UG901 - Vivado Synthesis Guide | 320 | |
pdf_11 | UG894 - Vivado TCL Scripting | 164 | |
pdf_8 | ADEAS IP License Config | 9 |
Query example 1
"Give me an example of a synthesis parameter I could edit in Vivado to improve my build speed?" -> Ran query_context:LocalFlow -> Query:
"query": "synthesis parameters settings options improve build speed runtime performance"
"source_ids": [
"vivado_2019_1_synthesis_guide"
],
"top_k": 10
-> Response (raw, this is used to inform the answer):
{
"results": [
{
"content": "MAX_FANOUT option described in Chapter 2, Synthesis Attributes.\nNote: The -fanout_limit switch does not impact control signals (such as set, reset, clock\nenable): use MAX_FANOUT to replicate these signals if needed.\n-directive: Replaces the -effort_level option. When specified, this option runs\n°\nVivado synthesis with different optimizations. See Table 1-2 for a list of all\nstrategies and settings. Values are:\n- Default: Default settings. See Table 1-2.\n- RuntimeOptimized: Performs fewer timing optimizations and eliminates some\nRTL optimizations to reduce synthesis run time.\n- AreaOptimized_high: Performs general area optimizations including forcing\nternary adder implementation, applying new thresholds for use of carry chain in\ncomparators, and implementing area-optimized multiplexers.\nSynthesis 12\nSend Feedback\nUG901 (v2019.1) June 12, 2019 www.xilinx.com",
"source": "C:\\Users\\helios.lyons\\Documents\\flow_src\\ug901-20191-vivado-synthesis.pdf",
"metadata": {
"source": "C:\\Users\\helios.lyons\\Documents\\flow_src\\ug901-20191-vivado-synthesis.pdf",
"page": 12,
"source_type": "pdf",
"source_id": "vivado_2019_1_synthesis_guide"
}
[...]
-> Prompt completion (using chunks from the response):
Based on the Vivado Synthesis Guide (UG901), here's a great example:
-directive RuntimeOptimized
This is the most impactful single parameter for reducing synthesis time. It performs fewer timing optimizations and eliminates some RTL optimizations to reduce run time.
Related Servers
AWS Documentation
Fetch, convert, and search AWS documentation pages, with recommendations for related content.
Pearch
Best people search engine that reduces the time spent on talent discovery.
Stack Overflow
Access Stack Overflow's trusted and verified technical questions and answers.
Yandex Search MCP Server
Perform real-time web searches using the Yandex Search API.
Web Search
Enables free web searching using Google search results, with no API key required.
upfront rentals MCP
enables searching and booking car rentals
门店大数据服务
Provides comprehensive offline store information queries, including enterprise restaurant brand store search, offline store search, and restaurant brand store statistics.
Powertools for AWS MCP
Search the Powertools for AWS Lambda documentation across multiple runtimes to find documentation and examples.
MCP Agent
A lightweight, local MCP server in Python that enables RAG search through AWS Lambda.
Code Research MCP Server
Search and access programming resources from Stack Overflow, MDN, GitHub, npm, and PyPI.