Local Flow
A minimal, local, GPU-accelerated RAG server for document ingestion and querying.
Local Flow
Minimal, local, RAG with GPU acceleration. It actually works. Ships with more dependencies than the Vatican's import list. Runs on Windows and WSL.
Architecture
MCP Server + FAISS + SentenceTransformers + LangChain + FastMCP
Vector database stored in ./vector_db (or wherever RAG_DATA_DIR points). Don't delete it unless you enjoy re-indexing everything. The default is a directory in Windows. You should edit RAG_DATA_DIR if using it with WSL because the argument doesn't always work.
JSON-RPC over stdin/stdout, but we log everything over stderr because we're not cowards.
Quick Start
Because slow start isn't good enough for all you accelerationists.
1. Platform
- Windows: Native Windows setup with CUDA toolkit → See
INSTALL_WINDOWS.md - WSL2: Used to have a guide for installing the CUDA stack on WSL2, but I'm thinking that's masochism -- now we have config which calls Windows Python from WSL
2. Install Dependencies
Assuming you already have CUDA Toolkit and CUDA Runtime installed. If you don't see, INSTALL_WINDOWS.md, again
git clone <repo_url>
# shocking, I know
python -m venv flow-env
flow-env\Scripts\activate.bat
pip install sentence-transformers langchain-community langchain-text-splitters faiss-cpu pdfplumber requests beautifulsoup4 gitpython nbformat pydantic fastmcp
# PyTorch with CUDA (check https://pytorch.org/get-started/locally/ for your version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# -- CUDA 12.9 (selected 12.8) I used `cu128`
Note: Using faiss-cpu because faiss-gpu is allergic to recent CUDA versions.
3. Configure MCP in Cursor
Add this to your mcp.json file - also accessible via the "MCP settings" menu:
Windows (%APPDATA%\Cursor\User\globalStorage\cursor.mcp\mcp.json):
Adjust paths to your setup (or it won't work, unsurprisingly).
{
"mcpServers": {
"LocalFlow": {
"command": "C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe",
"args": ["C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py"],
"env": {
"RAG_DATA_DIR": "C:\\Users\\user.name\\Documents\\flow_db"
},
"scopes": ["rag_read", "rag_write"],
"tools": ["add_source", "query_context", "list_sources", "remove_source"]
}
}
}
WSL2 (~/.cursor/mcp.json):
{
"mcpServers": {
"LocalFlow": {
"command": "/mnt/c/Users/your.name/Documents/git/local_flow/flow-env/Scripts/python.exe",
"args": [
"C:\\Users\\your.name\\Documents\\git\\local_flow\\rag_mcp_server.py"
],
"env": {
"RAG_DATA_DIR": "C:\\Users\\your.name\\Documents\\flow_db"
}
}
}
}
When using WSL conf cannot execute binary file indicates WSL interop is disabled. Fix it:
# Add to /etc/wsl.conf
[interop]
enabled = true
appendWindowsPath = true
Then restart WSL from PowerShell: wsl --shutdown. UNC paths not supported is a related warning. If this is not persistent on restarts, you can manually register with the following from your target WSL2 distribution.
sudo sh -c 'echo ":WSLInterop:M::MZ::/init:PF" > /proc/sys/fs/binfmt_misc/register'
If RAG_DATA_DIR isn't being picked up (vector_db path shows \\wsl.localhost\... in logs), hardcode the fallback in rag_mcp_server.py -- the curernt fallback is my local path:
VECTOR_DB_PATH = os.environ.get("RAG_DATA_DIR") or "C:\\Users\\your.name\\Documents\\flow_db"
Server runs on http://localhost:8081.
4. Restart Cursor
If you're a coward.
Usage
Adding Documents
Tell Cursor to use the add_source tool, like magic, but with more dependencies.
PDFs:
- Source type:
pdf - Path:
/path/to/your/document.pdf(Linux) orC:\path\to\document.pdf(Windows) - Source ID: Whatever makes you happy (Optional)
Web Pages:
- Source type:
webpage - URL:
https://stackoverflow.com/questions/definitely-not-copy-pasted - Source ID: Optional
Git Repositories:
- Source type:
git_repo - URL:
https://github.com/someone/vibed/tree.gitor local path - Source ID: Optional
Querying (examples below)
Use the query_context tool:
- Query: "What does this thing actually do?"
- Top K: How many results you want (default: 5)
- Source IDs: Filter to specific sources (optional)
Managing Sources
list_sources- See what you've fed the machineremove_source- Pretend to delete things (metadata only)
Troubleshooting
Universal Issues
"Tool not found": Did you restart Cursor? Restart Cursor. "CUDA out of memory": Your GPU is having feelings. Try smaller batch sizes or less ambitious documents. "It's not working": That's not a question. But yes, I agree.
Examples of stuff that you can index
Your prompts should indicate one of the following behaviours:
- Listing the sources currently available.
- Indexing a new source (given some local path)
- Removing a source (from metadata, not embeddings)
- Querying existing source(s) given some prompt, from which keywords/phrases are generated. You can modulate parameters such as
top_k, requesting a greater sample of the top ranked return document chunks.
Source example
"What sources are available in the local Flow tool?"
Azure & Machine Learning
| Source ID | Type | Description | Chunks |
|---|---|---|---|
azure_ml_api_docs | Azure Machine Learning API documentation | 4,713 | |
azureml_examples_local_new | Git Repo | Azure ML examples repository | 24,290 |
grounding_dino_repo | Git Repo | GroundingDINO repository | 240 |
folder_9 | Folder | HuggingFace Hub documentation | 19,706 |
Vivado / FPGA Development
| Source ID | Type | Description | Chunks |
|---|---|---|---|
vivado_2019_1_ip_guide | UG896 - Vivado IP User Guide | 154 | |
vivado_2019_1_design_flows | UG892 - Vivado Design Flows Overview | 170 | |
vivado_2019_1_synthesis_guide | UG901 - Vivado Synthesis Guide | 320 | |
pdf_11 | UG894 - Vivado TCL Scripting | 164 | |
pdf_8 | ADEAS IP License Config | 9 |
Query example 1
"Give me an example of a synthesis parameter I could edit in Vivado to improve my build speed?" -> Ran query_context:LocalFlow -> Query:
"query": "synthesis parameters settings options improve build speed runtime performance"
"source_ids": [
"vivado_2019_1_synthesis_guide"
],
"top_k": 10
-> Response (raw, this is used to inform the answer):
{
"results": [
{
"content": "MAX_FANOUT option described in Chapter 2, Synthesis Attributes.\nNote: The -fanout_limit switch does not impact control signals (such as set, reset, clock\nenable): use MAX_FANOUT to replicate these signals if needed.\n-directive: Replaces the -effort_level option. When specified, this option runs\n°\nVivado synthesis with different optimizations. See Table 1-2 for a list of all\nstrategies and settings. Values are:\n- Default: Default settings. See Table 1-2.\n- RuntimeOptimized: Performs fewer timing optimizations and eliminates some\nRTL optimizations to reduce synthesis run time.\n- AreaOptimized_high: Performs general area optimizations including forcing\nternary adder implementation, applying new thresholds for use of carry chain in\ncomparators, and implementing area-optimized multiplexers.\nSynthesis 12\nSend Feedback\nUG901 (v2019.1) June 12, 2019 www.xilinx.com",
"source": "C:\\Users\\helios.lyons\\Documents\\flow_src\\ug901-20191-vivado-synthesis.pdf",
"metadata": {
"source": "C:\\Users\\helios.lyons\\Documents\\flow_src\\ug901-20191-vivado-synthesis.pdf",
"page": 12,
"source_type": "pdf",
"source_id": "vivado_2019_1_synthesis_guide"
}
[...]
-> Prompt completion (using chunks from the response):
Based on the Vivado Synthesis Guide (UG901), here's a great example:
-directive RuntimeOptimized
This is the most impactful single parameter for reducing synthesis time. It performs fewer timing optimizations and eliminates some RTL optimizations to reduce run time.
Related Servers
Teleport Documentation
Search and query Teleport's documentation using embeddings stored in a local Chroma vector database.
Brave Search
A server for Brave Search, enabling integration with AI assistants like Claude.
Google Research
Perform advanced web research using Google Search, with intelligent content extraction and multi-source synthesis.
Deep Research
A server for conducting deep research and generating reports.
Yahoo Finance
An MCP server for querying Yahoo Finance data using the yfinance library.
ArXiv-MCP
Search and retrieve academic papers from arXiv based on keywords.
展会大数据服务
Query comprehensive exhibition information, including enterprise participation records, venue details, and exhibition search.
Gemini DeepSearch MCP
An automated research agent using Google Gemini models and Google Search to perform deep, multi-step web research.
Academia MCP
Search for scientific publications across ArXiv, ACL Anthology, HuggingFace Datasets, and Semantic Scholar.
Perplexity Search
Web search and chat completion powered by the Perplexity AI API.