Local Flow

A minimal, local, GPU-accelerated RAG server for document ingestion and querying.

Local Flow

Minimal, local, RAG with GPU acceleration. It actually works. Ships with more dependencies than the Vatican's import list. Runs on Windows and WSL.

Architecture

MCP Server + FAISS + SentenceTransformers + LangChain + FastMCP

Vector database stored in ./vector_db (or wherever RAG_DATA_DIR points). Don't delete it unless you enjoy re-indexing everything. The default is a directory in Windows. You should edit RAG_DATA_DIR if using it with WSL because the argument doesn't always work.

JSON-RPC over stdin/stdout, but we log everything over stderr because we're not cowards.

Quick Start

Because slow start isn't good enough for all you accelerationists.

1. Platform

  • Windows: Native Windows setup with CUDA toolkit → See INSTALL_WINDOWS.md
  • WSL2: Used to have a guide for installing the CUDA stack on WSL2, but I'm thinking that's masochism -- now we have config which calls Windows Python from WSL

2. Install Dependencies

Assuming you already have CUDA Toolkit and CUDA Runtime installed. If you don't see, INSTALL_WINDOWS.md, again

git clone <repo_url>
# shocking, I know
python -m venv flow-env
flow-env\Scripts\activate.bat
pip install sentence-transformers langchain-community langchain-text-splitters faiss-cpu pdfplumber requests beautifulsoup4 gitpython nbformat pydantic fastmcp

# PyTorch with CUDA (check https://pytorch.org/get-started/locally/ for your version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# -- CUDA 12.9 (selected 12.8) I used `cu128`

Note: Using faiss-cpu because faiss-gpu is allergic to recent CUDA versions.

3. Configure MCP in Cursor

Add this to your mcp.json file - also accessible via the "MCP settings" menu:

Windows (%APPDATA%\Cursor\User\globalStorage\cursor.mcp\mcp.json):

Adjust paths to your setup (or it won't work, unsurprisingly).

{
  "mcpServers": {
    "LocalFlow": {
      "command": "C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe",
      "args": ["C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py"],
      "env": {
        "RAG_DATA_DIR": "C:\\Users\\user.name\\Documents\\flow_db"
      },
      "scopes": ["rag_read", "rag_write"],
      "tools": ["add_source", "query_context", "list_sources", "remove_source"]
    }
  }
}

WSL2 (~/.cursor/mcp.json):

{
  "mcpServers": {
    "LocalFlow": {
      "command": "/mnt/c/Users/your.name/Documents/git/local_flow/flow-env/Scripts/python.exe",
      "args": [
        "C:\\Users\\your.name\\Documents\\git\\local_flow\\rag_mcp_server.py"
      ],
      "env": {
        "RAG_DATA_DIR": "C:\\Users\\your.name\\Documents\\flow_db"
      }
    }
  }
}

When using WSL conf cannot execute binary file indicates WSL interop is disabled. Fix it:

# Add to /etc/wsl.conf
[interop]
enabled = true
appendWindowsPath = true

Then restart WSL from PowerShell: wsl --shutdown. UNC paths not supported is a related warning. If this is not persistent on restarts, you can manually register with the following from your target WSL2 distribution.

sudo sh -c 'echo ":WSLInterop:M::MZ::/init:PF" > /proc/sys/fs/binfmt_misc/register'

If RAG_DATA_DIR isn't being picked up (vector_db path shows \\wsl.localhost\... in logs), hardcode the fallback in rag_mcp_server.py -- the curernt fallback is my local path:

VECTOR_DB_PATH = os.environ.get("RAG_DATA_DIR") or "C:\\Users\\your.name\\Documents\\flow_db"

Server runs on http://localhost:8081.

4. Restart Cursor

If you're a coward.

Usage

Adding Documents

Tell Cursor to use the add_source tool, like magic, but with more dependencies.

PDFs:

  • Source type: pdf
  • Path: /path/to/your/document.pdf (Linux) or C:\path\to\document.pdf (Windows)
  • Source ID: Whatever makes you happy (Optional)

Web Pages:

  • Source type: webpage
  • URL: https://stackoverflow.com/questions/definitely-not-copy-pasted
  • Source ID: Optional

Git Repositories:

  • Source type: git_repo
  • URL: https://github.com/someone/vibed/tree.git or local path
  • Source ID: Optional

Querying (examples below)

Use the query_context tool:

  • Query: "What does this thing actually do?"
  • Top K: How many results you want (default: 5)
  • Source IDs: Filter to specific sources (optional)

Managing Sources

  • list_sources - See what you've fed the machine
  • remove_source - Pretend to delete things (metadata only)

Troubleshooting

Universal Issues

"Tool not found": Did you restart Cursor? Restart Cursor. "CUDA out of memory": Your GPU is having feelings. Try smaller batch sizes or less ambitious documents. "It's not working": That's not a question. But yes, I agree.

Examples of stuff that you can index

Your prompts should indicate one of the following behaviours:

  1. Listing the sources currently available.
  2. Indexing a new source (given some local path)
  3. Removing a source (from metadata, not embeddings)
  4. Querying existing source(s) given some prompt, from which keywords/phrases are generated. You can modulate parameters such as top_k, requesting a greater sample of the top ranked return document chunks.

Source example

"What sources are available in the local Flow tool?"
Azure & Machine Learning

Source IDTypeDescriptionChunks
azure_ml_api_docsPDFAzure Machine Learning API documentation4,713
azureml_examples_local_newGit RepoAzure ML examples repository24,290
grounding_dino_repoGit RepoGroundingDINO repository240
folder_9FolderHuggingFace Hub documentation19,706

Vivado / FPGA Development

Source IDTypeDescriptionChunks
vivado_2019_1_ip_guidePDFUG896 - Vivado IP User Guide154
vivado_2019_1_design_flowsPDFUG892 - Vivado Design Flows Overview170
vivado_2019_1_synthesis_guidePDFUG901 - Vivado Synthesis Guide320
pdf_11PDFUG894 - Vivado TCL Scripting164
pdf_8PDFADEAS IP License Config9

Query example 1

"Give me an example of a synthesis parameter I could edit in Vivado to improve my build speed?" -> Ran query_context:LocalFlow -> Query:

"query": "synthesis parameters settings options improve build speed runtime performance"
"source_ids": [
  "vivado_2019_1_synthesis_guide"
],
"top_k": 10

-> Response (raw, this is used to inform the answer):

{
"results": [
{
"content": "MAX_FANOUT option described in Chapter 2, Synthesis Attributes.\nNote: The -fanout_limit switch does not impact control signals (such as set, reset, clock\nenable): use MAX_FANOUT to replicate these signals if needed.\n-directive: Replaces the -effort_level option. When specified, this option runs\n°\nVivado synthesis with different optimizations. See Table 1-2 for a list of all\nstrategies and settings. Values are:\n- Default: Default settings. See Table 1-2.\n- RuntimeOptimized: Performs fewer timing optimizations and eliminates some\nRTL optimizations to reduce synthesis run time.\n- AreaOptimized_high: Performs general area optimizations including forcing\nternary adder implementation, applying new thresholds for use of carry chain in\ncomparators, and implementing area-optimized multiplexers.\nSynthesis 12\nSend Feedback\nUG901 (v2019.1) June 12, 2019 www.xilinx.com",
"source": "C:\\Users\\helios.lyons\\Documents\\flow_src\\ug901-20191-vivado-synthesis.pdf",
"metadata": {
"source": "C:\\Users\\helios.lyons\\Documents\\flow_src\\ug901-20191-vivado-synthesis.pdf",
"page": 12,
"source_type": "pdf",
"source_id": "vivado_2019_1_synthesis_guide"
}
[...]

-> Prompt completion (using chunks from the response):

Based on the Vivado Synthesis Guide (UG901), here's a great example:
-directive RuntimeOptimized
This is the most impactful single parameter for reducing synthesis time. It performs fewer timing optimizations and eliminates some RTL optimizations to reduce run time.

Related Servers