Local Flow

A minimal, local, GPU-accelerated RAG server for document ingestion and querying.

Local Flow

Minimal, local, RAG with GPU acceleration. It actually works. Ships with more dependencies than the Vatican's import list. Runs on WSL2 and Windows out-of-the-box (mostly).

Architecture

MCP Server + FAISS + SentenceTransformers + LangChain + FastMCP

Vector database stored in ./vector_db (or wherever RAG_DATA_DIR points). Don't delete it unless you enjoy re-indexing everything.

JSON-RPC over stdin/stdout because apparently that's how we communicate with AI tools now.

Quick Start

Because slow start isn't good enough for all you accelerationists.

1. Platform

  • Windows: Native Windows setup with CUDA toolkit → See INSTALL_WINDOWS.md
  • WSL2: Used to have a guide for installing the CUDA stack on WSL2, but I'm thinking that's masochism -- now we have config which just calls Powershell from WSL

2. Install Dependencies

Assuming you already have CUDA Toolkit and CUDA Runtime installed. If you don't see, INSTALL_WINDOWS.md, again

# Clone the repo somewhere
git clone <repo_url>

# Create virtual environment (shocking, I know)
python -m venv flow-env

# Activate venv
flow-env\Scripts\activate.bat

# Install everything 
pip install sentence-transformers langchain-community langchain-text-splitters faiss-cpu pdfplumber requests beautifulsoup4 gitpython nbformat pydantic fastmcp

# PyTorch with CUDA (check https://pytorch.org/get-started/locally/ for your version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# -- CUDA 12.9 (selected 12.8) I used `cu128`

Note: Using faiss-cpu because faiss-gpu is apparently allergic to recent CUDA versions. Your embeddings will still use GPU. Chill.

3. Configure MCP in Cursor

Add this to your mcp.json file:

Windows (%APPDATA%\Cursor\User\globalStorage\cursor.mcp\mcp.json):

Adjust paths to your setup (or it won't work, unsurprisingly).

{
  "mcpServers": {
    "LocalFlow": {
      "command": "C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe",
      "args": ["C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py"],
      "env": {
        "RAG_DATA_DIR": "C:\\Users\\user.name\\Documents\\flow_db"
      },
      "scopes": ["rag_read", "rag_write"],
      "tools": ["add_source", "query_context", "list_sources", "remove_source"]
    }
  }
}

WSL2 (~/.cursor/mcp.json):

{
  "mcpServers": {
    "LocalFlow": {
      "command": "powershell.exe",
      "args": [
        "-Command", 
        "$env:RAG_DATA_DIR='C:\\Users\\user.name\\Documents\\flow_db'; & 'C:\\Users\\user.name\\Documents\\git\\local_flow\\flow-env\\Scripts\\python.exe' 'C:\\Users\\user.name\\Documents\\git\\local_flow\\rag_mcp_server.py'"
      ],
      "scopes": ["rag_read", "rag_write"],
      "tools": ["add_source", "query_context", "list_sources", "remove_source"]
    }
  }
}

Server runs on http://localhost:8081. Revolutionary stuff.

4. Restart Cursor

Because restarting always fixes everything, right?

Usage

Adding Documents

Tell Cursor to use the add_source tool:

PDFs:

  • Source type: pdf
  • Path: /path/to/your/document.pdf (Linux) or C:\path\to\document.pdf (Windows)
  • Source ID: Whatever makes you happy

Web Pages:

  • Source type: webpage
  • URL: https://stackoverflow.com/questions/definitely-not-copy-pasted
  • Source ID: Optional

Git Repositories:

  • Source type: git_repo
  • URL: https://github.com/someone/hopefully-documented.git or local path
  • Source ID: Optional identifier

Like magic, but with more dependencies.

Querying (who knew it could be so complicated to ask a simple question)

Use the query_context tool:

  • Query: "What does this thing actually do?"
  • Top K: How many results you want (default: 5)
  • Source IDs: Filter to specific sources (optional)

Managing Sources

  • list_sources - See what you've fed the machine
  • remove_source - Pretend to delete things (metadata only, embeddings stick around like bad memories)

Features

  • ✅ GPU acceleration (most of the time)
  • ✅ Arbitrary text (PDFs, web pages, Git repos)
  • ✅ Local vector DB
  • ✅ Source filtering (TODO: nested vector DBs for faster re-indexing so we can modify RAG params)
  • ❌ Your sanity (sold separately)

Troubleshooting

Universal Issues

"Tool not found": Did you restart Cursor? Restart Cursor.

"CUDA out of memory": Your GPU is having feelings. Try smaller batch sizes or less ambitious documents.

"It's not working": That's not a question. But yes, welcome to local AI tooling.

Platform-Specific Issues

For detailed troubleshooting:

  • Windows: Check INSTALL_WINDOWS.md
  • WSL2: Check INSTALL_WSL2.md

Both have extensive troubleshooting sections because, let's face it, you'll need them.

Related Servers