Prompt Lab MCP Server

Prompt optimization loops and regression test suites for Claude Code, with a companion web UI.

Documentation

Prompt Lab MCP Server

Prompt optimization loops and regression test suites for Claude Code, with a companion web UI.

The agent runs inside your Claude Code session and owns all LLM work — scoring responses, proposing improved prompts, applying suggestions. The server holds workspace state and keeps the agent and the Prompt Lab UI in sync.


Quick start

Copy mcp-connect.json from this repo into your project as .mcp.json:

{
  "mcpServers": {
    "prompt-lab": {
      "type": "http",
      "url": "https://prompt-lab-mcp.up.railway.app/mcp"
    }
  }
}

Claude Code connects automatically on next start. Verify with /mcp.


Example session

# 1. Open a workspace — agent shares the UI URL
start_web_app()
→ "Open https://prompt-lab-mcp.vercel.app?s=abc123 to follow along."

# 2. Register an API key
register_api_key(workspaceId, "sk-ant-...")

# 3. Set a system prompt and a test case
set_system_prompt(workspaceId, "You are a concise customer support agent...")
add_test_cases(workspaceId, [{
  query: "How do I reset my password?",
  targetAnswer: "Click 'Forgot password' on the login page and follow the email link."
}])

# 4. Run the optimization loop
loop_optimization(workspaceId, threshold=85)
→ Iteration 1 — score 58: response too long, no mention of email link
→ Iteration 2 — score 74: better, but missing the exact step
→ Iteration 3 — score 91: SUCCESS — prompt updated to require step-by-step answers

The UI shows each iteration's score, the agent's reasoning, and the revised system prompt in real time.


How it works

Prompt Lab UI (Vercel)
  ↕  HTTP
Prompt Lab MCP Server (Railway)
  ↕  MCP
Claude Code (your machine)

MCP tools

Setup

ToolDescription
start_web_app(workspaceId?)Creates a workspace and returns the Prompt Lab UI URL.
register_api_key(workspaceId, apiKey, provider?)Registers an API key for test runs. Provider is auto-detected from the key prefix.
list_models(workspaceId)Lists available models based on registered keys.
set_test_model(workspaceId, model)Sets the model for test runs. Syncs to the UI model selector.
delete_session(workspaceId)Deletes a workspace and all its state. Irreversible.

Templates

Templates are global and appear in the UI dropdowns as soon as they are pushed.

ToolDescription
save_template(name, testCases)Saves a test suite template. Appears in the UI "Load test suite…" dropdown.
save_system_prompt_template(name, content)Saves a system prompt template. Appears in the UI "Load template…" dropdown.

Workspace state

ToolDescription
get_workspace_state(workspaceId)Reads the full workspace: system prompt, test cases, results, suggestions, model.
set_system_prompt(workspaceId, systemPrompt)Sets the system prompt without incrementing the iteration counter.
add_test_cases(workspaceId, testCases, replace?)Adds test cases. replace=true overwrites all existing ones.
post_test_result(workspaceId, testCaseId, response, score, reasoning, model)Stores one scored test result.
post_prompt_suggestion(workspaceId, prompt, reasoning, expectedGain?)Queues a revised prompt for review in the UI.
apply_suggestion(workspaceId, suggestionId)Applies a pending suggestion and increments the iteration counter.
get_regression_status(workspaceId, threshold?)Pass/fail summary across all test cases for the current system prompt.

Optimization

Requires a workspace with at least one test case.

ToolDescription
start_optimization_session(workspaceId, threshold?, maxIterations?)Single pass — scores test cases, posts one suggestion, then waits for user review in the UI.
loop_optimization(workspaceId, threshold?, maxIterations?)Automated loop — iterates until all scores meet the threshold or max iterations is reached.

Regression

ToolDescription
run_regression_testsuite(workspaceId, threshold?)Single pass — scores all test cases, no prompt changes.
loop_regression(workspaceId, threshold?)Automated loop — repeats until every individual score meets the threshold. A high average that masks one failing case is not a pass.

Archive

ToolDescription
pull_ui_history(workspaceId)Fetches all session summaries and regression runs pushed by the UI.

Self-hosting

Deploy to Railway and set these environment variables:

VariableDescription
UPSTASH_REDIS_REST_URLUpstash Redis URL for persistence
UPSTASH_REDIS_REST_TOKENUpstash Redis token
OVERHANG_PROMPT_LAB_URLURL of your Prompt Lab UI deployment
npm install
npm run dev    # starts on :3000

MCP endpoint: http://localhost:3000/mcp


License

MIT — see LICENSE.

© 2026 Jurek Föllmer