playwright-trace-decoder-mcp

MCP server for unpacking and analyzing Playwright trace.zip archives

🎭 playwright-trace-decoder-mcp

npm version npm downloads CI License: MIT

An MCP server that unpacks and structures Playwright trace.zip archives so AI agents can perform root-cause analysis on CI failures — without drowning in raw JSON or blowing up the context window.

🤔 The Problem

When a Playwright test fails in CI, you get a trace.zip. It's a binary blob. LLMs can't read it natively, and dumping the raw contents exceeds the context window. Engineers end up copying log snippets into ChatGPT manually like it's 2022.

This MCP server solves that: 16 focused tools that expose exactly the signal an agent needs to diagnose a failure, with pagination and ARIA compression to keep token costs low.

🛠️ Tools

Tools are grouped by how an agent should sequence them when diagnosing a failure.

Inspection — read trace data

ToolArgumentsWhat it returns
get_test_metadatatrace_pathBrowser, platform, viewport, test title, wall-clock start time
get_trace_summarytrace_pathFailing action + top-level error + total action count
get_action_timelinetrace_path, limit, offsetPaginated list of all actions with API names, locators, and timings
get_filtered_network_logstrace_path, limit, offsetOnly 4xx/5xx responses — static assets (CSS, JS, fonts, images) stripped
get_console_errorstrace_path, limit, offsetJS exceptions and warnings from the browser console
get_element_state_at_failuretrace_pathFailing locator, error message, and raw before/after metadata
extract_trace_metadata_stricttrace_pathFormat version, retry session breakdown, HAR payload mode (embed/attach/omit)

All list-returning tools support limit (1–500, default 50) and offset pagination with a has_more flag.

trace_path accepts either an absolute local path or an HTTPS URL — the server downloads the file automatically and caches it for the session.

DOM / UI analysis

ToolArgumentsWhat it returns
get_aria_accessibility_treetrace_path, action_index?ARIA accessibility tree as compact YAML (~90% fewer tokens than raw HTML). Defaults to the snapshot at the failed action.
get_dom_mutation_deltatrace_path, action_indexSet-diff of ARIA lines before vs after a specific action — added/removed elements only, not two full DOM dumps
get_screenshot_at_failuretrace_path, screenshot_index?Base64 JPEG screenshot closest to the moment of failure. Use when ARIA tree is empty (captcha, blank page). screenshot_index lets you walk the full visual timeline.
analyze_race_conditionstrace_pathNetwork requests that were in-flight when an interaction or assertion fired
correlate_dom_and_networktrace_pathFor each action where a fetch completed and the DOM mutated within ±100ms: triggering URL, response status, body snippet, and exact nodes added/removed

Root-cause analysis

ToolArgumentsWhat it returns
get_causal_chain_for_failuretrace_path, lookback_ms?Chronological chain of preceding actions, network errors, and console errors leading to the failure (default window: 5 s)
generate_error_signaturetrace_pathStable 12-char SHA-1 hash of the normalized error — use to group duplicate failures across parallel CI runs
compare_tracespassing_trace_path, failing_trace_pathLCS-aligned action sequence between a passing and failing run: structural divergence, timing anomalies (>500 ms), unmatched actions, network delta

Performance analysis

ToolArgumentsWhat it returns
detect_performance_anomaliestrace_path, slow_action_threshold_ms?, frame_drop_threshold_ms?Ranked list of slow actions and frame drops with suspected_cause (main thread blocked / network saturation / navigation timeout). Also reports p50/p95 action duration and a memory leak flag.

💬 Suggested agent workflow

get_trace_summary              ← what failed?
get_causal_chain_for_failure   ← what led up to it?
get_aria_accessibility_tree    ← what did the page look like?
get_screenshot_at_failure      ← ARIA empty? get the actual screenshot
get_dom_mutation_delta         ← what changed right before the failure?
analyze_race_conditions        ← was a network request still pending?
correlate_dom_and_network      ← which fetch caused which DOM change?
compare_traces                 ← flaky? compare to a passing run
detect_performance_anomalies   ← timeout but no JS error? check for Long Tasks

🚀 Setup

Build from source

git clone https://github.com/vola-trebla/playwright-trace-decoder-mcp.git
cd playwright-trace-decoder-mcp
npm install
npm run build

Add to your MCP client

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json)

{
  "mcpServers": {
    "playwright-trace-decoder": {
      "command": "node",
      "args": ["/absolute/path/to/playwright-trace-decoder-mcp/dist/index.js"]
    }
  }
}

Cursor (.cursor/mcp.json) or VS Code (.vscode/mcp.json)

{
  "mcpServers": {
    "playwright-trace-decoder": {
      "command": "node",
      "args": ["/absolute/path/to/playwright-trace-decoder-mcp/dist/index.js"]
    }
  }
}

Claude Code

claude mcp add playwright-trace-decoder \
  node /absolute/path/to/playwright-trace-decoder-mcp/dist/index.js

Docker

docker build -t playwright-trace-decoder-mcp .
{
  "mcpServers": {
    "playwright-trace-decoder": {
      "command": "docker",
      "args": ["run", "--rm", "-i", "-v", "/path/to/traces:/traces", "playwright-trace-decoder-mcp"]
    }
  }
}

💬 Example usage

Basic failure analysis

Ask your agent:

"The CI run failed. Here's the trace: /tmp/trace.zip. What went wrong and why?"

The agent calls get_trace_summaryget_causal_chain_for_failureget_aria_accessibility_tree, drilling deeper as needed — without you copy-pasting anything.

When the page was blank or redirected

"The ARIA tree is empty. Can you show me what was actually on screen when it failed?"

The agent calls get_screenshot_at_failure and gets the JPEG taken closest to the moment of failure — useful for catching captchas, error pages, or unexpected redirects.

Flakiness diagnosis

"This test passes locally but fails in CI. Compare these two traces and tell me what was different."

The agent calls compare_traces, which LCS-aligns both action sequences and surfaces the first structural divergence, timing anomalies, and network requests that only appeared in the failing run.

Grouping duplicate failures across parallel CI runs

"We have 12 failed traces from this pipeline. Are they all the same failure?"

Call generate_error_signature on each — identical signatures mean identical root cause, no need to read every trace.

Diagnosing which API call caused a DOM change

"The modal appeared but I don't know which fetch triggered it."

correlate_dom_and_network joins the HAR log and DOM snapshots automatically. Example output:

{
  "total_correlations": 1,
  "correlations": [
    {
      "action_id": "4:Locator.click",
      "triggering_request_url": "https://api.example.com/cart/items",
      "response_status_code": 200,
      "response_body_snippet": "{\"items\":[{\"id\":\"abc\",\"qty\":1}]}",
      "time_to_dom_mutation_ms": 38,
      "resulting_dom_mutations": [
        { "type": "added", "selector": "heading \"Cart (1 item)\"" },
        { "type": "removed", "selector": "button \"Add to cart\" [disabled]" }
      ]
    }
  ]
}

Performance timeouts — not just missing elements

"The test times out on goto, but there's no JS error. What's blocking the page?"

detect_performance_anomalies inspects screencast-frame gaps and flags Long Tasks. Example output:

{
  "anomalies": [
    {
      "kind": "slow_action",
      "blocked_action_id": "2:Frame.goto",
      "task_duration_ms": 4200,
      "threshold_ms": 500,
      "concurrent_network_load": 9,
      "frame_drop_count": 0,
      "worst_frame_gap_ms": 0,
      "suspected_cause": "network_saturation"
    }
  ],
  "suspected_memory_leak_flag": false,
  "p50_action_duration_ms": 95,
  "p95_action_duration_ms": 780,
  "total_frame_drop_count": 0
}

suspected_cause distinguishes a blocked main thread (main_thread_blocked — frame gaps present), a waterfall of concurrent fetches (network_saturation — ≥5 in-flight), and a navigation/hard timeout (timeout_or_navigation — duration >3 s with no other signals).

Checking what Playwright version and HAR mode a trace uses

"The trace came from an unfamiliar CI configuration. Is the response body data available?"

extract_trace_metadata_strict inspects the archive before you run any other tool:

{
  "format_version": 6,
  "har_mode": "embed",
  "retry_sessions": [
    { "session_id": "s1", "failed": false },
    { "session_id": "s2", "failed": true }
  ],
  "failed_session_id": "s2"
}

har_mode: "embed" means body snippets are inline. "attach" means they're in separate resource files. "omit" means headers only — correlate_dom_and_network will return empty response_body_snippet in that case.

🏗️ Architecture

trace.zip
  ├── *.trace          → JSONL: before/after action pairs, console events, frame snapshots
  ├── *.network        → JSONL: HAR resource-snapshot entries
  └── resources/
      ├── page@*.jpeg  → screenshots taken during the run
      └── ...          → fonts, stylesheets, other captured resources

The parser streams each file line-by-line (no full-buffer split) and caches results in-process with an LRU (max 50 entries), keyed by path + mtime. Re-reading the same unmodified trace costs zero I/O.

Frame snapshots store the DOM as nested arrays (["TAG", {attrs}, ...children]). The ARIA translator walks this tree and outputs compact YAML, reducing token cost by ~90% vs raw HTML.

🏗️ Stack

📋 Scripts

npm run build        # compile TypeScript → dist/
npm run lint         # ESLint
npm run format       # Prettier --write
npm run format:check # Prettier check (used in CI)

📄 License

MIT

Verwandte Server