playwright-trace-decoder-mcp
MCP server for unpacking and analyzing Playwright trace.zip archives
🎭 playwright-trace-decoder-mcp
An MCP server that unpacks and structures Playwright trace.zip archives so AI agents can perform root-cause analysis on CI failures — without drowning in raw JSON or blowing up the context window.
🤔 The Problem
When a Playwright test fails in CI, you get a trace.zip. It's a binary blob. LLMs can't read it natively, and dumping the raw contents exceeds the context window. Engineers end up copying log snippets into ChatGPT manually like it's 2022.
This MCP server solves that: 16 focused tools that expose exactly the signal an agent needs to diagnose a failure, with pagination and ARIA compression to keep token costs low.
🛠️ Tools
Tools are grouped by how an agent should sequence them when diagnosing a failure.
Inspection — read trace data
| Tool | Arguments | What it returns |
|---|---|---|
get_test_metadata | trace_path | Browser, platform, viewport, test title, wall-clock start time |
get_trace_summary | trace_path | Failing action + top-level error + total action count |
get_action_timeline | trace_path, limit, offset | Paginated list of all actions with API names, locators, and timings |
get_filtered_network_logs | trace_path, limit, offset | Only 4xx/5xx responses — static assets (CSS, JS, fonts, images) stripped |
get_console_errors | trace_path, limit, offset | JS exceptions and warnings from the browser console |
get_element_state_at_failure | trace_path | Failing locator, error message, and raw before/after metadata |
extract_trace_metadata_strict | trace_path | Format version, retry session breakdown, HAR payload mode (embed/attach/omit) |
All list-returning tools support limit (1–500, default 50) and offset pagination with a has_more flag.
trace_path accepts either an absolute local path or an HTTPS URL — the server downloads the file automatically and caches it for the session.
DOM / UI analysis
| Tool | Arguments | What it returns |
|---|---|---|
get_aria_accessibility_tree | trace_path, action_index? | ARIA accessibility tree as compact YAML (~90% fewer tokens than raw HTML). Defaults to the snapshot at the failed action. |
get_dom_mutation_delta | trace_path, action_index | Set-diff of ARIA lines before vs after a specific action — added/removed elements only, not two full DOM dumps |
get_screenshot_at_failure | trace_path, screenshot_index? | Base64 JPEG screenshot closest to the moment of failure. Use when ARIA tree is empty (captcha, blank page). screenshot_index lets you walk the full visual timeline. |
analyze_race_conditions | trace_path | Network requests that were in-flight when an interaction or assertion fired |
correlate_dom_and_network | trace_path | For each action where a fetch completed and the DOM mutated within ±100ms: triggering URL, response status, body snippet, and exact nodes added/removed |
Root-cause analysis
| Tool | Arguments | What it returns |
|---|---|---|
get_causal_chain_for_failure | trace_path, lookback_ms? | Chronological chain of preceding actions, network errors, and console errors leading to the failure (default window: 5 s) |
generate_error_signature | trace_path | Stable 12-char SHA-1 hash of the normalized error — use to group duplicate failures across parallel CI runs |
compare_traces | passing_trace_path, failing_trace_path | LCS-aligned action sequence between a passing and failing run: structural divergence, timing anomalies (>500 ms), unmatched actions, network delta |
Performance analysis
| Tool | Arguments | What it returns |
|---|---|---|
detect_performance_anomalies | trace_path, slow_action_threshold_ms?, frame_drop_threshold_ms? | Ranked list of slow actions and frame drops with suspected_cause (main thread blocked / network saturation / navigation timeout). Also reports p50/p95 action duration and a memory leak flag. |
💬 Suggested agent workflow
get_trace_summary ← what failed?
get_causal_chain_for_failure ← what led up to it?
get_aria_accessibility_tree ← what did the page look like?
get_screenshot_at_failure ← ARIA empty? get the actual screenshot
get_dom_mutation_delta ← what changed right before the failure?
analyze_race_conditions ← was a network request still pending?
correlate_dom_and_network ← which fetch caused which DOM change?
compare_traces ← flaky? compare to a passing run
detect_performance_anomalies ← timeout but no JS error? check for Long Tasks
🚀 Setup
Build from source
git clone https://github.com/vola-trebla/playwright-trace-decoder-mcp.git
cd playwright-trace-decoder-mcp
npm install
npm run build
Add to your MCP client
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json)
{
"mcpServers": {
"playwright-trace-decoder": {
"command": "node",
"args": ["/absolute/path/to/playwright-trace-decoder-mcp/dist/index.js"]
}
}
}
Cursor (.cursor/mcp.json) or VS Code (.vscode/mcp.json)
{
"mcpServers": {
"playwright-trace-decoder": {
"command": "node",
"args": ["/absolute/path/to/playwright-trace-decoder-mcp/dist/index.js"]
}
}
}
Claude Code
claude mcp add playwright-trace-decoder \
node /absolute/path/to/playwright-trace-decoder-mcp/dist/index.js
Docker
docker build -t playwright-trace-decoder-mcp .
{
"mcpServers": {
"playwright-trace-decoder": {
"command": "docker",
"args": ["run", "--rm", "-i", "-v", "/path/to/traces:/traces", "playwright-trace-decoder-mcp"]
}
}
}
💬 Example usage
Basic failure analysis
Ask your agent:
"The CI run failed. Here's the trace:
/tmp/trace.zip. What went wrong and why?"
The agent calls get_trace_summary → get_causal_chain_for_failure → get_aria_accessibility_tree, drilling deeper as needed — without you copy-pasting anything.
When the page was blank or redirected
"The ARIA tree is empty. Can you show me what was actually on screen when it failed?"
The agent calls get_screenshot_at_failure and gets the JPEG taken closest to the moment of failure — useful for catching captchas, error pages, or unexpected redirects.
Flakiness diagnosis
"This test passes locally but fails in CI. Compare these two traces and tell me what was different."
The agent calls compare_traces, which LCS-aligns both action sequences and surfaces the first structural divergence, timing anomalies, and network requests that only appeared in the failing run.
Grouping duplicate failures across parallel CI runs
"We have 12 failed traces from this pipeline. Are they all the same failure?"
Call generate_error_signature on each — identical signatures mean identical root cause, no need to read every trace.
Diagnosing which API call caused a DOM change
"The modal appeared but I don't know which fetch triggered it."
correlate_dom_and_network joins the HAR log and DOM snapshots automatically. Example output:
{
"total_correlations": 1,
"correlations": [
{
"action_id": "4:Locator.click",
"triggering_request_url": "https://api.example.com/cart/items",
"response_status_code": 200,
"response_body_snippet": "{\"items\":[{\"id\":\"abc\",\"qty\":1}]}",
"time_to_dom_mutation_ms": 38,
"resulting_dom_mutations": [
{ "type": "added", "selector": "heading \"Cart (1 item)\"" },
{ "type": "removed", "selector": "button \"Add to cart\" [disabled]" }
]
}
]
}
Performance timeouts — not just missing elements
"The test times out on
goto, but there's no JS error. What's blocking the page?"
detect_performance_anomalies inspects screencast-frame gaps and flags Long Tasks. Example output:
{
"anomalies": [
{
"kind": "slow_action",
"blocked_action_id": "2:Frame.goto",
"task_duration_ms": 4200,
"threshold_ms": 500,
"concurrent_network_load": 9,
"frame_drop_count": 0,
"worst_frame_gap_ms": 0,
"suspected_cause": "network_saturation"
}
],
"suspected_memory_leak_flag": false,
"p50_action_duration_ms": 95,
"p95_action_duration_ms": 780,
"total_frame_drop_count": 0
}
suspected_cause distinguishes a blocked main thread (main_thread_blocked — frame gaps present), a waterfall of concurrent fetches (network_saturation — ≥5 in-flight), and a navigation/hard timeout (timeout_or_navigation — duration >3 s with no other signals).
Checking what Playwright version and HAR mode a trace uses
"The trace came from an unfamiliar CI configuration. Is the response body data available?"
extract_trace_metadata_strict inspects the archive before you run any other tool:
{
"format_version": 6,
"har_mode": "embed",
"retry_sessions": [
{ "session_id": "s1", "failed": false },
{ "session_id": "s2", "failed": true }
],
"failed_session_id": "s2"
}
har_mode: "embed" means body snippets are inline. "attach" means they're in separate resource files. "omit" means headers only — correlate_dom_and_network will return empty response_body_snippet in that case.
🏗️ Architecture
trace.zip
├── *.trace → JSONL: before/after action pairs, console events, frame snapshots
├── *.network → JSONL: HAR resource-snapshot entries
└── resources/
├── page@*.jpeg → screenshots taken during the run
└── ... → fonts, stylesheets, other captured resources
The parser streams each file line-by-line (no full-buffer split) and caches results in-process with an LRU (max 50 entries), keyed by path + mtime. Re-reading the same unmodified trace costs zero I/O.
Frame snapshots store the DOM as nested arrays (["TAG", {attrs}, ...children]). The ARIA translator walks this tree and outputs compact YAML, reducing token cost by ~90% vs raw HTML.
🏗️ Stack
@modelcontextprotocol/sdk— MCP server runtimeadm-zip— zip extractionzodv4 — input schema validation- TypeScript, ESLint, Prettier, Husky, GitHub Actions CI
📋 Scripts
npm run build # compile TypeScript → dist/
npm run lint # ESLint
npm run format # Prettier --write
npm run format:check # Prettier check (used in CI)
📄 License
MIT
Servidores relacionados
Alpha Vantage MCP Server
patrocinadorAccess financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
Cursor Chat History MCP
Provides local access to Cursor chat history for AI analysis and insights, with no external services or API keys required.
Flutter Package MCP Server
A Model Context Protocol (MCP) server for Flutter packages, designed to integrate with AI assistants like Claude.
Digma
A code observability MCP enabling dynamic code analysis based on OTEL/APM data to assist in code reviews, issues identification and fix, highlighting risky code etc.
JSON MCP
MCP server empowers LLMs to interact with JSON files efficiently. With JSON MCP, you can split, merge, etc.
Hashnet MCP (Hashgraph Online Registry Broker)
MCP server for agent discovery, registration, and chat via the Hashgraph Online Registry Broker.
dr-mcp
Clean and audit messy MCP setups: stale servers, unused tools, context-heavy MCPs, outdated packages, and safe repair plans.
iOS MCP Server
An iOS mobile automation server using Appium and WebDriverAgent.
DevCycle
Turn your favourite AI tool into a feature management assistant. DevCycle's MCP works with your favourite coding assistant so you can create and monitor feature flags using natural language right in your workflow.
Claude Code MCP
Orchestrates multiple Claude Code agents across iTerm2 sessions, providing centralized management and inter-agent communication.
OpenMM MCP
AI-native crypto trading server with 13 tools for market data, order execution, grid strategies, and Cardano DeFi across multiple exchanges.