Chromewright MCP Server

通过Chrome DevTools协议实现浏览器自动化

文档

chromewright

Crates.io Version Build Status License

Chromewright is a local-first browser automation MCP server built on Chrome DevTools Protocol (CDP). It can attach to an existing Chrome or Chromium session or launch its own browser, then expose a bounded, agent-oriented tool surface for navigation, reading, tab management, and interaction without a Node.js runtime.

It is built for the moment when an MCP client needs a real browser, but not an unbounded automation stack. Under the hood, Chromewright combines CDP session control, revision-scoped DOM extraction, cursor-based targeting, and consistent tool-result metadata. It is not a general-purpose end-to-end test runner. It is a browser control layer for AI agents.

What To Use Chromewright For

Use Chromewright when you need browser-aware automation with a stable high-level surface instead of handwritten CDP calls.

  • attaching to a running Chrome or Chromium session or launching a disposable browser
  • exposing a real browser to MCP clients over streamable HTTP or stdio
  • reading pages through snapshot, inspect_node, get_markdown, extract, and read_links
  • driving bounded interactions through navigate, click, input, select, hover, press_key, scroll, wait, and the tab tools
  • targeting follow-up actions with revision-scoped cursor or node_ref handles instead of relying only on fragile selectors

Installation

Cargo

Install the binary from crates.io:

cargo install chromewright

Building from source or installing with Cargo requires Rust 1.88 or newer.

Homebrew

brew install bnomei/chromewright/chromewright

GitHub Releases

Download a prebuilt archive or source package from GitHub Releases, extract it, and place chromewright on your PATH.

From source

The published package, binary, and repository are named chromewright.

git clone https://github.com/bnomei/chromewright.git
cd chromewright
cargo install --path .

If you only want a local release build:

cargo build --release

Quickstart

1) Prepare Chrome, Chromium or Obscura

The default attach mode expects a browser exposing DevTools on http://127.0.0.1:9222.

Recommended macOS launch command for a dedicated visible Chrome profile:

open -na "Google Chrome" --args \
  --remote-debugging-port=9222 \
  --user-data-dir="$HOME/.chromewright-agent-profile"

Use a dedicated profile when you do not want agent automation attached to your personal browsing session. If you prefer Chromewright to launch its own browser instead, skip this and pass any launch-mode flag in the next step. Launch mode is headed by default; add --headless only when you want a hidden browser.

Or use Obscura instead of Chrome.

2) Start Chromewright

cargo run --bin chromewright

This default mode attaches to Chrome on 127.0.0.1:9222 and serves MCP over stdio.

Other common startup modes:

# release build, same defaults
cargo run --release --bin chromewright

# serve streamable HTTP on 127.0.0.1:3000/mcp
cargo run --bin chromewright -- serve

# connect to a different DevTools endpoint
cargo run --bin chromewright -- \
  --ws-endpoint http://127.0.0.1:9333

# launch a new visible browser instead of attaching to an existing one
cargo run --bin chromewright -- \
  --user-data-dir /tmp/chromewright-profile

# launch a new visible browser and serve streamable HTTP
cargo run --bin chromewright -- \
  --user-data-dir /tmp/chromewright-profile serve

# launch headless instead of headed
cargo run --bin chromewright -- \
  --headless --user-data-dir /tmp/chromewright-profile

3) Add Chromewright to your MCP client

Codex

Recommended stdio configuration:

[mcp_servers.chromewright]
command = "/absolute/path/to/chromewright"
enabled = true

If you want a long-lived loopback HTTP service instead, start chromewright serve separately and point Codex at the running endpoint:

[mcp_servers.chromewright]
url = "http://127.0.0.1:3000/mcp"
enabled = true

If you need a non-default attach target, add --ws-endpoint explicitly. If you want Chromewright to launch its own browser from the client command, add a launch-mode flag such as --user-data-dir /tmp/chromewright-profile, and add --headless only when you do not want a visible browser.

Other JSON-configured clients

{
  "mcpServers": {
    "chromewright": {
      "transport": "streamable_http",
      "url": "http://127.0.0.1:3000/mcp"
    }
  }
}

The exact file name and field names vary by client. The important part is that the client connects to a running Chromewright service at that URL.

Local Browser Smoke Checks

Browser launch and attach smoke checks are intended to be run manually on a maintainer workstation, not as a required Linux CI gate. The CI workflow covers Rust formatting, clippy, MSRV, cargo check, tests, and packaging without requiring a Linux browser launch or DevTools attach target.

To run the focused browser smoke suite locally from the repository root:

scripts/browser-smoke.sh

The script runs:

cargo test --test browser_smoke -- --nocapture

For attach-mode experiments, start Chrome or Chromium with DevTools enabled as shown in the Quickstart section, then run Chromewright against that endpoint. macOS is the primary local target for visible browser smoke checks; Linux browser launch or attach is optional local validation rather than a pull-request requirement.

How Chromewright Uses Your Browser

  • attach mode connects to an existing Chrome or Chromium session and can see the tabs, cookies, and authenticated state already present in that profile
  • launch mode starts a dedicated browser session and tracks the tabs created under that session
  • in attach mode, close defaults to session-managed cleanup and close_tab requires confirm_destructive = true before closing an unmanaged active tab
  • most high-level tools read and interact through CDP only; screenshot is the bounded exception and stores a managed PNG artifact for the caller
  • screenshot is part of the default surface and uses mode, optional tab_id, optional target, and region instead of caller-chosen path or confirm_unsafe

Use Cases

Standard MCP browser automation

Once Chromewright is running, the normal workflow is:

  1. Use new_tab or tab_list to establish an active tab. On a fresh session with no active tab, do not call snapshot first.
  2. Use snapshot to get document metadata plus actionable nodes. mode = "viewport" is the default local reread, mode = "delta" reuses the prior session base when available, and mode = "full" keeps the exhaustive escape hatch. Inline [index=...] markers only appear for nodes that still expose a public follow-up handle in that returned scope.
  3. Use inspect_node for targeted bounded reads, including selector-based inspection of non-actionable nodes such as headings, images, and overlays. Prefer cursor when one is available; stale cursors may selector-rebound, and a successful inspection may still legitimately return cursor = null with a selector-only target.
  4. Use screenshot when you need a managed PNG artifact. mode = "viewport" is the default, mode = "full_page" captures the whole page, mode = "element" requires target, and mode = "region" requires region. scale = "device" preserves raw device pixels by default, while scale = "css" normalizes output dimensions to CSS pixels. Pass tab_id when the capture should target a specific tab without activating it first.
  5. Use set_viewport when you need responsive breakpoint simulation on the active tab or a specific tab_id; successful calls return canonical viewport_metrics_after, and later snapshot calls surface the live metrics again at scope.viewport.
  6. Use click, input, select, hover, press_key, scroll, wait, or the tab tools with cursor preferred for follow-up targeting inside a page and stable tab_id preferred for multi-tab flows.
  7. Refresh snapshot after revision-changing actions. cursor and node_ref are revision-scoped, so rereads are the normal recovery path.

Workflow Conventions

  • Fresh sessions: use new_tab or tab_list before snapshot if you do not already have an active tab.
  • Revision-scoped targets: cursor and node_ref belong to a specific document revision. After navigation or DOM-changing actions, rerun snapshot; stale cursor replay may selector-rebound, but treat rebound as a signal to reread before more precise chained work.
  • Snapshot modes: default viewport is the fast local reread, delta reports the changed local surface when a compatible prior base exists and falls back to viewport when it does not, and full keeps the exhaustive page-wide tree for deep inspection or regression work.
  • Viewport locality: viewport and delta now demote unchanged sticky/fixed header or footer chrome when stronger local anchors are present. If persistent chrome still wins because nothing stronger exists, scope.locality_fallback_reason explains that fallback.
  • Snapshot inline handles: rendered [index=...] markers follow the same revision scope as the exposed actionable nodes and only advertise follow-up-capable nodes in that returned scope; use them as reread-local hints, not as durable cross-revision IDs.
  • target_status = same: the tool still proved the same target, even if the post-action handle downgraded to selector-only because actionability disappeared.
  • target_status = rebound: the tool recovered after a revision change; target_after may downgrade to selector-only when the same element still exists but no longer has a verified actionable handle, so reread with snapshot before more precise chained work.
  • target_status = detached: the old target no longer exists, often after navigation; reacquire state from the new page before continuing.
  • target_status = unknown: post-action identity stayed ambiguous, usually because multiple matches remained or the selector could not prove the same element.
  • Attach-mode recovery: if a connected session returns code = attach_page_target_lost, use tab_list to confirm inventory, switch_tab to reacquire an active page target, and reconnect the session if DOM-backed tools still fail.
  • Attach-mode safety: use a disposable browser profile for debugging and treat destructive tab tools as explicit actions, especially on connected sessions.

Chromewright also carries a few small but important contract details:

  • DOM-targeted tools take one public target object: { "kind": "selector", "selector": "..." } or { "kind": "cursor", "cursor": ... }.
  • Canonical target examples:
    • inspect_node: { "target": { "kind": "selector", "selector": "h1" } }
    • click: { "target": { "kind": "cursor", "cursor": <snapshot cursor> } }
  • screenshot is part of the default surface and uses mode plus optional scale instead of legacy full_page = true; successful calls return managed artifact metadata including artifact_uri, artifact_path, mime_type, byte_count, image dimensions, CSS dimensions, DPR metadata, revealed_from_offscreen, and optional clip. Screenshot CSS limits apply to css_width/css_height and area before output scaling; scale = "device" keeps physical HiDPI pixels using the current device pixel ratio, while scale = "css" normalizes output dimensions back to CSS pixels. The PNG byte cap is a separate final artifact-size guard.
  • set_viewport is part of the default surface and uses CDP emulation instead of resizing the OS window; width and height must be positive bounded CSS pixels with a practical default cap of 10,000 CSS pixels per dimension, device_scale_factor must be greater than zero, orientation uses snake_case values such as portrait_primary and landscape_primary, and reset = true only accepts tab_id. For unusual large-canvas or regression-capture workloads, pass allow_large_viewport = true intentionally to raise the viewport cap to 10,000,000 CSS pixels per dimension; screenshot CSS dimensions, physical HiDPI output size, and PNG byte limits still apply independently. Read the live scoped metrics back from viewport_metrics_after or snapshot.scope.viewport; viewport_after remains a compatibility alias.
  • switch_tab accepts stable tab_id only on the public MCP surface.
  • Structured tool-local failures use one top-level family: code, error, optional document, optional target, optional recovery, and optional details.
  • extract uses code = element_not_found for selector misses and reserves code = invalid_extract_payload for malformed extraction results.
  • read_links returns both the raw href attribute and an absolute resolved_url.

Default Tool Surface

The default Chromewright MCP server exposes 23 high-level and operator tools:

  • navigation: navigate, go_back, go_forward, wait
  • interaction and viewport: click, input, select, hover, press_key, scroll, set_viewport
  • tabs and lifecycle: new_tab, tab_list, switch_tab, close_tab, close
  • reading and inspection: snapshot, inspect_node, get_markdown, extract, read_links
  • managed artifacts: screenshot
  • operator diagnostics: evaluate

The raw-JavaScript operator tool evaluate is part of the normal production MCP surface because it is useful for diagnostics and escape-hatch inspection when bounded tools such as inspect_node, extract, or get_markdown cannot answer a page-specific question. Chromewright does not use a separate global enable flag for operator tools; instead, higher-risk tools keep their guardrails in their own input contracts.

High-level action tools return compact follow-up metadata by default. Use snapshot when you need the scoped YAML snapshot plus actionable-node list, with viewport as the default, delta for session-local changes, and full for exhaustive rereads. For targeted reads, use snapshot to choose a node and reuse its cursor, then call inspect_node; when you need to inspect a non-actionable DOM node such as a heading or image, inspect_node also accepts selector-based reads with an optional cursor, and stale cursor replay may selector-rebound before the final target settles. After revision-changing actions, rerun snapshot before more precise target reuse. Public DOM follow-up calls should use target.kind = "cursor" whenever a fresh cursor is available and fall back to target.kind = "selector" when only selector continuity remains.

Use set_viewport before snapshot when you want the DOM reread scoped to a simulated breakpoint. Successful set_viewport responses include viewport_metrics_after, and snapshot rereads expose the same live metrics under scope.viewport without widening unrelated tool outputs. scroll reports canonical scroll position under scroll_after; legacy viewport_after aliases remain for compatibility.

Use screenshot when you need a bounded visual artifact from the browser. The public contract is mode plus optional scale, tab_id, target, and region; callers do not provide path, full_page, or confirm_unsafe. Successful results include artifact_uri, artifact_path, mime_type, byte_count, width, height, css_width, css_height, device_pixel_ratio, pixel_scale, revealed_from_offscreen, and optional clip.

Read-oriented tools are intentionally distinct: get_markdown is the broad reading tool, extract is for targeted text or HTML, and read_links is for link inventory and planning. For multi-tab work, prefer stable tab_id handles from tab_list, new_tab, switch_tab, and close_tab instead of relying only on tab indices.

Operation Metrics

Finished tool results include operation_metrics metadata when a tool records non-zero metrics. Agents must treat operation_metrics.output_bytes as optional: it is present only when a tool path measures the exact serialized output size, and it is omitted when exact sizing was not measured. Measured hot paths add the relevant counters below:

  • browser evaluation count
  • poll iterations
  • DOM extraction count and extraction time
  • snapshot render time
  • handoff rebuild count and time
  • optional serialized output size (output_bytes)

The lightweight validation surface for these metrics is in the normal test suite:

cargo test --locked --all-features operation_metrics

Safety And Boundaries

  • Chromewright drives a real Chrome or Chromium instance through CDP. In attach mode, it sees the tabs, cookies, and authenticated state of the browser profile you give it.
  • Use a dedicated browser profile for agent work when you do not want automation attached to your personal session.
  • The normal tool surface includes the raw-JavaScript operator tool evaluate; callers must pass confirm_unsafe = true for each invocation because it executes arbitrary JavaScript in the active page. screenshot remains part of the bounded default surface and returns managed artifact metadata.
  • screenshot does not accept caller-chosen output paths or confirm_unsafe; use mode = "full_page" instead of a legacy full_page = true flag, and use scale = "css" only when you want CSS-pixel-normalized output instead of raw device pixels.
  • navigate and new_tab reject unsafe schemes such as data: and file: unless the caller passes allow_unsafe = true on that specific request. go_back and go_forward apply the same gate: if a history move lands on a destination outside the default-safe http:, https:, or about: schemes, the move is reverted and the call is rejected unless that request also sets allow_unsafe = true.
  • Destructive tab lifecycle operations use per-tool confirmation fields: close_tab requires confirm_destructive = true before closing an unmanaged active tab in a connected session, and close requires the same confirmation before expanding connected-session cleanup from managed tabs to all tabs.
  • cursor and node_ref targets are revision-scoped. After a DOM-changing action, stale cursor replay may selector-rebound, but precise follow-up work should still be refreshed from a new snapshot.

License

MIT