browser-harness
CDPによるブラウザの直接制御。ユーザーがウェブページの自動化、スクレイピング、テスト、操作を行いたい場合に使用します。ユーザーが既に起動しているChromeに接続します。
npx skills add https://github.com/browser-use/browser-harness --skill browser-harnessbrowser-harness
Direct browser control via CDP. You drive the user's real browser with Python helpers run through the browser-harness command.
Prerequisite (one-time — NOT part of the AI workflow)
This skill is instructions only. It assumes the browser-harness command is already on $PATH. If command -v browser-harness fails, do the one-time install in references/install.md first, then continue. Installation and browser-connection setup are a prerequisite; once browser-harness <<'PY' … PY prints page info, never run install/connection steps again as part of normal work.
Usage
browser-harness <<'PY'
new_tab("https://docs.browser-use.com")
wait_for_load()
print(page_info())
PY
- Invoke as
browser-harness— it's on$PATHafter install. Nocd, nouv run. - Use the heredoc form for every multi-line command. It prevents shell quote mangling inside Python strings and JavaScript snippets.
- First navigation is
new_tab(url), notgoto_url(url)— goto runs in the user's active tab and clobbers their work. - Helpers are pre-imported and the daemon auto-starts; you never start/stop it manually unless you want to.
What actually works
- Screenshots first.
capture_screenshot()to understand the page, find visible targets, and decide whether you need a click, a selector, or more navigation. - Clicking.
capture_screenshot()→ read the pixel off the image →click_at_xy(x, y)→capture_screenshot()to verify. Suppress the Playwright-habit reflex of "locate first, then click" — nogetBoundingClientRect, no selector hunt. Drop to DOM only when the target has no visible geometry. Hit-testing happens in Chrome's browser process, so clicks pass through iframes / shadow DOM / cross-origin without extra work. - Bulk HTTP.
http_get(url)+ThreadPoolExecutor. No browser needed for static pages. - After goto:
wait_for_load(). - Wrong/stale tab:
ensure_real_tab(). - Verification:
print(page_info())is the simplest "is this alive?" check; screenshots are the default way to verify whether a visible action worked. - DOM reads: use
js(...)for inspection/extraction when a screenshot shows coordinates are the wrong tool. - Auth wall: redirected to login → stop and ask the user. Don't type credentials from screenshots.
- Raw CDP for anything helpers don't cover:
cdp("Domain.method", params).
After every meaningful action, re-screenshot before assuming it worked.
Remote / cloud browsers
Use remote for parallel sub-agents (each gets an isolated browser via a distinct BU_NAME) or on a headless server. BROWSER_USE_API_KEY must be set.
browser-harness <<'PY'
start_remote_daemon("work") # clean cloud browser; profileName=/profileId= to reuse a logged-in profile
PY
BU_NAME=work browser-harness <<'PY'
new_tab("https://example.com")
print(page_info())
PY
start_remote_daemon prints a liveUrl so the user can watch. Running remote daemons bill until timeout.
Interaction skills (progressive disclosure)
If you struggle with a specific UI mechanic, read the matching file under ${CLAUDE_PLUGIN_ROOT}/interaction-skills/ before inventing an approach. Available: browser-wall, connection, cookies, cross-origin-iframes, dialogs, downloads, drag-and-drop, dropdowns, iframes, network-requests, print-as-pdf, profile-sync, screenshots, scrolling, shadow-dom, tabs, uploads, viewport.
Task-specific edits
For task-specific helper additions, edit ${CLAUDE_PLUGIN_ROOT}/agent-workspace/agent_helpers.py. Keep core helpers short.
Domain skills (opt-in)
Community per-site playbooks live in ${CLAUDE_PLUGIN_ROOT}/agent-workspace/domain-skills/<host>/ and are off by default. Set BH_DOMAIN_SKILLS=1 to enable them; when enabled and the task is site-specific, read every file in the matching <site>/ directory before inventing an approach.
Design constraints
- Coordinate clicks default.
Input.dispatchMouseEventgoes through iframes/shadow/cross-origin at the compositor level. - Connect to the user's running Chrome. Don't launch your own browser.
- Prefer compositor-level actions (screenshots, coordinate clicks, raw key input) over framework/DOM hacks. Reach for
interaction-skills/only when those are the wrong tool.