A browser screenshot tool to capture scrolling screenshots of webpages using Playwright, with support for intelligent section identification and multiple output formats.
Brosh is an advanced browser screenshot tool designed for developers, QA testers, AI engineers, and content creators. It excels at capturing comprehensive, scrolling screenshots of webpages using Playwright's asynchronous API.
What it does: Brosh automates the process of taking full-page or partial screenshots, including those requiring scrolling. It can capture single images, a series of images, or even animated PNGs (APNGs) of the scrolling process. Beyond pixels, Brosh intelligently extracts visible text content (as Markdown) and optionally, the underlying HTML structure of captured sections.
Who it's for:
Why it's useful:
Brosh streamlines the process of capturing web content through several key steps:
__init__.py
, cli.py
, api.py
):
CaptureConfig
object is created (defined in models.py
).browser.py
, tool.py
):
BrowserManager
determines the target browser (Chrome, Edge, Safari) based on user input or auto-detection.page
object is configured with the specified viewport dimensions and zoom.capture.py
, tool.py
):
page
navigates to the target URL.from_selector
is provided, the page scrolls to that element.capture.py
):
CaptureManager
calculates scroll positions based on viewport height and scroll_step
.DOMProcessor
(from texthtml.py
) is invoked to extract:
active_selector
for the visible content.visible_text
(converted to Markdown).visible_html
(minified).CaptureFrame
object.image.py
, tool.py
):
ImageProcessor
takes the captured raw image bytes (PNGs from Playwright).scale
is specified, images are downsampled.PNG
(default): Images are optimized using pyoxipng
.JPG
: Images are converted from PNG to JPG, handling transparency.APNG
: All captured frames are compiled into an animated PNG.tool.py
):
BrowserScreenshotTool
orchestrates the saving of processed images to disk.active_selector
or headers).mcp.py
):
brosh mcp
or brosh-mcp
), a FastMCP
server starts.see_webpage
tool that AI agents can call.capture_webpage_async
API function.MCPToolResult
model, potentially including base64-encoded images and/or text/HTML, optimized for AI consumption (e.g., default smaller image scale, text trimming).This modular design allows for flexibility and robust error handling at each stage.
uv is a fast Python package manager.
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Run brosh directly with uvx (no installation needed)
uvx brosh shot "https://example.com"
# Or install globally as a command-line tool
uv tool install brosh
# Install with all optional dependencies (for development, testing, docs)
uv tool install "brosh[all]"
# Basic installation
python -m pip install brosh
# With all optional dependencies
python -m pip install "brosh[all]"
pipx installs Python applications in isolated environments.
# Install pipx (if not already installed)
python -m pip install --user pipx
python -m pipx ensurepath
# Install brosh
pipx install brosh
For development or to get the latest changes:
git clone https://github.com/twardoch/brosh.git
cd brosh
python -m pip install -e ".[all]" # Editable install with all extras
After installing the brosh
package, you need to install the browser drivers required by Playwright:
playwright install
# To install specific browsers, e.g., only Chromium:
# playwright install chromium
This command downloads the browser binaries (Chromium, Firefox, WebKit) that Playwright will use. Brosh primarily targets Chrome, Edge (Chromium-based), and Safari (WebKit-based).
# Capture a single webpage (e.g., example.com)
brosh shot "https://example.com"
# Capture a local HTML file
brosh shot "file:///path/to/your/local/file.html"
# For potentially better performance with multiple captures,
# start the browser in debug mode first (recommended for Chrome/Edge)
brosh --app chrome run
# Then, in the same or different terminal:
brosh --app chrome shot "https://example.com"
# When finished:
brosh --app chrome quit
# Create an animated PNG showing the scroll
brosh shot "https://example.com" --output_format apng
# Capture with a custom viewport size (e.g., common desktop)
brosh --width 1920 --height 1080 shot "https://example.com"
# Extract HTML content along with screenshots and output as JSON
brosh shot "https://example.com" --fetch_html --json > page_content.json
Brosh uses a fire
-based CLI. Global options are set before the command.
Pattern: brosh [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS]
Example:
brosh --width 1280 --height 720 shot "https://example.com" --scroll_step 80
See Command Reference for all options.
Brosh offers both synchronous and asynchronous APIs.
import asyncio
from brosh import (
capture_webpage,
capture_webpage_async,
capture_full_page,
capture_visible_area,
capture_animation
)
from brosh.models import ImageFormat # For specifying image formats
# --- Synchronous API ---
# Best for simple scripts or CLI usage.
# It internally manages an asyncio event loop if needed.
def capture_sync_example():
print("Running synchronous capture...")
result = capture_webpage(
url="https://example.com",
width=1280,
height=720,
scroll_step=100,
output_format=ImageFormat.JPG, # Use the enum
scale=75, # Scale images to 75%
fetch_text=True
)
print(f"Captured {len(result)} screenshots synchronously.")
for path, metadata in result.items():
print(f" - Saved to: {path}")
if metadata.get("text"):
print(f" Text preview: {metadata['text'][:80]}...")
return result
# --- Asynchronous API ---
# Ideal for integration into async applications (e.g., web servers, MCP).
async def capture_async_example():
print("\nRunning asynchronous capture...")
result = await capture_webpage_async(
url="https://docs.python.org/3/",
fetch_html=True, # Get HTML content
max_frames=3, # Limit to 3 frames
from_selector="#getting-started", # Start capturing from this element
output_format=ImageFormat.PNG
)
print(f"Captured {len(result)} screenshots asynchronously.")
for path, metadata in result.items():
print(f" - Saved to: {path}")
print(f" Selector: {metadata.get('selector', 'N/A')}")
if metadata.get("html"):
print(f" HTML preview: {metadata['html'][:100]}...")
return result
# --- Convenience Functions (use sync API by default) ---
def convenience_functions_example():
print("\nRunning convenience functions...")
# Capture entire page in one go (sets height=-1, max_frames=1)
# Note: This may not work well for infinitely scrolling pages.
full_page_result = capture_full_page(
"https://www.python.org/psf/",
output_format=ImageFormat.PNG,
scale=50
)
print(f"Full page capture result: {list(full_page_result.keys())}")
# Capture only the initial visible viewport (sets max_frames=1)
visible_area_result = capture_visible_area(
"https://www.djangoproject.com/",
width=800, height=600
)
print(f"Visible area capture result: {list(visible_area_result.keys())}")
# Create an animated PNG of the scrolling process
animation_result = capture_animation(
"https://playwright.dev/",
anim_spf=0.8, # 0.8 seconds per frame
max_frames=5 # Limit animation to 5 frames
)
print(f"Animation capture result: {list(animation_result.keys())}")
# --- Running the examples ---
if __name__ == "__main__":
sync_results = capture_sync_example()
# To run the async example:
# asyncio.run(capture_async_example())
# Note: If capture_sync_example() was called, it might have closed
# the default event loop on some systems/Python versions if it created one.
# For robust mixed sync/async, manage loops carefully or run separately.
# For this example, we'll re-get/set a loop for the async part if needed.
try:
loop = asyncio.get_running_loop()
except RuntimeError:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
async_results = loop.run_until_complete(capture_async_example())
convenience_functions_example()
print("\nAll examples complete.")
Brosh can run as an MCP (Model Context Protocol) server, allowing AI tools like Claude to request web captures.
# Start the MCP server using the dedicated command:
brosh-mcp
# Or, using the main brosh command:
brosh mcp
This will start a server that listens for requests from MCP clients.
Configuring with Claude Desktop:
Locate your Claude Desktop configuration file:
~/Library/Application Support/Claude/claude_desktop_config.json
%APPDATA%\Claude\claude_desktop_config.json
~/.config/Claude/claude_desktop_config.json
Add or modify the mcpServers
section. Using uvx
is recommended if uv
is installed:
{
"mcpServers": {
"brosh": {
"command": "uvx",
"args": ["brosh-mcp"],
"env": {
"FASTMCP_LOG_LEVEL": "INFO" // Optional: control logging
}
}
}
}
Alternative if uvx
is not used or preferred:
You can use the direct path to brosh-mcp
or invoke it via python -m brosh mcp
.
To find the path to brosh-mcp
:
which brosh-mcp # Unix-like systems
# or
python -c "import shutil; print(shutil.which('brosh-mcp'))"
Then update claude_desktop_config.json
:
{
"mcpServers": {
"brosh": {
"command": "/full/path/to/your/python/bin/brosh-mcp", // Replace with actual path
"args": []
}
}
}
Or using python -m
:
{
"mcpServers": {
"brosh": {
"command": "python", // or /path/to/specific/python
"args": ["-m", "brosh", "mcp"]
}
}
}
After configuration, (re)start Claude Desktop. You can then ask Claude to use Brosh, e.g., "Using the brosh tool, capture a screenshot of example.com and show me the text."
MCP Tool Parameters:
The see_webpage
tool exposed via MCP accepts parameters similar to capture_webpage_async
, including url
, zoom
, width
, height
, scroll_step
, scale
(defaults to 50 for MCP), fetch_html
, fetch_text
, fetch_image
, fetch_image_path
, etc.
These options are specified before the command (e.g., brosh --width 800 shot ...
).
Option | Type | Default | Description |
---|---|---|---|
--app | str | (auto-detect) | Browser: chrome , edge , safari . Auto-detects installed browser if empty. |
--width | int | (screen width) | Viewport width in pixels. 0 for default screen width. |
--height | int | (screen height) | Viewport height in pixels. 0 for default screen height, -1 for full page. |
--zoom | int | 100 | Browser zoom level in % (range: 10-500). |
--output_dir | str | (user pictures) | Directory to save screenshots (defaults to ~/Pictures/brosh ). |
--subdirs | bool | False | Create domain-based subdirectories within output_dir . |
--verbose | bool | False | Enable verbose debug logging. |
--json | bool | False | Output results from shot command as JSON to stdout (CLI only). |
run
Starts the specified browser in remote debug mode. This can improve performance for multiple shot
commands as the browser doesn't need to be re-initialized each time.
Usage: brosh [GLOBAL_OPTIONS] run [--force_run]
Options:
Option | Type | Default | Description |
---|---|---|---|
--force_run | bool | False | Force restart browser even if one seems active. |
quit
Quits the browser previously started with run
.
Usage: brosh [GLOBAL_OPTIONS] quit
shot
Captures screenshots of the given URL.
Usage: brosh [GLOBAL_OPTIONS] shot URL [SHOT_OPTIONS]
Required Argument:
URL
(str): The URL of the webpage to capture. Can be http://
, https://
, or file:///
.Shot Options:
Option | Type | Default | Description |
---|---|---|---|
--scroll_step | int | 100 | Scroll step as % of viewport height (range: 10-200). |
--scale | int | 100 | Scale output images by % (range: 10-200). 50 means 50% of original size. |
--output_format | str | png | Output format: png , jpg , apng . |
--anim_spf | float | 0.5 | Seconds per frame for APNG animations (range: 0.1-10.0). |
--fetch_html | bool | False | Include minified HTML of visible elements in the output metadata. |
--fetch_image | bool | False | (MCP context) Include base64 image data in MCP output. |
--fetch_image_path | bool | True | (MCP context) Include image file path in MCP output. |
--fetch_text | bool | True | Include extracted Markdown text from visible elements in output metadata. |
--trim_text | bool | True | Trim extracted text to ~200 characters in output metadata. |
--max_frames | int | 0 | Maximum number of frames/screenshots to capture. 0 for unlimited (full scroll). |
--from_selector | str | "" | CSS selector of an element to scroll to before starting capture. |
mcp
Runs Brosh as an MCP (Model Context Protocol) server.
Usage: brosh [GLOBAL_OPTIONS] mcp
This command is also available as a dedicated script brosh-mcp
.
Screenshots are saved with a descriptive filename pattern:
{domain}-{timestamp}-{scroll_percentage}-{section_id}.{format}
Example: github_com-230715-103000-00500-readme_md.png
github_com
: Domain name (underscores replace dots).230715-103000
: Timestamp (YYMMDD-HHMMSS, UTC).00500
: Scroll position as percentage of total page height, times 100 (e.g., 00500 means 50.00%). Padded to 5 digits.readme_md
: A semantic identifier for the currently visible section, often derived from the most prominent header or element ID in view..png
: File extension based on the chosen output_format
.png
(default): Lossless compression, excellent quality. Optimized with pyoxipng
.jpg
: Lossy compression, smaller file sizes. Good for photographic content or when space is critical.apng
: Animated Portable Network Graphics. Creates an animation from all captured frames, showing the scrolling sequence.When using the --json
flag with the shot
command (or when using the Python API), Brosh returns a dictionary. The keys are the absolute file paths of the saved screenshots, and the values are dictionaries containing metadata for each screenshot.
Structure:
{
"/path/to/output/domain-ts-scroll-section.png": {
"selector": "css_selector_for_main_content_block",
"text": "Extracted Markdown text from the visible part of the page...",
"html": "<!DOCTYPE html><html>...</html>" // Only if --fetch_html is true
},
// ... more entries for other screenshots
}
Metadata Fields:
selector
(str): A CSS selector identifying the most relevant content block visible in that frame (e.g., main
, article#content
, div.product-details
). Defaults to body
if no more specific selector is found.text
(str): Markdown representation of the text content visible in the frame. Extracted using html2text
. Included if fetch_text
is true (default). Can be trimmed if trim_text
is true (default).html
(str, optional): Minified HTML of the elements fully visible in the frame. Included only if fetch_html
is true.For apng
format, the JSON output will typically contain a single entry for the animation file, with metadata like {"selector": "animated", "text": "Animation with N frames", "frames": N}
.
Brosh is built as a modular Python package. Understanding its architecture can help with advanced usage, customization, and contributions.
The primary logic flows through these key modules in src/brosh/
:
__main__.py
& cli.py
:
__main__.py
is the entry point for the brosh
CLI command.cli.py
defines BrowserScreenshotCLI
using python-fire
. It parses CLI arguments, initializes common settings, and maps commands (run
, quit
, shot
, mcp
) to their respective methods. These methods often delegate to api.py
.api.py
:
capture_webpage
(synchronous wrapper) and capture_webpage_async
(core asynchronous logic).CaptureConfig
object (from models.py
), and then instantiates and uses BrowserScreenshotTool
.capture_full_page
are also defined here.tool.py
:
BrowserScreenshotTool
, the main orchestrator.capture
method (async) manages the entire screenshot process:
BrowserManager
to get a browser page
.CaptureManager
to get CaptureFrame
objects.ImageProcessor
, saving files).browser.py
:
BrowserManager
handles browser interactions:
connect_over_cdp
(for Chromium-based browsers) or launch
.capture.py
:
CaptureManager
is responsible for the actual screenshotting logic on a given Playwright page
:
from_selector
to scroll to a starting point.scroll_step
and viewport height.page.evaluate("window.scrollTo(...)")
).page.screenshot()
).DOMProcessor
to get visible HTML, text, and active selector.CaptureFrame
dataclass instance.texthtml.py
:
DOMProcessor
extracts content from the browser's DOM:
extract_visible_content()
: Executes JavaScript in the page to get fully visible elements' HTML, then converts it to Markdown text using html2text
. Also determines an active_selector
.get_section_id()
: Executes JavaScript to find a semantic ID for the current view (e.g., from a nearby header).compress_html()
: Minifies HTML by removing comments, excessive whitespace, large data URIs, etc.image.py
:
ImageProcessor
performs image manipulations in memory using Pillow and pyoxipng
:
optimize_png_bytes()
: Optimizes PNGs.downsample_png_bytes()
: Resizes images.convert_png_to_jpg_bytes()
: Converts PNG to JPG, handling transparency.create_apng_bytes()
: Assembles multiple PNG frames into an animated PNG.models.py
:
ImageFormat
(Enum: PNG, JPG, APNG) with properties for MIME type and extension.CaptureConfig
(Dataclass): Holds all settings for a capture job, including validation logic.CaptureFrame
(Dataclass): Represents a single captured frame's data (image bytes, scroll position, text, HTML, etc.).MCPTextContent
, MCPImageContent
, MCPToolResult
(Pydantic Models): Define the structure for data exchange in MCP mode.mcp.py
:
FastMCP
.see_webpage
that mirrors api.capture_webpage_async
's signature.MCPToolResult
(defined in models.py
), handling fetch_image
, fetch_image_path
, fetch_text
, fetch_html
flags to tailor the output for AI agents.brosh-mcp
script entry point is also here.As outlined in CLAUDE.md
, Brosh's functionality can be grouped into three core domains:
capture.py
, texthtml.py
)DOMProcessor.get_section_id()
analyzes the DOM for prominent headers (<h1>
-<h6>
) or elements with IDs near the top of the viewport. This generates a human-readable identifier used in filenames (e.g., introduction
, installation-steps
).DOMProcessor.extract_visible_content()
identifies the most encompassing, fully visible elements to determine the active_selector
and extract their HTML and text. This helps in associating screenshots with specific content blocks.CaptureManager
progressively scrolls the page. The scroll_step
(percentage of viewport height) allows for overlapping captures if less than 100%, ensuring no content is missed.document.documentElement.scrollHeight
).SCROLL_AND_CONTENT_WAIT_SECONDS
) allow for dynamically loading or expanding content to render before capture.browser.py
)BrowserManager.get_browser_name()
uses a priority system (Chrome > Edge > Safari on macOS) for auto-detection.BrowserManager.is_browser_available()
and get_browser_paths()
check for browser installations in common locations across OSes.BrowserManager.get_screen_dimensions()
attempts to get logical screen resolution, accounting for Retina displays on macOS (by checking physical resolution from system_profiler
). Falls back to defaults if detection fails.--remote-debugging-port
) on specific ports (Chrome: 9222, Edge: 9223). This allows connection to the user's actual browser profile. WebKit (Safari) uses a different launch mechanism.mcp.py
)see_webpage
function acts as the tool interface for AI systems (like Claude) via FastMCP
.fetch_image=True
).fetch_image_path=True
).fetch_text=True
).fetch_html=True
).selector
field in metadata links screenshots and extracted text/HTML to a specific part of the DOM structure.--subdirs
).scroll_step
(10-200% of viewport height).scroll_step
calculation is based on viewport percentage.models.py
)The use of Pydantic models and dataclasses ensures type safety, validation, and clear data structures throughout the application. CaptureConfig
centralizes all job parameters, CaptureFrame
standardizes per-frame data, and MCP models ensure compliant communication with AI tools.
We welcome contributions to Brosh! Please follow these guidelines to ensure a smooth process.
git clone https://github.com/twardoch/brosh.git
cd brosh
uv
for faster environment setup:
uv venv # Create a virtual environment
source .venv/bin/activate # Or .venv\Scripts\activate on Windows
uv pip install -e ".[all]"
Alternatively, using pip
:
python -m venv .venv
source .venv/bin/activate # Or .venv\Scripts\activate on Windows
python -m pip install -e ".[all]"
pre-commit install
Brosh uses pytest
for testing.
# Run all tests
pytest
# Run tests with coverage report
pytest --cov=src/brosh --cov-report=term-missing
# Run tests in parallel (if you have pytest-xdist, included in [all])
pytest -n auto
# Run specific test file or test function
pytest tests/test_api.py
pytest tests/test_cli.py::TestBrowserScreenshotCLI::test_shot_basic
Ensure your changes pass all tests and maintain or increase test coverage. Test configuration is in pyproject.toml
under [tool.pytest.ini_options]
.
This project uses Ruff
for linting and formatting, and mypy
for type checking. Pre-commit hooks should handle most of this automatically.
ruff format src tests
ruff check --fix --unsafe-fixes src tests
mypy src
Configuration for these tools is in pyproject.toml
.
Adhere to these principles when developing:
README.md
, CHANGELOG.md
, TODO.md
) in mind.list
, dict
, |
for unions).loguru
for logging. Add verbose
mode logging and debug-log
where appropriate.python-fire
) Standard shebangs and this_file
comments are good practice.AGENT.md
):
While pre-commit
handles much of this, be aware of the typical full check sequence:
# This is a conceptual guide; pre-commit and hatch scripts automate parts of this.
# uzpy run . # (If used, uzpy seems to be a project-specific runner)
fd -e py -x autoflake --remove-all-unused-imports -ir . # Remove unused imports
fd -e py -x pyupgrade --py310-plus . # Upgrade syntax
fd -e py -x ruff check --fix --unsafe-fixes . # Ruff lint and fix
fd -e py -x ruff format . # Ruff format
python -m pytest # Run tests
(Note: fd
is a command-line tool. If not installed, adapt Ruff/Autoflake commands to scan src
and tests
.)
Hatch environments (hatch run lint:all
, hatch run test:test-cov
) simplify running these checks.feature/new-output-option
or fix/timeout-issue
).main
branch of twardoch/brosh
.
AGENT.md
: Contains detailed working principles, Python guidelines, and tool usage instructions relevant to AI-assisted development for this project.CLAUDE.md
: Provides an overview of Brosh's architecture, commands, and development notes, particularly useful for understanding the system's design.pyproject.toml
: Defines dependencies, build system, and tool configurations (Ruff, Mypy, Pytest, Hatch).(This section is largely the same as the original README, as it was already comprehensive. Minor updates for consistency.)
Error: "Could not find chrome installation" or similar. Solution:
brosh --app edge shot "https://example.com"
playwright install
to ensure Playwright's browser binaries are correctly installed.Error: "Failed to connect to browser", "TimeoutError", or browser doesn't launch as expected. Solution:
brosh --app chrome run
. Then, in another terminal, run your shot
command.--force_run
option with run
can help.sudo apt-get install -y libgbm-dev libnss3 libxss1 libasound2 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdbus-1-3 libdrm2 libexpat1 libgbm1 libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxtst6
.Error: "Screenshot timeout for position X" or page content doesn't load. Solution:
brosh shot "..." --output_format jpg
.Error: "Permission denied" when saving screenshots. Solution:
output_dir
.brosh --output_dir /tmp/screenshots shot "..."
Enable verbose logging for detailed troubleshooting:
brosh --verbose shot "https://example.com"
This will print debug messages from loguru
to stderr.
uvx
or pipx
in environments like Git Bash or WSL, ensure paths are correctly resolved.This project is licensed under the MIT License - see the LICENSE file for details.
pyproject.toml
.Leverage Notte Web AI agents & cloud browser sessions for scalable browser automation & scraping workflows
Fetch and extract web content using a Playwright headless browser, with support for intelligent extraction and flexible output.
An MCP server using Playwright for browser automation and webscrapping
A MCP server to retrieve up-to-date jobs from company career sites.
Fetches content from deepwiki.com and converts it into LLM-readable markdown.
Fast, token-efficient web content extraction that converts websites to clean Markdown. Features Mozilla Readability, smart caching, polite crawling with robots.txt support, and concurrent fetching with minimal dependencies.
Fetch Bilibili video comments in bulk, including nested replies. Requires a Bilibili cookie for authentication.
A MCP server that provides comprehensive website snapshot capabilities using Playwright. This server enables LLMs to capture and analyze web pages through structured accessibility snapshots, network monitoring, and console message collection.
Fetch YouTube subtitles
High-quality screenshot capture optimized for Claude Vision API. Automatically tiles full pages into 1072x1072 chunks (1.15 megapixels) with configurable viewports and wait strategies for dynamic content.