firecrawl-scrape

от firecrawl

Извлекает чистый Markdown из любого URL, включая одностраничные приложения, отображаемые с помощью JavaScript. Обрабатывает как статические страницы, так и JS-рендеренные SPA с настраиваемым временем ожидания для рендеринга. Поддерживает одновременный сбор данных с нескольких URL с опциями формата вывода, включая Markdown, HTML, ссылки и скриншоты. Включает опции фильтрации контента, такие как режим только основного содержимого для удаления навигации и подвалов, а также включение/исключение тегов. Опциональный встроенный ответ на вопросы через флаг --query для целевого...

npx skills add https://github.com/firecrawl/cli --skill firecrawl-scrape

Скачать ZIP GitHub

489

firecrawl scrape

Scrape one or more URLs. Returns clean, LLM-optimized markdown. Multiple URLs are scraped concurrently.

When to use

You have a specific URL and want its content
The page is static or JS-rendered (SPA)
Step 2 in the workflow escalation pattern: search → scrape → map → crawl → interact

Quick start

# Basic markdown extraction
firecrawl scrape "<url>" -o .firecrawl/page.md

# Main content only, no nav/footer
firecrawl scrape "<url>" --only-main-content -o .firecrawl/page.md

# Wait for JS to render, then scrape
firecrawl scrape "<url>" --wait-for 3000 -o .firecrawl/page.md

# Multiple URLs (each saved to .firecrawl/)
firecrawl scrape https://example.com https://example.com/blog https://example.com/docs

# Get markdown and links together
firecrawl scrape "<url>" --format markdown,links -o .firecrawl/page.json

# Ask a question about the page
firecrawl scrape "https://example.com/pricing" --query "What is the enterprise plan price?"

Options

Option	Description
`-f, --format <formats>`	Output formats: markdown, html, rawHtml, links, screenshot, json
`-Q, --query <prompt>`	Ask a question about the page content (5 credits)
`-H`	Include HTTP headers in output
`--only-main-content`	Strip nav, footer, sidebar — main content only
`--wait-for <ms>`	Wait for JS rendering before scraping
`--include-tags <tags>`	Only include these HTML tags
`--exclude-tags <tags>`	Exclude these HTML tags
`--redact-pii`	Redact personally identifiable information from output
`-o, --output <path>`	Output file path

Tips

Prefer plain scrape over --query. Scrape to a file, then use grep, head, or read the markdown directly — you can search and reason over the full content yourself. Use --query only when you want a single targeted answer without saving the page (costs 5 extra credits).
Try scrape before interact. Scrape handles static pages and JS-rendered SPAs. Only escalate to interact when you need interaction (clicks, form fills, pagination).
Multiple URLs are scraped concurrently — check firecrawl --status for your concurrency limit.
Single format outputs raw content. Multiple formats (e.g., --format markdown,links) output JSON.
Always quote URLs — shell interprets ? and & as special characters.
Naming convention: .firecrawl/{site}-{path}.md