tavily-extract

Extrahiere sauberes Markdown oder Text aus bis zu 20 URLs, mit Unterstützung für JavaScript-Rendering und abfragefokussierte Chunking. Verarbeitet JavaScript-gerenderte Seiten mit konfigurierbarer Extraktionstiefe (einfach für einfache Seiten, erweitert für dynamische SPAs und Tabellen). Unterstützt abfragefokussierte Extraktion, um nur relevante Inhaltsabschnitte statt ganzer Seiten zurückzugeben. Gibt standardmäßig LLM-optimiertes Markdown zurück, mit Optionen für Klartextformat und strukturierte JSON-Ausgabe. Verarbeitet bis zu 20 URLs in einem einzigen Aufruf;...

tavily extract

Extract clean markdown or text content from one or more URLs.

Before running any command

If tvly is not found on PATH, install it first:

curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login

Do not skip this step or fall back to other tools.

See tavily-cli for alternative install methods and auth options.

When to use

You have a specific URL and want its content
You need text from JavaScript-rendered pages
Step 2 in the workflow: search → extract → map → crawl → research

Quick start

# Single URL
tvly extract "https://example.com/article" --json

# Multiple URLs
tvly extract "https://example.com/page1" "https://example.com/page2" --json

# Query-focused extraction (returns relevant chunks only)
tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json

# JS-heavy pages
tvly extract "https://app.example.com" --extract-depth advanced --json

# Save to file
tvly extract "https://example.com/article" -o article.md

Options

Option	Description
`--query`	Rerank chunks by relevance to this query
`--chunks-per-source`	Chunks per URL (1-5, requires `--query`)
`--extract-depth`	`basic` (default) or `advanced` (for JS pages)
`--format`	`markdown` (default) or `text`
`--include-images`	Include image URLs
`--timeout`	Max wait time (1-60 seconds)
`-o, --output`	Save output to file
`--json`	Structured JSON output

Extract depth

Depth	When to use
`basic`	Simple pages, fast — try this first
`advanced`	JS-rendered SPAs, dynamic content, tables

Tips

Max 20 URLs per request — batch larger lists into multiple calls.
Use --query + --chunks-per-source to get only relevant content instead of full pages.
Try basic first, fall back to advanced if content is missing.
Set --timeout for slow pages (up to 60s).
If search results already contain the content you need (via --include-raw-content), skip the extract step.