tavily-extract
20 URL'ye kadar temiz markdown veya metin çıkarır; JavaScript işleme ve sorgu odaklı parçalama desteği sunar. JavaScript ile oluşturulmuş sayfaları, yapılandırılabilir çıkarma derinliğiyle (basit sayfalar için temel, dinamik SPA'lar ve tablolar için gelişmiş) işler. Tam sayfalar yerine yalnızca ilgili içerik parçalarını döndürmek için sorgu odaklı çıkarmayı destekler. Varsayılan olarak LLM için optimize edilmiş markdown döndürür; düz metin biçimi ve yapılandırılmış JSON çıktısı seçenekleri sunar. Tek bir çağrıda 20 URL'ye kadar işler;...
npx skills add https://github.com/tavily-ai/skills --skill tavily-extracttavily extract
Extract clean markdown or text content from one or more URLs.
Before running any command
If tvly is not found on PATH, install it first:
curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login
Do not skip this step or fall back to other tools.
See tavily-cli for alternative install methods and auth options.
When to use
- You have a specific URL and want its content
- You need text from JavaScript-rendered pages
- Step 2 in the workflow: search → extract → map → crawl → research
Quick start
# Single URL
tvly extract "https://example.com/article" --json
# Multiple URLs
tvly extract "https://example.com/page1" "https://example.com/page2" --json
# Query-focused extraction (returns relevant chunks only)
tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json
# JS-heavy pages
tvly extract "https://app.example.com" --extract-depth advanced --json
# Save to file
tvly extract "https://example.com/article" -o article.md
Options
| Option | Description |
|---|---|
--query | Rerank chunks by relevance to this query |
--chunks-per-source | Chunks per URL (1-5, requires --query) |
--extract-depth | basic (default) or advanced (for JS pages) |
--format | markdown (default) or text |
--include-images | Include image URLs |
--timeout | Max wait time (1-60 seconds) |
-o, --output | Save output to file |
--json | Structured JSON output |
Extract depth
| Depth | When to use |
|---|---|
basic | Simple pages, fast — try this first |
advanced | JS-rendered SPAs, dynamic content, tables |
Tips
- Max 20 URLs per request — batch larger lists into multiple calls.
- Use
--query+--chunks-per-sourceto get only relevant content instead of full pages. - Try
basicfirst, fall back toadvancedif content is missing. - Set
--timeoutfor slow pages (up to 60s). - If search results already contain the content you need (via
--include-raw-content), skip the extract step.
See also
- tavily-search — find pages when you don't have a URL
- tavily-crawl — extract content from many pages on a site