tavily-extract

โดย tavily-ai

แยกข้อมูลเป็น markdown หรือข้อความที่สะอาดจาก URL สูงสุด 20 รายการ พร้อมรองรับการเรนเดอร์ JavaScript และการแบ่งส่วนตามคำค้นหา จัดการหน้าเว็บที่เรนเดอร์ด้วย JavaScript ได้ โดยปรับระดับการแยกข้อมูลได้ (พื้นฐานสำหรับหน้าเว็บธรรมดา ขั้นสูงสำหรับ SPA และตารางแบบไดนามิก) รองรับการแยกข้อมูลตามคำค้นหาเพื่อส่งคืนเฉพาะเนื้อหาที่เกี่ยวข้อง แทนที่จะส่งคืนทั้งหน้า ส่งคืน markdown ที่ปรับให้เหมาะสมกับ LLM โดยค่าเริ่มต้น พร้อมตัวเลือกรูปแบบข้อความธรรมดาและเอาต์พุต JSON แบบมีโครงสร้าง ประมวลผล URL สูงสุด 20 รายการในการเรียกครั้งเดียว...

npx skills add https://github.com/tavily-ai/skills --skill tavily-extract

ดาวน์โหลด ZIP GitHub

397

tavily extract

Extract clean markdown or text content from one or more URLs.

Before running any command

If tvly is not found on PATH, install it first:

curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login

Do not skip this step or fall back to other tools.

See tavily-cli for alternative install methods and auth options.

When to use

You have a specific URL and want its content
You need text from JavaScript-rendered pages
Step 2 in the workflow: search → extract → map → crawl → research

Quick start

# Single URL
tvly extract "https://example.com/article" --json

# Multiple URLs
tvly extract "https://example.com/page1" "https://example.com/page2" --json

# Query-focused extraction (returns relevant chunks only)
tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json

# JS-heavy pages
tvly extract "https://app.example.com" --extract-depth advanced --json

# Save to file
tvly extract "https://example.com/article" -o article.md

Options

Option	Description
`--query`	Rerank chunks by relevance to this query
`--chunks-per-source`	Chunks per URL (1-5, requires `--query`)
`--extract-depth`	`basic` (default) or `advanced` (for JS pages)
`--format`	`markdown` (default) or `text`
`--include-images`	Include image URLs
`--timeout`	Max wait time (1-60 seconds)
`-o, --output`	Save output to file
`--json`	Structured JSON output

Extract depth

Depth	When to use
`basic`	Simple pages, fast — try this first
`advanced`	JS-rendered SPAs, dynamic content, tables

Tips

Max 20 URLs per request — batch larger lists into multiple calls.
Use --query + --chunks-per-source to get only relevant content instead of full pages.
Try basic first, fall back to advanced if content is missing.
Set --timeout for slow pages (up to 60s).
If search results already contain the content you need (via --include-raw-content), skip the extract step.