tavily-dynamic-search

ウェブを検索し、結果をフィルタリングしてコンテンツを抽出することで、生の検索データがコンテキストウィンドウに入ることはありません。厳選されたprint()出力のみが返されます。

npx skills add https://github.com/tavily-ai/skills --skill tavily-dynamic-search

Tavily Dynamic Search

Search the web, filter results, and extract content so that raw search data never enters your context window. Only your curated print() output comes back.

Why this matters

A typical tvly search --include-raw-content returns 8 results × 30-50K chars each = ~300K characters of raw page content. If this enters your context window, you burn tokens reading navigation bars, cookie banners, and boilerplate — and your reasoning quality degrades under the noise. By processing results inside a Python script, only your print() output enters context — typically 1-3K characters of pure signal. That's a 100-200x reduction.

Background: Programmatic Tool Calling (PTC)

This skill replicates the architecture of Anthropic's Programmatic Tool Calling (PTC) for web search. PTC lets the model write code that orchestrates tool calls inside a sandbox — intermediate results stay in the sandbox, and only the final print() output reaches the model's context window.

This skill applies the same principle using local Python execution. The Python process is the sandbox. Variables in memory hold the raw data. Only what you print() crosses into your context window. You write the filtering logic — you decide what matters for each query.

Before running any command

If tvly is not found on PATH, install it first:

curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login

Core Rule

NEVER run tvly as a bare command. Always process output through Python so you control what enters your context.

# WRONG — raw results flood your context
tvly search "quantum computing 2025" --json

# RIGHT — only your print() output enters context
tvly search "quantum computing 2025" --json 2>/dev/null | python3 -c "
import json, sys
data = json.load(sys.stdin)
for r in data['results']:
    print(f'[{r[\"score\"]:.2f}] {r[\"title\"]}')
    print(f'  {r[\"url\"]}')
"

JSON Schemas

You need these to write correct filtering code.

tvly search --json

{
  "query": "string",
  "answer": "string | null",
  "results": [
    {
      "url": "string",
      "title": "string",
      "content": "string (snippet, ~500-1500 chars)",
      "score": 0.0-1.0,
      "raw_content": "string | null (full page, only with --include-raw-content)"
    }
  ],
  "response_time": 0.0
}

tvly extract --json

{
  "results": [
    {
      "url": "string",
      "title": "string",
      "raw_content": "string (full page markdown)",
      "images": []
    }
  ],
  "failed_results": [],
  "response_time": 0.0
}

How to search

You have two building blocks and two ways to run them. Compose these however the query demands — there are no fixed patterns. You decide the approach based on what you need.

Building blocks

tvly search — returns titles, URLs, snippets, scores. Optionally includes full page content with --include-raw-content markdown.

tvly extract — fetches full page content for specific URLs. Use when you found a URL from search and need more detail.

Execution modes

Pipe mode — for simple filters (3-5 lines). Pipe tvly output into python3 -c:

tvly search "query" --json 2>/dev/null | python3 -c "
import json, sys
data = json.load(sys.stdin)
# your filtering code here
"

Heredoc mode — for anything more complex. Single Bash call, clean multi-line Python, no escaping, no temp files:

python3 << 'PYEOF'
import json, subprocess
raw = subprocess.check_output(
    ['tvly', 'search', 'query', '--json'],
    stderr=subprocess.DEVNULL
)
data = json.loads(raw)
for r in data['results']:
    print(f"[{r['score']:.2f}] {r['title']}")
    print(f"  {r['url']}")
PYEOF

Single-quoted heredocs (<< 'PYEOF') don't interpret anything — no escaping needed. This is the default for most tasks.

Script mode — only when you will reuse the same script across multiple turns. Do NOT write one-shot scripts to /tmp/. If you run it once, use a heredoc.

Important: save DATA to /tmp/, not CODE. Writing /tmp/tavily_results.json (data for later turns) = good. Writing /tmp/my_filter.py (one-shot code) = wasteful — use a heredoc instead.

Multi-turn iteration

For complex queries, you often need to explore before you extract — just like PTC, where the model searches, sees titles, decides which results to drill into, then extracts.

The key: save raw results to a file, then process them in separate steps. The file is your persistent state between turns.

Turn 1: Search and explore

Search and print only titles + scores. Save raw results to disk for later turns:

python3 << 'PYEOF'
import json, subprocess

raw = subprocess.check_output(
    ['tvly', 'search', 'solid-state battery commercialization 2025',
     '--include-raw-content', 'markdown', '--max-results', '8', '--json'],
    stderr=subprocess.DEVNULL
)
data = json.loads(raw)

# Save raw results — this stays on disk, never enters context
with open('/tmp/tavily_results.json', 'w') as f:
    json.dump(data, f)

# Print only what you need to decide next steps
print(f'{len(data["results"])} results saved to /tmp/tavily_results.json\n')
for i, r in enumerate(data['results']):
    print(f'[{i}] [{r["score"]:.2f}] {r["title"][:90]}')
    print(f'    {r["url"]}')
    print(f'    {r["content"][:150]}')
    print()
PYEOF

Context receives: ~800 tokens of titles + snippets. The 300K of raw page content is in /tmp/tavily_results.json, untouched.

Turn 2: Extract based on what you saw

Now you know what's in the results. Write targeted extraction — you decide which results to drill into and what to filter for:

python3 << 'PYEOF'
import json

data = json.load(open('/tmp/tavily_results.json'))

# You chose these indices based on the titles you saw in turn 1
for i in [0, 2, 5]:
    r = data['results'][i]
    raw = r.get('raw_content', '') or ''
    if not raw:
        continue

    print(f'## {r["title"]}')
    print(f'URL: {r["url"]}\n')

    # You write the filtering logic based on the query
    # This example extracts paragraphs about specific companies
    for para in raw.split('\n\n'):
        para = para.strip()
        if len(para) > 80 and any(kw in para.lower() for kw in
                ['toyota', 'quantumscape', 'samsung', 'commercializ', 'production']):
            print(para)
            print()

    print('---\n')
PYEOF

Context receives: ~600 tokens of targeted content. You made the decision about what to keep.

Turn 3 (optional): Fetch more detail

If you need more from a specific source:

python3 << 'PYEOF'
import json, subprocess

# Fetch a specific URL you identified
raw = subprocess.check_output(
    ['tvly', 'extract', 'https://example.com/article', '--json'],
    stderr=subprocess.DEVNULL
)
data = json.loads(raw)
page = data['results'][0]
content = page.get('raw_content', '')

# Save for potential further processing
with open('/tmp/page_detail.txt', 'w') as f:
    f.write(content)

# Print only the section you care about
for line in content.split('\n'):
    if any(kw in line.lower() for kw in ['timeline', '2025', '2026', 'mass production']):
        print(line.strip())
PYEOF

When to use multi-turn vs single-turn

Single turn (pipe mode or one script): when you know upfront what you're looking for. Specific factual queries, known keywords.

Multi-turn (save + explore + extract): when you need to see what's available before deciding what to extract. Open-ended research, complex topics, queries where you don't know the right keywords yet.

Examples

Simple factual lookup (single turn, pipe mode)

tvly search "Python 3.13 release date" --max-results 5 --json 2>/dev/null | python3 -c "
import json, sys
data = json.load(sys.stdin)
for r in data['results'][:3]:
    print(f'{r[\"title\"]}')
    print(f'{r[\"content\"][:300]}')
    print()
"

Financial data extraction (single turn, heredoc)

python3 << 'PYEOF'
import json, subprocess

raw = subprocess.check_output(
    ['tvly', 'search', 'NVIDIA Q4 2025 earnings revenue',
     '--include-raw-content', 'markdown', '--max-results', '5',
     '--json'],
    stderr=subprocess.DEVNULL
)
data = json.loads(raw)

for r in data['results']:
    raw_content = r.get('raw_content', '') or ''
    # For financial queries, look for lines with numbers
    financial_lines = [
        line.strip() for line in raw_content.split('\n')
        if any(kw in line.lower() for kw in
               ['revenue', 'eps', 'earnings', 'margin', 'guidance', 'billion'])
        and any(c.isdigit() for c in line)
        and len(line.strip()) > 30
    ]
    if financial_lines:
        print(f'## {r["title"]}')
        print(f'URL: {r["url"]}')
        for line in financial_lines[:15]:
            print(f'  {line}')
        print()
PYEOF

Multi-source research (multi-turn)

Turn 1 — broad search + triage:

python3 << 'PYEOF'
import json, subprocess

# Search from multiple angles
queries = [
    ('broad', 'EU AI Act implementation timeline 2025'),
    ('specific', 'EU AI Act high-risk AI systems obligations'),
]

all_results = []
for label, query in queries:
    raw = subprocess.check_output(
        ['tvly', 'search', query, '--max-results', '8', '--json'],
        stderr=subprocess.DEVNULL
    )
    data = json.loads(raw)
    for r in data['results']:
        r['_query'] = label
    all_results.extend(data['results'])

# Deduplicate by URL
seen = set()
unique = []
for r in all_results:
    if r['url'] not in seen:
        seen.add(r['url'])
        unique.append(r)

# Save all results
with open('/tmp/eu_ai_results.json', 'w') as f:
    json.dump(unique, f)

# Print triage
unique.sort(key=lambda r: r['score'], reverse=True)
print(f'{len(unique)} unique results from {len(queries)} queries\n')
for i, r in enumerate(unique[:10]):
    print(f'[{i}] [{r["score"]:.2f}] ({r["_query"]}) {r["title"][:80]}')
    print(f'    {r["url"]}')
    print(f'    {r["content"][:120]}')
    print()
PYEOF

Turn 2 — you see the triage, pick the best sources, and extract:

python3 << 'PYEOF'
import json, subprocess

results = json.load(open('/tmp/eu_ai_results.json'))

# Fetch full content for the top 3 (you chose these based on turn 1)
for r in [results[0], results[2], results[4]]:
    try:
        raw = subprocess.check_output(
            ['tvly', 'extract', r['url'], '--json'],
            stderr=subprocess.DEVNULL, timeout=30
        )
        page = json.loads(raw)
        if not page.get('results'):
            continue
        content = page['results'][0].get('raw_content', '')

        # Your filtering logic — tailored to this query
        print(f'## {r["title"]}')
        print(f'URL: {r["url"]}\n')

        for para in content.split('\n\n'):
            para = para.strip()
            if len(para) > 100 and any(kw in para.lower() for kw in
                    ['high-risk', 'prohibited', 'deadline', 'obligation',
                     'compliance', 'penalty', 'fine', 'article']):
                print(para)
                print()

        print('---\n')
    except Exception:
        continue
PYEOF

Following leads across turns

Sometimes turn 2 reveals new URLs or topics to chase. You can keep iterating:

python3 << 'PYEOF'
import json, subprocess

# Read the page you saved earlier
with open('/tmp/page_detail.txt') as f:
    content = f.read()

# You noticed a reference to a specific regulation document
# Search for it specifically
raw = subprocess.check_output(
    ['tvly', 'search', 'EU AI Act Annex III high-risk list',
     '--include-domains', 'eur-lex.europa.eu',
     '--max-results', '3', '--json'],
    stderr=subprocess.DEVNULL
)
data = json.loads(raw)

for r in data['results']:
    print(f'## {r["title"]}')
    print(f'URL: {r["url"]}')
    print(r['content'])
    print()
PYEOF

Each turn, you save data to /tmp/, decide what to explore next, and write new filtering code as heredocs. The raw data accumulates on disk; your context stays lean.

Writing your filtering code

The Python you write IS the filtering logic. There are no fixed templates — you write code that makes sense for the specific query. Here are principles, not rules:

Triage first. Inspect titles and scores before fetching full pages. Don't extract everything blindly.

Be specific. A financial query should filter for numbers and financial terms. A technical query should look for code blocks and specifications. A news query should look for dates and quotes. Match your filtering to the query.

Structural filtering helps. Skip lines shorter than ~50-80 chars (usually nav elements). Skip common boilerplate phrases. Keep headings and their following paragraphs. But these are starting points — adapt based on what you see.

Print structured output. Format your output so it's easy to reason over:

print(f'## {title}')
print(f'URL: {url}')
print(relevant_content)
print()

Handle errors. Pages fail, URLs 404, extractions timeout. Use try/except and skip failures:

try:
    raw = subprocess.check_output(['tvly', 'extract', url, '--json'],
                                   stderr=subprocess.DEVNULL, timeout=30)
except Exception:
    continue

Token budget awareness. Your print() output is what enters your context. Target 150-600 tokens per source. If you're printing 5000+ chars from a single page, you're probably not filtering enough. But if a source has a critical data table, it's fine to keep more.

Options

All standard tvly search options work:

Option	Description
`--max-results`	Number of results (default: 5, max: 20)
`--depth`	`ultra-fast`, `fast`, `basic` (default), `advanced`
`--time-range`	`day`, `week`, `month`, `year`
`--include-domains`	Comma-separated whitelist
`--exclude-domains`	Comma-separated blacklist
`--include-raw-content`	Full page content (`markdown` or `text`)
`--country`	Boost results from country

Fallback: jq

When python3 is unavailable, use jq for basic filtering:

tvly search "query" --json 2>/dev/null | jq '[.results[] | select(.score > 0.5) | {title, url, content}]'

jq can't do multi-step search-then-extract or complex filtering. Use it only for simple lookups.

tavily-aiのその他のスキル

crawl

tavily-ai

ウェブサイトのコンテンツを抽出し、マークダウンファイルとして保存して、オフラインでのアクセスや分析を可能にします。クロール深度（1～5レベル）、幅の制限、ページ上限を設定可能で、カバレッジとパフォーマンスのバランスを調整できます。正規表現パターンによるパスフィルタリングをサポートし、特定のセクションに焦点を当てたり、無関係なコンテンツを除外できます。データ収集のための全ページ抽出、または結果をLLMコンテキストに取り込むための自然言語指示によるセマンティックチャンキングの2つのモードを提供します。URL用のコンパニオンMap APIも提供します...

official

extract

tavily-ai

Tavilyの抽出APIを使用して、特定のURLからクリーンなコンテンツを抽出します。1リクエストあたり最大20URLに対応し、オプションのクエリベースの再ランキングにより関連するコンテンツチャンクに焦点を当てます。2つの抽出モードがあります：高速テキスト抽出用のベーシックモードと、JavaScriptレンダリングページや構造化データ用のアドバンストモード。初回実行時にブラウザ経由で自動OAuth認証を行うか、設定で手動APIキーを構成します。マークダウンまたはプレーンテキスト形式で返され、オプションで画像URLを含み、最大60秒までのタイムアウト設定が可能です。

official

research

tavily-ai

あらゆるトピックについて、自動的な情報収集、分析、引用を伴う包括的なリサーチを実施。明示的な引用付きで複数ソースのウェブリサーチを行い、比較、時事問題、市場分析、詳細レポートに最適。3つのモデルオプションを提供：ミニ（対象を絞った単一トピックのリサーチ、約30秒）、プロ（包括的な多角的分析、約60～120秒）、オート（APIによる複雑性検出で自動選択）。Tavily MCPサーバーを通じてOAuth認証を行い、自動ブラウザベースのログインを...

official

tavily-ai

official

tavily-best-practices

tavily-ai

LLM向けWeb検索API。リアルタイムデータアクセス、コンテンツ抽出、サイトクローリング、AI駆動のリサーチを提供。5つのコアメソッド：search()（Web結果取得）、extract()（URLコンテンツ抽出）、crawl()（サイト全体の抽出）、map()（URL発見）、research()（エンドツーエンドのAI合成）。PythonおよびJavaScript SDKに対応し、非同期クライアントによる並列クエリと設定可能な検索深度（ultra-fast/fast/basic/advanced）をサポート。Crawlメソッドはセマンティック指示を受け付け、抽出を特定の内容に集中させる。

official

tavily-cli

tavily-ai

Web検索、コンテンツ抽出、サイトクローリング、およびTavily CLIによる深層リサーチ。検索、抽出、URL発見、一括クローリング、引用付きマルチソースリサーチをカバーする5つのコマンドモード。すべてのコマンドはJSON出力とファイル保存に対応し、構造化されたエージェントワークフローを実現。エスカレーションパターンにより、単純な検索から抽出、マッピング、クローリング、包括的なリサーチまで、ニーズに応じてガイド。tavily-cliのインストールと、tvly loginによるAPIキー認証が必要。

official

tavily-crawl

tavily-ai

マルチページウェブサイトクローラーで、セマンティックフィルタリングとマークダウンエクスポート機能を備えています。深さと幅を制御してサイト全体のセクションをクロールし、パス正規表現、ドメイン、または自然言語の指示でフィルタリングして結果を絞り込みます。各ページを--output-dirでローカルのマークダウンファイルとして保存するか、エージェント処理用に構造化JSONを返します。結果をLLMに渡す際のコンテキスト肥大化を防ぐために、チャンク抽出を伴うセマンティック指示を使用します。オフラインのドキュメントダウンロードには全ページ抽出を使用します。対応...

official

tavily-extract

tavily-ai

最大20件のURLからクリーンなマークダウンまたはテキストを抽出。JavaScriptレンダリングとクエリに焦点を当てたチャンク分割をサポート。JavaScriptでレンダリングされたページを処理し、抽出深度を設定可能（シンプルなページは基本、動的なSPAやテーブルは高度）。クエリに焦点を当てた抽出をサポートし、全ページではなく関連コンテンツのチャンクのみを返却。デフォルトでLLM最適化されたマークダウンを返し、プレーンテキスト形式や構造化JSON出力のオプションも提供。1回の呼び出しで最大20件のURLを処理。

official