apify-ultimate-scraper

作者: apify

自動化網頁爬蟲,為55多個平台選擇最佳Actor,包括Instagram、TikTok、YouTube、Facebook、Google地圖等。涵蓋8大主要平台的55多個預配置Actor,並提供針對特定使用案例的選擇指引(潛在客戶開發、網紅發現、品牌監控、競爭對手分析、趨勢研究)。支援三種輸出格式:快速聊天顯示、CSV匯出或JSON匯出,並可自訂結果數量限制。包含多Actor工作流程模式,適用於複雜...

npx skills add https://github.com/apify/agent-skills --skill apify-ultimate-scraper

Universal web scraper

AI-driven data extraction from ~100 Actors across 15+ platforms via the Apify CLI.

Rules for every apify command:

  1. Pass --json for machine-readable output (stable across CLI versions).
  2. Pass --user-agent apify-agent-skills/apify-ultimate-scraper for telemetry attribution.
  3. Redirect stderr with 2>/dev/null (stderr contains progress messages that break JSON parsers).

Prerequisites

  • Apify CLI v1.5.0+ (npm install -g apify-cli)
  • Authenticated session (see below)

Authentication

If a CLI command fails with an auth error, authenticate using one of these methods:

  1. OAuth (interactive): apify login (opens browser)
  2. Environment variable: export APIFY_TOKEN=your_token_here
  3. From .env file: source .env (if the file contains APIFY_TOKEN=...)

Generate token: https://console.apify.com/settings/integrations

Workflow

Step 1: Understand goal and select Actor

Identify the target platform and use case. Read references/actor-index.md to find the right Actor.

If the task involves a multi-step pipeline, also read the matching workflow guide:

Task involves...Read
leads, contacts, emails, B2Breferences/workflows/lead-generation.md
competitor, ads, pricingreferences/workflows/competitive-intel.md
influencer, creatorreferences/workflows/influencer-vetting.md
brand, mentions, sentimentreferences/workflows/brand-monitoring.md
reviews, ratings, reputationreferences/workflows/review-analysis.md
SEO, SERP, crawl, content, RAGreferences/workflows/content-and-seo.md
analytics, engagement, performancereferences/workflows/social-media-analytics.md
trends, keywords, hashtagsreferences/workflows/trend-research.md
jobs, recruiting, candidatesreferences/workflows/job-market-and-recruitment.md
real estate, listings, hotelsreferences/workflows/real-estate-and-hospitality.md
price monitoring, e-commerce, productsreferences/workflows/ecommerce-price-monitoring.md
contact enrichment, email extractionreferences/workflows/contact-enrichment.md
knowledge base, RAG, LLM data feedreferences/workflows/knowledge-base-and-rag.md
company research, due diligencereferences/workflows/company-research.md

If no Actor matches in the index, search dynamically:

apify actors search "KEYWORDS" --user-agent apify-agent-skills/apify-ultimate-scraper --json --limit 10 2>/dev/null

From results: items[].username/items[].name (Actor ID), items[].title, items[].stats.totalUsers30Days, items[].currentPricingInfo.pricingModel.

Step 2: Fetch Actor schema and check gotchas

Fetch the input schema dynamically:

apify actors info "ACTOR_ID" --user-agent apify-agent-skills/apify-ultimate-scraper --input --json 2>/dev/null

Also read references/gotchas.md to check for common pitfalls for the selected Actor.

For Actor documentation: apify actors info "ACTOR_ID" --user-agent apify-agent-skills/apify-ultimate-scraper --readme

Step 3: Configure and run

Skip user preferences for simple lookups (e.g., "Nike's follower count"). Go straight to running with quick answer mode.

For larger tasks, confirm output format (quick answer / CSV / JSON) and result count.

Standard run (blocking):

apify actors call "ACTOR_ID" --input-file input.json --user-agent apify-agent-skills/apify-ultimate-scraper --json 2>/dev/null

Prefer --input-file input.json for large or complex inputs. For tiny inputs, inline JSON is acceptable with shell quoting: --input '{"maxItems":10}'.

From output: .id (run ID), .status, .defaultDatasetId, .stats.durationMillis

Fetch results:

apify datasets get-items DATASET_ID --user-agent apify-agent-skills/apify-ultimate-scraper --format json

For CSV: apify datasets get-items DATASET_ID --user-agent apify-agent-skills/apify-ultimate-scraper --format csv

Quick answer mode: Fetch results as JSON, pick top 5, present formatted in chat.

Save to file: Fetch results, use Write tool to save as YYYY-MM-DD_descriptive-name.csv or .json.

Large/long-running scrapes:

apify actors start "ACTOR_ID" --input-file input.json --user-agent apify-agent-skills/apify-ultimate-scraper --json 2>/dev/null

Poll: apify runs info RUN_ID --user-agent apify-agent-skills/apify-ultimate-scraper --json 2>/dev/null (check .status for SUCCEEDED).

Step 4: Deliver results

Report: result count, file location (if saved), key data fields, and links:

  • Dataset: https://console.apify.com/storage/datasets/DATASET_ID
  • Run: https://console.apify.com/actors/runs/RUN_ID

For multi-step workflows: suggest the next pipeline step from the workflow guide.

Troubleshooting

Common errors and pitfalls are documented in references/gotchas.md. Read it before running PPE (pay-per-event) Actors.

來自 apify 的更多技能

bug-triage
apify
分類處理 apify/apify-mcp-server 上的未解決錯誤問題。分析、草擬回覆、取得核准、發布。
official
dig
apify
用於在 Apify MCP 伺服器上探索、規劃與規格化工作的靈活技能。請勿編輯原始檔案——此技能僅供理解與規劃使用。
official
apify-actor-development
apify
建立、除錯及部署無伺服器雲端程式,用於網頁爬取、自動化及資料處理。支援 JavaScript、TypeScript 及 Python 範本,內建 Crawlee、Playwright 與 Cheerio 函式庫,適用於 HTTP 及瀏覽器爬取。包含透過 apify run 進行本地測試(具備隔離儲存)、輸入/輸出結構驗證,以及透過 apify push 部署至 Apify 平台。需進行 Apify CLI 驗證,並在 .actor/actor.json 中強制加入 generatedBy 元資料以供 AI 使用...
official
apify-actorization
apify
將現有專案轉換為無伺服器 Apify Actors,並整合語言專屬 SDK。支援 JavaScript/TypeScript(使用 Actor.init() / Actor.exit())、Python(非同步上下文管理器),以及透過 CLI 包裝器的任何語言。提供結構化工作流程:使用 apify init 建立專案骨架、套用 SDK 包裝、設定輸入/輸出架構、以 apify run 進行本地測試,再透過 apify push 部署。包含輸入與輸出架構驗證、Docker 容器化,以及可選的按事件付費...
official
apify-audience-analysis
apify
從Facebook、Instagram、YouTube和TikTok提取受眾人口統計、互動模式及行為數據。支援18個以上專業Actor,涵蓋四個平台的粉絲人口統計、互動指標、留言及個人檔案分析。提供三種輸出格式:快速聊天顯示、CSV匯出或JSON匯出供後續分析。需使用Apify token及mcpc CLI工具;透過動態架構擷取來調整輸入以符合各Actor需求。包含結構化...
official
apify-brand-reputation-monitoring
apify
監控品牌在Google Maps、Booking.com、TripAdvisor、Facebook、Instagram、YouTube和TikTok上的聲譽。支援16個以上的專用Apify Actors,涵蓋所有主要平台的評論、評分、留言和提及。靈活的輸出格式:在聊天中顯示結果、匯出為CSV,或儲存為JSON供後續分析使用。需要Apify token和Node.js 20.6+;使用mcpc CLI動態擷取Actor架構和輸入參數。工作流程引導使用者選擇平台...
official
apify-competitor-intelligence
apify
透過 Apify Actors 進行多平台競爭對手分析,涵蓋 Google Maps、Booking.com、Facebook、Instagram、YouTube 及 TikTok。包含 25 個以上專用 Actors,橫跨七大平台,每個皆針對特定分析類型最佳化:商業資料擷取、評論比較、廣告策略監控、內容成效及受眾洞察。需具備 Apify 權杖、Node.js 20.6+ 及 mcpc CLI 工具,以動態擷取 Actor 架構並執行分析。支援三種輸出格式:快速聊天顯示、...
official
apify-content-analytics
apify
透過 Apify Actors 進行多平台內容分析,支援 Instagram、Facebook、YouTube 及 TikTok。涵蓋 17 種以上專用 Actors,可處理貼文、Reels、限時動態、留言、Hashtag、粉絲及廣告等內容,並動態使用 mcpc CLI 擷取 Actor 架構,以判斷所需輸入與可用輸出欄位。結果提供三種格式:快速聊天顯示、CSV 匯出或 JSON 匯出,並可自訂結果數量。需在 .env 檔案中設定 Apify Token,並使用 Node.js 20.6+...
official