apify-ultimate-scraper

作者: apify

自动化网页抓取工具,为55多个平台(包括Instagram、TikTok、YouTube、Facebook、Google Maps等)选择最优Actor。涵盖8大主流平台的55多个预配置Actor,并提供针对特定用例的选择指导(潜在客户生成、网红发现、品牌监控、竞争对手分析、趋势研究)。支持三种输出格式:快速聊天显示、CSV导出或JSON导出,可自定义结果数量限制。包含多Actor工作流模式,适用于复杂...

npx skills add https://github.com/apify/agent-skills --skill apify-ultimate-scraper

Universal web scraper

AI-driven data extraction from ~100 Actors across 15+ platforms via the Apify CLI.

Rules for every apify command:

  1. Pass --json for machine-readable output (stable across CLI versions).
  2. Pass --user-agent apify-agent-skills/apify-ultimate-scraper for telemetry attribution.
  3. Redirect stderr with 2>/dev/null (stderr contains progress messages that break JSON parsers).

Prerequisites

  • Apify CLI v1.5.0+ (npm install -g apify-cli)
  • Authenticated session (see below)

Authentication

If a CLI command fails with an auth error, authenticate using one of these methods:

  1. OAuth (interactive): apify login (opens browser)
  2. Environment variable: export APIFY_TOKEN=your_token_here
  3. From .env file: source .env (if the file contains APIFY_TOKEN=...)

Generate token: https://console.apify.com/settings/integrations

Workflow

Step 1: Understand goal and select Actor

Identify the target platform and use case. Read references/actor-index.md to find the right Actor.

If the task involves a multi-step pipeline, also read the matching workflow guide:

Task involves...Read
leads, contacts, emails, B2Breferences/workflows/lead-generation.md
competitor, ads, pricingreferences/workflows/competitive-intel.md
influencer, creatorreferences/workflows/influencer-vetting.md
brand, mentions, sentimentreferences/workflows/brand-monitoring.md
reviews, ratings, reputationreferences/workflows/review-analysis.md
SEO, SERP, crawl, content, RAGreferences/workflows/content-and-seo.md
analytics, engagement, performancereferences/workflows/social-media-analytics.md
trends, keywords, hashtagsreferences/workflows/trend-research.md
jobs, recruiting, candidatesreferences/workflows/job-market-and-recruitment.md
real estate, listings, hotelsreferences/workflows/real-estate-and-hospitality.md
price monitoring, e-commerce, productsreferences/workflows/ecommerce-price-monitoring.md
contact enrichment, email extractionreferences/workflows/contact-enrichment.md
knowledge base, RAG, LLM data feedreferences/workflows/knowledge-base-and-rag.md
company research, due diligencereferences/workflows/company-research.md

If no Actor matches in the index, search dynamically:

apify actors search "KEYWORDS" --user-agent apify-agent-skills/apify-ultimate-scraper --json --limit 10 2>/dev/null

From results: items[].username/items[].name (Actor ID), items[].title, items[].stats.totalUsers30Days, items[].currentPricingInfo.pricingModel.

Step 2: Fetch Actor schema and check gotchas

Fetch the input schema dynamically:

apify actors info "ACTOR_ID" --user-agent apify-agent-skills/apify-ultimate-scraper --input --json 2>/dev/null

Also read references/gotchas.md to check for common pitfalls for the selected Actor.

For Actor documentation: apify actors info "ACTOR_ID" --user-agent apify-agent-skills/apify-ultimate-scraper --readme

Step 3: Configure and run

Skip user preferences for simple lookups (e.g., "Nike's follower count"). Go straight to running with quick answer mode.

For larger tasks, confirm output format (quick answer / CSV / JSON) and result count.

Standard run (blocking):

apify actors call "ACTOR_ID" --input-file input.json --user-agent apify-agent-skills/apify-ultimate-scraper --json 2>/dev/null

Prefer --input-file input.json for large or complex inputs. For tiny inputs, inline JSON is acceptable with shell quoting: --input '{"maxItems":10}'.

From output: .id (run ID), .status, .defaultDatasetId, .stats.durationMillis

Fetch results:

apify datasets get-items DATASET_ID --user-agent apify-agent-skills/apify-ultimate-scraper --format json

For CSV: apify datasets get-items DATASET_ID --user-agent apify-agent-skills/apify-ultimate-scraper --format csv

Quick answer mode: Fetch results as JSON, pick top 5, present formatted in chat.

Save to file: Fetch results, use Write tool to save as YYYY-MM-DD_descriptive-name.csv or .json.

Large/long-running scrapes:

apify actors start "ACTOR_ID" --input-file input.json --user-agent apify-agent-skills/apify-ultimate-scraper --json 2>/dev/null

Poll: apify runs info RUN_ID --user-agent apify-agent-skills/apify-ultimate-scraper --json 2>/dev/null (check .status for SUCCEEDED).

Step 4: Deliver results

Report: result count, file location (if saved), key data fields, and links:

  • Dataset: https://console.apify.com/storage/datasets/DATASET_ID
  • Run: https://console.apify.com/actors/runs/RUN_ID

For multi-step workflows: suggest the next pipeline step from the workflow guide.

Troubleshooting

Common errors and pitfalls are documented in references/gotchas.md. Read it before running PPE (pay-per-event) Actors.

来自 apify 的更多技能

bug-triage
apify
对 apify/apify-mcp-server 上的开放 bug 问题进行分类。分析、草拟回复、获取批准、发布。
official
dig
apify
用于在Apify MCP服务器上探索、规划和指定工作的灵活技能。请勿编辑源文件——此技能仅用于理解和规划。
official
apify-actor-development
apify
创建、调试和部署用于网页抓取、自动化及数据处理的无服务器云程序。支持JavaScript、TypeScript和Python模板,集成Crawlee、Playwright和Cheerio库,用于HTTP和基于浏览器的爬取。包含通过apify run进行的本地测试(使用隔离存储)、输入/输出的模式验证,以及通过apify push部署到Apify平台。需要Apify CLI认证,并在.actor/actor.json中强制包含generatedBy元数据以用于AI...
official
apify-actorization
apify
将现有项目转换为无服务器Apify Actors,支持语言特定的SDK集成。支持JavaScript/TypeScript(使用Actor.init() / Actor.exit())、Python(异步上下文管理器)以及通过CLI包装器的任何语言。提供结构化工作流:使用apify init搭建脚手架,应用SDK封装,配置输入/输出模式,通过apify run进行本地测试,然后使用apify push进行部署。包含输入和输出模式验证、Docker容器化以及可选的按事件付费...
official
apify-audience-analysis
apify
从Facebook、Instagram、YouTube和TikTok提取受众人口统计、参与模式和行为数据。支持18+个专业Actor,涵盖所有四个平台的粉丝人口统计、参与指标、评论和资料分析。提供三种输出格式:快速聊天显示、CSV导出或JSON导出,用于下游分析。需要Apify令牌和mcpc CLI工具;使用动态模式获取来调整输入以适应每个Actor的要求。包括结构化...
official
apify-brand-reputation-monitoring
apify
监控Google Maps、Booking.com、TripAdvisor、Facebook、Instagram、YouTube和TikTok上的品牌声誉。支持16+个专用Apify Actor,覆盖所有主要平台的评论、评分、评论和提及内容。灵活的输出格式:在聊天中显示结果、导出为CSV或保存为JSON供下游分析使用。需要Apify令牌和Node.js 20.6+;使用mcpc CLI动态获取Actor架构和输入参数。工作流程引导用户选择平台...
official
apify-competitor-intelligence
apify
通过Apify Actors实现多平台竞争对手分析,覆盖Google Maps、Booking.com、Facebook、Instagram、YouTube和TikTok。包含七个平台25+个专用Actors,每个针对特定分析类型优化:商业数据提取、评论对比、广告策略监控、内容表现及受众洞察。需Apify令牌、Node.js 20.6+及mcpc CLI工具以动态获取Actor架构并运行分析。支持三种输出格式:快速聊天展示...
official
apify-content-analytics
apify
通过Apify Actors实现多平台内容分析,支持Instagram、Facebook、YouTube和TikTok。涵盖17+个专业Actors,覆盖所有四个平台的帖子、Reels、故事、评论、话题标签、粉丝和广告。使用mcpc CLI动态获取Actor模式,以确定所需输入和可用输出字段。结果以三种格式输出:快速聊天显示、CSV导出或JSON导出,并可自定义结果数量。需要在.env文件中配置Apify令牌,并安装Node.js 20.6+...
official