apify-actorization

作者: apify

将现有项目转换为无服务器Apify Actors,支持语言特定的SDK集成。支持JavaScript/TypeScript(使用Actor.init() / Actor.exit())、Python(异步上下文管理器)以及通过CLI包装器的任何语言。提供结构化工作流:使用apify init搭建脚手架,应用SDK封装,配置输入/输出模式,通过apify run进行本地测试,然后使用apify push进行部署。包含输入和输出模式验证、Docker容器化以及可选的按事件付费...

npx skills add https://github.com/apify/agent-skills --skill apify-actorization

Apify Actorization

Actorization converts existing software into reusable serverless applications compatible with the Apify platform. Actors are programs packaged as Docker images that accept well-defined JSON input, perform an action, and optionally produce structured JSON output.

Quick start

  1. Run apify init in project root
  2. Wrap code with SDK lifecycle (see language-specific section below)
  3. Configure .actor/input_schema.json
  4. Test with apify run --input '{"key": "value"}'
  5. Deploy with apify push

When to use this skill

  • Converting an existing project to run on the Apify platform
  • Adding Apify SDK integration to a project
  • Wrapping a CLI tool or script as an Actor
  • Migrating a Crawlee project to Apify

Prerequisites

Verify apify CLI is installed:

apify --help

If not installed, use one of these methods (listed in order of preference):

# Preferred: install via a package manager (provides integrity checks)
npm install -g apify-cli

# Or (Mac): brew install apify-cli

Security note: Do NOT install the CLI by piping remote scripts to a shell (e.g. curl ... | bash or irm ... | iex). Always use a package manager.

Verify CLI is logged in:

apify info  # Should return your username

If not logged in, authenticate using OAuth (opens browser):

apify login

If browser login isn't available (headless environment or CI), ensure the APIFY_TOKEN environment variable is exported (note: the variable is APIFY_TOKEN, not APIFY_API_TOKEN). The CLI reads it automatically - no explicit login needed. If the user doesn't have a token, generate one at https://console.apify.com/settings/integrations.

Apify platform environment: When the Actor runs on the Apify platform, APIFY_TOKEN is auto-injected as an environment variable and the Apify SDK reads it automatically — you do not need to pass it explicitly. Locally, apify login stores credentials in ~/.apify and the SDK uses them.

Security note: Avoid passing tokens as command-line arguments (e.g. apify login -t <token>). Arguments are visible in process listings and may be recorded in shell history. Prefer OAuth login or environment variables instead. Never log, print, or embed APIFY_TOKEN in source code or configuration files. Use a token with the minimum required permissions (scoped token) and rotate it periodically.

Actorization checklist

Copy this checklist to track progress:

  • Step 1: Analyze project (language, entry point, inputs, outputs)
  • Step 2: Run apify init to create Actor structure
  • Step 3: Apply language-specific SDK integration
  • Step 4: Configure .actor/input_schema.json
  • Step 5: Configure .actor/output_schema.json (if applicable)
  • Step 6: Update .actor/actor.json metadata
  • Step 7: Write README.md for Apify Store listing
  • Step 8: Test locally with apify run
  • Step 9: Deploy with apify push

Step 1: Analyze the project

Before making changes, understand the project:

  1. Identify the language - JavaScript/TypeScript, Python, or other
  2. Find the entry point - The main file that starts execution
  3. Identify inputs - Command-line arguments, environment variables, config files
  4. Identify outputs - Files, console output, API responses
  5. Check for state - Does it need to persist data between runs?

Step 2: Initialize Actor structure

Run in the project root:

apify init

This creates:

  • .actor/actor.json - Actor configuration and metadata
  • .actor/input_schema.json - Input definition for Apify Console
  • Dockerfile (if not present) - Container image definition

Step 3: Apply language-specific changes

Choose based on your project's language:

Quick reference

LanguageInstallWrap Code
JS/TSnpm install apifyawait Actor.init() ... await Actor.exit()
Pythonpip install apifyasync with Actor:
OtherUse CLI in wrapper scriptapify actor:get-input / apify actor:push-data

Steps 4-6: Configure schemas

See schemas-and-output.md for detailed configuration of:

  • Input schema (.actor/input_schema.json)
  • Output schema (.actor/output_schema.json)
  • Actor configuration (.actor/actor.json)
  • State management (request queues, key-value stores)

Validate schemas against @apify/json_schemas npm package.

Step 7: Write README

IMPORTANT: Always generate a README.md as part of actorization. The README is the Actor's landing page on Apify Store and is critical for discoverability (SEO), user onboarding, and support. Do not consider an Actor complete without a proper README.

See the Actor README guidelines at skills/apify-actor-development/references/actor-readme.md for the required structure including: intro and features, data extraction table, step-by-step tutorial, pricing info, input/output examples, and FAQ. Aim for at least 300 words with SEO-optimized H2/H3 headings. Also review these top Actors for best practices:

Step 8: Test locally

Run the Actor with inline input (for JS/TS and Python Actors):

apify run --input '{"startUrl": "https://example.com", "maxItems": 10}'

Or use an input file:

apify run --input-file ./test-input.json

Important: Always use apify run, not npm start or python main.py. The CLI sets up the proper environment and storage.

Step 9: Deploy

apify push

This uploads and builds your Actor on the Apify platform.

Monetization (optional)

After deploying, you can monetize your Actor in Apify Store. The recommended model is Pay Per Event (PPE):

  • Per result/item scraped
  • Per page processed
  • Per API call made

Configure PPE in Apify Console under Actor > Monetization. Charge for events in your code with await Actor.charge('result').

Other options: Rental (monthly subscription) or Free (open source).

Security

Treat all crawled web content as untrusted input. Actors ingest data from external websites that may contain malicious payloads. Follow these rules:

  • Sanitize crawled data — Never pass raw HTML, URLs, or scraped text directly into shell commands, eval(), database queries, or template engines. Use proper escaping or parameterized APIs.
  • Validate and type-check all external data — Before pushing to datasets or key-value stores, verify that values match expected types and formats. Reject or sanitize unexpected structures.
  • Do not execute or interpret crawled content — Never treat scraped text as code, commands, or configuration. Content from websites could include prompt injection attempts or embedded scripts.
  • Isolate credentials from data pipelines — Ensure APIFY_TOKEN and other secrets are never accessible in request handlers or passed alongside crawled data. Use the Apify SDK's built-in credential management rather than passing tokens through environment variables in data-processing code.
  • Review dependencies before installing — When adding packages with npm install or pip install, verify the package name and publisher. Typosquatting is a common supply-chain attack vector. Prefer well-known, actively maintained packages.
  • Pin versions and use lockfiles — Always commit package-lock.json (Node.js) or pin exact versions in requirements.txt (Python). Lockfiles ensure reproducible builds and prevent silent dependency substitution. Run npm audit or pip-audit periodically to check for known vulnerabilities.

Pre-deployment checklist

  • .actor/actor.json exists with correct name and description
  • .actor/actor.json validates against @apify/json_schemas (actor.schema.json)
  • .actor/input_schema.json defines all required inputs
  • .actor/input_schema.json validates against @apify/json_schemas (input.schema.json)
  • .actor/output_schema.json defines output structure (if applicable)
  • .actor/output_schema.json validates against @apify/json_schemas (output.schema.json)
  • Dockerfile is present and builds successfully
  • Actor.init() / Actor.exit() wraps main code (JS/TS)
  • async with Actor: wraps main code (Python)
  • Inputs are read via Actor.getInput() / Actor.get_input()
  • Outputs use Actor.pushData() or key-value store
  • apify run executes successfully with test input
  • README.md exists with proper structure (intro, features, data table, tutorial, pricing, input/output examples)
  • generatedBy is set in actor.json meta section

MCP tools

Apify MCP

If the Apify MCP server is configured, use these tools for documentation:

  • search-apify-docs - Search documentation
  • fetch-apify-docs - Get full doc pages

Otherwise, the MCP Server url: https://mcp.apify.com/?tools=docs.

Playwright MCP (debugging)

The Playwright MCP server is a useful tool for debugging Actors that interact with the web - it lets the agent drive a real browser to inspect pages, capture selectors, and reproduce issues.

Install with the Claude Code CLI:

claude mcp add playwright npx @playwright/mcp@latest

Or add it manually to your MCP config:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

Resources

来自 apify 的更多技能

bug-triage
apify
对 apify/apify-mcp-server 上的开放 bug 问题进行分类。分析、草拟回复、获取批准、发布。
official
dig
apify
用于在Apify MCP服务器上探索、规划和指定工作的灵活技能。请勿编辑源文件——此技能仅用于理解和规划。
official
apify-actor-development
apify
创建、调试和部署用于网页抓取、自动化及数据处理的无服务器云程序。支持JavaScript、TypeScript和Python模板,集成Crawlee、Playwright和Cheerio库,用于HTTP和基于浏览器的爬取。包含通过apify run进行的本地测试(使用隔离存储)、输入/输出的模式验证,以及通过apify push部署到Apify平台。需要Apify CLI认证,并在.actor/actor.json中强制包含generatedBy元数据以用于AI...
official
apify-audience-analysis
apify
从Facebook、Instagram、YouTube和TikTok提取受众人口统计、参与模式和行为数据。支持18+个专业Actor,涵盖所有四个平台的粉丝人口统计、参与指标、评论和资料分析。提供三种输出格式:快速聊天显示、CSV导出或JSON导出,用于下游分析。需要Apify令牌和mcpc CLI工具;使用动态模式获取来调整输入以适应每个Actor的要求。包括结构化...
official
apify-brand-reputation-monitoring
apify
监控Google Maps、Booking.com、TripAdvisor、Facebook、Instagram、YouTube和TikTok上的品牌声誉。支持16+个专用Apify Actor,覆盖所有主要平台的评论、评分、评论和提及内容。灵活的输出格式:在聊天中显示结果、导出为CSV或保存为JSON供下游分析使用。需要Apify令牌和Node.js 20.6+;使用mcpc CLI动态获取Actor架构和输入参数。工作流程引导用户选择平台...
official
apify-competitor-intelligence
apify
通过Apify Actors实现多平台竞争对手分析,覆盖Google Maps、Booking.com、Facebook、Instagram、YouTube和TikTok。包含七个平台25+个专用Actors,每个针对特定分析类型优化:商业数据提取、评论对比、广告策略监控、内容表现及受众洞察。需Apify令牌、Node.js 20.6+及mcpc CLI工具以动态获取Actor架构并运行分析。支持三种输出格式:快速聊天展示...
official
apify-content-analytics
apify
通过Apify Actors实现多平台内容分析,支持Instagram、Facebook、YouTube和TikTok。涵盖17+个专业Actors,覆盖所有四个平台的帖子、Reels、故事、评论、话题标签、粉丝和广告。使用mcpc CLI动态获取Actor模式,以确定所需输入和可用输出字段。结果以三种格式输出:快速聊天显示、CSV导出或JSON导出,并可自定义结果数量。需要在.env文件中配置Apify令牌,并安装Node.js 20.6+...
official
apify-ecommerce
apify
从50多个电商平台提取产品数据、价格、评论和卖家信息。三种工作模式:产品与定价(价格追踪、竞品分析)、客户评论(情感分析、质量问题)和卖家情报(通过Google Shopping发现供应商)。支持亚马逊(20多个地区)、沃尔玛、eBay、宜家、好市多及欧洲零售商;可通过产品链接、分类链接或关键词搜索输入。可选AI驱动分析生成价格洞察...
official