apify-generate-output-schema

작성자: apify

Apify Actor의 소스 코드를 분석하여 출력 스키마(dataset_schema.json, output_schema.json, key_value_store_schema.json)를 생성합니다. 다음과 같은 경우에 사용하세요…

npx skills add https://github.com/apify/agent-skills --skill apify-generate-output-schema

Generate Actor output schema

You are generating output schema files for an Apify Actor. The output schema tells Apify Console how to display run results. You will analyze the Actor's source code, create dataset_schema.json, output_schema.json, and key_value_store_schema.json (if the Actor uses key-value store), and update actor.json.

Core principles

  • Analyze code first: Read the Actor's source to understand what data it actually pushes to the dataset — never guess
  • Every field is nullable: APIs and websites are unpredictable — always set "nullable": true
  • Anonymize examples: Never use real user IDs, usernames, or personal data in examples
  • Verify against code: If TypeScript types exist, cross-check the schema against both the type definition AND the code that produces the values
  • Reuse existing patterns: Before generating schemas, check if other Actors in the same repository already have output schemas — match their structure, naming conventions, description style, and formatting
  • Don't reinvent the wheel: Reuse existing type definitions, interfaces, and utilities from the codebase instead of creating duplicate definitions

Phase 1: Discover Actor structure

Goal: Locate the Actor and understand its output

Initial request: $ARGUMENTS

Actions:

  1. Create todo list with all phases
  2. Find the .actor/ directory containing actor.json
  3. Read actor.json to understand the Actor's configuration
  4. Check if dataset_schema.json, output_schema.json, and key_value_store_schema.json already exist
  5. Search for existing schemas in the repository: Look for other .actor/ directories or schema files (e.g., **/dataset_schema.json, **/output_schema.json, **/key_value_store_schema.json) to learn the repo's conventions — match their description style, field naming, example formatting, and overall structure
  6. Find all places where data is pushed to the dataset:
    • JavaScript/TypeScript: Search for Actor.pushData(, dataset.pushData(, Dataset.pushData(
    • Python: Search for Actor.push_data(, dataset.push_data(, Dataset.push_data(
  7. Find all places where data is stored in the key-value store:
    • JavaScript/TypeScript: Search for Actor.setValue(, keyValueStore.setValue(, KeyValueStore.setValue(
    • Python: Search for Actor.set_value(, key_value_store.set_value(, KeyValueStore.set_value(
  8. Find output type definitions — reuse them directly instead of recreating from scratch:
    • TypeScript: Look for output type interfaces/types (e.g., in src/types/, src/types/output.ts). If an interface or type already defines the output shape, derive the schema fields from it — do not create a parallel definition
    • Python: Look for TypedDict, dataclass, or Pydantic model definitions. Use the existing field names, types, and docstrings as the source of truth
  9. Check for existing shared schema utilities or helper functions in the codebase that handle schema generation or validation — reuse them rather than creating new logic
  10. If inline storages.dataset or storages.keyValueStore config exists in actor.json, note it for migration

Present findings to user: list all discovered dataset output fields, key-value store keys, their types, and where they come from.


Phase 2: Generate dataset_schema.json

Goal: Create a complete dataset schema with field definitions and display views

File structure

{
    "actorSpecification": 1,
    "fields": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
            // ALL output fields here — every field the Actor can produce,
            // not just the ones shown in the overview view
        },
        "required": [],
        "additionalProperties": true
    },
    "views": {
        "overview": {
            "title": "Overview",
            "description": "Most important fields at a glance",
            "transformation": {
                "fields": [
                    // 8-12 most important field names
                ]
            },
            "display": {
                "component": "table",
                "properties": {
                    // Display config for each overview field
                }
            }
        }
    }
}

Consistency with existing schemas

If existing output schemas were found in the repository during Phase 1 (step 5), follow their conventions:

  • Match the description writing style (sentence case vs. lowercase, period vs. no period, etc.)
  • Match the field naming convention (camelCase vs. snake_case) — this must also match the actual keys produced by the Actor code
  • Match the example value style (e.g., date formats, URL patterns, placeholder names)
  • Match the view structure (number of fields in overview, display format choices)
  • Match the JSON formatting (indentation, property ordering, spacing) — all schemas in the same repository must use identical formatting, including standalone Actors

When the Actor code already has well-defined TypeScript interfaces or Python type classes, derive fields directly from those types rather than re-analyzing pushData/push_data calls from scratch. The type definition is the canonical source.

Hard rules (no exceptions)

RuleDetail
All fields in propertiesThe fields.properties object must contain every field the Actor can output, not just the fields shown in the overview view. The views section selects a subset for display — the properties section must be the complete superset
"nullable": trueOn every field — APIs are unpredictable
"additionalProperties": trueOn the top-level fields object AND on every nested object within properties. This is the most commonly missed rule — it must appear at both levels
"required": []Always empty array — on the top-level fields object AND on every nested object within properties
Anonymized examplesNo real user IDs, usernames, or content
"type" required with "nullable"AJV rejects nullable without a type on the same field

Warning — most common mistakes:

  1. Only including fields that appear in the overview view. The fields.properties must list ALL output fields, even if they are not in the views section.
  2. Only adding "required": [] and "additionalProperties": true on nested object-type properties but forgetting them on the top-level fields object. Both levels need them.

Note: nullable is an Apify-specific extension to JSON Schema draft-07. It is intentional and correct.

Field type patterns

String field:

"title": {
    "type": "string",
    "description": "Title of the scraped item",
    "nullable": true,
    "example": "Example Item Title"
}

Number field:

"viewCount": {
    "type": "number",
    "description": "Number of views",
    "nullable": true,
    "example": 15000
}

Boolean field:

"isVerified": {
    "type": "boolean",
    "description": "Whether the account is verified",
    "nullable": true,
    "example": true
}

Array field:

"hashtags": {
    "type": "array",
    "description": "Hashtags associated with the item",
    "items": { "type": "string" },
    "nullable": true,
    "example": ["#example", "#demo"]
}

Nested object field:

"authorInfo": {
    "type": "object",
    "description": "Information about the author",
    "properties": {
        "name": { "type": "string", "nullable": true },
        "url": { "type": "string", "nullable": true }
    },
    "required": [],
    "additionalProperties": true,
    "nullable": true,
    "example": { "name": "Example Author", "url": "https://example.com/author" }
}

Enum field:

"contentType": {
    "type": "string",
    "description": "Type of content",
    "enum": ["article", "video", "image"],
    "nullable": true,
    "example": "article"
}

Union type (e.g., TypeScript ObjectType | string):

"metadata": {
    "type": ["object", "string"],
    "description": "Structured metadata object, or error string if unavailable",
    "nullable": true,
    "example": { "key": "value" }
}

Anonymized example values

Use realistic but generic values. Follow platform ID format conventions:

Field typeExample approach
IDsMatch platform format and length (e.g., 11 chars for YouTube video IDs)
Usernames"exampleuser", "sampleuser123"
Display names"Example Channel", "Sample Author"
URLsUse platform's standard URL format with fake IDs
Dates"2025-01-15T12:00:00.000Z" (ISO 8601)
Text contentGeneric descriptive text, e.g., "This is an example description."

Views section

  • transformation.fields: List 8–12 most important field names (order = column order in UI)
  • display.properties: One entry per overview field with label and format
  • Available formats: "text", "number", "date", "link", "boolean", "image", "array", "object"

Pick fields that give users the most useful at-a-glance summary of the data.


Phase 3: Generate key_value_store_schema.json (if applicable)

Goal: Define key-value store collections if the Actor stores data in the key-value store

Skip this phase if no Actor.setValue() / Actor.set_value() calls were found in Phase 1 (beyond the default INPUT key).

File structure

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "<Descriptive title — what the key-value store contains>",
    "description": "<One sentence describing the stored data>",
    "collections": {
        "<collectionName>": {
            "title": "<Human-readable title>",
            "description": "<What this collection contains>",
            "keyPrefix": "<prefix->"
        }
    }
}

How to identify collections

Group the discovered setValue / set_value calls by key pattern:

  1. Fixed keys (e.g., "RESULTS", "summary") — use "key" (exact match)
  2. Dynamic keys with a prefix (e.g., "screenshot-${id}", f"image-{name}") — use "keyPrefix"

Each group becomes a collection.

Collection properties

PropertyRequiredDescription
titleYesShown in UI tabs
descriptionNoShown in UI tooltips
keyConditionalExact key for single-key collections (use key OR keyPrefix, not both)
keyPrefixConditionalPrefix for multi-key collections (use key OR keyPrefix, not both)
contentTypesNoRestrict allowed MIME types (e.g., ["image/jpeg"], ["application/json"])
jsonSchemaNoJSON Schema draft-07 for validating application/json content

Examples

Single file output (e.g., a report):

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "Analysis Results",
    "description": "Key-value store containing analysis output",
    "collections": {
        "report": {
            "title": "Report",
            "description": "Final analysis report",
            "key": "REPORT",
            "contentTypes": ["application/json"]
        }
    }
}

Multiple files with prefix (e.g., screenshots):

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "Scraped Files",
    "description": "Key-value store containing downloaded files and screenshots",
    "collections": {
        "screenshots": {
            "title": "Screenshots",
            "description": "Page screenshots captured during scraping",
            "keyPrefix": "screenshot-",
            "contentTypes": ["image/png", "image/jpeg"]
        },
        "documents": {
            "title": "Documents",
            "description": "Downloaded document files",
            "keyPrefix": "doc-",
            "contentTypes": ["application/pdf", "text/html"]
        }
    }
}

Phase 4: Generate output_schema.json

Goal: Create the output schema that tells Apify Console where to find results

For most Actors that push data to a dataset, this is a minimal file:

{
    "actorOutputSchemaVersion": 1,
    "title": "<Descriptive title — what the Actor returns>",
    "description": "<One sentence describing the output data>",
    "properties": {
        "dataset": {
            "type": "string",
            "title": "Results",
            "description": "Dataset containing all scraped data",
            "template": "{{links.apiDefaultDatasetUrl}}/items"
        }
    }
}

Critical: Each property entry must include "type": "string" — this is an Apify-specific convention. The Apify meta-validator rejects properties without it (and rejects "type": "object" — only "string" is valid here).

If key_value_store_schema.json was generated in Phase 3, add a second property:

"files": {
    "type": "string",
    "title": "Files",
    "description": "Key-value store containing downloaded files",
    "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
}

Available template variables

  • {{links.apiDefaultDatasetUrl}} — API URL of default dataset
  • {{links.apiDefaultKeyValueStoreUrl}} — API URL of default key-value store
  • {{links.publicRunUrl}} — Public run URL
  • {{links.consoleRunUrl}} — Console run URL
  • {{links.apiRunUrl}} — API run URL
  • {{links.containerRunUrl}} — URL of webserver running inside the run
  • {{run.defaultDatasetId}} — ID of the default dataset
  • {{run.defaultKeyValueStoreId}} — ID of the default key-value store

Phase 5: Update actor.json

Goal: Wire the schema files into the Actor configuration

Actions:

  1. Read the current actor.json
  2. Add or update the storages.dataset reference:
    "storages": {
        "dataset": "./dataset_schema.json"
    }
    
  3. If key_value_store_schema.json was generated, add the reference:
    "storages": {
        "dataset": "./dataset_schema.json",
        "keyValueStore": "./key_value_store_schema.json"
    }
    
  4. Add or update the output reference:
    "output": "./output_schema.json"
    
  5. If actor.json had inline storages.dataset or storages.keyValueStore objects (not string paths), migrate their content into the respective schema files and replace the inline objects with file path strings

Phase 6: Review and validate

Goal: Ensure correctness and completeness

Checklist:

  • Every output field from the source code is in dataset_schema.json fields.properties — not just the overview view fields but ALL fields the Actor can produce
  • Every field has "nullable": true
  • The top-level fields object has both "additionalProperties": true and "required": []
  • Every nested object within properties also has "additionalProperties": true and "required": []
  • Every field has a "description" and an "example"
  • All example values are anonymized
  • "type" is present on every field that has "nullable"
  • Views list 8–12 most useful fields with correct display formats
  • output_schema.json has "type": "string" on every property
  • If key-value store is used: key_value_store_schema.json has collections matching all setValue/set_value calls
  • If key-value store is used: each collection uses either key or keyPrefix (not both)
  • actor.json references all generated schema files
  • Schema field names match the actual keys in the code (camelCase/snake_case consistency)
  • If existing schemas were found in the repo, the new schema follows their conventions (description style, example format, view structure)
  • Schema fields are derived from existing type definitions (interfaces, TypedDicts, dataclasses) where available — no duplicated or divergent field definitions

Present the generated schemas to the user for review before writing them.


Phase 7: Summary

Goal: Document what was created

Report:

  • Files created or updated
  • Number of fields in the dataset schema
  • Number of collections in the key-value store schema (if generated)
  • Fields selected for the overview view
  • Any fields that need user clarification (ambiguous types, unclear nullability)
  • Suggested next steps (test locally with apify run, verify output tab in Console)

apify의 다른 스킬

bug-triage
apify
apify/apify-mcp-server 저장소의 열린 버그 이슈를 분류합니다. 분석하고, 응답을 초안 작성하며, 승인을 받고, 게시합니다.
official
dig
apify
Apify MCP 서버에서 작업을 탐색, 계획 및 사양을 작성하기 위한 유연한 스킬입니다. 소스 파일을 편집하지 마십시오 — 이 스킬은 이해와 계획 전용입니다.
official
apify-actor-development
apify
서버리스 클라우드 프로그램을 생성, 디버깅 및 배포하여 웹 스크래핑, 자동화 및 데이터 처리를 수행합니다. JavaScript, TypeScript 및 Python 템플릿을 지원하며, HTTP 및 브라우저 기반 크롤링을 위한 통합 Crawlee, Playwright 및 Cheerio 라이브러리를 포함합니다. 격리된 스토리지와 함께 apify run을 통한 로컬 테스트, 입력/출력에 대한 스키마 검증, apify push를 통한 Apify 플랫폼 배포를 포함합니다. Apify CLI 인증 및 AI를 위한 .actor/actor.json의 필수 generatedBy 메타데이터가 필요합니다...
official
apify-actorization
apify
기존 프로젝트를 언어별 SDK 통합을 통해 서버리스 Apify Actor로 변환합니다. JavaScript/TypeScript(Actor.init() / Actor.exit() 사용), Python(비동기 컨텍스트 매니저), CLI 래퍼를 통한 모든 언어를 지원합니다. 구조화된 워크플로우를 제공합니다: apify init으로 스캐폴딩, SDK 래핑 적용, 입출력 스키마 구성, apify run으로 로컬 테스트, apify push로 배포. 입출력 스키마 검증, Docker 컨테이너화, 선택적 이벤트당 과금을 포함합니다.
official
apify-audience-analysis
apify
페이스북, 인스타그램, 유튜브, 틱톡에서 잠재 고객 인구통계, 참여 패턴, 행동 데이터를 추출합니다. 4개 플랫폼 전반에 걸쳐 팔로워 인구통계, 참여 지표, 댓글, 프로필 분석을 다루는 18개 이상의 전문 액터를 지원합니다. 빠른 채팅 표시, CSV 내보내기, 다운스트림 분석용 JSON 내보내기 등 세 가지 출력 형식을 제공합니다. Apify 토큰과 mcpc CLI 도구가 필요하며, 동적 스키마 가져오기를 사용하여 각 액터의 요구사항에 맞게 입력을 조정합니다. 구조화된...
official
apify-brand-reputation-monitoring
apify
Google Maps, Booking.com, TripAdvisor, Facebook, Instagram, YouTube, TikTok 전반에서 브랜드 평판을 모니터링합니다. 리뷰, 평점, 댓글, 멘션을 포함한 모든 주요 플랫폼을 아우르는 16개 이상의 전용 Apify Actor를 지원합니다. 유연한 출력 형식: 채팅에서 결과 표시, CSV로 내보내기, 또는 다운스트림 분석을 위해 JSON으로 저장 가능합니다. Apify 토큰과 Node.js 20.6+가 필요하며, mcpc CLI를 사용하여 Actor 스키마와 입력 파라미터를 동적으로 가져옵니다. 워크플로는 플랫폼 선택 과정을 안내합니다.
official
apify-competitor-intelligence
apify
Apify Actors를 통한 Google Maps, Booking.com, Facebook, Instagram, YouTube, TikTok의 멀티 플랫폼 경쟁사 분석. 7개 플랫폼에 걸쳐 25개 이상의 특화된 Actors를 제공하며, 각각 비즈니스 데이터 추출, 리뷰 비교, 광고 전략 모니터링, 콘텐츠 성과, 오디언스 인사이트 등 특정 분석 유형에 최적화되어 있습니다. Apify 토큰, Node.js 20.6+, 그리고 Actor 스키마를 가져와 동적으로 분석을 실행하는 mcpc CLI 도구가 필요합니다. 빠른 채팅 표시 등 세 가지 출력 형식을 지원합니다.
official
apify-content-analytics
apify
Apify Actors를 통한 Instagram, Facebook, YouTube, TikTok의 멀티 플랫폼 콘텐츠 분석. 네 플랫폼의 게시물, 릴스, 스토리, 댓글, 해시태그, 팔로워, 광고를 포함한 17개 이상의 특화 Actors를 지원합니다. mcpc CLI를 사용하여 Actor 스키마를 동적으로 가져와 필요한 입력과 사용 가능한 출력 필드를 결정합니다. 빠른 채팅 표시, CSV 내보내기, JSON 내보내기(결과 수 사용자 지정 가능)의 세 가지 형식으로 결과를 출력합니다. .env 파일에 Apify 토큰이 필요하며 Node.js 20.6+가 필요합니다...
official