apify-generate-output-schema

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Generate Actor Output Schema

生成Actor输出模式

You are generating output schema files for an Apify Actor. The output schema tells Apify Console how to display run results. You will analyze the Actor's source code, create
dataset_schema.json
,
output_schema.json
, and
key_value_store_schema.json
(if the Actor uses key-value store), and update
actor.json
.
你正在为Apify Actor生成输出模式文件。输出模式用于告知Apify控制台如何显示运行结果。你需要分析Actor的源代码,创建
dataset_schema.json
output_schema.json
key_value_store_schema.json
(如果Actor使用了键值存储),并更新
actor.json

Core Principles

核心原则

  • Analyze code first: Read the Actor's source to understand what data it actually pushes to the dataset — never guess
  • Every field is nullable: APIs and websites are unpredictable — always set
    "nullable": true
  • Anonymize examples: Never use real user IDs, usernames, or personal data in examples
  • Verify against code: If TypeScript types exist, cross-check the schema against both the type definition AND the code that produces the values
  • Reuse existing patterns: Before generating schemas, check if other Actors in the same repository already have output schemas — match their structure, naming conventions, description style, and formatting
  • Don't reinvent the wheel: Reuse existing type definitions, interfaces, and utilities from the codebase instead of creating duplicate definitions

  • 先分析代码:阅读Actor的源代码,明确它实际推送到dataset中的数据——绝不凭空猜测
  • 所有字段均可为空:API和网站的不可预测性——始终设置
    "nullable": true
  • 示例数据匿名化:不要在示例中使用真实的用户ID、用户名或个人数据
  • 与代码交叉验证:如果存在TypeScript类型定义,要同时对照类型定义和生成值的代码来验证模式
  • 复用现有模式:在生成模式之前,检查同一仓库中的其他Actor是否已有输出模式——匹配它们的结构、命名规范、描述风格和格式
  • 不重复造轮子:复用代码库中已有的类型定义、接口和工具,而非创建重复定义

Phase 1: Discover Actor Structure

阶段1:探索Actor结构

Goal: Locate the Actor and understand its output
Initial request: $ARGUMENTS
Actions:
  1. Create todo list with all phases
  2. Find the
    .actor/
    directory containing
    actor.json
  3. Read
    actor.json
    to understand the Actor's configuration
  4. Check if
    dataset_schema.json
    ,
    output_schema.json
    , and
    key_value_store_schema.json
    already exist
  5. Search for existing schemas in the repository: Look for other
    .actor/
    directories or schema files (e.g.,
    **/dataset_schema.json
    ,
    **/output_schema.json
    ,
    **/key_value_store_schema.json
    ) to learn the repo's conventions — match their description style, field naming, example formatting, and overall structure
  6. Find all places where data is pushed to the dataset:
    • JavaScript/TypeScript: Search for
      Actor.pushData(
      ,
      dataset.pushData(
      ,
      Dataset.pushData(
    • Python: Search for
      Actor.push_data(
      ,
      dataset.push_data(
      ,
      Dataset.push_data(
  7. Find all places where data is stored in the key-value store:
    • JavaScript/TypeScript: Search for
      Actor.setValue(
      ,
      keyValueStore.setValue(
      ,
      KeyValueStore.setValue(
    • Python: Search for
      Actor.set_value(
      ,
      key_value_store.set_value(
      ,
      KeyValueStore.set_value(
  8. Find output type definitions — reuse them directly instead of recreating from scratch:
    • TypeScript: Look for output type interfaces/types (e.g., in
      src/types/
      ,
      src/types/output.ts
      ). If an interface or type already defines the output shape, derive the schema fields from it — do not create a parallel definition
    • Python: Look for TypedDict, dataclass, or Pydantic model definitions. Use the existing field names, types, and docstrings as the source of truth
  9. Check for existing shared schema utilities or helper functions in the codebase that handle schema generation or validation — reuse them rather than creating new logic
  10. If inline
    storages.dataset
    or
    storages.keyValueStore
    config exists in
    actor.json
    , note it for migration
Present findings to user: list all discovered dataset output fields, key-value store keys, their types, and where they come from.

目标:定位Actor并理解其输出内容
初始请求:$ARGUMENTS
操作步骤:
  1. 创建包含所有阶段的待办事项列表
  2. 找到包含
    actor.json
    .actor/
    目录
  3. 读取
    actor.json
    以了解Actor的配置信息
  4. 检查
    dataset_schema.json
    output_schema.json
    key_value_store_schema.json
    是否已存在
  5. 在仓库中搜索现有模式:查找其他
    .actor/
    目录或模式文件(例如
    **/dataset_schema.json
    **/output_schema.json
    **/key_value_store_schema.json
    ),学习仓库的规范——匹配它们的描述风格、字段命名、示例格式和整体结构
  6. 找到所有将数据推送到dataset的位置:
    • JavaScript/TypeScript:搜索
      Actor.pushData(
      dataset.pushData(
      Dataset.pushData(
    • Python:搜索
      Actor.push_data(
      dataset.push_data(
      Dataset.push_data(
  7. 找到所有将数据存储到键值存储的位置:
    • JavaScript/TypeScript:搜索
      Actor.setValue(
      keyValueStore.setValue(
      KeyValueStore.setValue(
    • Python:搜索
      Actor.set_value(
      key_value_store.set_value(
      KeyValueStore.set_value(
  8. 找到输出类型定义——直接复用而非从头创建
    • TypeScript:查找输出类型接口/类型(例如在
      src/types/
      src/types/output.ts
      中)。如果已有接口或类型定义了输出结构,直接从中派生模式字段——不要创建并行定义
    • Python:查找TypedDict、dataclass或Pydantic模型定义。使用现有的字段名称、类型和文档字符串作为事实来源
  9. 检查代码库中是否有处理模式生成或验证的共享工具函数或辅助函数——复用它们而非创建新逻辑
  10. 如果
    actor.json
    中存在内联的
    storages.dataset
    storages.keyValueStore
    配置,记录下来以便迁移
向用户展示发现结果:列出所有已发现的dataset输出字段、键值存储键、它们的类型以及来源位置。

Phase 2: Generate
dataset_schema.json

阶段2:生成
dataset_schema.json

Goal: Create a complete dataset schema with field definitions and display views
目标:创建包含字段定义和显示视图的完整dataset模式

File structure

文件结构

json
{
    "actorSpecification": 1,
    "fields": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
            // ALL output fields here — every field the Actor can produce,
            // not just the ones shown in the overview view
        },
        "required": [],
        "additionalProperties": true
    },
    "views": {
        "overview": {
            "title": "Overview",
            "description": "Most important fields at a glance",
            "transformation": {
                "fields": [
                    // 8-12 most important field names
                ]
            },
            "display": {
                "component": "table",
                "properties": {
                    // Display config for each overview field
                }
            }
        }
    }
}
json
{
    "actorSpecification": 1,
    "fields": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
            // 所有输出字段都在这里——Actor可能生成的每个字段,
            // 不仅是概览视图中显示的字段
        },
        "required": [],
        "additionalProperties": true
    },
    "views": {
        "overview": {
            "title": "Overview",
            "description": "Most important fields at a glance",
            "transformation": {
                "fields": [
                    // 8-12个最重要的字段名称
                ]
            },
            "display": {
                "component": "table",
                "properties": {
                    // 每个概览字段的显示配置
                }
            }
        }
    }
}

Consistency with existing schemas

与现有模式保持一致

If existing output schemas were found in the repository during Phase 1 (step 5), follow their conventions:
  • Match the description writing style (sentence case vs. lowercase, period vs. no period, etc.)
  • Match the field naming convention (camelCase vs. snake_case) — this must also match the actual keys produced by the Actor code
  • Match the example value style (e.g., date formats, URL patterns, placeholder names)
  • Match the view structure (number of fields in overview, display format choices)
  • Match the JSON formatting (indentation, property ordering, spacing) — all schemas in the same repository must use identical formatting, including standalone Actors
When the Actor code already has well-defined TypeScript interfaces or Python type classes, derive fields directly from those types rather than re-analyzing pushData/push_data calls from scratch. The type definition is the canonical source.
如果在阶段1的步骤5中发现仓库中已有输出模式,遵循它们的规范:
  • 匹配描述写作风格(句首大写 vs 全小写、是否加句号等)
  • 匹配字段命名规范(驼峰式 vs 蛇形命名)——这必须与Actor代码生成的实际键名匹配
  • 匹配示例值风格(例如日期格式、URL模式、占位符名称)
  • 匹配视图结构(概览中的字段数量、显示格式选择)
  • 匹配JSON格式(缩进、属性顺序、空格)——同一仓库中的所有模式必须使用完全相同的格式,包括独立的Actor
当Actor代码已有完善的TypeScript接口或Python类型类时,直接从这些类型派生字段,而非从头分析pushData/push_data调用。类型定义是权威来源。

Hard rules (no exceptions)

硬性规则(无例外)

RuleDetail
All fields in
properties
The
fields.properties
object must contain every field the Actor can output, not just the fields shown in the overview view. The views section selects a subset for display — the
properties
section must be the complete superset
"nullable": true
On every field — APIs are unpredictable
"additionalProperties": true
On the top-level
fields
object
AND on every nested object within
properties
. This is the most commonly missed rule — it must appear at both levels
"required": []
Always empty array — on the top-level
fields
object
AND on every nested object within
properties
Anonymized examplesNo real user IDs, usernames, or content
"type"
required with
"nullable"
AJV rejects
nullable
without a
type
on the same field
Warning — most common mistakes:
  1. Only including fields that appear in the overview view. The
    fields.properties
    must list ALL output fields, even if they are not in the
    views
    section.
  2. Only adding
    "required": []
    and
    "additionalProperties": true
    on nested object-type properties but forgetting them on the top-level
    fields
    object. Both levels need them.
Note:
nullable
is an Apify-specific extension to JSON Schema draft-07. It is intentional and correct.
规则细节
properties
中包含所有字段
fields.properties
对象必须包含Actor可能输出的每一个字段,不仅是概览视图中显示的字段。视图部分选择子集用于显示——
properties
部分必须是完整的超集
"nullable": true
应用于每一个字段——API不可预测
"additionalProperties": true
应用于顶层
fields
对象
以及
properties
内的每一个嵌套对象。这是最常被忽略的规则——两个层级都必须添加
"required": []
始终为空数组——应用于顶层
fields
对象
以及
properties
内的每一个嵌套对象
示例数据匿名化不使用真实用户ID、用户名或内容
"nullable"
时必须有
"type"
AJV会拒绝没有同时设置
type
nullable
字段
警告——最常见的错误:
  1. 仅包含出现在概览视图中的字段。
    fields.properties
    必须列出所有输出字段,即使它们不在
    views
    部分。
  2. 仅在嵌套对象类型的属性上添加
    "required": []
    "additionalProperties": true
    ,但忘记在顶层
    fields
    对象上添加。两个层级都需要这些配置。
注意
nullable
是Apify对JSON Schema draft-07的特定扩展。这是有意且正确的。

Field type patterns

字段类型模式

String field:
json
"title": {
    "type": "string",
    "description": "Title of the scraped item",
    "nullable": true,
    "example": "Example Item Title"
}
Number field:
json
"viewCount": {
    "type": "number",
    "description": "Number of views",
    "nullable": true,
    "example": 15000
}
Boolean field:
json
"isVerified": {
    "type": "boolean",
    "description": "Whether the account is verified",
    "nullable": true,
    "example": true
}
Array field:
json
"hashtags": {
    "type": "array",
    "description": "Hashtags associated with the item",
    "items": { "type": "string" },
    "nullable": true,
    "example": ["#example", "#demo"]
}
Nested object field:
json
"authorInfo": {
    "type": "object",
    "description": "Information about the author",
    "properties": {
        "name": { "type": "string", "nullable": true },
        "url": { "type": "string", "nullable": true }
    },
    "required": [],
    "additionalProperties": true,
    "nullable": true,
    "example": { "name": "Example Author", "url": "https://example.com/author" }
}
Enum field:
json
"contentType": {
    "type": "string",
    "description": "Type of content",
    "enum": ["article", "video", "image"],
    "nullable": true,
    "example": "article"
}
Union type (e.g., TypeScript
ObjectType | string
):
json
"metadata": {
    "type": ["object", "string"],
    "description": "Structured metadata object, or error string if unavailable",
    "nullable": true,
    "example": { "key": "value" }
}
字符串字段:
json
"title": {
    "type": "string",
    "description": "Title of the scraped item",
    "nullable": true,
    "example": "Example Item Title"
}
数字字段:
json
"viewCount": {
    "type": "number",
    "description": "Number of views",
    "nullable": true,
    "example": 15000
}
布尔字段:
json
"isVerified": {
    "type": "boolean",
    "description": "Whether the account is verified",
    "nullable": true,
    "example": true
}
数组字段:
json
"hashtags": {
    "type": "array",
    "description": "Hashtags associated with the item",
    "items": { "type": "string" },
    "nullable": true,
    "example": ["#example", "#demo"]
}
嵌套对象字段:
json
"authorInfo": {
    "type": "object",
    "description": "Information about the author",
    "properties": {
        "name": { "type": "string", "nullable": true },
        "url": { "type": "string", "nullable": true }
    },
    "required": [],
    "additionalProperties": true,
    "nullable": true,
    "example": { "name": "Example Author", "url": "https://example.com/author" }
}
枚举字段:
json
"contentType": {
    "type": "string",
    "description": "Type of content",
    "enum": ["article", "video", "image"],
    "nullable": true,
    "example": "article"
}
联合类型(例如TypeScript
ObjectType | string
):
json
"metadata": {
    "type": ["object", "string"],
    "description": "Structured metadata object, or error string if unavailable",
    "nullable": true,
    "example": { "key": "value" }
}

Anonymized example values

匿名化示例值

Use realistic but generic values. Follow platform ID format conventions:
Field typeExample approach
IDsMatch platform format and length (e.g., 11 chars for YouTube video IDs)
Usernames
"exampleuser"
,
"sampleuser123"
Display names
"Example Channel"
,
"Sample Author"
URLsUse platform's standard URL format with fake IDs
Dates
"2025-01-15T12:00:00.000Z"
(ISO 8601)
Text contentGeneric descriptive text, e.g.,
"This is an example description."
使用真实感强但通用的值。遵循平台ID格式规范:
字段类型示例方法
ID匹配平台格式和长度(例如YouTube视频ID为11个字符)
用户名
"exampleuser"
"sampleuser123"
显示名称
"Example Channel"
"Sample Author"
URL使用平台标准URL格式搭配虚构ID
日期
"2025-01-15T12:00:00.000Z"
(ISO 8601格式)
文本内容通用描述性文本,例如
"This is an example description."

Views section

视图部分

  • transformation.fields
    : List 8–12 most important field names (order = column order in UI)
  • display.properties
    : One entry per overview field with
    label
    and
    format
  • Available formats:
    "text"
    ,
    "number"
    ,
    "date"
    ,
    "link"
    ,
    "boolean"
    ,
    "image"
    ,
    "array"
    ,
    "object"
Pick fields that give users the most useful at-a-glance summary of the data.

  • transformation.fields
    : 列出8-12个最重要的字段名称(顺序对应UI中的列顺序)
  • display.properties
    : 每个概览字段对应一个条目,包含
    label
    format
  • 可用格式:
    "text"
    "number"
    "date"
    "link"
    "boolean"
    "image"
    "array"
    "object"
选择能让用户快速了解数据核心信息的字段。

Phase 3: Generate
key_value_store_schema.json
(if applicable)

阶段3:生成
key_value_store_schema.json
(如适用)

Goal: Define key-value store collections if the Actor stores data in the key-value store
Skip this phase if no
Actor.setValue()
/
Actor.set_value()
calls were found in Phase 1 (beyond the default
INPUT
key).
目标:如果Actor使用键值存储,定义键值存储集合
跳过此阶段:如果在阶段1中未发现
Actor.setValue()
/
Actor.set_value()
调用(默认的
INPUT
键除外)。

File structure

文件结构

json
{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "<Descriptive title — what the key-value store contains>",
    "description": "<One sentence describing the stored data>",
    "collections": {
        "<collectionName>": {
            "title": "<Human-readable title>",
            "description": "<What this collection contains>",
            "keyPrefix": "<prefix->"
        }
    }
}
json
{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "<Descriptive title — what the key-value store contains>",
    "description": "<One sentence describing the stored data>",
    "collections": {
        "<collectionName>": {
            "title": "<Human-readable title>",
            "description": "<What this collection contains>",
            "keyPrefix": "<prefix->"
        }
    }
}

How to identify collections

如何识别集合

Group the discovered
setValue
/
set_value
calls by key pattern:
  1. Fixed keys (e.g.,
    "RESULTS"
    ,
    "summary"
    ) — use
    "key"
    (exact match)
  2. Dynamic keys with a prefix (e.g.,
    "screenshot-${id}"
    ,
    f"image-{name}"
    ) — use
    "keyPrefix"
Each group becomes a collection.
按键模式对已发现的
setValue
/
set_value
调用进行分组:
  1. 固定键(例如
    "RESULTS"
    "summary"
    )——使用
    "key"
    (精确匹配)
  2. 带前缀的动态键(例如
    "screenshot-${id}"
    f"image-{name}"
    )——使用
    "keyPrefix"
每个分组对应一个集合。

Collection properties

集合属性

PropertyRequiredDescription
title
YesShown in UI tabs
description
NoShown in UI tooltips
key
ConditionalExact key for single-key collections (use
key
OR
keyPrefix
, not both)
keyPrefix
ConditionalPrefix for multi-key collections (use
key
OR
keyPrefix
, not both)
contentTypes
NoRestrict allowed MIME types (e.g.,
["image/jpeg"]
,
["application/json"]
)
jsonSchema
NoJSON Schema draft-07 for validating
application/json
content
属性是否必填描述
title
在UI标签中显示
description
在UI提示框中显示
key
可选单键集合的精确键名(
key
keyPrefix
二选一,不能同时使用)
keyPrefix
可选多键集合的前缀(
key
keyPrefix
二选一,不能同时使用)
contentTypes
限制允许的MIME类型(例如
["image/jpeg"]
["application/json"]
jsonSchema
用于验证
application/json
内容的JSON Schema draft-07

Examples

示例

Single file output (e.g., a report):
json
{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "Analysis Results",
    "description": "Key-value store containing analysis output",
    "collections": {
        "report": {
            "title": "Report",
            "description": "Final analysis report",
            "key": "REPORT",
            "contentTypes": ["application/json"]
        }
    }
}
Multiple files with prefix (e.g., screenshots):
json
{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "Scraped Files",
    "description": "Key-value store containing downloaded files and screenshots",
    "collections": {
        "screenshots": {
            "title": "Screenshots",
            "description": "Page screenshots captured during scraping",
            "keyPrefix": "screenshot-",
            "contentTypes": ["image/png", "image/jpeg"]
        },
        "documents": {
            "title": "Documents",
            "description": "Downloaded document files",
            "keyPrefix": "doc-",
            "contentTypes": ["application/pdf", "text/html"]
        }
    }
}

单个文件输出(例如报告):
json
{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "Analysis Results",
    "description": "Key-value store containing analysis output",
    "collections": {
        "report": {
            "title": "Report",
            "description": "Final analysis report",
            "key": "REPORT",
            "contentTypes": ["application/json"]
        }
    }
}
带前缀的多个文件(例如截图):
json
{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "Scraped Files",
    "description": "Key-value store containing downloaded files and screenshots",
    "collections": {
        "screenshots": {
            "title": "Screenshots",
            "description": "Page screenshots captured during scraping",
            "keyPrefix": "screenshot-",
            "contentTypes": ["image/png", "image/jpeg"]
        },
        "documents": {
            "title": "Documents",
            "description": "Downloaded document files",
            "keyPrefix": "doc-",
            "contentTypes": ["application/pdf", "text/html"]
        }
    }
}

Phase 4: Generate
output_schema.json

阶段4:生成
output_schema.json

Goal: Create the output schema that tells Apify Console where to find results
For most Actors that push data to a dataset, this is a minimal file:
json
{
    "actorOutputSchemaVersion": 1,
    "title": "<Descriptive title — what the Actor returns>",
    "description": "<One sentence describing the output data>",
    "properties": {
        "dataset": {
            "type": "string",
            "title": "Results",
            "description": "Dataset containing all scraped data",
            "template": "{{links.apiDefaultDatasetUrl}}/items"
        }
    }
}
Critical: Each property entry must include
"type": "string"
— this is an Apify-specific convention. The Apify meta-validator rejects properties without it (and rejects
"type": "object"
— only
"string"
is valid here).
If
key_value_store_schema.json
was generated in Phase 3, add a second property:
json
"files": {
    "type": "string",
    "title": "Files",
    "description": "Key-value store containing downloaded files",
    "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
}
目标:创建输出模式,告知Apify控制台结果的位置
对于大多数将数据推送到dataset的Actor,这是一个极简文件:
json
{
    "actorOutputSchemaVersion": 1,
    "title": "<Descriptive title — what the Actor returns>",
    "description": "<One sentence describing the output data>",
    "properties": {
        "dataset": {
            "type": "string",
            "title": "Results",
            "description": "Dataset containing all scraped data",
            "template": "{{links.apiDefaultDatasetUrl}}/items"
        }
    }
}
关键注意事项:每个属性条目必须包含
"type": "string"
——这是Apify的特定约定。Apify元验证器会拒绝没有该属性的条目(并且拒绝
"type": "object"
——仅
"string"
有效)。
如果在阶段3中生成了
key_value_store_schema.json
,添加第二个属性:
json
"files": {
    "type": "string",
    "title": "Files",
    "description": "Key-value store containing downloaded files",
    "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
}

Available template variables

可用模板变量

  • {{links.apiDefaultDatasetUrl}}
    — API URL of default dataset
  • {{links.apiDefaultKeyValueStoreUrl}}
    — API URL of default key-value store
  • {{links.publicRunUrl}}
    — Public run URL
  • {{links.consoleRunUrl}}
    — Console run URL
  • {{links.apiRunUrl}}
    — API run URL
  • {{links.containerRunUrl}}
    — URL of webserver running inside the run
  • {{run.defaultDatasetId}}
    — ID of the default dataset
  • {{run.defaultKeyValueStoreId}}
    — ID of the default key-value store

  • {{links.apiDefaultDatasetUrl}}
    — 默认dataset的API URL
  • {{links.apiDefaultKeyValueStoreUrl}}
    — 默认键值存储的API URL
  • {{links.publicRunUrl}}
    — 公开运行URL
  • {{links.consoleRunUrl}}
    — 控制台运行URL
  • {{links.apiRunUrl}}
    — API运行URL
  • {{links.containerRunUrl}}
    — 运行在容器内的Web服务器URL
  • {{run.defaultDatasetId}}
    — 默认dataset的ID
  • {{run.defaultKeyValueStoreId}}
    — 默认键值存储的ID

Phase 5: Update
actor.json

阶段5:更新
actor.json

Goal: Wire the schema files into the Actor configuration
Actions:
  1. Read the current
    actor.json
  2. Add or update the
    storages.dataset
    reference:
    json
    "storages": {
        "dataset": "./dataset_schema.json"
    }
  3. If
    key_value_store_schema.json
    was generated, add the reference:
    json
    "storages": {
        "dataset": "./dataset_schema.json",
        "keyValueStore": "./key_value_store_schema.json"
    }
  4. Add or update the
    output
    reference:
    json
    "output": "./output_schema.json"
  5. If
    actor.json
    had inline
    storages.dataset
    or
    storages.keyValueStore
    objects (not string paths), migrate their content into the respective schema files and replace the inline objects with file path strings

目标:将模式文件关联到Actor配置中
操作步骤:
  1. 读取当前的
    actor.json
  2. 添加或更新
    storages.dataset
    引用:
    json
    "storages": {
        "dataset": "./dataset_schema.json"
    }
  3. 如果生成了
    key_value_store_schema.json
    ,添加引用:
    json
    "storages": {
        "dataset": "./dataset_schema.json",
        "keyValueStore": "./key_value_store_schema.json"
    }
  4. 添加或更新
    output
    引用:
    json
    "output": "./output_schema.json"
  5. 如果
    actor.json
    中存在内联的
    storages.dataset
    storages.keyValueStore
    对象(而非字符串路径),将其内容迁移到对应的模式文件中,并用文件路径字符串替换内联对象

Phase 6: Review and Validate

阶段6:审核与验证

Goal: Ensure correctness and completeness
Checklist:
  • Every output field from the source code is in
    dataset_schema.json
    fields.properties
    — not just the overview view fields but ALL fields the Actor can produce
  • Every field has
    "nullable": true
  • The top-level
    fields
    object
    has both
    "additionalProperties": true
    and
    "required": []
  • Every nested object within
    properties
    also has
    "additionalProperties": true
    and
    "required": []
  • Every field has a
    "description"
    and an
    "example"
  • All example values are anonymized
  • "type"
    is present on every field that has
    "nullable"
  • Views list 8–12 most useful fields with correct display formats
  • output_schema.json
    has
    "type": "string"
    on every property
  • If key-value store is used:
    key_value_store_schema.json
    has collections matching all
    setValue
    /
    set_value
    calls
  • If key-value store is used: each collection uses either
    key
    or
    keyPrefix
    (not both)
  • actor.json
    references all generated schema files
  • Schema field names match the actual keys in the code (camelCase/snake_case consistency)
  • If existing schemas were found in the repo, the new schema follows their conventions (description style, example format, view structure)
  • Schema fields are derived from existing type definitions (interfaces, TypedDicts, dataclasses) where available — no duplicated or divergent field definitions
Present the generated schemas to the user for review before writing them.

目标:确保正确性和完整性
检查清单:
  • 源代码中的每一个输出字段都已包含在
    dataset_schema.json
    fields.properties
    中——不仅是概览视图的字段,还包括Actor可能生成的所有字段
  • 每个字段都设置了
    "nullable": true
  • 顶层
    fields
    对象
    同时包含
    "additionalProperties": true
    "required": []
  • properties
    内的每一个嵌套对象也同时包含
    "additionalProperties": true
    "required": []
  • 每个字段都有
    "description"
    "example"
  • 所有示例值均已匿名化
  • 每个带
    "nullable"
    的字段都设置了
    "type"
  • 视图部分列出了8-12个最有用的字段,并设置了正确的显示格式
  • output_schema.json
    中的每个属性都包含
    "type": "string"
  • 如果使用了键值存储:
    key_value_store_schema.json
    中的集合与所有
    setValue
    /
    set_value
    调用匹配
  • 如果使用了键值存储:每个集合仅使用
    key
    keyPrefix
    中的一个(不同时使用)
  • actor.json
    引用了所有生成的模式文件
  • 模式字段名称与代码中的实际键名匹配(驼峰/蛇形命名一致)
  • 如果仓库中已有模式,新模式遵循了它们的规范(描述风格、示例格式、视图结构)
  • 模式字段尽可能从现有类型定义(接口、TypedDict、dataclass)派生——无重复或不一致的字段定义
在写入文件前,向用户展示生成的模式以供审核。

Phase 7: Summary

阶段7:总结

Goal: Document what was created
Report:
  • Files created or updated
  • Number of fields in the dataset schema
  • Number of collections in the key-value store schema (if generated)
  • Fields selected for the overview view
  • Any fields that need user clarification (ambiguous types, unclear nullability)
  • Suggested next steps (test locally with
    apify run
    , verify output tab in Console)
目标:记录已完成的工作
报告内容:
  • 创建或更新的文件
  • dataset模式中的字段数量
  • 键值存储模式中的集合数量(如果已生成)
  • 概览视图中选择的字段
  • 需要用户澄清的字段(类型模糊、可空性不明确)
  • 建议的下一步操作(使用
    apify run
    本地测试,在控制台中验证输出标签页)