apify-sdk-integration

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Apify SDK Integration

Apify SDK 集成

Add Apify Actor execution to an existing application. This skill covers the
apify-client
package for JS/TS and Python, plus the REST API for other languages.
将Apify Actor执行功能添加到现有应用程序中。本技能涵盖适用于JS/TS和Python的
apify-client
包,以及适用于其他语言的REST API。

When to Use This Skill

何时使用本技能

  • Adding web scraping or automation to an existing app
  • Calling Apify Actors programmatically from application code
  • Building a product that uses Apify as a backend service
  • Integrating Actor results into a data pipeline
  • 为现有应用添加网页抓取或自动化功能
  • 从应用代码中以编程方式调用Apify Actors
  • 构建以Apify作为后端服务的产品
  • 将Actor结果集成到数据管道中

Critical: Package Naming

重要提示:包命名

apify-client
is the API client for calling Actors from your app.
apify
is the SDK for building Actors (wrong package for this use case).
Always install
apify-client
. Never install
apify
for integration work.
apify-client
是用于从你的应用中调用 Actors的API客户端。
apify
是用于构建 Actors的SDK(不适用于此使用场景)。
请始终安装
apify-client
。集成工作绝对不要安装
apify

Prerequisites

前提条件

The user needs an
APIFY_TOKEN
. Direct them to Console > Settings > Integrations at https://console.apify.com/settings/integrations to create one. If they don't have an account: https://console.apify.com/sign-up (free, no credit card).
Store the token securely — environment variable or secrets manager, never hardcoded.
用户需要一个
APIFY_TOKEN
。引导他们访问https://console.apify.com/settings/integrations(控制台 > 设置 > 集成)来创建一个。如果没有账户:https://console.apify.com/sign-up(免费,无需信用卡)。
请安全存储令牌——使用环境变量或密钥管理器,绝对不要硬编码。

Finding the Right Actor

选择合适的Actor

Before writing integration code, find the Actor that fits the user's needs. Use the MCP tools if available:
  • search-actors
    — search the Apify Store by keyword
  • fetch-actor-details
    — get the Actor's input schema, output format, and pricing
Alternatively, browse https://apify.com/store. Append
.md
to any Actor's Store URL to get its docs in markdown.
编写集成代码之前,先找到符合用户需求的Actor。如果可用,请使用MCP工具:
  • search-actors
    —— 按关键词搜索Apify商店
  • fetch-actor-details
    —— 获取Actor的输入模式、输出格式和定价信息

JavaScript / TypeScript

JavaScript / TypeScript

Install

安装

bash
npm install apify-client
bash
npm install apify-client

Synchronous Execution (wait for results)

同步执行(等待结果)

typescript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

const run = await client.actor('apify/web-scraper').call({
    startUrls: [{ url: 'https://example.com' }],
    maxPagesPerCrawl: 10,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
.call()
blocks until the Actor finishes. Use for short-running Actors (under a few minutes).
typescript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

const run = await client.actor('apify/web-scraper').call({
    startUrls: [{ url: 'https://example.com' }],
    maxPagesPerCrawl: 10,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
.call()
会阻塞直到Actor执行完成。适用于短时间运行的Actor(几分钟以内)。

Asynchronous Execution (start and poll/retrieve later)

异步执行(启动后轮询/稍后获取结果)

typescript
const run = await client.actor('apify/web-scraper').start({
    startUrls: [{ url: 'https://example.com' }],
});

// Poll for completion
const finishedRun = await client.run(run.id).waitForFinish();

// Retrieve results
const { items } = await client.dataset(finishedRun.defaultDatasetId).listItems();
Use
.start()
+
.waitForFinish()
for long-running Actors or when you need the run ID immediately.
typescript
const run = await client.actor('apify/web-scraper').start({
    startUrls: [{ url: 'https://example.com' }],
});

// 轮询等待完成
const finishedRun = await client.run(run.id).waitForFinish();

// 获取结果
const { items } = await client.dataset(finishedRun.defaultDatasetId).listItems();
对于长时间运行的Actor,或者需要立即获取运行ID的场景,请使用
.start()
+
.waitForFinish()

Retrieving Results

获取结果

typescript
// Dataset items (structured data from pushData)
const { items } = await client.dataset(run.defaultDatasetId).listItems({
    limit: 100,
    offset: 0,
});

// Key-value store (files, screenshots, etc.)
const record = await client.keyValueStore(run.defaultKeyValueStoreId).getRecord('OUTPUT');
typescript
// 数据集条目(来自pushData的结构化数据)
const { items } = await client.dataset(run.defaultDatasetId).listItems({
    limit: 100,
    offset: 0,
});

// 键值存储(文件、截图等)
const record = await client.keyValueStore(run.defaultKeyValueStoreId).getRecord('OUTPUT');

Error Handling

错误处理

typescript
try {
    const run = await client.actor('apify/web-scraper').call(input);

    if (run.status !== 'SUCCEEDED') {
        const log = await client.log(run.id).get();
        throw new Error(`Actor failed with status ${run.status}: ${log}`);
    }

    const { items } = await client.dataset(run.defaultDatasetId).listItems();
} catch (error) {
    if (error.message?.includes('not found')) {
        // Actor ID is wrong or Actor was deleted
    } else if (error.statusCode === 401) {
        // Invalid or missing APIFY_TOKEN
    }
    throw error;
}
typescript
try {
    const run = await client.actor('apify/web-scraper').call(input);

    if (run.status !== 'SUCCEEDED') {
        const log = await client.log(run.id).get();
        throw new Error(`Actor failed with status ${run.status}: ${log}`);
    }

    const { items } = await client.dataset(run.defaultDatasetId).listItems();
} catch (error) {
    if (error.message?.includes('not found')) {
        // Actor ID错误或Actor已被删除
    } else if (error.statusCode === 401) {
        // APIFY_TOKEN无效或缺失
    }
    throw error;
}

Python

Python

Install

安装

bash
pip install apify-client
bash
pip install apify-client

Synchronous Execution

同步执行

python
from apify_client import ApifyClient
import os

client = ApifyClient(token=os.environ['APIFY_TOKEN'])

run = client.actor('apify/web-scraper').call(run_input={
    'startUrls': [{'url': 'https://example.com'}],
    'maxPagesPerCrawl': 10,
})

items = client.dataset(run['defaultDatasetId']).list_items().items
python
from apify_client import ApifyClient
import os

client = ApifyClient(token=os.environ['APIFY_TOKEN'])

run = client.actor('apify/web-scraper').call(run_input={
    'startUrls': [{'url': 'https://example.com'}],
    'maxPagesPerCrawl': 10,
})

items = client.dataset(run['defaultDatasetId']).list_items().items

Asynchronous Execution

异步执行

python
run = client.actor('apify/web-scraper').start(run_input={
    'startUrls': [{'url': 'https://example.com'}],
})
python
run = client.actor('apify/web-scraper').start(run_input={
    'startUrls': [{'url': 'https://example.com'}],
})

Poll for completion

轮询等待完成

finished_run = client.run(run['id']).wait_for_finish()
items = client.dataset(finished_run['defaultDatasetId']).list_items().items
undefined
finished_run = client.run(run['id']).wait_for_finish()
items = client.dataset(finished_run['defaultDatasetId']).list_items().items
undefined

Async Client (asyncio)

异步客户端(asyncio)

python
from apify_client import ApifyClientAsync

client = ApifyClientAsync(token=os.environ['APIFY_TOKEN'])

run = await client.actor('apify/web-scraper').call(run_input={
    'startUrls': [{'url': 'https://example.com'}],
})

items = (await client.dataset(run['defaultDatasetId']).list_items()).items
python
from apify_client import ApifyClientAsync

client = ApifyClientAsync(token=os.environ['APIFY_TOKEN'])

run = await client.actor('apify/web-scraper').call(run_input={
    'startUrls': [{'url': 'https://example.com'}],
})

items = (await client.dataset(run['defaultDatasetId']).list_items()).items

REST API (Any Language)

REST API(任意语言)

For languages without an official client, use the REST API directly.
对于没有官方客户端的语言,可以直接使用REST API。

Start a Run

启动运行

POST https://api.apify.com/v2/acts/{actorId}/runs
Authorization: Bearer <APIFY_TOKEN>
Content-Type: application/json

{ "startUrls": [{ "url": "https://example.com" }] }
POST https://api.apify.com/v2/acts/{actorId}/runs
Authorization: Bearer <APIFY_TOKEN>
Content-Type: application/json

{ "startUrls": [{ "url": "https://example.com" }] }

Get Run Status

获取运行状态

GET https://api.apify.com/v2/acts/{actorId}/runs/{runId}
Authorization: Bearer <APIFY_TOKEN>
GET https://api.apify.com/v2/acts/{actorId}/runs/{runId}
Authorization: Bearer <APIFY_TOKEN>

Get Dataset Items

获取数据集条目

GET https://api.apify.com/v2/datasets/{datasetId}/items?format=json
Authorization: Bearer <APIFY_TOKEN>
Full API reference: https://docs.apify.com/api/v2
GET https://api.apify.com/v2/datasets/{datasetId}/items?format=json
Authorization: Bearer <APIFY_TOKEN>

Best Practices

最佳实践

  • Set timeouts: Pass
    timeoutSecs
    in the Actor input or use
    waitSecs
    on
    .call()
    to avoid indefinite waits.
  • Paginate large datasets: Use
    limit
    and
    offset
    when retrieving dataset items. Default limit is 250K items.
  • Reuse clients: Create one
    ApifyClient
    instance and reuse it across calls.
  • Handle Actor-specific input: Every Actor has its own input schema. Use
    fetch-actor-details
    MCP tool or append
    .md
    to the Actor's Store URL to get the schema before constructing input.
  • 设置超时: 在Actor输入中传入
    timeoutSecs
    ,或者在
    .call()
    中使用
    waitSecs
    ,避免无限等待。
  • 分页处理大型数据集: 获取数据集条目时使用
    limit
    offset
    。默认限制为250K条目。
  • 复用客户端: 创建一个
    ApifyClient
    实例并在多次调用中复用。
  • 处理Actor特定输入: 每个Actor都有自己的输入模式。在构造输入之前,使用
    fetch-actor-details
    MCP工具或在Actor的商店URL后添加
    .md
    来获取模式。

Documentation

文档

If the Apify MCP server is available, use
search-apify-docs
and
fetch-apify-docs
tools for contextual documentation lookups during development.
如果Apify MCP服务器可用,开发过程中可使用
search-apify-docs
fetch-apify-docs
工具进行上下文文档查找。