affinda

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Affinda — AI Document Processing Platform

Affinda — AI文档处理平台

Affinda extracts structured data from documents (invoices, resumes, receipts, contracts, and any custom document type) using machine learning. The API turns uploaded files into clean JSON. Over 250 million documents processed for 500+ organisations in 40 countries.

Full documentation: https://docs.affinda.com OpenAPI spec: https://api.affinda.com/static/v3/api_spec.yaml Support: support@affinda.com

Affinda利用机器学习从发票、简历、收据、合同及任何自定义类型的文档中提取结构化数据。API可将上传的文件转换为清晰的JSON格式。目前已为40个国家的500余家机构处理了超过2.5亿份文档。

完整文档：https://docs.affinda.com OpenAPI规范：https://api.affinda.com/static/v3/api_spec.yaml 支持：support@affinda.com

Core Concepts

核心概念

Concept	Description
Organization	Top-level account. Contains users, billing, document types, and workspaces.
Workspace	Logical container for documents. Scopes permissions, webhooks, and processing settings.
Document Type	A model configuration defining how a specific kind of document is parsed (invoice, resume, custom).
Document	An uploaded file (PDF, image, DOCX, etc.) plus its extracted data and metadata.

The workflow is: Upload -> Pre-process -> Split -> Classify -> Extract -> Validate -> Export.

概念	描述
Organization	顶级账户，包含用户、账单、文档类型及工作区。
Workspace	文档的逻辑容器，用于划分权限、Webhook及处理设置的作用范围。
Document Type	模型配置，定义特定类型文档的解析方式（如发票、简历、自定义类型）。
Document	上传的文件（PDF、图片、DOCX等）及其提取的数据和元数据。

工作流为：上传 -> 预处理 -> 拆分 -> 分类 -> 提取 -> 验证 -> 导出。

API Basics

API基础

Base URLs

基础URL

Region	API Base URL	App URL
Australia (Global)	`https://api.affinda.com`	`https://app.affinda.com`
United States	`https://api.us1.affinda.com`	`https://app.us1.affinda.com`
European Union	`https://api.eu1.affinda.com`	`https://app.eu1.affinda.com`

Use the base URL matching the region where the user's account was created.

地区	API基础URL	应用URL
澳大利亚（全球）	`https://api.affinda.com`	`https://app.affinda.com`
美国	`https://api.us1.affinda.com`	`https://app.us1.affinda.com`
欧盟	`https://api.eu1.affinda.com`	`https://app.eu1.affinda.com`

请使用与用户账户创建地区匹配的基础URL。

Authentication

认证

All requests require a Bearer token:

Authorization: Bearer <API_KEY>

API keys are per-user, managed at Settings -> API Keys in the Affinda dashboard. Up to 3 keys per user. Keys can have custom names and expiry dates. A key is only visible once at creation -- store it securely.

所有请求均需携带Bearer令牌：

Authorization: Bearer <API_KEY>

API密钥按用户分配，可在Affinda控制台的“设置 -> API密钥”中管理。每个用户最多可创建3个密钥，密钥可自定义名称和过期日期。密钥仅在创建时可见，请妥善存储。

Rate Limits and File Constraints

速率限制与文件约束

High-priority queue: 30 documents/minute (exceeding returns
```
429
```
)
Low-priority queue: No submission limit (set
```
lowPriority: true
```
)
Max file size: 20 MB (5 MB for resumes)
Default page limit: 20 pages per document (can be increased on request)
Supported formats: PDF, DOC, DOCX, XLSX, ODT, RTF, TXT, HTML, PNG, JPG, TIFF, JPEG

高优先级队列：每分钟30份文档（超出将返回
```
429
```
错误）
低优先级队列：无提交限制（设置
```
lowPriority: true
```
）
最大文件大小：20 MB（简历为5 MB）
默认页数限制：每份文档20页（可申请提高限制）
支持格式：PDF、DOC、DOCX、XLSX、ODT、RTF、TXT、HTML、PNG、JPG、TIFF、JPEG

Client Libraries

客户端库

Python (recommended)

Python（推荐）

bash

pip install affinda

python

from pathlib import Path
from affinda import AffindaAPI, TokenCredential

credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)

with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(file=f, workspace="YOUR_WORKSPACE_ID")

print(doc.data)  # Extracted JSON

GitHub: https://github.com/affinda/affinda-python PyPI: https://pypi.org/project/affinda/

bash

pip install affinda

python

from pathlib import Path
from affinda import AffindaAPI, TokenCredential

credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)

with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(file=f, workspace="YOUR_WORKSPACE_ID")

print(doc.data)  # 提取的JSON数据

GitHub：https://github.com/affinda/affinda-python PyPI：https://pypi.org/project/affinda/

TypeScript / JavaScript (recommended)

TypeScript / JavaScript（推荐）

bash

npm install @affinda/affinda

typescript

import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";

const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);

const doc = await client.createDocument({
  file: fs.createReadStream("invoice.pdf"),
  workspace: "YOUR_WORKSPACE_ID",
});

console.log(doc.data); // Extracted JSON

GitHub: https://github.com/affinda/affinda-typescript npm: https://www.npmjs.com/package/@affinda/affinda

bash

npm install @affinda/affinda

typescript

import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";

const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);

const doc = await client.createDocument({
  file: fs.createReadStream("invoice.pdf"),
  workspace: "YOUR_WORKSPACE_ID",
});

console.log(doc.data); // 提取的JSON数据

GitHub：https://github.com/affinda/affinda-typescript npm：https://www.npmjs.com/package/@affinda/affinda

Other Libraries

其他库

.NET:
```
dotnet add package Affinda.API
```
-- GitHub
Java: Maven repository -- GitHub

Note: The .NET and Java libraries may lag behind the Python and TypeScript libraries in feature parity.

.NET：
```
dotnet add package Affinda.API
```
-- GitHub
Java：Maven仓库 -- GitHub

注意：.NET和Java库的功能可能落后于Python和TypeScript库。

Direct HTTP (cURL)

直接HTTP调用（cURL）

bash

curl -X POST https://api.affinda.com/v3/documents \
  -H "Authorization: Bearer $AFFINDA_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "workspace=YOUR_WORKSPACE_ID"

bash

curl -X POST https://api.affinda.com/v3/documents \
  -H "Authorization: Bearer $AFFINDA_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "workspace=YOUR_WORKSPACE_ID"

Structured Outputs (Type-Safe Responses)

结构化输出（类型安全响应）

This is the recommended approach for building robust integrations. Affinda can generate typed models from your document type configuration, giving you auto-completion, validation, and type safety.

这是构建可靠集成的推荐方式。 Affinda可根据你的文档类型配置生成类型化模型，为你提供自动补全、验证和类型安全保障。

Python -- Pydantic Models

Python — Pydantic模型

Generate Pydantic v2 models that match your document type's field schema:

bash

undefined

生成与文档类型字段架构匹配的Pydantic v2模型：

bash

undefined

Set your API key (or export AFFINDA_API_KEY)

设置你的API密钥（或导出AFFINDA_API_KEY环境变量）

python -m affinda generate_models --workspace-id=YOUR_WORKSPACE_ID


This creates a `./affinda_models/` directory with one `.py` file per document type. Each file contains Pydantic `BaseModel` classes with all your configured fields as typed, optional attributes.

**Use the generated models when calling the API:**

```python
from pathlib import Path
from affinda import AffindaAPI, TokenCredential
from affinda_models.invoice import Invoice  # Generated model

credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)

with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(
        file=f,
        workspace="YOUR_WORKSPACE_ID",
        data_model=Invoice,  # Enables Pydantic validation
    )

python -m affinda generate_models --workspace-id=YOUR_WORKSPACE_ID


此命令会创建`./affinda_models/`目录，每个文档类型对应一个`.py`文件。每个文件包含Pydantic `BaseModel`类，其中所有配置字段均为带类型的可选属性。

**调用API时使用生成的模型：**

```python
from pathlib import Path
from affinda import AffindaAPI, TokenCredential
from affinda_models.invoice import Invoice  # 生成的模型

credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)

with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(
        file=f,
        workspace="YOUR_WORKSPACE_ID",
        data_model=Invoice,  # 启用Pydantic验证
    )

doc.parsed is a typed Invoice instance

doc.parsed是类型化的Invoice实例

print(doc.parsed.invoice_number) print(doc.parsed.total_amount)

doc.data is still available as raw JSON

doc.data仍可作为原始JSON数据使用

print(doc.data)


**Handling validation errors gracefully:**

```python
with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(
        file=f,
        workspace="YOUR_WORKSPACE_ID",
        data_model=Invoice,
        ignore_validation_errors=True,  # Don't raise on schema mismatch
    )

if doc.parsed:
    print(doc.parsed.invoice_number)  # Type-safe access
else:
    print("Validation failed, falling back to raw data")
    print(doc.data)

CLI options:

bash

python -m affinda generate_models --workspace-id=ID        # All types in a workspace
python -m affinda generate_models --document-type-id=ID    # Single document type
python -m affinda generate_models --organization-id=ID     # All types in an org
python -m affinda generate_models --output-dir=./my_models # Custom output path
python -m affinda generate_models --help                   # All options

print(doc.data)


**优雅处理验证错误：**

```python
with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(
        file=f,
        workspace="YOUR_WORKSPACE_ID",
        data_model=Invoice,
        ignore_validation_errors=True,  # 架构不匹配时不抛出异常
    )

if doc.parsed:
    print(doc.parsed.invoice_number)  # 类型安全访问
else:
    print("验证失败，回退到原始数据")
    print(doc.data)

CLI选项：

bash

python -m affinda generate_models --workspace-id=ID        # 工作区中的所有类型
python -m affinda generate_models --document-type-id=ID    # 单个文档类型
python -m affinda generate_models --organization-id=ID     # 组织中的所有类型
python -m affinda generate_models --output-dir=./my_models # 自定义输出路径
python -m affinda generate_models --help                   # 所有选项

TypeScript -- Generated Interfaces

TypeScript — 生成的接口

Generate TypeScript interfaces that match your document type's field schema:

bash

undefined

生成与文档类型字段架构匹配的TypeScript接口：

bash

undefined

Set your API key (or export AFFINDA_API_KEY)

设置你的API密钥（或导出AFFINDA_API_KEY环境变量）

npm exec affinda-generate-interfaces -- --workspace-id=YOUR_WORKSPACE_ID


This creates an `./affinda-interfaces/` directory with one `.ts` file per document type. Each file contains TypeScript interfaces with all your configured fields.

**Use the generated interfaces for type-safe access:**

```typescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
import { Invoice } from "./affinda-interfaces/Invoice";

const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);

const doc = await client.createDocument({
  file: fs.createReadStream("invoice.pdf"),
  workspace: "YOUR_WORKSPACE_ID",
});

const parsed = doc.data as Invoice;
console.log(parsed.invoiceNumber);  // Type-safe access
console.log(parsed.totalAmount);

CLI options:

bash

npm exec affinda-generate-interfaces -- --workspace-id=ID       # All types in workspace
npm exec affinda-generate-interfaces -- --document-type-id=ID   # Single document type
npm exec affinda-generate-interfaces -- --output-dir=./types    # Custom output path
npm exec affinda-generate-interfaces -- --help                  # All options

npm exec affinda-generate-interfaces -- --workspace-id=YOUR_WORKSPACE_ID


此命令会创建`./affinda-interfaces/`目录，每个文档类型对应一个`.ts`文件。每个文件包含TypeScript接口，其中包含所有配置字段。

**使用生成的接口实现类型安全访问：**

```typescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
import { Invoice } from "./affinda-interfaces/Invoice";

const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);

const doc = await client.createDocument({
  file: fs.createReadStream("invoice.pdf"),
  workspace: "YOUR_WORKSPACE_ID",
});

const parsed = doc.data as Invoice;
console.log(parsed.invoiceNumber);  # 类型安全访问
console.log(parsed.totalAmount);

CLI选项：

bash

npm exec affinda-generate-interfaces -- --workspace-id=ID       # 工作区中的所有类型
npm exec affinda-generate-interfaces -- --document-type-id=ID   # 单个文档类型
npm exec affinda-generate-interfaces -- --output-dir=./types    # 自定义输出路径
npm exec affinda-generate-interfaces -- --help                  # 所有选项

Why Use Structured Outputs?

为什么使用结构化输出？

Type safety: Catch field name typos and type mismatches at compile/lint time
Auto-completion: IDE support for all extracted fields
Validation: Pydantic automatically validates the API response structure
Schema-driven: Models stay in sync with your document type configuration -- regenerate after schema changes
Documentation as code: The generated models serve as living documentation of your extraction schema

类型安全：在编译/代码检查阶段捕获字段名称拼写错误和类型不匹配问题
自动补全：IDE支持所有提取字段的自动补全
验证：Pydantic自动验证API响应结构
架构驱动：模型与文档类型配置保持同步——架构变更后重新生成即可
文档即代码：生成的模型可作为提取架构的活文档

Document Upload Options

文档上传选项

There are three patterns for submitting documents and retrieving results:

提交文档并获取结果有三种模式：

1. Synchronous (simplest)

1. 同步模式（最简单）

Upload and block until parsing completes. The response contains the extracted data.

python

doc = client.create_document(file=f, workspace="WORKSPACE_ID")

上传文档并阻塞直到解析完成，响应中包含提取的数据。

python

doc = client.create_document(file=f, workspace="WORKSPACE_ID")

wait defaults to True -- blocks until ready

wait默认值为True——阻塞直到处理完成

print(doc.data)


**Best for**: Interactive apps, low volume, quick prototyping.
**Limitation**: Can timeout on large or complex documents.

print(doc.data)


**最佳适用场景**：交互式应用、低数据量场景、快速原型开发。
**限制**：处理大型或复杂文档时可能超时。

2. Asynchronous with Polling

2. 异步模式+轮询

Upload with

wait=false

, receive a document ID, then poll

GET /documents/{id}

until

ready

true

python

doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)

上传时设置

wait=false

，获取文档ID，然后轮询

GET /documents/{id}

直到

ready

变为

true

。

python

doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)

doc.data is empty -- poll until ready

doc.data为空——轮询直到处理完成

doc = client.get_document(doc.meta.identifier)


**Best for**: Batch processing, large documents, high volume.

doc = client.get_document(doc.meta.identifier)


**最佳适用场景**：批量处理、大型文档、高数据量场景。

3. Asynchronous with Webhooks (recommended for production)

3. 异步模式+Webhook（生产环境推荐）

Upload the document, then receive a webhook notification when processing completes. This is the most efficient pattern for production systems.

python

undefined

上传文档，处理完成后接收Webhook通知。这是生产系统中最高效的模式。

python

undefined

1. Upload

1. 上传文档

doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)

2. Receive webhook at your endpoint when ready

2. 处理完成后，你的端点会收到Webhook通知

3. Fetch full data

3. 获取完整数据

doc = client.get_document(identifier_from_webhook)


**Best for**: Real-time workflows, event-driven architectures, production systems.

See the [Webhooks section](#webhooks) below for setup details.

doc = client.get_document(identifier_from_webhook)


**最佳适用场景**：实时工作流、事件驱动架构、生产系统。

设置详情请见下方的[Webhook部分](#webhooks)。

Upload Parameters

上传参数

Parameter	Type	Description
`file`	binary	The document file. Mutually exclusive with `url` .
`url`	string	URL to download and process. Mutually exclusive with `file` .
`workspace`	string	Workspace identifier (required).
`documentType`	string	Document type identifier (optional -- enables skip-classification).
`wait`	boolean	`true` (default): block until done. `false` : return immediately.
`customIdentifier`	string	Your internal ID for the document.
`expiryTime`	ISO-8601	Auto-delete the document at this time.
`rejectDuplicates`	boolean	Reject if duplicate of existing document.
`lowPriority`	boolean	Route to low-priority queue (no rate limit).
`compact`	boolean	Return compact response (with `wait=true` ).
`deleteAfterParse`	boolean	Delete data after parsing (requires `wait=true` ).
`enableValidationTool`	boolean	Make document viewable in validation UI. Set `false` for speed.

参数	类型	描述
`file`	二进制	文档文件，与 `url` 参数互斥。
`url`	字符串	用于下载并处理的URL，与 `file` 参数互斥。
`workspace`	字符串	工作区标识符（必填）。
`documentType`	字符串	文档类型标识符（可选——启用后可跳过分类步骤）。
`wait`	布尔值	`true` （默认）：阻塞直到处理完成； `false` ：立即返回。
`customIdentifier`	字符串	你系统中的文档内部ID。
`expiryTime`	ISO-8601格式	文档自动删除的时间。
`rejectDuplicates`	布尔值	如果是现有文档的副本则拒绝上传。
`lowPriority`	布尔值	路由到低优先级队列（无速率限制）。
`compact`	布尔值	返回紧凑响应（仅当 `wait=true` 时有效）。
`deleteAfterParse`	布尔值	解析完成后删除数据（需设置 `wait=true` ）。
`enableValidationTool`	布尔值	允许在验证UI中查看文档。设置为 `false` 可提高处理速度。

Response Structure

响应结构

Each extracted field in the response includes metadata:

Field	Description
`raw`	Raw extracted text before processing
`parsed`	Processed value after formatting and mapping
`confidence`	Overall confidence score (0-1)
`classificationConfidence`	Confidence the field was correctly classified
`textExtractionConfidence`	Confidence text was correctly extracted
`isVerified`	Whether the value has been validated (any means)
`isClientVerified`	Whether validated by a human
`isAutoVerified`	Whether auto-validated by rules
`rectangle`	Bounding box coordinates on the page
`pageIndex`	Which page the data appears on

Document-level metadata includes

ready

failed

language

pages

isOcrd

ocrConfidence

reviewUrl

isConfirmed

isRejected

isArchived

errorCode

, and

errorDetail

Full metadata reference: https://docs.affinda.com/reference/metadata

响应中的每个提取字段都包含元数据：

字段	描述
`raw`	处理前提取的原始文本
`parsed`	格式化和映射后的处理值
`confidence`	整体置信度得分（0-1）
`classificationConfidence`	字段分类正确的置信度
`textExtractionConfidence`	文本提取正确的置信度
`isVerified`	该值是否已通过验证（无论通过何种方式）
`isClientVerified`	是否由人工验证
`isAutoVerified`	是否由规则自动验证
`rectangle`	字段在页面上的 bounding box 坐标
`pageIndex`	数据所在的页码

文档级元数据包括

ready

、

failed

、

language

、

pages

、

isOcrd

、

ocrConfidence

、

reviewUrl

、

isConfirmed

、

isRejected

、

isArchived

、

errorCode

和

errorDetail

。

完整元数据参考：https://docs.affinda.com/reference/metadata

Webhooks

Webhook

Affinda uses RESTHooks -- webhook subscriptions managed via REST API. Webhooks can be scoped to an organization or workspace.

Affinda使用RESTHooks——通过REST API管理的Webhook订阅。Webhook可作用于组织或工作区级别。

Available Events

可用事件

Event	Description
`document.parse.completed`	Parsing finished (succeeded or failed)
`document.parse.succeeded`	Parsing succeeded
`document.parse.failed`	Parsing failed
`document.validate.completed`	Document confirmed (manually or auto)
`document.classify.completed`	Classification finished
`document.classify.succeeded`	Classification succeeded
`document.classify.failed`	Classification failed
`document.rejected`	Document rejected

事件	描述
`document.parse.completed`	解析完成（成功或失败）
`document.parse.succeeded`	解析成功
`document.parse.failed`	解析失败
`document.validate.completed`	文档已确认（人工或自动）
`document.classify.completed`	分类完成
`document.classify.succeeded`	分类成功
`document.classify.failed`	分类失败
`document.rejected`	文档被拒绝

Setup Flow

设置流程

Subscribe --

POST /v3/resthook_subscriptions

with

targetUrl

event

, and

organization

(or

workspace

Confirm -- Affinda sends a
```
POST
```
to your
```
targetUrl
```
with an
```
X-Hook-Secret
```
header. Respond with
```
200
```
, then call
```
POST /v3/resthook_subscriptions/activate
```
with that secret.
Receive -- Affinda sends webhook payloads to your endpoint. Respond
```
200
```
to acknowledge.

订阅 —— 调用

POST /v3/resthook_subscriptions

，携带

targetUrl

、

event

和

organization

（或

workspace

）参数。

确认 —— Affinda会向你的
```
targetUrl
```
发送
```
POST
```
请求，包含
```
X-Hook-Secret
```
请求头。返回
```
200
```
响应，然后携带该密钥调用
```
POST /v3/resthook_subscriptions/activate
```
。
接收通知 —— Affinda会向你的端点发送Webhook负载，返回
```
200
```
以确认接收。

Signature Verification

签名验证

Enable payload signing via Organization Settings -> Webhook Signature Key. Incoming webhooks include an

X-Hook-Signature

header (

<timestamp>.<signature>

). Verify using HMAC-SHA256:

python

import hmac, hashlib, json, time

def verify_webhook(request, sig_key: bytes) -> bool:
    sig_header = request.headers["X-Hook-Signature"]
    timestamp, sig_received = sig_header.split(".")
    sig_calculated = hmac.new(sig_key, msg=request.body, digestmod=hashlib.sha256).hexdigest()

    sig_ok = hmac.compare_digest(sig_received, sig_calculated)
    body = json.loads(request.body)
    time_ok = (time.time() - body["timestamp"]) < 600  # 10 min window
    return sig_ok and time_ok

可在组织设置 -> Webhook签名密钥中启用负载签名。传入的Webhook包含

X-Hook-Signature

请求头（格式为

<timestamp>.<signature>

）。使用HMAC-SHA256进行验证：

python

import hmac, hashlib, json, time

def verify_webhook(request, sig_key: bytes) -> bool:
    sig_header = request.headers["X-Hook-Signature"]
    timestamp, sig_received = sig_header.split(".")
    sig_calculated = hmac.new(sig_key, msg=request.body, digestmod=hashlib.sha256).hexdigest()

    sig_ok = hmac.compare_digest(sig_received, sig_calculated)
    body = json.loads(request.body)
    time_ok = (time.time() - body["timestamp"]) < 600  # 10分钟时间窗口
    return sig_ok and time_ok

Webhook Payload

Webhook负载

The payload contains document metadata (not the full parsed data). Use the

identifier

to fetch full results:

json

{
  "id": "e3bd1942-...",
  "event": "document.parse.completed",
  "timestamp": 1665637107,
  "payload": {
    "identifier": "abcdXYZ",
    "ready": true,
    "failed": false,
    "fileName": "invoice.pdf",
    "workspace": { "identifier": "...", "name": "..." }
  }
}

负载包含文档元数据（而非完整的解析数据）。使用

identifier

获取完整结果：

json

{
  "id": "e3bd1942-...",
  "event": "document.parse.completed",
  "timestamp": 1665637107,
  "payload": {
    "identifier": "abcdXYZ",
    "ready": true,
    "failed": false,
    "fileName": "invoice.pdf",
    "workspace": { "identifier": "...", "name": "..." }
  }
}

Retry Behavior

重试机制

```
200
```
-- Success, delivery confirmed
```
410
```
-- Subscription auto-deleted (endpoint "gone")
Other 4xx/5xx -- Retried with exponential backoff for ~1 day

Full webhook docs: https://docs.affinda.com/reference/webhooks

```
200
```
—— 成功，交付确认
```
410
```
—— 订阅自动删除（端点已“不存在”）
其他4xx/5xx状态码 —— 指数退避重试约1天

完整Webhook文档：https://docs.affinda.com/reference/webhooks

Embedded Validation UI

嵌入式验证UI

Affinda provides a human-in-the-loop validation interface that can be embedded in your application via iframe. Each document response includes a

reviewUrl

-- a signed URL valid for 60 minutes.

Implementation pattern:

Store only the Affinda document
```
identifier
```
in your system
When a user needs to review, fetch a fresh
```
reviewUrl
```
via
```
GET /documents/{id}
```
Embed the URL in an iframe
Do not persist the URL -- treat it as ephemeral

The UI supports custom theming (colors, fonts, border radius) in embedded mode. Contact Affinda to configure.

Full embedded docs: https://docs.affinda.com/reference/embedded

Affinda提供了人机协同的验证界面，可通过iframe嵌入你的应用。每个文档响应包含一个

reviewUrl

——有效期为60分钟的签名URL。

实现模式：

在你的系统中仅存储Affinda文档的
```
identifier
```
当用户需要审核时，调用
```
GET /documents/{id}
```
获取最新的
```
reviewUrl
```
将该URL嵌入iframe
请勿持久化该URL——将其视为临时链接

嵌入模式下支持自定义主题（颜色、字体、边框圆角）。请联系Affinda进行配置。

完整嵌入式文档：https://docs.affinda.com/reference/embedded

Key API Methods

核心API方法

Documents

文档相关

Method	Endpoint	Description
POST	`/v3/documents`	Upload and parse a document
GET	`/v3/documents/{id}`	Retrieve a document and its data
PATCH	`/v3/documents/{id}`	Update document fields/status
DELETE	`/v3/documents/{id}`	Delete a document
GET	`/v3/documents`	List documents (with filtering)
GET	`/v3/documents/{id}/redacted`	Download redacted PDF

方法	端点	描述
POST	`/v3/documents`	上传并解析文档
GET	`/v3/documents/{id}`	获取文档及其数据
PATCH	`/v3/documents/{id}`	更新文档字段/状态
DELETE	`/v3/documents/{id}`	删除文档
GET	`/v3/documents`	列出文档（支持过滤）
GET	`/v3/documents/{id}/redacted`	下载已脱敏的PDF

Workspaces

工作区相关

Method	Endpoint	Description
GET	`/v3/workspaces`	List workspaces
POST	`/v3/workspaces`	Create a workspace
GET	`/v3/workspaces/{id}`	Get workspace details
PATCH	`/v3/workspaces/{id}`	Update workspace
DELETE	`/v3/workspaces/{id}`	Delete workspace

方法	端点	描述
GET	`/v3/workspaces`	列出工作区
POST	`/v3/workspaces`	创建工作区
GET	`/v3/workspaces/{id}`	获取工作区详情
PATCH	`/v3/workspaces/{id}`	更新工作区
DELETE	`/v3/workspaces/{id}`	删除工作区

Annotations

注释相关

Method	Endpoint	Description
GET	`/v3/annotations`	List annotations for a document
POST	`/v3/annotations`	Create an annotation
PATCH	`/v3/annotations/{id}`	Update an annotation
POST	`/v3/annotations/batch_create`	Batch create annotations
POST	`/v3/annotations/batch_update`	Batch update annotations
POST	`/v3/annotations/batch_delete`	Batch delete annotations

方法	端点	描述
GET	`/v3/annotations`	列出文档的注释
POST	`/v3/annotations`	创建注释
PATCH	`/v3/annotations/{id}`	更新注释
POST	`/v3/annotations/batch_create`	批量创建注释
POST	`/v3/annotations/batch_update`	批量更新注释
POST	`/v3/annotations/batch_delete`	批量删除注释

Webhooks

Webhook相关

Method	Endpoint	Description
POST	`/v3/resthook_subscriptions`	Create subscription
POST	`/v3/resthook_subscriptions/activate`	Activate with X-Hook-Secret
GET	`/v3/resthook_subscriptions`	List subscriptions
PATCH	`/v3/resthook_subscriptions/{id}`	Update subscription
DELETE	`/v3/resthook_subscriptions/{id}`	Delete subscription

Full API reference: https://docs.affinda.com/reference/getting-started OpenAPI spec: https://api.affinda.com/static/v3/api_spec.yaml

方法	端点	描述
POST	`/v3/resthook_subscriptions`	创建订阅
POST	`/v3/resthook_subscriptions/activate`	使用X-Hook-Secret激活订阅
GET	`/v3/resthook_subscriptions`	列出订阅
PATCH	`/v3/resthook_subscriptions/{id}`	更新订阅
DELETE	`/v3/resthook_subscriptions/{id}`	删除订阅

完整API参考：https://docs.affinda.com/reference/getting-started OpenAPI规范：https://api.affinda.com/static/v3/api_spec.yaml

Common Integration Patterns

常见集成模式

Affinda supports six integration workflow patterns depending on where validation logic lives and where exceptions are handled:

Pattern	Description	Webhook Event
W1 -- No validation	Upload -> get JSON. No rules, no human review.	`document.parse.completed`
W2 -- Client-side validation	Same as W1; your system applies rules after export.	`document.parse.completed`
W3 -- Affinda validation logic	Affinda validates automatically; no human review.	`document.validate.completed`
W4 -- Review all in Affinda	Humans review every document in Affinda UI.	`document.validate.completed`
W5 -- Client rules + Affinda review	Your rules, pushed back as warnings; flagged docs reviewed in Affinda.	`document.parse.completed` then `document.validate.completed`
W6 -- Full Affinda validation	Affinda validates; exceptions reviewed in Affinda UI.	`document.validate.completed`

For most new integrations, W1 or W2 is the simplest starting point. W6 provides the most automation with human-in-the-loop for exceptions.

Full solution design guide: https://docs.affinda.com/academy/solution-design

根据验证逻辑的位置和异常处理方式，Affinda支持六种集成工作流模式：

模式	描述	Webhook事件
W1 — 无验证	上传 -> 获取JSON。无规则，无人工审核。	`document.parse.completed`
W2 — 客户端验证	与W1相同；你的系统在导出后应用规则。	`document.parse.completed`
W3 — Affinda验证逻辑	Affinda自动验证；无人工审核。	`document.validate.completed`
W4 — 所有文档在Affinda中审核	人工在Affinda UI中审核每份文档。	`document.validate.completed`
W5 — 客户端规则 + Affinda审核	你的规则以警告形式推送回Affinda；标记的文档在Affinda中审核。	`document.parse.completed` 随后触发 `document.validate.completed`
W6 — 完整Affinda验证	Affinda自动验证；异常文档在Affinda UI中审核。	`document.validate.completed`

对于大多数新集成，W1或W2是最简单的起点。W6提供最高程度的自动化，同时支持人机协同处理异常情况。

完整解决方案设计指南：https://docs.affinda.com/academy/solution-design

Common Errors

常见错误

Error Code	Meaning	Resolution
`duplicate_document_error`	Document rejected as duplicate	Disable "Reject duplicates" or upload unique files
`no_text_found`	No extractable text	Check file is not a photo of an object; try OCR
`file_corrupted`	File is corrupted	Re-upload a valid file
`file_too_large`	Exceeds 20 MB limit	Reduce file size
`invalid_file_type`	Unsupported format	Use PDF, DOC, DOCX, XLSX, ODT, RTF, TXT, HTML, PNG, JPG, TIFF, JPEG
`no_parsing_credits`	Out of credits	Purchase more credits and reparse
`password_protected`	File is password-protected	Remove password and re-upload
`document_classification_failed`	No matching document type	Check document type configuration or disable "Reject Documents"
`capacity_exceeded`	System capacity exceeded	Wait and retry
`parse_terminated`	Exceeded timeout	Contact Affinda for custom limits

Full error reference: https://docs.affinda.com/error-glossary

错误代码	含义	解决方法
`duplicate_document_error`	文档因重复被拒绝	禁用“拒绝重复文档”选项或上传唯一文件
`no_text_found`	未提取到可识别文本	检查文件是否为实物照片；尝试OCR处理
`file_corrupted`	文件已损坏	重新上传有效文件
`file_too_large`	超过20 MB大小限制	减小文件大小
`invalid_file_type`	不支持的格式	使用PDF、DOC、DOCX、XLSX、ODT、RTF、TXT、HTML、PNG、JPG、TIFF、JPEG格式
`no_parsing_credits`	解析额度耗尽	购买更多额度后重新解析
`password_protected`	文件受密码保护	移除密码后重新上传
`document_classification_failed`	无匹配的文档类型	检查文档类型配置或禁用“拒绝文档”选项
`capacity_exceeded`	系统容量超出	等待后重试
`parse_terminated`	超出超时限制	联系Affinda申请自定义限制

完整错误参考：https://docs.affinda.com/error-glossary

Documentation Map

文档导航

Use this index to find detailed information on specific topics. Each link goes to the full documentation page.

使用此索引查找特定主题的详细信息。每个链接指向完整的文档页面。

Affinda Academy (Tutorials)

Affinda学院（教程）

Getting Started -- Core concepts: organizations, workspaces, document types, statuses, and the processing workflow.
Creating a New Model -- Step-by-step guide to creating extraction models from scratch.
Improving Accuracy -- Strategies for 99%+ accuracy via model memory, field prompts, and OCR settings.
User Validation of Extracted Data -- How to validate and correct extractions in the Affinda UI.
Table Editor -- Grid and freeform modes for validating table extractions.
Reviewing Splitting & Classification -- How to correct document splitting and classification.
Schema Design Best Practices -- Field configuration trade-offs, advanced options, and schema design guidance.
Straight-Through Processing -- Data mapping, validation rules, and auto-confirmation for full automation.
Integration Workflows -- Six workflow patterns (W1-W6) for different integration scenarios.
Integration Agent -- No-code integrations using AI agent and Pipedream.

快速入门 — 核心概念：组织、工作区、文档类型、状态及处理工作流。
创建新模型 — 从零开始创建提取模型的分步指南。
提高准确率 — 通过模型记忆、字段提示和OCR设置实现99%+准确率的策略。
用户验证提取数据 — 如何在Affinda UI中验证和修正提取结果。
表格编辑器 — 用于验证表格提取结果的网格和自由格式模式。
审核拆分与分类 — 如何修正文档拆分和分类结果。
架构设计最佳实践 — 字段配置权衡、高级选项及架构设计指南。
直通式处理 — 数据映射、验证规则及自动确认，实现全自动化。
集成工作流 — 适用于不同集成场景的六种工作流模式（W1-W6）。
集成Agent — 使用AI Agent和Pipedream实现无代码集成。

Configuration Guide

配置指南

Overview & Workflow:

Workflow -- End-to-end document processing pipeline stages.
Glossary -- Platform terminology definitions.
Document Status -- For Review, Confirmed, Archived, Rejected states.

Ingestion & Pre-Processing:

Ingestion -- Upload methods: manual, email, API.
Email Upload -- Email-to-workspace document ingestion.
Pre-Processing -- Automated cleaning before extraction.
OCR -- OCR modes: Skip, Auto-detect, Partial, Full.
Duplicates -- Duplicate detection and rejection.

Splitting, Classification & Extraction:

Splitting -- Auto-separate multi-document files.
Classification -- Auto-categorize documents by type.
Field Configuration -- Field names, types, and settings.
Standard Fields -- Text, numbers, dates, location, phone, URL types.
Groups & Tables -- Repeating structures and line items.
Picklists & Data Sources -- Controlled vocabularies and master data matching.
Checkboxes -- Label and true/false checkbox extraction.
Image Fields -- Signature, headshot, and seal extraction.
Model Memory -- RAG-based learning from validated documents.

Validation & Export:

Machine Validation -- Automated validation overview.
Validation Rules -- Natural-language business rule creation.
Confidence -- Confidence scoring and thresholds.
User Validation -- Human review interface.
Data Export -- JSON, XML, CSV export options.
Redaction -- PDF redaction of sensitive data.
User Management -- Roles and permissions.

概述与工作流：

工作流 — 端到端文档处理管道阶段。
术语表 — 平台术语定义。
文档状态 — 待审核、已确认、已归档、已拒绝状态说明。

摄入与预处理：

摄入 — 上传方式：手动、邮件、API。
邮件上传 — 邮件到工作区的文档摄入。
预处理 — 提取前的自动清理。
OCR — OCR模式：跳过、自动检测、部分、完整。
重复项 — 重复项检测与拒绝。

拆分、分类与提取：

拆分 — 自动分离多文档文件。
分类 — 自动按类型分类文档。
字段配置 — 字段名称、类型及设置。
标准字段 — 文本、数字、日期、位置、电话、URL类型。
组与表格 — 重复结构和行项目。
选择列表与数据源 — 受控词汇表与主数据匹配。
复选框 — 标签和布尔值复选框提取。
图片字段 — 签名、头像和印章提取。
模型记忆 — 基于RAG从已验证文档中学习。

验证与导出：

机器验证 — 自动验证概述。
验证规则 — 自然语言业务规则创建。
置信度 — 置信度评分与阈值。
用户验证 — 人工审核界面。
数据导出 — JSON、XML、CSV导出选项。
脱敏 — PDF敏感数据脱敏。
用户管理 — 角色与权限。

API Reference

API参考

Quick Start -- First API call walkthrough with code examples.
Authentication -- API key management and rotation.
Upload Options -- Sync, async polling, and webhook patterns.
Metadata -- Field-level and document-level metadata reference.
Limits -- Rate limits, file size limits, page limits.
Webhooks -- Webhook setup, events, signature verification.
Embedded Mode -- Embedding validation UI via iframe.
Client Libraries -- Python, JavaScript, .NET, Java SDKs.
Structured Outputs (Pydantic) -- Generate Python Pydantic models from document types.
TypeScript Interfaces -- Generate TypeScript interfaces from document types.

快速入门 — 第一个API调用的分步指南及代码示例。
认证 — API密钥管理与轮换。
上传选项 — 同步、异步轮询及Webhook模式。
元数据 — 字段级和文档级元数据参考。
限制 — 速率限制、文件大小限制、页数限制。
Webhook — Webhook设置、事件、签名验证。
嵌入式模式 — 通过iframe嵌入验证UI。
客户端库 — Python、JavaScript、.NET、Java SDK。
结构化输出（Pydantic） — 从文档类型生成Python Pydantic模型。
TypeScript接口 — 从文档类型生成TypeScript接口。

Resume Parsing Guide

简历解析指南

Getting Started -- Resume parsing product overview and workspace setup.
Integration -- Resume parser API integration with code examples.
Credits -- Per-document credit system for resume parsing.
Data Extracted -- All fields extracted from resumes with sample JSON.
Taxonomies -- Skills, job titles, and occupation standardization.
Resume Redactor -- Automated PII redaction for unbiased hiring.
Resume Summary -- AI-generated candidate summaries.
Job Description Parser -- Structured extraction from job descriptions.
Search & Match -- Candidate/job matching with scoring and search UI.

快速入门 — 简历解析产品概述及工作区设置。
集成 — 简历解析API集成及代码示例。
额度 — 简历解析的按文档计费系统。
提取的数据 — 从简历中提取的所有字段及示例JSON。
分类体系 — 技能、职位名称及职业标准化。
简历脱敏工具 — 自动脱敏PII数据，实现无偏见招聘。
简历摘要 — AI生成的候选人摘要。
职位描述解析器 — 从职位描述中提取结构化数据。
搜索与匹配 — 候选人与职位匹配评分及搜索UI。

Additional Resources

其他资源

Error Glossary -- Error codes and resolutions.
FAQs -- Common questions on capabilities, configuration, and troubleshooting.
Billing -- Credits, pricing, and payment.
Data Retention -- Document deletion and expiry policies.
Deployment & Data Residency -- Regional servers and enterprise options.
Product Updates -- Changelog and release notes.
Status -- Service availability dashboard.

错误术语表 — 错误代码及解决方法。
常见问题 — 关于功能、配置及故障排除的常见问题。
计费 — 额度、定价及支付。
数据保留 — 文档删除及过期策略。
部署与数据驻留 — 区域服务器及企业选项。
产品更新 — 更新日志及发布说明。
状态 — 服务可用性仪表板。