Loading...
Loading...
Compare original and translation side by side
| Concept | Description |
|---|---|
| Organization | Top-level account. Contains users, billing, document types, and workspaces. |
| Workspace | Logical container for documents. Scopes permissions, webhooks, and processing settings. |
| Document Type | A model configuration defining how a specific kind of document is parsed (invoice, resume, custom). |
| Document | An uploaded file (PDF, image, DOCX, etc.) plus its extracted data and metadata. |
| 概念 | 描述 |
|---|---|
| Organization | 顶级账户,包含用户、账单、文档类型及工作区。 |
| Workspace | 文档的逻辑容器,用于划分权限、Webhook及处理设置的作用范围。 |
| Document Type | 模型配置,定义特定类型文档的解析方式(如发票、简历、自定义类型)。 |
| Document | 上传的文件(PDF、图片、DOCX等)及其提取的数据和元数据。 |
| Region | API Base URL | App URL |
|---|---|---|
| Australia (Global) | | |
| United States | | |
| European Union | | |
| 地区 | API基础URL | 应用URL |
|---|---|---|
| 澳大利亚(全球) | | |
| 美国 | | |
| 欧盟 | | |
Authorization: Bearer <API_KEY>Authorization: Bearer <API_KEY>429lowPriority: true429lowPriority: truepip install affindafrom pathlib import Path
from affinda import AffindaAPI, TokenCredential
credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(file=f, workspace="YOUR_WORKSPACE_ID")
print(doc.data) # Extracted JSONpip install affindafrom pathlib import Path
from affinda import AffindaAPI, TokenCredential
credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(file=f, workspace="YOUR_WORKSPACE_ID")
print(doc.data) # 提取的JSON数据npm install @affinda/affindaimport { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);
const doc = await client.createDocument({
file: fs.createReadStream("invoice.pdf"),
workspace: "YOUR_WORKSPACE_ID",
});
console.log(doc.data); // Extracted JSONnpm install @affinda/affindaimport { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);
const doc = await client.createDocument({
file: fs.createReadStream("invoice.pdf"),
workspace: "YOUR_WORKSPACE_ID",
});
console.log(doc.data); // 提取的JSON数据dotnet add package Affinda.APINote: The .NET and Java libraries may lag behind the Python and TypeScript libraries in feature parity.
curl -X POST https://api.affinda.com/v3/documents \
-H "Authorization: Bearer $AFFINDA_API_KEY" \
-F "file=@invoice.pdf" \
-F "workspace=YOUR_WORKSPACE_ID"curl -X POST https://api.affinda.com/v3/documents \
-H "Authorization: Bearer $AFFINDA_API_KEY" \
-F "file=@invoice.pdf" \
-F "workspace=YOUR_WORKSPACE_ID"undefinedundefined
This creates a `./affinda_models/` directory with one `.py` file per document type. Each file contains Pydantic `BaseModel` classes with all your configured fields as typed, optional attributes.
**Use the generated models when calling the API:**
```python
from pathlib import Path
from affinda import AffindaAPI, TokenCredential
from affinda_models.invoice import Invoice # Generated model
credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(
file=f,
workspace="YOUR_WORKSPACE_ID",
data_model=Invoice, # Enables Pydantic validation
)
此命令会创建`./affinda_models/`目录,每个文档类型对应一个`.py`文件。每个文件包含Pydantic `BaseModel`类,其中所有配置字段均为带类型的可选属性。
**调用API时使用生成的模型:**
```python
from pathlib import Path
from affinda import AffindaAPI, TokenCredential
from affinda_models.invoice import Invoice # 生成的模型
credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(
file=f,
workspace="YOUR_WORKSPACE_ID",
data_model=Invoice, # 启用Pydantic验证
)
**Handling validation errors gracefully:**
```python
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(
file=f,
workspace="YOUR_WORKSPACE_ID",
data_model=Invoice,
ignore_validation_errors=True, # Don't raise on schema mismatch
)
if doc.parsed:
print(doc.parsed.invoice_number) # Type-safe access
else:
print("Validation failed, falling back to raw data")
print(doc.data)python -m affinda generate_models --workspace-id=ID # All types in a workspace
python -m affinda generate_models --document-type-id=ID # Single document type
python -m affinda generate_models --organization-id=ID # All types in an org
python -m affinda generate_models --output-dir=./my_models # Custom output path
python -m affinda generate_models --help # All options
**优雅处理验证错误:**
```python
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(
file=f,
workspace="YOUR_WORKSPACE_ID",
data_model=Invoice,
ignore_validation_errors=True, # 架构不匹配时不抛出异常
)
if doc.parsed:
print(doc.parsed.invoice_number) # 类型安全访问
else:
print("验证失败,回退到原始数据")
print(doc.data)python -m affinda generate_models --workspace-id=ID # 工作区中的所有类型
python -m affinda generate_models --document-type-id=ID # 单个文档类型
python -m affinda generate_models --organization-id=ID # 组织中的所有类型
python -m affinda generate_models --output-dir=./my_models # 自定义输出路径
python -m affinda generate_models --help # 所有选项undefinedundefined
This creates an `./affinda-interfaces/` directory with one `.ts` file per document type. Each file contains TypeScript interfaces with all your configured fields.
**Use the generated interfaces for type-safe access:**
```typescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
import { Invoice } from "./affinda-interfaces/Invoice";
const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);
const doc = await client.createDocument({
file: fs.createReadStream("invoice.pdf"),
workspace: "YOUR_WORKSPACE_ID",
});
const parsed = doc.data as Invoice;
console.log(parsed.invoiceNumber); // Type-safe access
console.log(parsed.totalAmount);npm exec affinda-generate-interfaces -- --workspace-id=ID # All types in workspace
npm exec affinda-generate-interfaces -- --document-type-id=ID # Single document type
npm exec affinda-generate-interfaces -- --output-dir=./types # Custom output path
npm exec affinda-generate-interfaces -- --help # All options
此命令会创建`./affinda-interfaces/`目录,每个文档类型对应一个`.ts`文件。每个文件包含TypeScript接口,其中包含所有配置字段。
**使用生成的接口实现类型安全访问:**
```typescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
import { Invoice } from "./affinda-interfaces/Invoice";
const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);
const doc = await client.createDocument({
file: fs.createReadStream("invoice.pdf"),
workspace: "YOUR_WORKSPACE_ID",
});
const parsed = doc.data as Invoice;
console.log(parsed.invoiceNumber); # 类型安全访问
console.log(parsed.totalAmount);npm exec affinda-generate-interfaces -- --workspace-id=ID # 工作区中的所有类型
npm exec affinda-generate-interfaces -- --document-type-id=ID # 单个文档类型
npm exec affinda-generate-interfaces -- --output-dir=./types # 自定义输出路径
npm exec affinda-generate-interfaces -- --help # 所有选项doc = client.create_document(file=f, workspace="WORKSPACE_ID")doc = client.create_document(file=f, workspace="WORKSPACE_ID")
**Best for**: Interactive apps, low volume, quick prototyping.
**Limitation**: Can timeout on large or complex documents.
**最佳适用场景**:交互式应用、低数据量场景、快速原型开发。
**限制**:处理大型或复杂文档时可能超时。wait=falseGET /documents/{id}readytruedoc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)wait=falseGET /documents/{id}readytruedoc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)
**Best for**: Batch processing, large documents, high volume.
**最佳适用场景**:批量处理、大型文档、高数据量场景。undefinedundefined
**Best for**: Real-time workflows, event-driven architectures, production systems.
See the [Webhooks section](#webhooks) below for setup details.
**最佳适用场景**:实时工作流、事件驱动架构、生产系统。
设置详情请见下方的[Webhook部分](#webhooks)。| Parameter | Type | Description |
|---|---|---|
| binary | The document file. Mutually exclusive with |
| string | URL to download and process. Mutually exclusive with |
| string | Workspace identifier (required). |
| string | Document type identifier (optional -- enables skip-classification). |
| boolean | |
| string | Your internal ID for the document. |
| ISO-8601 | Auto-delete the document at this time. |
| boolean | Reject if duplicate of existing document. |
| boolean | Route to low-priority queue (no rate limit). |
| boolean | Return compact response (with |
| boolean | Delete data after parsing (requires |
| boolean | Make document viewable in validation UI. Set |
| 参数 | 类型 | 描述 |
|---|---|---|
| 二进制 | 文档文件,与 |
| 字符串 | 用于下载并处理的URL,与 |
| 字符串 | 工作区标识符(必填)。 |
| 字符串 | 文档类型标识符(可选——启用后可跳过分类步骤)。 |
| 布尔值 | |
| 字符串 | 你系统中的文档内部ID。 |
| ISO-8601格式 | 文档自动删除的时间。 |
| 布尔值 | 如果是现有文档的副本则拒绝上传。 |
| 布尔值 | 路由到低优先级队列(无速率限制)。 |
| 布尔值 | 返回紧凑响应(仅当 |
| 布尔值 | 解析完成后删除数据(需设置 |
| 布尔值 | 允许在验证UI中查看文档。设置为 |
| Field | Description |
|---|---|
| Raw extracted text before processing |
| Processed value after formatting and mapping |
| Overall confidence score (0-1) |
| Confidence the field was correctly classified |
| Confidence text was correctly extracted |
| Whether the value has been validated (any means) |
| Whether validated by a human |
| Whether auto-validated by rules |
| Bounding box coordinates on the page |
| Which page the data appears on |
readyfailedlanguagepagesisOcrdocrConfidencereviewUrlisConfirmedisRejectedisArchivederrorCodeerrorDetail| 字段 | 描述 |
|---|---|
| 处理前提取的原始文本 |
| 格式化和映射后的处理值 |
| 整体置信度得分(0-1) |
| 字段分类正确的置信度 |
| 文本提取正确的置信度 |
| 该值是否已通过验证(无论通过何种方式) |
| 是否由人工验证 |
| 是否由规则自动验证 |
| 字段在页面上的 bounding box 坐标 |
| 数据所在的页码 |
readyfailedlanguagepagesisOcrdocrConfidencereviewUrlisConfirmedisRejectedisArchivederrorCodeerrorDetail| Event | Description |
|---|---|
| Parsing finished (succeeded or failed) |
| Parsing succeeded |
| Parsing failed |
| Document confirmed (manually or auto) |
| Classification finished |
| Classification succeeded |
| Classification failed |
| Document rejected |
| 事件 | 描述 |
|---|---|
| 解析完成(成功或失败) |
| 解析成功 |
| 解析失败 |
| 文档已确认(人工或自动) |
| 分类完成 |
| 分类成功 |
| 分类失败 |
| 文档被拒绝 |
POST /v3/resthook_subscriptionstargetUrleventorganizationworkspacePOSTtargetUrlX-Hook-Secret200POST /v3/resthook_subscriptions/activate200POST /v3/resthook_subscriptionstargetUrleventorganizationworkspacetargetUrlPOSTX-Hook-Secret200POST /v3/resthook_subscriptions/activate200X-Hook-Signature<timestamp>.<signature>import hmac, hashlib, json, time
def verify_webhook(request, sig_key: bytes) -> bool:
sig_header = request.headers["X-Hook-Signature"]
timestamp, sig_received = sig_header.split(".")
sig_calculated = hmac.new(sig_key, msg=request.body, digestmod=hashlib.sha256).hexdigest()
sig_ok = hmac.compare_digest(sig_received, sig_calculated)
body = json.loads(request.body)
time_ok = (time.time() - body["timestamp"]) < 600 # 10 min window
return sig_ok and time_okX-Hook-Signature<timestamp>.<signature>import hmac, hashlib, json, time
def verify_webhook(request, sig_key: bytes) -> bool:
sig_header = request.headers["X-Hook-Signature"]
timestamp, sig_received = sig_header.split(".")
sig_calculated = hmac.new(sig_key, msg=request.body, digestmod=hashlib.sha256).hexdigest()
sig_ok = hmac.compare_digest(sig_received, sig_calculated)
body = json.loads(request.body)
time_ok = (time.time() - body["timestamp"]) < 600 # 10分钟时间窗口
return sig_ok and time_okidentifier{
"id": "e3bd1942-...",
"event": "document.parse.completed",
"timestamp": 1665637107,
"payload": {
"identifier": "abcdXYZ",
"ready": true,
"failed": false,
"fileName": "invoice.pdf",
"workspace": { "identifier": "...", "name": "..." }
}
}identifier{
"id": "e3bd1942-...",
"event": "document.parse.completed",
"timestamp": 1665637107,
"payload": {
"identifier": "abcdXYZ",
"ready": true,
"failed": false,
"fileName": "invoice.pdf",
"workspace": { "identifier": "...", "name": "..." }
}
}200410200410reviewUrlidentifierreviewUrlGET /documents/{id}reviewUrlidentifierGET /documents/{id}reviewUrl| Method | Endpoint | Description |
|---|---|---|
| POST | | Upload and parse a document |
| GET | | Retrieve a document and its data |
| PATCH | | Update document fields/status |
| DELETE | | Delete a document |
| GET | | List documents (with filtering) |
| GET | | Download redacted PDF |
| 方法 | 端点 | 描述 |
|---|---|---|
| POST | | 上传并解析文档 |
| GET | | 获取文档及其数据 |
| PATCH | | 更新文档字段/状态 |
| DELETE | | 删除文档 |
| GET | | 列出文档(支持过滤) |
| GET | | 下载已脱敏的PDF |
| Method | Endpoint | Description |
|---|---|---|
| GET | | List workspaces |
| POST | | Create a workspace |
| GET | | Get workspace details |
| PATCH | | Update workspace |
| DELETE | | Delete workspace |
| 方法 | 端点 | 描述 |
|---|---|---|
| GET | | 列出工作区 |
| POST | | 创建工作区 |
| GET | | 获取工作区详情 |
| PATCH | | 更新工作区 |
| DELETE | | 删除工作区 |
| Method | Endpoint | Description |
|---|---|---|
| GET | | List annotations for a document |
| POST | | Create an annotation |
| PATCH | | Update an annotation |
| POST | | Batch create annotations |
| POST | | Batch update annotations |
| POST | | Batch delete annotations |
| 方法 | 端点 | 描述 |
|---|---|---|
| GET | | 列出文档的注释 |
| POST | | 创建注释 |
| PATCH | | 更新注释 |
| POST | | 批量创建注释 |
| POST | | 批量更新注释 |
| POST | | 批量删除注释 |
| Method | Endpoint | Description |
|---|---|---|
| POST | | Create subscription |
| POST | | Activate with X-Hook-Secret |
| GET | | List subscriptions |
| PATCH | | Update subscription |
| DELETE | | Delete subscription |
| 方法 | 端点 | 描述 |
|---|---|---|
| POST | | 创建订阅 |
| POST | | 使用X-Hook-Secret激活订阅 |
| GET | | 列出订阅 |
| PATCH | | 更新订阅 |
| DELETE | | 删除订阅 |
| Pattern | Description | Webhook Event |
|---|---|---|
| W1 -- No validation | Upload -> get JSON. No rules, no human review. | |
| W2 -- Client-side validation | Same as W1; your system applies rules after export. | |
| W3 -- Affinda validation logic | Affinda validates automatically; no human review. | |
| W4 -- Review all in Affinda | Humans review every document in Affinda UI. | |
| W5 -- Client rules + Affinda review | Your rules, pushed back as warnings; flagged docs reviewed in Affinda. | |
| W6 -- Full Affinda validation | Affinda validates; exceptions reviewed in Affinda UI. | |
| 模式 | 描述 | Webhook事件 |
|---|---|---|
| W1 — 无验证 | 上传 -> 获取JSON。无规则,无人工审核。 | |
| W2 — 客户端验证 | 与W1相同;你的系统在导出后应用规则。 | |
| W3 — Affinda验证逻辑 | Affinda自动验证;无人工审核。 | |
| W4 — 所有文档在Affinda中审核 | 人工在Affinda UI中审核每份文档。 | |
| W5 — 客户端规则 + Affinda审核 | 你的规则以警告形式推送回Affinda;标记的文档在Affinda中审核。 | |
| W6 — 完整Affinda验证 | Affinda自动验证;异常文档在Affinda UI中审核。 | |
| Error Code | Meaning | Resolution |
|---|---|---|
| Document rejected as duplicate | Disable "Reject duplicates" or upload unique files |
| No extractable text | Check file is not a photo of an object; try OCR |
| File is corrupted | Re-upload a valid file |
| Exceeds 20 MB limit | Reduce file size |
| Unsupported format | Use PDF, DOC, DOCX, XLSX, ODT, RTF, TXT, HTML, PNG, JPG, TIFF, JPEG |
| Out of credits | Purchase more credits and reparse |
| File is password-protected | Remove password and re-upload |
| No matching document type | Check document type configuration or disable "Reject Documents" |
| System capacity exceeded | Wait and retry |
| Exceeded timeout | Contact Affinda for custom limits |
| 错误代码 | 含义 | 解决方法 |
|---|---|---|
| 文档因重复被拒绝 | 禁用“拒绝重复文档”选项或上传唯一文件 |
| 未提取到可识别文本 | 检查文件是否为实物照片;尝试OCR处理 |
| 文件已损坏 | 重新上传有效文件 |
| 超过20 MB大小限制 | 减小文件大小 |
| 不支持的格式 | 使用PDF、DOC、DOCX、XLSX、ODT、RTF、TXT、HTML、PNG、JPG、TIFF、JPEG格式 |
| 解析额度耗尽 | 购买更多额度后重新解析 |
| 文件受密码保护 | 移除密码后重新上传 |
| 无匹配的文档类型 | 检查文档类型配置或禁用“拒绝文档”选项 |
| 系统容量超出 | 等待后重试 |
| 超出超时限制 | 联系Affinda申请自定义限制 |