nutrient-document-processing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Nutrient Document Processing

Nutrient 文档处理

Process, convert, extract, redact, sign, and manipulate documents using the Nutrient DWS Processor API.

使用Nutrient DWS Processor API处理、转换、提取、脱敏、签名和操作文档。

Setup

配置

You need a Nutrient DWS API key. Get one free at https://dashboard.nutrient.io/sign_up/?product=processor.

你需要一个Nutrient DWS API密钥。可在https://dashboard.nutrient.io/sign_up/?product=processor免费获取。

Option 1: MCP Server (Recommended)

选项1：MCP Server（推荐）

If your agent supports MCP (Model Context Protocol), use the Nutrient DWS MCP Server. It provides all operations as native tools.

Configure your MCP client (e.g.,

claude_desktop_config.json

.mcp.json

json

{
  "mcpServers": {
    "nutrient-dws": {
      "command": "npx",
      "args": ["-y", "@nutrient-sdk/dws-mcp-server"],
      "env": {
        "NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
        "SANDBOX_PATH": "/path/to/working/directory"
      }
    }
  }
}

Then use the MCP tools directly (e.g.,

convert_to_pdf

extract_text

redact

, etc.).

如果你的Agent支持MCP（Model Context Protocol），请使用Nutrient DWS MCP Server。它将所有操作作为原生工具提供。

配置你的MCP客户端（例如

claude_desktop_config.json

或

.mcp.json

）：

json

{
  "mcpServers": {
    "nutrient-dws": {
      "command": "npx",
      "args": ["-y", "@nutrient-sdk/dws-mcp-server"],
      "env": {
        "NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
        "SANDBOX_PATH": "/path/to/working/directory"
      }
    }
  }
}

然后可直接使用MCP工具（例如

convert_to_pdf

、

extract_text

、

redact

等）。

Option 2: Direct API (curl)

选项2：直接调用API（curl）

For agents without MCP support, call the API directly:

bash

export NUTRIENT_API_KEY="your_api_key_here"

All requests go to

https://api.nutrient.io/build

as multipart POST with an

instructions

JSON field.

对于不支持MCP的Agent，可直接调用API：

bash

export NUTRIENT_API_KEY="your_api_key_here"

所有请求均以multipart POST方式发送至

https://api.nutrient.io/build

，并包含

instructions

JSON字段。

Operations

操作功能

1. Convert Documents

1. 文档转换

Convert between PDF, DOCX, XLSX, PPTX, HTML, and image formats.

HTML to PDF:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "index.html=@index.html" \
  -F 'instructions={"parts":[{"html":"index.html"}]}' \
  -o output.pdf

DOCX to PDF:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.docx=@document.docx" \
  -F 'instructions={"parts":[{"file":"document.docx"}]}' \
  -o output.pdf

PDF to DOCX/XLSX/PPTX:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
  -o output.docx

Image to PDF:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "image.jpg=@image.jpg" \
  -F 'instructions={"parts":[{"file":"image.jpg"}]}' \
  -o output.pdf

在PDF、DOCX、XLSX、PPTX、HTML和图片格式之间进行转换。

HTML转PDF：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "index.html=@index.html" \
  -F 'instructions={"parts":[{"html":"index.html"}]}' \
  -o output.pdf

DOCX转PDF：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.docx=@document.docx" \
  -F 'instructions={"parts":[{"file":"document.docx"}]}' \
  -o output.pdf

PDF转DOCX/XLSX/PPTX：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
  -o output.docx

图片转PDF：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "image.jpg=@image.jpg" \
  -F 'instructions={"parts":[{"file":"image.jpg"}]}' \
  -o output.pdf

2. Extract Text and Data

2. 提取文本和数据

Extract plain text:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
  -o output.txt

Extract tables (as JSON, CSV, or Excel):

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
  -o tables.xlsx

Extract key-value pairs:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"extraction","strategy":"key-values"}]}' \
  -o result.json

提取纯文本：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
  -o output.txt

提取表格（保存为JSON、CSV或Excel格式）：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
  -o tables.xlsx

提取键值对：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"extraction","strategy":"key-values"}]}' \
  -o result.json

3. OCR Scanned Documents

3. 扫描文档OCR识别

Apply OCR to scanned PDFs or images, producing searchable PDFs with selectable text.

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "scanned.pdf=@scanned.pdf" \
  -F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
  -o searchable.pdf

Supported languages:

english

german

french

spanish

italian

portuguese

dutch

swedish

danish

norwegian

finnish

polish

czech

turkish

japanese

korean

chinese-simplified

chinese-traditional

arabic

hebrew

thai

hindi

russian

, and more.

对扫描PDF或图片进行OCR识别，生成可搜索、可选中文本的PDF。

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "scanned.pdf=@scanned.pdf" \
  -F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
  -o searchable.pdf

支持的语言：

english

、

german

、

french

、

spanish

、

italian

、

portuguese

、

dutch

、

swedish

、

danish

、

norwegian

、

finnish

、

polish

、

czech

、

turkish

、

japanese

、

korean

、

chinese-simplified

、

chinese-traditional

、

arabic

、

hebrew

、

thai

、

hindi

、

russian

等。

4. Redact Sensitive Information

4. 敏感信息脱敏

Pattern-based redaction (preset patterns):

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","preset":"social-security-number"}]}' \
  -o redacted.pdf

Available presets:

social-security-number

credit-card-number

email-address

north-american-phone-number

international-phone-number

date

url

ipv4

ipv6

mac-address

us-zip-code

vin

time

Regex-based redaction:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","regex":"\\b[A-Z]{2}\\d{6}\\b"}]}' \
  -o redacted.pdf

AI-powered PII redaction:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"ai_redaction","criteria":"All personally identifiable information"}]}' \
  -o redacted.pdf

The

criteria

field accepts natural language (e.g., "Names and phone numbers", "Protected health information", "Financial account numbers").

基于预设规则的脱敏：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","preset":"social-security-number"}]}' \
  -o redacted.pdf

可用预设规则：

social-security-number

、

credit-card-number

、

email-address

、

north-american-phone-number

、

international-phone-number

、

date

、

url

、

ipv4

、

ipv6

、

mac-address

、

us-zip-code

、

vin

、

time

。

基于正则表达式的脱敏：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","regex":"\\b[A-Z]{2}\\d{6}\\b"}]}' \
  -o redacted.pdf

AI驱动的PII脱敏：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"ai_redaction","criteria":"All personally identifiable information"}]}' \
  -o redacted.pdf

criteria

字段支持自然语言描述（例如“姓名和电话号码”、“受保护的健康信息”、“金融账户号码”）。

5. Add Watermarks

5. 添加水印

Text watermark:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":48,"fontColor":"#FF0000","opacity":0.5,"rotation":45,"width":"50%","height":"50%"}]}' \
  -o watermarked.pdf

Image watermark:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F "logo.png=@logo.png" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","imagePath":"logo.png","width":"30%","height":"30%","opacity":0.3}]}' \
  -o watermarked.pdf

文本水印：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":48,"fontColor":"#FF0000","opacity":0.5,"rotation":45,"width":"50%","height":"50%"}]}' \
  -o watermarked.pdf

图片水印：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F "logo.png=@logo.png" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","imagePath":"logo.png","width":"30%","height":"30%","opacity":0.3}]}' \
  -o watermarked.pdf

6. Digital Signatures

6. 数字签名

Sign a PDF with CMS signature:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms","signerName":"John Doe","reason":"Approval","location":"New York"}]}' \
  -o signed.pdf

Sign with CAdES-B-LT (long-term validation):

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cades","cadesLevel":"b-lt","signerName":"Jane Smith"}]}' \
  -o signed.pdf

使用CMS签名签署PDF：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms","signerName":"John Doe","reason":"Approval","location":"New York"}]}' \
  -o signed.pdf

使用CAdES-B-LT（长期验证）签署：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cades","cadesLevel":"b-lt","signerName":"Jane Smith"}]}' \
  -o signed.pdf

7. Form Filling (Instant JSON)

7. 表单填写（Instant JSON格式）

Fill PDF form fields using Instant JSON format:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "form.pdf=@form.pdf" \
  -F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","fields":[{"name":"firstName","value":"John"},{"name":"lastName","value":"Doe"},{"name":"email","value":"john@example.com"}]}]}' \
  -o filled.pdf

使用Instant JSON格式填写PDF表单字段：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "form.pdf=@form.pdf" \
  -F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","fields":[{"name":"firstName","value":"John"},{"name":"lastName","value":"Doe"},{"name":"email","value":"john@example.com"}]}]}' \
  -o filled.pdf

8. Merge and Split PDFs

8. PDF合并与拆分

Merge multiple PDFs:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "doc1.pdf=@doc1.pdf" \
  -F "doc2.pdf=@doc2.pdf" \
  -F 'instructions={"parts":[{"file":"doc1.pdf"},{"file":"doc2.pdf"}]}' \
  -o merged.pdf

Extract specific pages:

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":4}}]}' \
  -o pages1-5.pdf

合并多个PDF：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "doc1.pdf=@doc1.pdf" \
  -F "doc2.pdf=@doc2.pdf" \
  -F 'instructions={"parts":[{"file":"doc1.pdf"},{"file":"doc2.pdf"}]}' \
  -o merged.pdf

提取指定页面：

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":4}}]}' \
  -o pages1-5.pdf

9. Render PDF Pages as Images

9. 将PDF页面渲染为图片

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":0}}],"output":{"type":"png","dpi":300}}' \
  -o page1.png

bash

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":0}}],"output":{"type":"png","dpi":300}}' \
  -o page1.png

10. Check Credits

10. 查询额度

bash

curl -X GET https://api.nutrient.io/credits \
  -H "Authorization: Bearer $NUTRIENT_API_KEY"

bash

curl -X GET https://api.nutrient.io/credits \
  -H "Authorization: Bearer $NUTRIENT_API_KEY"

Best Practices

最佳实践

Use the MCP server when your agent supports it — it handles file I/O, error handling, and sandboxing automatically.
Set
SANDBOX_PATH
to restrict file access to a specific directory.
Check credit balance before batch operations to avoid interruptions.
Use AI redaction for complex PII detection; use preset/regex redaction for known patterns (faster, cheaper).
Chain operations — the API supports multiple actions in a single call (e.g., OCR then redact).

当你的Agent支持时，请使用MCP Server——它会自动处理文件I/O、错误处理和沙箱隔离。
**设置
```
SANDBOX_PATH
```
**以限制文件访问至特定目录。
批量操作前查询额度余额，避免操作中断。
复杂PII检测使用AI脱敏；已知模式使用预设/正则脱敏（速度更快、成本更低）。
链式操作——API支持单次调用执行多个操作（例如先OCR再脱敏）。

Troubleshooting

故障排除

Issue	Solution
401 Unauthorized	Check your API key is valid and has credits
413 Payload Too Large	Files must be under 100 MB
Slow AI redaction	AI analysis takes 60–120 seconds; this is normal
OCR quality poor	Try a different language parameter or improve scan quality
Missing text in extraction	Run OCR first on scanned documents

问题	解决方案
401 未授权	检查你的API密钥是否有效且有可用额度
413 请求体过大	文件大小必须小于100 MB
AI脱敏速度慢	AI分析需要60-120秒，此为正常情况
OCR识别质量差	尝试更换语言参数或提升扫描质量
提取文本缺失	先对扫描文档执行OCR识别

nutrient-document-processing

Original

Translation

Nutrient Document Processing

Nutrient 文档处理

Setup

配置

Option 1: MCP Server (Recommended)

选项1：MCP Server（推荐）

Option 2: Direct API (curl)

选项2：直接调用API（curl）

Operations

操作功能

1. Convert Documents

1. 文档转换

2. Extract Text and Data

2. 提取文本和数据

3. OCR Scanned Documents

3. 扫描文档OCR识别

4. Redact Sensitive Information

4. 敏感信息脱敏

5. Add Watermarks

5. 添加水印

6. Digital Signatures

6. 数字签名

7. Form Filling (Instant JSON)

7. 表单填写（Instant JSON格式）

8. Merge and Split PDFs

8. PDF合并与拆分

9. Render PDF Pages as Images

9. 将PDF页面渲染为图片

10. Check Credits

10. 查询额度

Best Practices

最佳实践

Troubleshooting

故障排除

More Information

更多信息