nutrient-document-processing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Nutrient Document Processing

Nutrient 文档处理

Process, convert, extract, redact, sign, and manipulate documents using the Nutrient DWS Processor API.
使用Nutrient DWS Processor API处理、转换、提取、脱敏、签名和操作文档。

Setup

配置

You need a Nutrient DWS API key. Get one free at https://dashboard.nutrient.io/sign_up/?product=processor.
你需要一个Nutrient DWS API密钥。可在https://dashboard.nutrient.io/sign_up/?product=processor免费获取。

Option 1: MCP Server (Recommended)

选项1:MCP Server(推荐)

If your agent supports MCP (Model Context Protocol), use the Nutrient DWS MCP Server. It provides all operations as native tools.
Configure your MCP client (e.g.,
claude_desktop_config.json
or
.mcp.json
):
json
{
  "mcpServers": {
    "nutrient-dws": {
      "command": "npx",
      "args": ["-y", "@nutrient-sdk/dws-mcp-server"],
      "env": {
        "NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
        "SANDBOX_PATH": "/path/to/working/directory"
      }
    }
  }
}
Then use the MCP tools directly (e.g.,
convert_to_pdf
,
extract_text
,
redact
, etc.).
如果你的Agent支持MCP(Model Context Protocol),请使用Nutrient DWS MCP Server。它将所有操作作为原生工具提供。
配置你的MCP客户端(例如
claude_desktop_config.json
.mcp.json
):
json
{
  "mcpServers": {
    "nutrient-dws": {
      "command": "npx",
      "args": ["-y", "@nutrient-sdk/dws-mcp-server"],
      "env": {
        "NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
        "SANDBOX_PATH": "/path/to/working/directory"
      }
    }
  }
}
然后可直接使用MCP工具(例如
convert_to_pdf
extract_text
redact
等)。

Option 2: Direct API (curl)

选项2:直接调用API(curl)

For agents without MCP support, call the API directly:
bash
export NUTRIENT_API_KEY="your_api_key_here"
All requests go to
https://api.nutrient.io/build
as multipart POST with an
instructions
JSON field.
对于不支持MCP的Agent,可直接调用API:
bash
export NUTRIENT_API_KEY="your_api_key_here"
所有请求均以multipart POST方式发送至
https://api.nutrient.io/build
,并包含
instructions
JSON字段。

Operations

操作功能

1. Convert Documents

1. 文档转换

Convert between PDF, DOCX, XLSX, PPTX, HTML, and image formats.
HTML to PDF:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "index.html=@index.html" \
  -F 'instructions={"parts":[{"html":"index.html"}]}' \
  -o output.pdf
DOCX to PDF:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.docx=@document.docx" \
  -F 'instructions={"parts":[{"file":"document.docx"}]}' \
  -o output.pdf
PDF to DOCX/XLSX/PPTX:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
  -o output.docx
Image to PDF:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "image.jpg=@image.jpg" \
  -F 'instructions={"parts":[{"file":"image.jpg"}]}' \
  -o output.pdf
在PDF、DOCX、XLSX、PPTX、HTML和图片格式之间进行转换。
HTML转PDF:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "index.html=@index.html" \
  -F 'instructions={"parts":[{"html":"index.html"}]}' \
  -o output.pdf
DOCX转PDF:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.docx=@document.docx" \
  -F 'instructions={"parts":[{"file":"document.docx"}]}' \
  -o output.pdf
PDF转DOCX/XLSX/PPTX:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
  -o output.docx
图片转PDF:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "image.jpg=@image.jpg" \
  -F 'instructions={"parts":[{"file":"image.jpg"}]}' \
  -o output.pdf

2. Extract Text and Data

2. 提取文本和数据

Extract plain text:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
  -o output.txt
Extract tables (as JSON, CSV, or Excel):
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
  -o tables.xlsx
Extract key-value pairs:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"extraction","strategy":"key-values"}]}' \
  -o result.json
提取纯文本:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
  -o output.txt
提取表格(保存为JSON、CSV或Excel格式):
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
  -o tables.xlsx
提取键值对:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"extraction","strategy":"key-values"}]}' \
  -o result.json

3. OCR Scanned Documents

3. 扫描文档OCR识别

Apply OCR to scanned PDFs or images, producing searchable PDFs with selectable text.
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "scanned.pdf=@scanned.pdf" \
  -F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
  -o searchable.pdf
Supported languages:
english
,
german
,
french
,
spanish
,
italian
,
portuguese
,
dutch
,
swedish
,
danish
,
norwegian
,
finnish
,
polish
,
czech
,
turkish
,
japanese
,
korean
,
chinese-simplified
,
chinese-traditional
,
arabic
,
hebrew
,
thai
,
hindi
,
russian
, and more.
对扫描PDF或图片进行OCR识别,生成可搜索、可选中文本的PDF。
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "scanned.pdf=@scanned.pdf" \
  -F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
  -o searchable.pdf
支持的语言:
english
german
french
spanish
italian
portuguese
dutch
swedish
danish
norwegian
finnish
polish
czech
turkish
japanese
korean
chinese-simplified
chinese-traditional
arabic
hebrew
thai
hindi
russian
等。

4. Redact Sensitive Information

4. 敏感信息脱敏

Pattern-based redaction (preset patterns):
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","preset":"social-security-number"}]}' \
  -o redacted.pdf
Available presets:
social-security-number
,
credit-card-number
,
email-address
,
north-american-phone-number
,
international-phone-number
,
date
,
url
,
ipv4
,
ipv6
,
mac-address
,
us-zip-code
,
vin
,
time
.
Regex-based redaction:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","regex":"\\b[A-Z]{2}\\d{6}\\b"}]}' \
  -o redacted.pdf
AI-powered PII redaction:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"ai_redaction","criteria":"All personally identifiable information"}]}' \
  -o redacted.pdf
The
criteria
field accepts natural language (e.g., "Names and phone numbers", "Protected health information", "Financial account numbers").
基于预设规则的脱敏:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","preset":"social-security-number"}]}' \
  -o redacted.pdf
可用预设规则:
social-security-number
credit-card-number
email-address
north-american-phone-number
international-phone-number
date
url
ipv4
ipv6
mac-address
us-zip-code
vin
time
基于正则表达式的脱敏:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","regex":"\\b[A-Z]{2}\\d{6}\\b"}]}' \
  -o redacted.pdf
AI驱动的PII脱敏:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"ai_redaction","criteria":"All personally identifiable information"}]}' \
  -o redacted.pdf
criteria
字段支持自然语言描述(例如“姓名和电话号码”、“受保护的健康信息”、“金融账户号码”)。

5. Add Watermarks

5. 添加水印

Text watermark:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":48,"fontColor":"#FF0000","opacity":0.5,"rotation":45,"width":"50%","height":"50%"}]}' \
  -o watermarked.pdf
Image watermark:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F "logo.png=@logo.png" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","imagePath":"logo.png","width":"30%","height":"30%","opacity":0.3}]}' \
  -o watermarked.pdf
文本水印:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":48,"fontColor":"#FF0000","opacity":0.5,"rotation":45,"width":"50%","height":"50%"}]}' \
  -o watermarked.pdf
图片水印:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F "logo.png=@logo.png" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","imagePath":"logo.png","width":"30%","height":"30%","opacity":0.3}]}' \
  -o watermarked.pdf

6. Digital Signatures

6. 数字签名

Sign a PDF with CMS signature:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms","signerName":"John Doe","reason":"Approval","location":"New York"}]}' \
  -o signed.pdf
Sign with CAdES-B-LT (long-term validation):
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cades","cadesLevel":"b-lt","signerName":"Jane Smith"}]}' \
  -o signed.pdf
使用CMS签名签署PDF:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms","signerName":"John Doe","reason":"Approval","location":"New York"}]}' \
  -o signed.pdf
使用CAdES-B-LT(长期验证)签署:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cades","cadesLevel":"b-lt","signerName":"Jane Smith"}]}' \
  -o signed.pdf

7. Form Filling (Instant JSON)

7. 表单填写(Instant JSON格式)

Fill PDF form fields using Instant JSON format:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "form.pdf=@form.pdf" \
  -F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","fields":[{"name":"firstName","value":"John"},{"name":"lastName","value":"Doe"},{"name":"email","value":"john@example.com"}]}]}' \
  -o filled.pdf
使用Instant JSON格式填写PDF表单字段:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "form.pdf=@form.pdf" \
  -F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","fields":[{"name":"firstName","value":"John"},{"name":"lastName","value":"Doe"},{"name":"email","value":"john@example.com"}]}]}' \
  -o filled.pdf

8. Merge and Split PDFs

8. PDF合并与拆分

Merge multiple PDFs:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "doc1.pdf=@doc1.pdf" \
  -F "doc2.pdf=@doc2.pdf" \
  -F 'instructions={"parts":[{"file":"doc1.pdf"},{"file":"doc2.pdf"}]}' \
  -o merged.pdf
Extract specific pages:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":4}}]}' \
  -o pages1-5.pdf
合并多个PDF:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "doc1.pdf=@doc1.pdf" \
  -F "doc2.pdf=@doc2.pdf" \
  -F 'instructions={"parts":[{"file":"doc1.pdf"},{"file":"doc2.pdf"}]}' \
  -o merged.pdf
提取指定页面:
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":4}}]}' \
  -o pages1-5.pdf

9. Render PDF Pages as Images

9. 将PDF页面渲染为图片

bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":0}}],"output":{"type":"png","dpi":300}}' \
  -o page1.png
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":0}}],"output":{"type":"png","dpi":300}}' \
  -o page1.png

10. Check Credits

10. 查询额度

bash
curl -X GET https://api.nutrient.io/credits \
  -H "Authorization: Bearer $NUTRIENT_API_KEY"
bash
curl -X GET https://api.nutrient.io/credits \
  -H "Authorization: Bearer $NUTRIENT_API_KEY"

Best Practices

最佳实践

  1. Use the MCP server when your agent supports it — it handles file I/O, error handling, and sandboxing automatically.
  2. Set
    SANDBOX_PATH
    to restrict file access to a specific directory.
  3. Check credit balance before batch operations to avoid interruptions.
  4. Use AI redaction for complex PII detection; use preset/regex redaction for known patterns (faster, cheaper).
  5. Chain operations — the API supports multiple actions in a single call (e.g., OCR then redact).
  1. 当你的Agent支持时,请使用MCP Server——它会自动处理文件I/O、错误处理和沙箱隔离。
  2. **设置
    SANDBOX_PATH
    **以限制文件访问至特定目录。
  3. 批量操作前查询额度余额,避免操作中断。
  4. 复杂PII检测使用AI脱敏;已知模式使用预设/正则脱敏(速度更快、成本更低)。
  5. 链式操作——API支持单次调用执行多个操作(例如先OCR再脱敏)。

Troubleshooting

故障排除

IssueSolution
401 UnauthorizedCheck your API key is valid and has credits
413 Payload Too LargeFiles must be under 100 MB
Slow AI redactionAI analysis takes 60–120 seconds; this is normal
OCR quality poorTry a different language parameter or improve scan quality
Missing text in extractionRun OCR first on scanned documents
问题解决方案
401 未授权检查你的API密钥是否有效且有可用额度
413 请求体过大文件大小必须小于100 MB
AI脱敏速度慢AI分析需要60-120秒,此为正常情况
OCR识别质量差尝试更换语言参数或提升扫描质量
提取文本缺失先对扫描文档执行OCR识别

More Information

更多信息

  • 完整API参考——详细的端点、参数和错误代码
  • API沙箱——交互式API测试
  • API文档——官方指南
  • MCP Server仓库——源代码和问题反馈