nutrient-document-processing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Nutrient Document Processing

Nutrient 文档处理

Process documents with the Nutrient DWS Processor API. Convert formats, extract text and tables, OCR scanned documents, redact PII, add watermarks, digitally sign, and fill PDF forms.
使用Nutrient DWS Processor API处理文档。支持格式转换、提取文本和表格、OCR识别扫描文档、脱敏PII信息、添加水印、数字签名以及填充PDF表单。

Setup

配置

bash
export NUTRIENT_API_KEY="pdf_live_..."
All requests go to
https://api.nutrient.io/build
as multipart POST with an
instructions
JSON field.
bash
export NUTRIENT_API_KEY="pdf_live_..."
所有请求均以multipart POST方式发送至
https://api.nutrient.io/build
,需包含
instructions
JSON字段。

Operations

操作

Convert Documents

文档转换

bash
undefined
bash
undefined

DOCX to PDF

DOCX转PDF

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.docx=@document.docx"
-F 'instructions={"parts":[{"file":"document.docx"}]}'
-o output.pdf
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.docx=@document.docx"
-F 'instructions={"parts":[{"file":"document.docx"}]}'
-o output.pdf

PDF to DOCX

PDF转DOCX

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}'
-o output.docx
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}'
-o output.docx

HTML to PDF

HTML转PDF

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "index.html=@index.html"
-F 'instructions={"parts":[{"html":"index.html"}]}'
-o output.pdf

Supported inputs: PDF, DOCX, XLSX, PPTX, DOC, XLS, PPT, PPS, PPSX, ODT, RTF, HTML, JPG, PNG, TIFF, HEIC, GIF, WebP, SVG, TGA, EPS.
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "index.html=@index.html"
-F 'instructions={"parts":[{"html":"index.html"}]}'
-o output.pdf

支持的输入格式:PDF、DOCX、XLSX、PPTX、DOC、XLS、PPT、PPS、PPSX、ODT、RTF、HTML、JPG、PNG、TIFF、HEIC、GIF、WebP、SVG、TGA、EPS。

Extract Text and Data

提取文本与数据

bash
undefined
bash
undefined

Extract plain text

提取纯文本

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}'
-o output.txt
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}'
-o output.txt

Extract tables as Excel

提取表格为Excel格式

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}'
-o tables.xlsx
undefined
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}'
-o tables.xlsx
undefined

OCR Scanned Documents

OCR识别扫描文档

bash
undefined
bash
undefined

OCR to searchable PDF (supports 100+ languages)

OCR转换为可搜索PDF(支持100+种语言)

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "scanned.pdf=@scanned.pdf"
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}'
-o searchable.pdf

Languages: Supports 100+ languages via ISO 639-2 codes (e.g., `eng`, `deu`, `fra`, `spa`, `jpn`, `kor`, `chi_sim`, `chi_tra`, `ara`, `hin`, `rus`). Full language names like `english` or `german` also work. See the [complete OCR language table](https://www.nutrient.io/guides/document-engine/ocr/language-support/) for all supported codes.
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "scanned.pdf=@scanned.pdf"
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}'
-o searchable.pdf

支持语言:通过ISO 639-2代码支持100+种语言(例如:`eng`、`deu`、`fra`、`spa`、`jpn`、`kor`、`chi_sim`、`chi_tra`、`ara`、`hin`、`rus`)。也支持完整语言名称,如`english`或`german`。查看[完整OCR语言列表](https://www.nutrient.io/guides/document-engine/ocr/language-support/)获取所有支持的代码。

Redact Sensitive Information

脱敏敏感信息

bash
undefined
bash
undefined

Pattern-based (SSN, email)

基于预设规则(社保号、邮箱)

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}'
-o redacted.pdf
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}'
-o redacted.pdf

Regex-based

基于正则表达式

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\b[A-Z]{2}\d{6}\b"}}]}'
-o redacted.pdf

Presets: `social-security-number`, `email-address`, `credit-card-number`, `international-phone-number`, `north-american-phone-number`, `date`, `time`, `url`, `ipv4`, `ipv6`, `mac-address`, `us-zip-code`, `vin`.
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\b[A-Z]{2}\d{6}\b"}}]}'
-o redacted.pdf

预设规则:`social-security-number`、`email-address`、`credit-card-number`、`international-phone-number`、`north-american-phone-number`、`date`、`time`、`url`、`ipv4`、`ipv6`、`mac-address`、`us-zip-code`、`vin`。

Add Watermarks

添加水印

bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
  -o watermarked.pdf
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
  -o watermarked.pdf

Digital Signatures

数字签名

bash
undefined
bash
undefined

Self-signed CMS signature

自签名CMS签名

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}'
-o signed.pdf
undefined
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}'
-o signed.pdf
undefined

Fill PDF Forms

填充PDF表单

bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "form.pdf=@form.pdf" \
  -F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \
  -o filled.pdf
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "form.pdf=@form.pdf" \
  -F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \
  -o filled.pdf

MCP Server (Alternative)

MCP服务器(替代方案)

For native tool integration, use the MCP server instead of curl:
json
{
  "mcpServers": {
    "nutrient-dws": {
      "command": "npx",
      "args": ["-y", "@nutrient-sdk/dws-mcp-server"],
      "env": {
        "NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
        "SANDBOX_PATH": "/path/to/working/directory"
      }
    }
  }
}
如需原生工具集成,可使用MCP服务器替代curl:
json
{
  "mcpServers": {
    "nutrient-dws": {
      "command": "npx",
      "args": ["-y", "@nutrient-sdk/dws-mcp-server"],
      "env": {
        "NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
        "SANDBOX_PATH": "/path/to/working/directory"
      }
    }
  }
}

When to Use

适用场景

  • Converting documents between formats (PDF, DOCX, XLSX, PPTX, HTML, images)
  • Extracting text, tables, or key-value pairs from PDFs
  • OCR on scanned documents or images
  • Redacting PII before sharing documents
  • Adding watermarks to drafts or confidential documents
  • Digitally signing contracts or agreements
  • Filling PDF forms programmatically
  • 在PDF、DOCX、XLSX、PPTX、HTML、图片等格式之间转换文档
  • 从PDF中提取文本、表格或键值对
  • 对扫描文档或图片进行OCR识别
  • 共享文档前脱敏PII信息
  • 为草稿或机密文档添加水印
  • 为合同或协议添加数字签名
  • 以编程方式填充PDF表单

Links

相关链接