nutrient-document-processing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNutrient Document Processing
Nutrient 文档处理
Process documents with the Nutrient DWS Processor API. Convert formats, extract text and tables, OCR scanned documents, redact PII, add watermarks, digitally sign, and fill PDF forms.
使用Nutrient DWS Processor API处理文档。支持格式转换、提取文本和表格、OCR识别扫描文档、脱敏PII信息、添加水印、数字签名以及填充PDF表单。
Setup
配置
Get a free API key at https://dashboard.nutrient.io/sign_up/?product=processor
bash
export NUTRIENT_API_KEY="pdf_live_..."All requests go to as multipart POST with an JSON field.
https://api.nutrient.io/buildinstructionsbash
export NUTRIENT_API_KEY="pdf_live_..."所有请求均以multipart POST方式发送至,需包含 JSON字段。
https://api.nutrient.io/buildinstructionsOperations
操作
Convert Documents
文档转换
bash
undefinedbash
undefinedDOCX to PDF
DOCX转PDF
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.docx=@document.docx"
-F 'instructions={"parts":[{"file":"document.docx"}]}'
-o output.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.docx=@document.docx"
-F 'instructions={"parts":[{"file":"document.docx"}]}'
-o output.pdf
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.docx=@document.docx"
-F 'instructions={"parts":[{"file":"document.docx"}]}'
-o output.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.docx=@document.docx"
-F 'instructions={"parts":[{"file":"document.docx"}]}'
-o output.pdf
PDF to DOCX
PDF转DOCX
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}'
-o output.docx
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}'
-o output.docx
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}'
-o output.docx
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}'
-o output.docx
HTML to PDF
HTML转PDF
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "index.html=@index.html"
-F 'instructions={"parts":[{"html":"index.html"}]}'
-o output.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "index.html=@index.html"
-F 'instructions={"parts":[{"html":"index.html"}]}'
-o output.pdf
Supported inputs: PDF, DOCX, XLSX, PPTX, DOC, XLS, PPT, PPS, PPSX, ODT, RTF, HTML, JPG, PNG, TIFF, HEIC, GIF, WebP, SVG, TGA, EPS.curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "index.html=@index.html"
-F 'instructions={"parts":[{"html":"index.html"}]}'
-o output.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "index.html=@index.html"
-F 'instructions={"parts":[{"html":"index.html"}]}'
-o output.pdf
支持的输入格式:PDF、DOCX、XLSX、PPTX、DOC、XLS、PPT、PPS、PPSX、ODT、RTF、HTML、JPG、PNG、TIFF、HEIC、GIF、WebP、SVG、TGA、EPS。Extract Text and Data
提取文本与数据
bash
undefinedbash
undefinedExtract plain text
提取纯文本
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}'
-o output.txt
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}'
-o output.txt
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}'
-o output.txt
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}'
-o output.txt
Extract tables as Excel
提取表格为Excel格式
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}'
-o tables.xlsx
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}'
-o tables.xlsx
undefinedcurl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}'
-o tables.xlsx
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}'
-o tables.xlsx
undefinedOCR Scanned Documents
OCR识别扫描文档
bash
undefinedbash
undefinedOCR to searchable PDF (supports 100+ languages)
OCR转换为可搜索PDF(支持100+种语言)
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "scanned.pdf=@scanned.pdf"
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}'
-o searchable.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "scanned.pdf=@scanned.pdf"
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}'
-o searchable.pdf
Languages: Supports 100+ languages via ISO 639-2 codes (e.g., `eng`, `deu`, `fra`, `spa`, `jpn`, `kor`, `chi_sim`, `chi_tra`, `ara`, `hin`, `rus`). Full language names like `english` or `german` also work. See the [complete OCR language table](https://www.nutrient.io/guides/document-engine/ocr/language-support/) for all supported codes.curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "scanned.pdf=@scanned.pdf"
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}'
-o searchable.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "scanned.pdf=@scanned.pdf"
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}'
-o searchable.pdf
支持语言:通过ISO 639-2代码支持100+种语言(例如:`eng`、`deu`、`fra`、`spa`、`jpn`、`kor`、`chi_sim`、`chi_tra`、`ara`、`hin`、`rus`)。也支持完整语言名称,如`english`或`german`。查看[完整OCR语言列表](https://www.nutrient.io/guides/document-engine/ocr/language-support/)获取所有支持的代码。Redact Sensitive Information
脱敏敏感信息
bash
undefinedbash
undefinedPattern-based (SSN, email)
基于预设规则(社保号、邮箱)
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}'
-o redacted.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}'
-o redacted.pdf
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}'
-o redacted.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}'
-o redacted.pdf
Regex-based
基于正则表达式
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\b[A-Z]{2}\d{6}\b"}}]}'
-o redacted.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\b[A-Z]{2}\d{6}\b"}}]}'
-o redacted.pdf
Presets: `social-security-number`, `email-address`, `credit-card-number`, `international-phone-number`, `north-american-phone-number`, `date`, `time`, `url`, `ipv4`, `ipv6`, `mac-address`, `us-zip-code`, `vin`.curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\b[A-Z]{2}\d{6}\b"}}]}'
-o redacted.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\b[A-Z]{2}\d{6}\b"}}]}'
-o redacted.pdf
预设规则:`social-security-number`、`email-address`、`credit-card-number`、`international-phone-number`、`north-american-phone-number`、`date`、`time`、`url`、`ipv4`、`ipv6`、`mac-address`、`us-zip-code`、`vin`。Add Watermarks
添加水印
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
-o watermarked.pdfbash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
-o watermarked.pdfDigital Signatures
数字签名
bash
undefinedbash
undefinedSelf-signed CMS signature
自签名CMS签名
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}'
-o signed.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}'
-o signed.pdf
undefinedcurl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}'
-o signed.pdf
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}'
-o signed.pdf
undefinedFill PDF Forms
填充PDF表单
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "form.pdf=@form.pdf" \
-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \
-o filled.pdfbash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "form.pdf=@form.pdf" \
-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \
-o filled.pdfMCP Server (Alternative)
MCP服务器(替代方案)
For native tool integration, use the MCP server instead of curl:
json
{
"mcpServers": {
"nutrient-dws": {
"command": "npx",
"args": ["-y", "@nutrient-sdk/dws-mcp-server"],
"env": {
"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
"SANDBOX_PATH": "/path/to/working/directory"
}
}
}
}如需原生工具集成,可使用MCP服务器替代curl:
json
{
"mcpServers": {
"nutrient-dws": {
"command": "npx",
"args": ["-y", "@nutrient-sdk/dws-mcp-server"],
"env": {
"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
"SANDBOX_PATH": "/path/to/working/directory"
}
}
}
}When to Use
适用场景
- Converting documents between formats (PDF, DOCX, XLSX, PPTX, HTML, images)
- Extracting text, tables, or key-value pairs from PDFs
- OCR on scanned documents or images
- Redacting PII before sharing documents
- Adding watermarks to drafts or confidential documents
- Digitally signing contracts or agreements
- Filling PDF forms programmatically
- 在PDF、DOCX、XLSX、PPTX、HTML、图片等格式之间转换文档
- 从PDF中提取文本、表格或键值对
- 对扫描文档或图片进行OCR识别
- 共享文档前脱敏PII信息
- 为草稿或机密文档添加水印
- 为合同或协议添加数字签名
- 以编程方式填充PDF表单