nutrient-document-processing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNutrient Document Processing
Nutrient 文档处理
Process, convert, extract, redact, sign, and manipulate documents using the Nutrient DWS Processor API.
使用Nutrient DWS Processor API处理、转换、提取、脱敏、签名和操作文档。
Setup
配置
You need a Nutrient DWS API key. Get one free at https://dashboard.nutrient.io/sign_up/?product=processor.
你需要一个Nutrient DWS API密钥。可在https://dashboard.nutrient.io/sign_up/?product=processor免费获取。
Option 1: MCP Server (Recommended)
选项1:MCP Server(推荐)
If your agent supports MCP (Model Context Protocol), use the Nutrient DWS MCP Server. It provides all operations as native tools.
Configure your MCP client (e.g., or ):
claude_desktop_config.json.mcp.jsonjson
{
"mcpServers": {
"nutrient-dws": {
"command": "npx",
"args": ["-y", "@nutrient-sdk/dws-mcp-server"],
"env": {
"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
"SANDBOX_PATH": "/path/to/working/directory"
}
}
}
}Then use the MCP tools directly (e.g., , , , etc.).
convert_to_pdfextract_textredact如果你的Agent支持MCP(Model Context Protocol),请使用Nutrient DWS MCP Server。它将所有操作作为原生工具提供。
配置你的MCP客户端(例如或):
claude_desktop_config.json.mcp.jsonjson
{
"mcpServers": {
"nutrient-dws": {
"command": "npx",
"args": ["-y", "@nutrient-sdk/dws-mcp-server"],
"env": {
"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
"SANDBOX_PATH": "/path/to/working/directory"
}
}
}
}然后可直接使用MCP工具(例如、、等)。
convert_to_pdfextract_textredactOption 2: Direct API (curl)
选项2:直接调用API(curl)
For agents without MCP support, call the API directly:
bash
export NUTRIENT_API_KEY="your_api_key_here"All requests go to as multipart POST with an JSON field.
https://api.nutrient.io/buildinstructions对于不支持MCP的Agent,可直接调用API:
bash
export NUTRIENT_API_KEY="your_api_key_here"所有请求均以multipart POST方式发送至,并包含 JSON字段。
https://api.nutrient.io/buildinstructionsOperations
操作功能
1. Convert Documents
1. 文档转换
Convert between PDF, DOCX, XLSX, PPTX, HTML, and image formats.
HTML to PDF:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "index.html=@index.html" \
-F 'instructions={"parts":[{"html":"index.html"}]}' \
-o output.pdfDOCX to PDF:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.docx=@document.docx" \
-F 'instructions={"parts":[{"file":"document.docx"}]}' \
-o output.pdfPDF to DOCX/XLSX/PPTX:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
-o output.docxImage to PDF:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "image.jpg=@image.jpg" \
-F 'instructions={"parts":[{"file":"image.jpg"}]}' \
-o output.pdf在PDF、DOCX、XLSX、PPTX、HTML和图片格式之间进行转换。
HTML转PDF:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "index.html=@index.html" \
-F 'instructions={"parts":[{"html":"index.html"}]}' \
-o output.pdfDOCX转PDF:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.docx=@document.docx" \
-F 'instructions={"parts":[{"file":"document.docx"}]}' \
-o output.pdfPDF转DOCX/XLSX/PPTX:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
-o output.docx图片转PDF:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "image.jpg=@image.jpg" \
-F 'instructions={"parts":[{"file":"image.jpg"}]}' \
-o output.pdf2. Extract Text and Data
2. 提取文本和数据
Extract plain text:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
-o output.txtExtract tables (as JSON, CSV, or Excel):
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
-o tables.xlsxExtract key-value pairs:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"extraction","strategy":"key-values"}]}' \
-o result.json提取纯文本:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
-o output.txt提取表格(保存为JSON、CSV或Excel格式):
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
-o tables.xlsx提取键值对:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"extraction","strategy":"key-values"}]}' \
-o result.json3. OCR Scanned Documents
3. 扫描文档OCR识别
Apply OCR to scanned PDFs or images, producing searchable PDFs with selectable text.
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "scanned.pdf=@scanned.pdf" \
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
-o searchable.pdfSupported languages: , , , , , , , , , , , , , , , , , , , , , , , and more.
englishgermanfrenchspanishitalianportuguesedutchswedishdanishnorwegianfinnishpolishczechturkishjapanesekoreanchinese-simplifiedchinese-traditionalarabichebrewthaihindirussian对扫描PDF或图片进行OCR识别,生成可搜索、可选中文本的PDF。
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "scanned.pdf=@scanned.pdf" \
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
-o searchable.pdf支持的语言:、、、、、、、、、、、、、、、、、、、、、、等。
englishgermanfrenchspanishitalianportuguesedutchswedishdanishnorwegianfinnishpolishczechturkishjapanesekoreanchinese-simplifiedchinese-traditionalarabichebrewthaihindirussian4. Redact Sensitive Information
4. 敏感信息脱敏
Pattern-based redaction (preset patterns):
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","preset":"social-security-number"}]}' \
-o redacted.pdfAvailable presets: , , , , , , , , , , , , .
social-security-numbercredit-card-numberemail-addressnorth-american-phone-numberinternational-phone-numberdateurlipv4ipv6mac-addressus-zip-codevintimeRegex-based redaction:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","regex":"\\b[A-Z]{2}\\d{6}\\b"}]}' \
-o redacted.pdfAI-powered PII redaction:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"ai_redaction","criteria":"All personally identifiable information"}]}' \
-o redacted.pdfThe field accepts natural language (e.g., "Names and phone numbers", "Protected health information", "Financial account numbers").
criteria基于预设规则的脱敏:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","preset":"social-security-number"}]}' \
-o redacted.pdf可用预设规则:、、、、、、、、、、、、。
social-security-numbercredit-card-numberemail-addressnorth-american-phone-numberinternational-phone-numberdateurlipv4ipv6mac-addressus-zip-codevintime基于正则表达式的脱敏:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","regex":"\\b[A-Z]{2}\\d{6}\\b"}]}' \
-o redacted.pdfAI驱动的PII脱敏:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"ai_redaction","criteria":"All personally identifiable information"}]}' \
-o redacted.pdfcriteria5. Add Watermarks
5. 添加水印
Text watermark:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":48,"fontColor":"#FF0000","opacity":0.5,"rotation":45,"width":"50%","height":"50%"}]}' \
-o watermarked.pdfImage watermark:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F "logo.png=@logo.png" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","imagePath":"logo.png","width":"30%","height":"30%","opacity":0.3}]}' \
-o watermarked.pdf文本水印:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":48,"fontColor":"#FF0000","opacity":0.5,"rotation":45,"width":"50%","height":"50%"}]}' \
-o watermarked.pdf图片水印:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F "logo.png=@logo.png" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","imagePath":"logo.png","width":"30%","height":"30%","opacity":0.3}]}' \
-o watermarked.pdf6. Digital Signatures
6. 数字签名
Sign a PDF with CMS signature:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms","signerName":"John Doe","reason":"Approval","location":"New York"}]}' \
-o signed.pdfSign with CAdES-B-LT (long-term validation):
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cades","cadesLevel":"b-lt","signerName":"Jane Smith"}]}' \
-o signed.pdf使用CMS签名签署PDF:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms","signerName":"John Doe","reason":"Approval","location":"New York"}]}' \
-o signed.pdf使用CAdES-B-LT(长期验证)签署:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cades","cadesLevel":"b-lt","signerName":"Jane Smith"}]}' \
-o signed.pdf7. Form Filling (Instant JSON)
7. 表单填写(Instant JSON格式)
Fill PDF form fields using Instant JSON format:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "form.pdf=@form.pdf" \
-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","fields":[{"name":"firstName","value":"John"},{"name":"lastName","value":"Doe"},{"name":"email","value":"john@example.com"}]}]}' \
-o filled.pdf使用Instant JSON格式填写PDF表单字段:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "form.pdf=@form.pdf" \
-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","fields":[{"name":"firstName","value":"John"},{"name":"lastName","value":"Doe"},{"name":"email","value":"john@example.com"}]}]}' \
-o filled.pdf8. Merge and Split PDFs
8. PDF合并与拆分
Merge multiple PDFs:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "doc1.pdf=@doc1.pdf" \
-F "doc2.pdf=@doc2.pdf" \
-F 'instructions={"parts":[{"file":"doc1.pdf"},{"file":"doc2.pdf"}]}' \
-o merged.pdfExtract specific pages:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":4}}]}' \
-o pages1-5.pdf合并多个PDF:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "doc1.pdf=@doc1.pdf" \
-F "doc2.pdf=@doc2.pdf" \
-F 'instructions={"parts":[{"file":"doc1.pdf"},{"file":"doc2.pdf"}]}' \
-o merged.pdf提取指定页面:
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":4}}]}' \
-o pages1-5.pdf9. Render PDF Pages as Images
9. 将PDF页面渲染为图片
bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":0}}],"output":{"type":"png","dpi":300}}' \
-o page1.pngbash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf","pages":{"start":0,"end":0}}],"output":{"type":"png","dpi":300}}' \
-o page1.png10. Check Credits
10. 查询额度
bash
curl -X GET https://api.nutrient.io/credits \
-H "Authorization: Bearer $NUTRIENT_API_KEY"bash
curl -X GET https://api.nutrient.io/credits \
-H "Authorization: Bearer $NUTRIENT_API_KEY"Best Practices
最佳实践
- Use the MCP server when your agent supports it — it handles file I/O, error handling, and sandboxing automatically.
- Set to restrict file access to a specific directory.
SANDBOX_PATH - Check credit balance before batch operations to avoid interruptions.
- Use AI redaction for complex PII detection; use preset/regex redaction for known patterns (faster, cheaper).
- Chain operations — the API supports multiple actions in a single call (e.g., OCR then redact).
- 当你的Agent支持时,请使用MCP Server——它会自动处理文件I/O、错误处理和沙箱隔离。
- **设置**以限制文件访问至特定目录。
SANDBOX_PATH - 批量操作前查询额度余额,避免操作中断。
- 复杂PII检测使用AI脱敏;已知模式使用预设/正则脱敏(速度更快、成本更低)。
- 链式操作——API支持单次调用执行多个操作(例如先OCR再脱敏)。
Troubleshooting
故障排除
| Issue | Solution |
|---|---|
| 401 Unauthorized | Check your API key is valid and has credits |
| 413 Payload Too Large | Files must be under 100 MB |
| Slow AI redaction | AI analysis takes 60–120 seconds; this is normal |
| OCR quality poor | Try a different language parameter or improve scan quality |
| Missing text in extraction | Run OCR first on scanned documents |
| 问题 | 解决方案 |
|---|---|
| 401 未授权 | 检查你的API密钥是否有效且有可用额度 |
| 413 请求体过大 | 文件大小必须小于100 MB |
| AI脱敏速度慢 | AI分析需要60-120秒,此为正常情况 |
| OCR识别质量差 | 尝试更换语言参数或提升扫描质量 |
| 提取文本缺失 | 先对扫描文档执行OCR识别 |
More Information
更多信息
- Full API reference — Detailed endpoints, parameters, and error codes
- API Playground — Interactive API testing
- API Documentation — Official guides
- MCP Server repo — Source code and issues
- 完整API参考——详细的端点、参数和错误代码
- API沙箱——交互式API测试
- API文档——官方指南
- MCP Server仓库——源代码和问题反馈