azure-ai-document-intelligence-dotnet
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAzure.AI.DocumentIntelligence (.NET)
Azure.AI.DocumentIntelligence(.NET)
Extract text, tables, and structured data from documents using prebuilt and custom models.
使用预构建和自定义模型从文档中提取文本、表格和结构化数据。
Installation
Installation
bash
dotnet add package Azure.AI.DocumentIntelligence
dotnet add package Azure.IdentityCurrent Version: v1.0.0 (GA)
bash
dotnet add package Azure.AI.DocumentIntelligence
dotnet add package Azure.Identity当前版本:v1.0.0(正式版)
Environment Variables
Environment Variables
bash
DOCUMENT_INTELLIGENCE_ENDPOINT=https://<resource-name>.cognitiveservices.azure.com/
DOCUMENT_INTELLIGENCE_API_KEY=<your-api-key>
BLOB_CONTAINER_SAS_URL=https://<storage>.blob.core.windows.net/<container>?<sas-token>bash
DOCUMENT_INTELLIGENCE_ENDPOINT=https://<resource-name>.cognitiveservices.azure.com/
DOCUMENT_INTELLIGENCE_API_KEY=<your-api-key>
BLOB_CONTAINER_SAS_URL=https://<storage>.blob.core.windows.net/<container>?<sas-token>Authentication
Authentication
Microsoft Entra ID (Recommended)
Microsoft Entra ID(推荐)
csharp
using Azure.Identity;
using Azure.AI.DocumentIntelligence;
string endpoint = Environment.GetEnvironmentVariable("DOCUMENT_INTELLIGENCE_ENDPOINT");
var credential = new DefaultAzureCredential();
var client = new DocumentIntelligenceClient(new Uri(endpoint), credential);Note: Entra ID requires a custom subdomain (e.g.,), not a regional endpoint.https://<resource-name>.cognitiveservices.azure.com/
csharp
using Azure.Identity;
using Azure.AI.DocumentIntelligence;
string endpoint = Environment.GetEnvironmentVariable("DOCUMENT_INTELLIGENCE_ENDPOINT");
var credential = new DefaultAzureCredential();
var client = new DocumentIntelligenceClient(new Uri(endpoint), credential);注意:Entra ID需要自定义子域名(例如:),而非区域端点。https://<resource-name>.cognitiveservices.azure.com/
API Key
API Key
csharp
string endpoint = Environment.GetEnvironmentVariable("DOCUMENT_INTELLIGENCE_ENDPOINT");
string apiKey = Environment.GetEnvironmentVariable("DOCUMENT_INTELLIGENCE_API_KEY");
var client = new DocumentIntelligenceClient(new Uri(endpoint), new AzureKeyCredential(apiKey));csharp
string endpoint = Environment.GetEnvironmentVariable("DOCUMENT_INTELLIGENCE_ENDPOINT");
string apiKey = Environment.GetEnvironmentVariable("DOCUMENT_INTELLIGENCE_API_KEY");
var client = new DocumentIntelligenceClient(new Uri(endpoint), new AzureKeyCredential(apiKey));Client Types
Client Types
| Client | Purpose |
|---|---|
| Analyze documents, classify documents |
| Build/manage custom models and classifiers |
| 客户端 | 用途 |
|---|---|
| 分析文档、分类文档 |
| 构建/管理自定义模型和分类器 |
Prebuilt Models
Prebuilt Models
| Model ID | Description |
|---|---|
| Extract text, languages, handwriting |
| Extract text, tables, selection marks, structure |
| Extract invoice fields (vendor, items, totals) |
| Extract receipt fields (merchant, items, total) |
| Extract ID document fields (name, DOB, address) |
| Extract business card fields |
| Extract W-2 tax form fields |
| Extract health insurance card fields |
| 模型ID | 描述 |
|---|---|
| 提取文本、语言、手写内容 |
| 提取文本、表格、选择标记、结构 |
| 提取发票字段(供应商、项目、总计) |
| 提取收据字段(商家、项目、总计) |
| 提取身份证件字段(姓名、出生日期、地址) |
| 提取名片字段 |
| 提取W-2税表字段 |
| 提取美国医保卡字段 |
Core Workflows
Core Workflows
1. Analyze Invoice
1. 分析发票
csharp
using Azure.AI.DocumentIntelligence;
Uri invoiceUri = new Uri("https://example.com/invoice.pdf");
Operation<AnalyzeResult> operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-invoice",
invoiceUri);
AnalyzeResult result = operation.Value;
foreach (AnalyzedDocument document in result.Documents)
{
if (document.Fields.TryGetValue("VendorName", out DocumentField vendorNameField)
&& vendorNameField.FieldType == DocumentFieldType.String)
{
string vendorName = vendorNameField.ValueString;
Console.WriteLine($"Vendor Name: '{vendorName}', confidence: {vendorNameField.Confidence}");
}
if (document.Fields.TryGetValue("InvoiceTotal", out DocumentField invoiceTotalField)
&& invoiceTotalField.FieldType == DocumentFieldType.Currency)
{
CurrencyValue invoiceTotal = invoiceTotalField.ValueCurrency;
Console.WriteLine($"Invoice Total: '{invoiceTotal.CurrencySymbol}{invoiceTotal.Amount}'");
}
// Extract line items
if (document.Fields.TryGetValue("Items", out DocumentField itemsField)
&& itemsField.FieldType == DocumentFieldType.List)
{
foreach (DocumentField item in itemsField.ValueList)
{
var itemFields = item.ValueDictionary;
if (itemFields.TryGetValue("Description", out DocumentField descField))
Console.WriteLine($" Item: {descField.ValueString}");
}
}
}csharp
using Azure.AI.DocumentIntelligence;
Uri invoiceUri = new Uri("https://example.com/invoice.pdf");
Operation<AnalyzeResult> operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-invoice",
invoiceUri);
AnalyzeResult result = operation.Value;
foreach (AnalyzedDocument document in result.Documents)
{
if (document.Fields.TryGetValue("VendorName", out DocumentField vendorNameField)
&& vendorNameField.FieldType == DocumentFieldType.String)
{
string vendorName = vendorNameField.ValueString;
Console.WriteLine($"Vendor Name: '{vendorName}', confidence: {vendorNameField.Confidence}");
}
if (document.Fields.TryGetValue("InvoiceTotal", out DocumentField invoiceTotalField)
&& invoiceTotalField.FieldType == DocumentFieldType.Currency)
{
CurrencyValue invoiceTotal = invoiceTotalField.ValueCurrency;
Console.WriteLine($"Invoice Total: '{invoiceTotal.CurrencySymbol}{invoiceTotal.Amount}'");
}
// Extract line items
if (document.Fields.TryGetValue("Items", out DocumentField itemsField)
&& itemsField.FieldType == DocumentFieldType.List)
{
foreach (DocumentField item in itemsField.ValueList)
{
var itemFields = item.ValueDictionary;
if (itemFields.TryGetValue("Description", out DocumentField descField))
Console.WriteLine($" Item: {descField.ValueString}");
}
}
}2. Extract Layout (Text, Tables, Structure)
2. 提取布局(文本、表格、结构)
csharp
Uri fileUri = new Uri("https://example.com/document.pdf");
Operation<AnalyzeResult> operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-layout",
fileUri);
AnalyzeResult result = operation.Value;
// Extract text by page
foreach (DocumentPage page in result.Pages)
{
Console.WriteLine($"Page {page.PageNumber}: {page.Lines.Count} lines, {page.Words.Count} words");
foreach (DocumentLine line in page.Lines)
{
Console.WriteLine($" Line: '{line.Content}'");
}
}
// Extract tables
foreach (DocumentTable table in result.Tables)
{
Console.WriteLine($"Table: {table.RowCount} rows x {table.ColumnCount} columns");
foreach (DocumentTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}): {cell.Content}");
}
}csharp
Uri fileUri = new Uri("https://example.com/document.pdf");
Operation<AnalyzeResult> operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-layout",
fileUri);
AnalyzeResult result = operation.Value;
// Extract text by page
foreach (DocumentPage page in result.Pages)
{
Console.WriteLine($"Page {page.PageNumber}: {page.Lines.Count} lines, {page.Words.Count} words");
foreach (DocumentLine line in page.Lines)
{
Console.WriteLine($" Line: '{line.Content}'");
}
}
// Extract tables
foreach (DocumentTable table in result.Tables)
{
Console.WriteLine($"Table: {table.RowCount} rows x {table.ColumnCount} columns");
foreach (DocumentTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}): {cell.Content}");
}
}3. Analyze Receipt
3. 分析收据
csharp
Operation<AnalyzeResult> operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-receipt",
receiptUri);
AnalyzeResult result = operation.Value;
foreach (AnalyzedDocument document in result.Documents)
{
if (document.Fields.TryGetValue("MerchantName", out DocumentField merchantField))
Console.WriteLine($"Merchant: {merchantField.ValueString}");
if (document.Fields.TryGetValue("Total", out DocumentField totalField))
Console.WriteLine($"Total: {totalField.ValueCurrency.Amount}");
if (document.Fields.TryGetValue("TransactionDate", out DocumentField dateField))
Console.WriteLine($"Date: {dateField.ValueDate}");
}csharp
Operation<AnalyzeResult> operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-receipt",
receiptUri);
AnalyzeResult result = operation.Value;
foreach (AnalyzedDocument document in result.Documents)
{
if (document.Fields.TryGetValue("MerchantName", out DocumentField merchantField))
Console.WriteLine($"Merchant: {merchantField.ValueString}");
if (document.Fields.TryGetValue("Total", out DocumentField totalField))
Console.WriteLine($"Total: {totalField.ValueCurrency.Amount}");
if (document.Fields.TryGetValue("TransactionDate", out DocumentField dateField))
Console.WriteLine($"Date: {dateField.ValueDate}");
}4. Build Custom Model
4. 构建自定义模型
csharp
var adminClient = new DocumentIntelligenceAdministrationClient(
new Uri(endpoint),
new AzureKeyCredential(apiKey));
string modelId = "my-custom-model";
Uri blobContainerUri = new Uri("<blob-container-sas-url>");
var blobSource = new BlobContentSource(blobContainerUri);
var options = new BuildDocumentModelOptions(modelId, DocumentBuildMode.Template, blobSource);
Operation<DocumentModelDetails> operation = await adminClient.BuildDocumentModelAsync(
WaitUntil.Completed,
options);
DocumentModelDetails model = operation.Value;
Console.WriteLine($"Model ID: {model.ModelId}");
Console.WriteLine($"Created: {model.CreatedOn}");
foreach (var docType in model.DocumentTypes)
{
Console.WriteLine($"Document type: {docType.Key}");
foreach (var field in docType.Value.FieldSchema)
{
Console.WriteLine($" Field: {field.Key}, Confidence: {docType.Value.FieldConfidence[field.Key]}");
}
}csharp
var adminClient = new DocumentIntelligenceAdministrationClient(
new Uri(endpoint),
new AzureKeyCredential(apiKey));
string modelId = "my-custom-model";
Uri blobContainerUri = new Uri("<blob-container-sas-url>");
var blobSource = new BlobContentSource(blobContainerUri);
var options = new BuildDocumentModelOptions(modelId, DocumentBuildMode.Template, blobSource);
Operation<DocumentModelDetails> operation = await adminClient.BuildDocumentModelAsync(
WaitUntil.Completed,
options);
DocumentModelDetails model = operation.Value;
Console.WriteLine($"Model ID: {model.ModelId}");
Console.WriteLine($"Created: {model.CreatedOn}");
foreach (var docType in model.DocumentTypes)
{
Console.WriteLine($"Document type: {docType.Key}");
foreach (var field in docType.Value.FieldSchema)
{
Console.WriteLine($" Field: {field.Key}, Confidence: {docType.Value.FieldConfidence[field.Key]}");
}
}5. Build Document Classifier
5. 构建文档分类器
csharp
string classifierId = "my-classifier";
Uri blobContainerUri = new Uri("<blob-container-sas-url>");
var sourceA = new BlobContentSource(blobContainerUri) { Prefix = "TypeA/train" };
var sourceB = new BlobContentSource(blobContainerUri) { Prefix = "TypeB/train" };
var docTypes = new Dictionary<string, ClassifierDocumentTypeDetails>()
{
{ "TypeA", new ClassifierDocumentTypeDetails(sourceA) },
{ "TypeB", new ClassifierDocumentTypeDetails(sourceB) }
};
var options = new BuildClassifierOptions(classifierId, docTypes);
Operation<DocumentClassifierDetails> operation = await adminClient.BuildClassifierAsync(
WaitUntil.Completed,
options);
DocumentClassifierDetails classifier = operation.Value;
Console.WriteLine($"Classifier ID: {classifier.ClassifierId}");csharp
string classifierId = "my-classifier";
Uri blobContainerUri = new Uri("<blob-container-sas-url>");
var sourceA = new BlobContentSource(blobContainerUri) { Prefix = "TypeA/train" };
var sourceB = new BlobContentSource(blobContainerUri) { Prefix = "TypeB/train" };
var docTypes = new Dictionary<string, ClassifierDocumentTypeDetails>()
{
{ "TypeA", new ClassifierDocumentTypeDetails(sourceA) },
{ "TypeB", new ClassifierDocumentTypeDetails(sourceB) }
};
var options = new BuildClassifierOptions(classifierId, docTypes);
Operation<DocumentClassifierDetails> operation = await adminClient.BuildClassifierAsync(
WaitUntil.Completed,
options);
DocumentClassifierDetails classifier = operation.Value;
Console.WriteLine($"Classifier ID: {classifier.ClassifierId}");6. Classify Document
6. 分类文档
csharp
string classifierId = "my-classifier";
Uri documentUri = new Uri("https://example.com/document.pdf");
var options = new ClassifyDocumentOptions(classifierId, documentUri);
Operation<AnalyzeResult> operation = await client.ClassifyDocumentAsync(
WaitUntil.Completed,
options);
AnalyzeResult result = operation.Value;
foreach (AnalyzedDocument document in result.Documents)
{
Console.WriteLine($"Document type: {document.DocumentType}, confidence: {document.Confidence}");
}csharp
string classifierId = "my-classifier";
Uri documentUri = new Uri("https://example.com/document.pdf");
var options = new ClassifyDocumentOptions(classifierId, documentUri);
Operation<AnalyzeResult> operation = await client.ClassifyDocumentAsync(
WaitUntil.Completed,
options);
AnalyzeResult result = operation.Value;
foreach (AnalyzedDocument document in result.Documents)
{
Console.WriteLine($"Document type: {document.DocumentType}, confidence: {document.Confidence}");
}7. Manage Models
7. 管理模型
csharp
// Get resource details
DocumentIntelligenceResourceDetails resourceDetails = await adminClient.GetResourceDetailsAsync();
Console.WriteLine($"Custom models: {resourceDetails.CustomDocumentModels.Count}/{resourceDetails.CustomDocumentModels.Limit}");
// Get specific model
DocumentModelDetails model = await adminClient.GetModelAsync("my-model-id");
Console.WriteLine($"Model: {model.ModelId}, Created: {model.CreatedOn}");
// List models
await foreach (DocumentModelDetails modelItem in adminClient.GetModelsAsync())
{
Console.WriteLine($"Model: {modelItem.ModelId}");
}
// Delete model
await adminClient.DeleteModelAsync("my-model-id");csharp
// Get resource details
DocumentIntelligenceResourceDetails resourceDetails = await adminClient.GetResourceDetailsAsync();
Console.WriteLine($"Custom models: {resourceDetails.CustomDocumentModels.Count}/{resourceDetails.CustomDocumentModels.Limit}");
// Get specific model
DocumentModelDetails model = await adminClient.GetModelAsync("my-model-id");
Console.WriteLine($"Model: {model.ModelId}, Created: {model.CreatedOn}");
// List models
await foreach (DocumentModelDetails modelItem in adminClient.GetModelsAsync())
{
Console.WriteLine($"Model: {modelItem.ModelId}");
}
// Delete model
await adminClient.DeleteModelAsync("my-model-id");Key Types Reference
Key Types Reference
| Type | Description |
|---|---|
| Main client for analysis |
| Model management |
| Result of document analysis |
| Single document within result |
| Extracted field with value and confidence |
| String, Date, Number, Currency, etc. |
| Page info (lines, words, selection marks) |
| Extracted table with cells |
| Custom model metadata |
| Training data source |
| 类型 | 描述 |
|---|---|
| 主要分析客户端 |
| 模型管理客户端 |
| 文档分析结果 |
| 结果中的单个文档 |
| 提取的字段,包含值和置信度 |
| 字符串、日期、数字、货币等类型 |
| 页面信息(行、词、选择标记) |
| 提取的表格及单元格 |
| 自定义模型元数据 |
| 训练数据源 |
Build Modes
Build Modes
| Mode | Use Case |
|---|---|
| Fixed layout documents (forms) |
| Variable layout documents |
| 模式 | 适用场景 |
|---|---|
| 固定布局文档(表单) |
| 可变布局文档 |
Best Practices
Best Practices
- Use DefaultAzureCredential for production
- Reuse client instances — clients are thread-safe
- Handle long-running operations — Use for simplicity
WaitUntil.Completed - Check field confidence — Always verify property
Confidence - Use appropriate model — Prebuilt for common docs, custom for specialized
- Use custom subdomain — Required for Entra ID authentication
- 生产环境使用DefaultAzureCredential
- 复用客户端实例——客户端支持线程安全
- 处理长时间运行的操作——使用简化流程
WaitUntil.Completed - 检查字段置信度——始终验证属性
Confidence - 使用合适的模型——常见文档用预构建模型,特殊场景用自定义模型
- 使用自定义子域名——Entra ID认证必需
Error Handling
Error Handling
csharp
using Azure;
try
{
var operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-invoice",
documentUri);
}
catch (RequestFailedException ex)
{
Console.WriteLine($"Error: {ex.Status} - {ex.Message}");
}csharp
using Azure;
try
{
var operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-invoice",
documentUri);
}
catch (RequestFailedException ex)
{
Console.WriteLine($"Error: {ex.Status} - {ex.Message}");
}Related SDKs
Related SDKs
| SDK | Purpose | Install |
|---|---|---|
| Document analysis (this SDK) | |
| Legacy SDK (deprecated) | Use DocumentIntelligence instead |
| SDK | 用途 | 安装命令 |
|---|---|---|
| 文档分析(本SDK) | |
| 旧版SDK(已弃用) | 请使用DocumentIntelligence替代 |
Reference Links
Reference Links
| Resource | URL |
|---|---|
| NuGet Package | https://www.nuget.org/packages/Azure.AI.DocumentIntelligence |
| API Reference | https://learn.microsoft.com/dotnet/api/azure.ai.documentintelligence |
| GitHub Samples | https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/documentintelligence/Azure.AI.DocumentIntelligence/samples |
| Document Intelligence Studio | https://documentintelligence.ai.azure.com/ |
| Prebuilt Models | https://aka.ms/azsdk/formrecognizer/models |