azure-ai-translation-document-py

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Azure AI Document Translation SDK for Python

适用于Python的Azure AI Document Translation SDK

Client library for Azure AI Translator document translation service for batch document translation with format preservation.
这是Azure AI Translator文档翻译服务的客户端库,支持保留格式的批量文档翻译。

Installation

安装

bash
pip install azure-ai-translation-document
bash
pip install azure-ai-translation-document

Environment Variables

环境变量

bash
AZURE_DOCUMENT_TRANSLATION_ENDPOINT=https://<resource>.cognitiveservices.azure.com
AZURE_DOCUMENT_TRANSLATION_KEY=<your-api-key>  # If using API key
bash
AZURE_DOCUMENT_TRANSLATION_ENDPOINT=https://<resource>.cognitiveservices.azure.com
AZURE_DOCUMENT_TRANSLATION_KEY=<your-api-key>  # If using API key

Storage for source and target documents

Storage for source and target documents

AZURE_SOURCE_CONTAINER_URL=https://<storage>.blob.core.windows.net/<container>?<sas> AZURE_TARGET_CONTAINER_URL=https://<storage>.blob.core.windows.net/<container>?<sas>
undefined
AZURE_SOURCE_CONTAINER_URL=https://<storage>.blob.core.windows.net/<container>?<sas> AZURE_TARGET_CONTAINER_URL=https://<storage>.blob.core.windows.net/<container>?<sas>
undefined

Authentication

身份验证

API Key

API Key

python
import os
from azure.ai.translation.document import DocumentTranslationClient
from azure.core.credentials import AzureKeyCredential

endpoint = os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"]
key = os.environ["AZURE_DOCUMENT_TRANSLATION_KEY"]

client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))
python
import os
from azure.ai.translation.document import DocumentTranslationClient
from azure.core.credentials import AzureKeyCredential

endpoint = os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"]
key = os.environ["AZURE_DOCUMENT_TRANSLATION_KEY"]

client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))

Entra ID (Recommended)

Entra ID(推荐)

python
from azure.ai.translation.document import DocumentTranslationClient
from azure.identity import DefaultAzureCredential

client = DocumentTranslationClient(
    endpoint=os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"],
    credential=DefaultAzureCredential()
)
python
from azure.ai.translation.document import DocumentTranslationClient
from azure.identity import DefaultAzureCredential

client = DocumentTranslationClient(
    endpoint=os.environ["AZURE_DOCUMENT_TRANSLATION_ENDPOINT"],
    credential=DefaultAzureCredential()
)

Basic Document Translation

基础文档翻译

python
from azure.ai.translation.document import DocumentTranslationInput, TranslationTarget

source_url = os.environ["AZURE_SOURCE_CONTAINER_URL"]
target_url = os.environ["AZURE_TARGET_CONTAINER_URL"]
python
from azure.ai.translation.document import DocumentTranslationInput, TranslationTarget

source_url = os.environ["AZURE_SOURCE_CONTAINER_URL"]
target_url = os.environ["AZURE_TARGET_CONTAINER_URL"]

Start translation job

Start translation job

poller = client.begin_translation( inputs=[ DocumentTranslationInput( source_url=source_url, targets=[ TranslationTarget( target_url=target_url, language="es" # Translate to Spanish ) ] ) ] )
poller = client.begin_translation( inputs=[ DocumentTranslationInput( source_url=source_url, targets=[ TranslationTarget( target_url=target_url, language="es" # Translate to Spanish ) ] ) ] )

Wait for completion

Wait for completion

result = poller.result()
print(f"Status: {poller.status()}") print(f"Documents translated: {poller.details.documents_succeeded_count}") print(f"Documents failed: {poller.details.documents_failed_count}")
undefined
result = poller.result()
print(f"Status: {poller.status()}") print(f"Documents translated: {poller.details.documents_succeeded_count}") print(f"Documents failed: {poller.details.documents_failed_count}")
undefined

Multiple Target Languages

多目标语言翻译

python
poller = client.begin_translation(
    inputs=[
        DocumentTranslationInput(
            source_url=source_url,
            targets=[
                TranslationTarget(target_url=target_url_es, language="es"),
                TranslationTarget(target_url=target_url_fr, language="fr"),
                TranslationTarget(target_url=target_url_de, language="de")
            ]
        )
    ]
)
python
poller = client.begin_translation(
    inputs=[
        DocumentTranslationInput(
            source_url=source_url,
            targets=[
                TranslationTarget(target_url=target_url_es, language="es"),
                TranslationTarget(target_url=target_url_fr, language="fr"),
                TranslationTarget(target_url=target_url_de, language="de")
            ]
        )
    ]
)

Translate Single Document

单文档翻译

python
from azure.ai.translation.document import SingleDocumentTranslationClient

single_client = SingleDocumentTranslationClient(endpoint, AzureKeyCredential(key))

with open("document.docx", "rb") as f:
    document_content = f.read()

result = single_client.translate(
    body=document_content,
    target_language="es",
    content_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
)
python
from azure.ai.translation.document import SingleDocumentTranslationClient

single_client = SingleDocumentTranslationClient(endpoint, AzureKeyCredential(key))

with open("document.docx", "rb") as f:
    document_content = f.read()

result = single_client.translate(
    body=document_content,
    target_language="es",
    content_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
)

Save translated document

Save translated document

with open("document_es.docx", "wb") as f: f.write(result)
undefined
with open("document_es.docx", "wb") as f: f.write(result)
undefined

Check Translation Status

检查翻译状态

python
undefined
python
undefined

Get all translation operations

Get all translation operations

operations = client.list_translation_statuses()
for op in operations: print(f"Operation ID: {op.id}") print(f"Status: {op.status}") print(f"Created: {op.created_on}") print(f"Total documents: {op.documents_total_count}") print(f"Succeeded: {op.documents_succeeded_count}") print(f"Failed: {op.documents_failed_count}")
undefined
operations = client.list_translation_statuses()
for op in operations: print(f"Operation ID: {op.id}") print(f"Status: {op.status}") print(f"Created: {op.created_on}") print(f"Total documents: {op.documents_total_count}") print(f"Succeeded: {op.documents_succeeded_count}") print(f"Failed: {op.documents_failed_count}")
undefined

List Document Statuses

查看文档状态

python
undefined
python
undefined

Get status of individual documents in a job

Get status of individual documents in a job

operation_id = poller.id document_statuses = client.list_document_statuses(operation_id)
for doc in document_statuses: print(f"Document: {doc.source_document_url}") print(f" Status: {doc.status}") print(f" Translated to: {doc.translated_to}") if doc.error: print(f" Error: {doc.error.message}")
undefined
operation_id = poller.id document_statuses = client.list_document_statuses(operation_id)
for doc in document_statuses: print(f"Document: {doc.source_document_url}") print(f" Status: {doc.status}") print(f" Translated to: {doc.translated_to}") if doc.error: print(f" Error: {doc.error.message}")
undefined

Cancel Translation

取消翻译

python
undefined
python
undefined

Cancel a running translation

Cancel a running translation

client.cancel_translation(operation_id)
undefined
client.cancel_translation(operation_id)
undefined

Using Glossary

使用术语库

python
from azure.ai.translation.document import TranslationGlossary

poller = client.begin_translation(
    inputs=[
        DocumentTranslationInput(
            source_url=source_url,
            targets=[
                TranslationTarget(
                    target_url=target_url,
                    language="es",
                    glossaries=[
                        TranslationGlossary(
                            glossary_url="https://<storage>.blob.core.windows.net/glossary/terms.csv?<sas>",
                            file_format="csv"
                        )
                    ]
                )
            ]
        )
    ]
)
python
from azure.ai.translation.document import TranslationGlossary

poller = client.begin_translation(
    inputs=[
        DocumentTranslationInput(
            source_url=source_url,
            targets=[
                TranslationTarget(
                    target_url=target_url,
                    language="es",
                    glossaries=[
                        TranslationGlossary(
                            glossary_url="https://<storage>.blob.core.windows.net/glossary/terms.csv?<sas>",
                            file_format="csv"
                        )
                    ]
                )
            ]
        )
    ]
)

Supported Document Formats

支持的文档格式

python
undefined
python
undefined

Get supported formats

Get supported formats

formats = client.get_supported_document_formats()
for fmt in formats: print(f"Format: {fmt.format}") print(f" Extensions: {fmt.file_extensions}") print(f" Content types: {fmt.content_types}")
undefined
formats = client.get_supported_document_formats()
for fmt in formats: print(f"Format: {fmt.format}") print(f" Extensions: {fmt.file_extensions}") print(f" Content types: {fmt.content_types}")
undefined

Supported Languages

支持的语言

python
undefined
python
undefined

Get supported languages

Get supported languages

languages = client.get_supported_languages()
for lang in languages: print(f"Language: {lang.name} ({lang.code})")
undefined
languages = client.get_supported_languages()
for lang in languages: print(f"Language: {lang.name} ({lang.code})")
undefined

Async Client

异步客户端

python
from azure.ai.translation.document.aio import DocumentTranslationClient
from azure.identity.aio import DefaultAzureCredential

async def translate_documents():
    async with DocumentTranslationClient(
        endpoint=endpoint,
        credential=DefaultAzureCredential()
    ) as client:
        poller = await client.begin_translation(inputs=[...])
        result = await poller.result()
python
from azure.ai.translation.document.aio import DocumentTranslationClient
from azure.identity.aio import DefaultAzureCredential

async def translate_documents():
    async with DocumentTranslationClient(
        endpoint=endpoint,
        credential=DefaultAzureCredential()
    ) as client:
        poller = await client.begin_translation(inputs=[...])
        result = await poller.result()

Supported Formats

支持的格式

CategoryFormats
DocumentsDOCX, PDF, PPTX, XLSX, HTML, TXT, RTF
StructuredCSV, TSV, JSON, XML
LocalizationXLIFF, XLF, MHTML
类别格式
文档DOCX, PDF, PPTX, XLSX, HTML, TXT, RTF
结构化数据CSV, TSV, JSON, XML
本地化文件XLIFF, XLF, MHTML

Storage Requirements

存储要求

  • Source and target containers must be Azure Blob Storage
  • Use SAS tokens with appropriate permissions:
    • Source: Read, List
    • Target: Write, List
  • 源容器和目标容器必须是Azure Blob Storage
  • 使用具有适当权限的SAS令牌:
    • 源容器:读取、列出
    • 目标容器:写入、列出

Best Practices

最佳实践

  1. Use SAS tokens with minimal required permissions
  2. Monitor long-running operations with
    poller.status()
  3. Handle document-level errors by iterating document statuses
  4. Use glossaries for domain-specific terminology
  5. Separate target containers for each language
  6. Use async client for multiple concurrent jobs
  7. Check supported formats before submitting documents
  1. 使用SAS令牌,仅赋予最小必要权限
  2. 使用
    poller.status()
    监控长时间运行的操作
  3. 通过遍历文档状态处理文档级错误
  4. 使用术语库处理特定领域术语
  5. 为每种语言单独设置目标容器
  6. 使用异步客户端处理多个并发任务
  7. 提交文档前检查支持的格式