natural-language

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

NaturalLanguage + Translation

NaturalLanguage + Translation框架使用

Analyze natural language text for tokenization, part-of-speech tagging, named entity recognition, sentiment analysis, language identification, and word/sentence embeddings. Translate text between languages with the Translation framework. Targets Swift 6.2 / iOS 26+.
This skill covers two related frameworks: NaturalLanguage (
NLTokenizer
,
NLTagger
,
NLEmbedding
) for on-device text analysis, and Translation (
TranslationSession
,
LanguageAvailability
) for language translation.
可对自然语言文本执行分词、词性标注、命名实体识别、情感分析、语言识别以及词/句嵌入分析操作。通过Translation框架实现文本的跨语言翻译。适配Swift 6.2 / iOS 26+版本。
本指南涵盖两个相关框架:用于设备端文本分析的NaturalLanguage
NLTokenizer
NLTagger
NLEmbedding
),以及用于语言翻译的Translation
TranslationSession
LanguageAvailability
)。

Contents

目录

Setup

环境搭建

Import
NaturalLanguage
for text analysis and
Translation
for language translation. No special entitlements or capabilities are required for NaturalLanguage. Translation requires iOS 17.4+ / macOS 14.4+.
swift
import NaturalLanguage
import Translation
NaturalLanguage classes (
NLTokenizer
,
NLTagger
) are not thread-safe. Use each instance from one thread or dispatch queue at a time.
文本分析需导入
NaturalLanguage
框架,语言翻译需导入
Translation
框架。使用NaturalLanguage无需特殊权限或功能配置。Translation框架要求iOS 17.4+ / macOS 14.4+版本。
swift
import NaturalLanguage
import Translation
NaturalLanguage类(
NLTokenizer
NLTagger
并非线程安全。请确保每个实例仅在单个线程或调度队列中使用。

Tokenization

分词处理

Segment text into words, sentences, or paragraphs with
NLTokenizer
.
swift
import NaturalLanguage

func tokenizeWords(in text: String) -> [String] {
    let tokenizer = NLTokenizer(unit: .word)
    tokenizer.string = text

    let range = text.startIndex..<text.endIndex
    return tokenizer.tokens(for: range).map { String(text[$0]) }
}
使用
NLTokenizer
将文本分割为单词、句子或段落。
swift
import NaturalLanguage

func tokenizeWords(in text: String) -> [String] {
    let tokenizer = NLTokenizer(unit: .word)
    tokenizer.string = text

    let range = text.startIndex..<text.endIndex
    return tokenizer.tokens(for: range).map { String(text[$0]) }
}

Token Units

分词单位

UnitDescription
.word
Individual words
.sentence
Sentences
.paragraph
Paragraphs
.document
Entire document
单位描述
.word
单个单词
.sentence
句子
.paragraph
段落
.document
整篇文档

Enumerating with Attributes

带属性的枚举

Use
enumerateTokens(in:using:)
to detect numeric or emoji tokens.
swift
let tokenizer = NLTokenizer(unit: .word)
tokenizer.string = text

tokenizer.enumerateTokens(in: text.startIndex..<text.endIndex) { range, attributes in
    if attributes.contains(.numeric) {
        print("Number: \(text[range])")
    }
    return true // continue enumeration
}
使用
enumerateTokens(in:using:)
检测数字或表情符号类型的分词。
swift
let tokenizer = NLTokenizer(unit: .word)
tokenizer.string = text

tokenizer.enumerateTokens(in: text.startIndex..<text.endIndex) { range, attributes in
    if attributes.contains(.numeric) {
        print("数字: \(text[range])")
    }
    return true // 继续枚举
}

Language Identification

语言识别

Detect the dominant language of a string with
NLLanguageRecognizer
.
swift
func detectLanguage(for text: String) -> NLLanguage? {
    NLLanguageRecognizer.dominantLanguage(for: text)
}

// Multiple hypotheses with confidence scores
func languageHypotheses(for text: String, max: Int = 5) -> [NLLanguage: Double] {
    let recognizer = NLLanguageRecognizer()
    recognizer.processString(text)
    return recognizer.languageHypotheses(withMaximum: max)
}
Constrain the recognizer to expected languages for better accuracy on short text.
swift
let recognizer = NLLanguageRecognizer()
recognizer.languageConstraints = [.english, .french, .spanish]
recognizer.processString(text)
let detected = recognizer.dominantLanguage
使用
NLLanguageRecognizer
检测字符串的主导语言。
swift
func detectLanguage(for text: String) -> NLLanguage? {
    NLLanguageRecognizer.dominantLanguage(for: text)
}

// 获取带置信度评分的多语言假设
func languageHypotheses(for text: String, max: Int = 5) -> [NLLanguage: Double] {
    let recognizer = NLLanguageRecognizer()
    recognizer.processString(text)
    return recognizer.languageHypotheses(withMaximum: max)
}
对于短文本,可限制识别器仅检测预期语言以提升准确率。
swift
let recognizer = NLLanguageRecognizer()
recognizer.languageConstraints = [.english, .french, .spanish]
recognizer.processString(text)
let detected = recognizer.dominantLanguage

Part-of-Speech Tagging

词性标注

Identify nouns, verbs, adjectives, and other lexical classes with
NLTagger
.
swift
func tagPartsOfSpeech(in text: String) -> [(String, NLTag)] {
    let tagger = NLTagger(tagSchemes: [.lexicalClass])
    tagger.string = text

    var results: [(String, NLTag)] = []
    let range = text.startIndex..<text.endIndex
    let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]

    tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
        if let tag {
            results.append((String(text[tokenRange]), tag))
        }
        return true
    }
    return results
}
使用
NLTagger
识别名词、动词、形容词等词汇类别。
swift
func tagPartsOfSpeech(in text: String) -> [(String, NLTag)] {
    let tagger = NLTagger(tagSchemes: [.lexicalClass])
    tagger.string = text

    var results: [(String, NLTag)] = []
    let range = text.startIndex..<text.endIndex
    let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]

    tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
        if let tag {
            results.append((String(text[tokenRange]), tag))
        }
        return true
    }
    return results
}

Common Tag Schemes

常用标记方案

SchemeOutput
.lexicalClass
Part of speech (noun, verb, adjective)
.nameType
Named entity type (person, place, organization)
.nameTypeOrLexicalClass
Combined NER + POS
.lemma
Base form of a word
.language
Per-token language
.sentimentScore
Sentiment polarity score
方案输出内容
.lexicalClass
词性(名词、动词、形容词等)
.nameType
命名实体类型(人物、地点、组织)
.nameTypeOrLexicalClass
命名实体识别+词性标注组合
.lemma
单词的基础形式
.language
每个分词的语言类型
.sentimentScore
情感极性评分

Named Entity Recognition

命名实体识别

Extract people, places, and organizations.
swift
func extractEntities(from text: String) -> [(String, NLTag)] {
    let tagger = NLTagger(tagSchemes: [.nameType])
    tagger.string = text

    var entities: [(String, NLTag)] = []
    let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]

    tagger.enumerateTags(
        in: text.startIndex..<text.endIndex,
        unit: .word,
        scheme: .nameType,
        options: options
    ) { tag, tokenRange in
        if let tag, tag != .other {
            entities.append((String(text[tokenRange]), tag))
        }
        return true
    }
    return entities
}
// NLTag values: .personalName, .placeName, .organizationName
提取文本中的人物、地点和组织信息。
swift
func extractEntities(from text: String) -> [(String, NLTag)] {
    let tagger = NLTagger(tagSchemes: [.nameType])
    tagger.string = text

    var entities: [(String, NLTag)] = []
    let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]

    tagger.enumerateTags(
        in: text.startIndex..<text.endIndex,
        unit: .word,
        scheme: .nameType,
        options: options
    ) { tag, tokenRange in
        if let tag, tag != .other {
            entities.append((String(text[tokenRange]), tag))
        }
        return true
    }
    return entities
}
// NLTag取值:.personalName, .placeName, .organizationName

Sentiment Analysis

情感分析

Score text sentiment from -1.0 (negative) to +1.0 (positive).
swift
func sentimentScore(for text: String) -> Double? {
    let tagger = NLTagger(tagSchemes: [.sentimentScore])
    tagger.string = text

    let (tag, _) = tagger.tag(
        at: text.startIndex,
        unit: .paragraph,
        scheme: .sentimentScore
    )
    return tag.flatMap { Double($0.rawValue) }
}
为文本生成-1.0(负面)到+1.0(正面)的情感评分。
swift
func sentimentScore(for text: String) -> Double? {
    let tagger = NLTagger(tagSchemes: [.sentimentScore])
    tagger.string = text

    let (tag, _) = tagger.tag(
        at: text.startIndex,
        unit: .paragraph,
        scheme: .sentimentScore
    )
    return tag.flatMap { Double($0.rawValue) }
}

Text Embeddings

文本嵌入

Measure semantic similarity between words or sentences with
NLEmbedding
.
swift
func wordSimilarity(_ word1: String, _ word2: String) -> Double? {
    guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return nil }
    return embedding.distance(between: word1, and: word2, distanceType: .cosine)
}

func findSimilarWords(to word: String, count: Int = 5) -> [(String, Double)] {
    guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return [] }
    return embedding.neighbors(for: word, maximumCount: count, distanceType: .cosine)
}
Sentence embeddings compare entire sentences.
swift
func sentenceSimilarity(_ s1: String, _ s2: String) -> Double? {
    guard let embedding = NLEmbedding.sentenceEmbedding(for: .english) else { return nil }
    return embedding.distance(between: s1, and: s2, distanceType: .cosine)
}
使用
NLEmbedding
衡量单词或句子间的语义相似度。
swift
func wordSimilarity(_ word1: String, _ word2: String) -> Double? {
    guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return nil }
    return embedding.distance(between: word1, and: word2, distanceType: .cosine)
}

func findSimilarWords(to word: String, count: Int = 5) -> [(String, Double)] {
    guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return [] }
    return embedding.neighbors(for: word, maximumCount: count, distanceType: .cosine)
}
句子嵌入可用于比较整句的语义相似度。
swift
func sentenceSimilarity(_ s1: String, _ s2: String) -> Double? {
    guard let embedding = NLEmbedding.sentenceEmbedding(for: .english) else { return nil }
    return embedding.distance(between: s1, and: s2, distanceType: .cosine)
}

Translation

翻译功能

System Translation Overlay

系统翻译浮层

Show the built-in translation UI with
.translationPresentation()
.
swift
import SwiftUI
import Translation

struct TranslatableView: View {
    @State private var showTranslation = false
    let text = "Hello, how are you?"

    var body: some View {
        Text(text)
            .onTapGesture { showTranslation = true }
            .translationPresentation(
                isPresented: $showTranslation,
                text: text
            )
    }
}
使用
.translationPresentation()
展示内置翻译UI。
swift
import SwiftUI
import Translation

struct TranslatableView: View {
    @State private var showTranslation = false
    let text = "Hello, how are you?"

    var body: some View {
        Text(text)
            .onTapGesture { showTranslation = true }
            .translationPresentation(
                isPresented: $showTranslation,
                text: text
            )
    }
}

Programmatic Translation

程序化翻译

Use
.translationTask()
for programmatic translations within a view context.
swift
struct TranslatingView: View {
    @State private var translatedText = ""
    @State private var configuration: TranslationSession.Configuration?

    var body: some View {
        VStack {
            Text(translatedText)
            Button("Translate") {
                configuration = .init(source: Locale.Language(identifier: "en"),
                                      target: Locale.Language(identifier: "es"))
            }
        }
        .translationTask(configuration) { session in
            let response = try await session.translate("Hello, world!")
            translatedText = response.targetText
        }
    }
}
在视图上下文内使用
.translationTask()
实现程序化翻译。
swift
struct TranslatingView: View {
    @State private var translatedText = ""
    @State private var configuration: TranslationSession.Configuration?

    var body: some View {
        VStack {
            Text(translatedText)
            Button("翻译") {
                configuration = .init(source: Locale.Language(identifier: "en"),
                                      target: Locale.Language(identifier: "es"))
            }
        }
        .translationTask(configuration) { session in
            let response = try await session.translate("Hello, world!")
            translatedText = response.targetText
        }
    }
}

Batch Translation

批量翻译

Translate multiple strings in a single session.
swift
.translationTask(configuration) { session in
    let requests = texts.enumerated().map { index, text in
        TranslationSession.Request(sourceText: text,
                                    clientIdentifier: "\(index)")
    }
    let responses = try await session.translations(from: requests)
    for response in responses {
        print("\(response.sourceText) -> \(response.targetText)")
    }
}
在单个会话中翻译多个字符串。
swift
.translationTask(configuration) { session in
    let requests = texts.enumerated().map { index, text in
        TranslationSession.Request(sourceText: text,
                                    clientIdentifier: "\(index)")
    }
    let responses = try await session.translations(from: requests)
    for response in responses {
        print("\(response.sourceText) -> \(response.targetText)")
    }
}

Checking Language Availability

检查语言支持情况

swift
let availability = LanguageAvailability()
let status = await availability.status(
    from: Locale.Language(identifier: "en"),
    to: Locale.Language(identifier: "ja")
)
switch status {
case .installed: break    // Ready to translate offline
case .supported: break    // Needs download
case .unsupported: break  // Language pair not available
}
swift
let availability = LanguageAvailability()
let status = await availability.status(
    from: Locale.Language(identifier: "en"),
    to: Locale.Language(identifier: "ja")
)
switch status {
case .installed: break    // 已安装,可离线翻译
case .supported: break    // 支持但需下载语言包
case .unsupported: break  // 该语言对不被支持
}

Common Mistakes

常见错误

DON'T: Share NLTagger/NLTokenizer across threads

错误做法:跨线程共享NLTagger/NLTokenizer实例

These classes are not thread-safe and will produce incorrect results or crash.
swift
// WRONG
let sharedTagger = NLTagger(tagSchemes: [.lexicalClass])
DispatchQueue.concurrentPerform(iterations: 10) { _ in
    sharedTagger.string = someText  // Data race
}

// CORRECT
await withTaskGroup(of: Void.self) { group in
    for _ in 0..<10 {
        group.addTask {
            let tagger = NLTagger(tagSchemes: [.lexicalClass])
            tagger.string = someText
            // process...
        }
    }
}
这些类并非线程安全,跨线程使用会导致结果错误或应用崩溃。
swift
// 错误示例
let sharedTagger = NLTagger(tagSchemes: [.lexicalClass])
DispatchQueue.concurrentPerform(iterations: 10) { _ in
    sharedTagger.string = someText  // 数据竞争
}

// 正确示例
await withTaskGroup(of: Void.self) { group in
    for _ in 0..<10 {
        group.addTask {
            let tagger = NLTagger(tagSchemes: [.lexicalClass])
            tagger.string = someText
            // 处理逻辑...
        }
    }
}

DON'T: Confuse NaturalLanguage with Core ML

错误做法:混淆NaturalLanguage与Core ML

NaturalLanguage provides built-in linguistic analysis. Use Core ML for custom trained models. They complement each other via
NLModel
.
swift
// WRONG: Trying to do NER with raw Core ML
let coreMLModel = try MLModel(contentsOf: modelURL)

// CORRECT: Use NLTagger for built-in NER
let tagger = NLTagger(tagSchemes: [.nameType])

// Or load a custom Core ML model via NLModel
let nlModel = try NLModel(mlModel: coreMLModel)
tagger.setModels([nlModel], forTagScheme: .nameType)
NaturalLanguage提供内置的语言学分析功能,Core ML适用于自定义训练模型。两者可通过
NLModel
互补使用。
swift
// 错误示例:试图用原生Core ML实现命名实体识别
let coreMLModel = try MLModel(contentsOf: modelURL)

// 正确示例:使用NLTagger实现内置命名实体识别
let tagger = NLTagger(tagSchemes: [.nameType])

// 或通过NLModel加载自定义Core ML模型
let nlModel = try NLModel(mlModel: coreMLModel)
tagger.setModels([nlModel], forTagScheme: .nameType)

DON'T: Assume embeddings exist for all languages

错误做法:假设所有语言都支持嵌入功能

Not all languages have word or sentence embeddings available on device.
swift
// WRONG: Force unwrap
let embedding = NLEmbedding.wordEmbedding(for: .japanese)!

// CORRECT: Handle nil
guard let embedding = NLEmbedding.wordEmbedding(for: .japanese) else {
    // Embedding not available for this language
    return
}
并非所有语言都在设备端提供词或句嵌入功能。
swift
// 错误示例:强制解包
let embedding = NLEmbedding.wordEmbedding(for: .japanese)!

// 正确示例:处理nil情况
guard let embedding = NLEmbedding.wordEmbedding(for: .japanese) else {
    // 当前语言不支持嵌入功能
    return
}

DON'T: Create a new tagger per token

错误做法:为每个分词创建新的标记器

Creating and configuring a tagger is expensive. Reuse it for the same text.
swift
// WRONG: New tagger per word
for word in words {
    let tagger = NLTagger(tagSchemes: [.lexicalClass])
    tagger.string = word
}

// CORRECT: Set string once, enumerate
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = fullText
tagger.enumerateTags(in: fullText.startIndex..<fullText.endIndex,
                     unit: .word, scheme: .lexicalClass, options: []) { tag, range in
    return true
}
创建和配置标记器的成本较高,应针对同一文本复用标记器实例。
swift
// 错误示例:每个单词创建新标记器
for word in words {
    let tagger = NLTagger(tagSchemes: [.lexicalClass])
    tagger.string = word
}

// 正确示例:设置一次字符串后进行枚举
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = fullText
tagger.enumerateTags(in: fullText.startIndex..<fullText.endIndex,
                     unit: .word, scheme: .lexicalClass, options: []) { tag, range in
    return true
}

DON'T: Ignore language hints for short text

错误做法:忽略短文本的语言提示

Language detection on short strings (under ~20 characters) is unreliable. Set constraints or hints to improve accuracy.
swift
// WRONG: Detect language of a single word
let lang = NLLanguageRecognizer.dominantLanguage(for: "chat")  // French or English?

// CORRECT: Provide context
let recognizer = NLLanguageRecognizer()
recognizer.languageHints = [.english: 0.8, .french: 0.2]
recognizer.processString("chat")
短字符串(约20字符以下)的语言检测结果不可靠。应设置约束或提示以提升准确率。
swift
// 错误示例:检测单个单词的语言
let lang = NLLanguageRecognizer.dominantLanguage(for: "chat")  // 是法语还是英语?

// 正确示例:提供上下文提示
let recognizer = NLLanguageRecognizer()
recognizer.languageHints = [.english: 0.8, .french: 0.2]
recognizer.processString("chat")

Review Checklist

审核检查清单

  • NLTokenizer
    and
    NLTagger
    instances used from a single thread
  • Tagger created once per text, not per token
  • Language detection uses constraints/hints for short text
  • NLEmbedding
    availability checked before use (returns nil if unavailable)
  • Translation
    LanguageAvailability
    checked before attempting translation
  • .translationTask()
    used within a SwiftUI view hierarchy
  • Batch translation uses
    clientIdentifier
    to match responses to requests
  • Sentiment scores handled as optional (may return nil for unsupported languages)
  • .joinNames
    option used with NER to keep multi-word names together
  • Custom ML models loaded via
    NLModel
    , not raw Core ML
  • NLTokenizer
    NLTagger
    实例仅在单个线程中使用
  • 针对同一文本仅创建一次标记器,而非每个分词创建一次
  • 短文本的语言检测使用了约束/提示
  • 使用
    NLEmbedding
    前已检查可用性(不可用时返回nil)
  • 翻译前已检查
    LanguageAvailability
    支持情况
  • 在SwiftUI视图层级内使用
    .translationTask()
  • 批量翻译使用
    clientIdentifier
    匹配请求与响应
  • 情感评分按可选类型处理(不支持的语言可能返回nil)
  • 命名实体识别使用
    .joinNames
    选项拼接多词名称
  • 自定义ML模型通过
    NLModel
    加载,而非原生Core ML

References

参考资料