natural-language
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNaturalLanguage + Translation
NaturalLanguage + Translation框架使用
Analyze natural language text for tokenization, part-of-speech tagging, named
entity recognition, sentiment analysis, language identification, and word/sentence
embeddings. Translate text between languages with the Translation framework.
Targets Swift 6.2 / iOS 26+.
This skill covers two related frameworks: NaturalLanguage (,NLTokenizer,NLTagger) for on-device text analysis, and Translation (NLEmbedding,TranslationSession) for language translation.LanguageAvailability
可对自然语言文本执行分词、词性标注、命名实体识别、情感分析、语言识别以及词/句嵌入分析操作。通过Translation框架实现文本的跨语言翻译。适配Swift 6.2 / iOS 26+版本。
本指南涵盖两个相关框架:用于设备端文本分析的NaturalLanguage(、NLTokenizer、NLTagger),以及用于语言翻译的Translation(NLEmbedding、TranslationSession)。LanguageAvailability
Contents
目录
Setup
环境搭建
Import for text analysis and for language
translation. No special entitlements or capabilities are required for
NaturalLanguage. Translation requires iOS 17.4+ / macOS 14.4+.
NaturalLanguageTranslationswift
import NaturalLanguage
import TranslationNaturalLanguage classes (, ) are not thread-safe.
Use each instance from one thread or dispatch queue at a time.
NLTokenizerNLTagger文本分析需导入框架,语言翻译需导入框架。使用NaturalLanguage无需特殊权限或功能配置。Translation框架要求iOS 17.4+ / macOS 14.4+版本。
NaturalLanguageTranslationswift
import NaturalLanguage
import TranslationNaturalLanguage类(、)并非线程安全。请确保每个实例仅在单个线程或调度队列中使用。
NLTokenizerNLTaggerTokenization
分词处理
Segment text into words, sentences, or paragraphs with .
NLTokenizerswift
import NaturalLanguage
func tokenizeWords(in text: String) -> [String] {
let tokenizer = NLTokenizer(unit: .word)
tokenizer.string = text
let range = text.startIndex..<text.endIndex
return tokenizer.tokens(for: range).map { String(text[$0]) }
}使用将文本分割为单词、句子或段落。
NLTokenizerswift
import NaturalLanguage
func tokenizeWords(in text: String) -> [String] {
let tokenizer = NLTokenizer(unit: .word)
tokenizer.string = text
let range = text.startIndex..<text.endIndex
return tokenizer.tokens(for: range).map { String(text[$0]) }
}Token Units
分词单位
| Unit | Description |
|---|---|
| Individual words |
| Sentences |
| Paragraphs |
| Entire document |
| 单位 | 描述 |
|---|---|
| 单个单词 |
| 句子 |
| 段落 |
| 整篇文档 |
Enumerating with Attributes
带属性的枚举
Use to detect numeric or emoji tokens.
enumerateTokens(in:using:)swift
let tokenizer = NLTokenizer(unit: .word)
tokenizer.string = text
tokenizer.enumerateTokens(in: text.startIndex..<text.endIndex) { range, attributes in
if attributes.contains(.numeric) {
print("Number: \(text[range])")
}
return true // continue enumeration
}使用检测数字或表情符号类型的分词。
enumerateTokens(in:using:)swift
let tokenizer = NLTokenizer(unit: .word)
tokenizer.string = text
tokenizer.enumerateTokens(in: text.startIndex..<text.endIndex) { range, attributes in
if attributes.contains(.numeric) {
print("数字: \(text[range])")
}
return true // 继续枚举
}Language Identification
语言识别
Detect the dominant language of a string with .
NLLanguageRecognizerswift
func detectLanguage(for text: String) -> NLLanguage? {
NLLanguageRecognizer.dominantLanguage(for: text)
}
// Multiple hypotheses with confidence scores
func languageHypotheses(for text: String, max: Int = 5) -> [NLLanguage: Double] {
let recognizer = NLLanguageRecognizer()
recognizer.processString(text)
return recognizer.languageHypotheses(withMaximum: max)
}Constrain the recognizer to expected languages for better accuracy on short text.
swift
let recognizer = NLLanguageRecognizer()
recognizer.languageConstraints = [.english, .french, .spanish]
recognizer.processString(text)
let detected = recognizer.dominantLanguage使用检测字符串的主导语言。
NLLanguageRecognizerswift
func detectLanguage(for text: String) -> NLLanguage? {
NLLanguageRecognizer.dominantLanguage(for: text)
}
// 获取带置信度评分的多语言假设
func languageHypotheses(for text: String, max: Int = 5) -> [NLLanguage: Double] {
let recognizer = NLLanguageRecognizer()
recognizer.processString(text)
return recognizer.languageHypotheses(withMaximum: max)
}对于短文本,可限制识别器仅检测预期语言以提升准确率。
swift
let recognizer = NLLanguageRecognizer()
recognizer.languageConstraints = [.english, .french, .spanish]
recognizer.processString(text)
let detected = recognizer.dominantLanguagePart-of-Speech Tagging
词性标注
Identify nouns, verbs, adjectives, and other lexical classes with .
NLTaggerswift
func tagPartsOfSpeech(in text: String) -> [(String, NLTag)] {
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
var results: [(String, NLTag)] = []
let range = text.startIndex..<text.endIndex
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]
tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
if let tag {
results.append((String(text[tokenRange]), tag))
}
return true
}
return results
}使用识别名词、动词、形容词等词汇类别。
NLTaggerswift
func tagPartsOfSpeech(in text: String) -> [(String, NLTag)] {
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
var results: [(String, NLTag)] = []
let range = text.startIndex..<text.endIndex
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]
tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
if let tag {
results.append((String(text[tokenRange]), tag))
}
return true
}
return results
}Common Tag Schemes
常用标记方案
| Scheme | Output |
|---|---|
| Part of speech (noun, verb, adjective) |
| Named entity type (person, place, organization) |
| Combined NER + POS |
| Base form of a word |
| Per-token language |
| Sentiment polarity score |
| 方案 | 输出内容 |
|---|---|
| 词性(名词、动词、形容词等) |
| 命名实体类型(人物、地点、组织) |
| 命名实体识别+词性标注组合 |
| 单词的基础形式 |
| 每个分词的语言类型 |
| 情感极性评分 |
Named Entity Recognition
命名实体识别
Extract people, places, and organizations.
swift
func extractEntities(from text: String) -> [(String, NLTag)] {
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text
var entities: [(String, NLTag)] = []
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
tagger.enumerateTags(
in: text.startIndex..<text.endIndex,
unit: .word,
scheme: .nameType,
options: options
) { tag, tokenRange in
if let tag, tag != .other {
entities.append((String(text[tokenRange]), tag))
}
return true
}
return entities
}
// NLTag values: .personalName, .placeName, .organizationName提取文本中的人物、地点和组织信息。
swift
func extractEntities(from text: String) -> [(String, NLTag)] {
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text
var entities: [(String, NLTag)] = []
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
tagger.enumerateTags(
in: text.startIndex..<text.endIndex,
unit: .word,
scheme: .nameType,
options: options
) { tag, tokenRange in
if let tag, tag != .other {
entities.append((String(text[tokenRange]), tag))
}
return true
}
return entities
}
// NLTag取值:.personalName, .placeName, .organizationNameSentiment Analysis
情感分析
Score text sentiment from -1.0 (negative) to +1.0 (positive).
swift
func sentimentScore(for text: String) -> Double? {
let tagger = NLTagger(tagSchemes: [.sentimentScore])
tagger.string = text
let (tag, _) = tagger.tag(
at: text.startIndex,
unit: .paragraph,
scheme: .sentimentScore
)
return tag.flatMap { Double($0.rawValue) }
}为文本生成-1.0(负面)到+1.0(正面)的情感评分。
swift
func sentimentScore(for text: String) -> Double? {
let tagger = NLTagger(tagSchemes: [.sentimentScore])
tagger.string = text
let (tag, _) = tagger.tag(
at: text.startIndex,
unit: .paragraph,
scheme: .sentimentScore
)
return tag.flatMap { Double($0.rawValue) }
}Text Embeddings
文本嵌入
Measure semantic similarity between words or sentences with .
NLEmbeddingswift
func wordSimilarity(_ word1: String, _ word2: String) -> Double? {
guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return nil }
return embedding.distance(between: word1, and: word2, distanceType: .cosine)
}
func findSimilarWords(to word: String, count: Int = 5) -> [(String, Double)] {
guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return [] }
return embedding.neighbors(for: word, maximumCount: count, distanceType: .cosine)
}Sentence embeddings compare entire sentences.
swift
func sentenceSimilarity(_ s1: String, _ s2: String) -> Double? {
guard let embedding = NLEmbedding.sentenceEmbedding(for: .english) else { return nil }
return embedding.distance(between: s1, and: s2, distanceType: .cosine)
}使用衡量单词或句子间的语义相似度。
NLEmbeddingswift
func wordSimilarity(_ word1: String, _ word2: String) -> Double? {
guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return nil }
return embedding.distance(between: word1, and: word2, distanceType: .cosine)
}
func findSimilarWords(to word: String, count: Int = 5) -> [(String, Double)] {
guard let embedding = NLEmbedding.wordEmbedding(for: .english) else { return [] }
return embedding.neighbors(for: word, maximumCount: count, distanceType: .cosine)
}句子嵌入可用于比较整句的语义相似度。
swift
func sentenceSimilarity(_ s1: String, _ s2: String) -> Double? {
guard let embedding = NLEmbedding.sentenceEmbedding(for: .english) else { return nil }
return embedding.distance(between: s1, and: s2, distanceType: .cosine)
}Translation
翻译功能
System Translation Overlay
系统翻译浮层
Show the built-in translation UI with .
.translationPresentation()swift
import SwiftUI
import Translation
struct TranslatableView: View {
@State private var showTranslation = false
let text = "Hello, how are you?"
var body: some View {
Text(text)
.onTapGesture { showTranslation = true }
.translationPresentation(
isPresented: $showTranslation,
text: text
)
}
}使用展示内置翻译UI。
.translationPresentation()swift
import SwiftUI
import Translation
struct TranslatableView: View {
@State private var showTranslation = false
let text = "Hello, how are you?"
var body: some View {
Text(text)
.onTapGesture { showTranslation = true }
.translationPresentation(
isPresented: $showTranslation,
text: text
)
}
}Programmatic Translation
程序化翻译
Use for programmatic translations within a view context.
.translationTask()swift
struct TranslatingView: View {
@State private var translatedText = ""
@State private var configuration: TranslationSession.Configuration?
var body: some View {
VStack {
Text(translatedText)
Button("Translate") {
configuration = .init(source: Locale.Language(identifier: "en"),
target: Locale.Language(identifier: "es"))
}
}
.translationTask(configuration) { session in
let response = try await session.translate("Hello, world!")
translatedText = response.targetText
}
}
}在视图上下文内使用实现程序化翻译。
.translationTask()swift
struct TranslatingView: View {
@State private var translatedText = ""
@State private var configuration: TranslationSession.Configuration?
var body: some View {
VStack {
Text(translatedText)
Button("翻译") {
configuration = .init(source: Locale.Language(identifier: "en"),
target: Locale.Language(identifier: "es"))
}
}
.translationTask(configuration) { session in
let response = try await session.translate("Hello, world!")
translatedText = response.targetText
}
}
}Batch Translation
批量翻译
Translate multiple strings in a single session.
swift
.translationTask(configuration) { session in
let requests = texts.enumerated().map { index, text in
TranslationSession.Request(sourceText: text,
clientIdentifier: "\(index)")
}
let responses = try await session.translations(from: requests)
for response in responses {
print("\(response.sourceText) -> \(response.targetText)")
}
}在单个会话中翻译多个字符串。
swift
.translationTask(configuration) { session in
let requests = texts.enumerated().map { index, text in
TranslationSession.Request(sourceText: text,
clientIdentifier: "\(index)")
}
let responses = try await session.translations(from: requests)
for response in responses {
print("\(response.sourceText) -> \(response.targetText)")
}
}Checking Language Availability
检查语言支持情况
swift
let availability = LanguageAvailability()
let status = await availability.status(
from: Locale.Language(identifier: "en"),
to: Locale.Language(identifier: "ja")
)
switch status {
case .installed: break // Ready to translate offline
case .supported: break // Needs download
case .unsupported: break // Language pair not available
}swift
let availability = LanguageAvailability()
let status = await availability.status(
from: Locale.Language(identifier: "en"),
to: Locale.Language(identifier: "ja")
)
switch status {
case .installed: break // 已安装,可离线翻译
case .supported: break // 支持但需下载语言包
case .unsupported: break // 该语言对不被支持
}Common Mistakes
常见错误
DON'T: Share NLTagger/NLTokenizer across threads
错误做法:跨线程共享NLTagger/NLTokenizer实例
These classes are not thread-safe and will produce incorrect results or crash.
swift
// WRONG
let sharedTagger = NLTagger(tagSchemes: [.lexicalClass])
DispatchQueue.concurrentPerform(iterations: 10) { _ in
sharedTagger.string = someText // Data race
}
// CORRECT
await withTaskGroup(of: Void.self) { group in
for _ in 0..<10 {
group.addTask {
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = someText
// process...
}
}
}这些类并非线程安全,跨线程使用会导致结果错误或应用崩溃。
swift
// 错误示例
let sharedTagger = NLTagger(tagSchemes: [.lexicalClass])
DispatchQueue.concurrentPerform(iterations: 10) { _ in
sharedTagger.string = someText // 数据竞争
}
// 正确示例
await withTaskGroup(of: Void.self) { group in
for _ in 0..<10 {
group.addTask {
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = someText
// 处理逻辑...
}
}
}DON'T: Confuse NaturalLanguage with Core ML
错误做法:混淆NaturalLanguage与Core ML
NaturalLanguage provides built-in linguistic analysis. Use Core ML for custom
trained models. They complement each other via .
NLModelswift
// WRONG: Trying to do NER with raw Core ML
let coreMLModel = try MLModel(contentsOf: modelURL)
// CORRECT: Use NLTagger for built-in NER
let tagger = NLTagger(tagSchemes: [.nameType])
// Or load a custom Core ML model via NLModel
let nlModel = try NLModel(mlModel: coreMLModel)
tagger.setModels([nlModel], forTagScheme: .nameType)NaturalLanguage提供内置的语言学分析功能,Core ML适用于自定义训练模型。两者可通过互补使用。
NLModelswift
// 错误示例:试图用原生Core ML实现命名实体识别
let coreMLModel = try MLModel(contentsOf: modelURL)
// 正确示例:使用NLTagger实现内置命名实体识别
let tagger = NLTagger(tagSchemes: [.nameType])
// 或通过NLModel加载自定义Core ML模型
let nlModel = try NLModel(mlModel: coreMLModel)
tagger.setModels([nlModel], forTagScheme: .nameType)DON'T: Assume embeddings exist for all languages
错误做法:假设所有语言都支持嵌入功能
Not all languages have word or sentence embeddings available on device.
swift
// WRONG: Force unwrap
let embedding = NLEmbedding.wordEmbedding(for: .japanese)!
// CORRECT: Handle nil
guard let embedding = NLEmbedding.wordEmbedding(for: .japanese) else {
// Embedding not available for this language
return
}并非所有语言都在设备端提供词或句嵌入功能。
swift
// 错误示例:强制解包
let embedding = NLEmbedding.wordEmbedding(for: .japanese)!
// 正确示例:处理nil情况
guard let embedding = NLEmbedding.wordEmbedding(for: .japanese) else {
// 当前语言不支持嵌入功能
return
}DON'T: Create a new tagger per token
错误做法:为每个分词创建新的标记器
Creating and configuring a tagger is expensive. Reuse it for the same text.
swift
// WRONG: New tagger per word
for word in words {
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = word
}
// CORRECT: Set string once, enumerate
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = fullText
tagger.enumerateTags(in: fullText.startIndex..<fullText.endIndex,
unit: .word, scheme: .lexicalClass, options: []) { tag, range in
return true
}创建和配置标记器的成本较高,应针对同一文本复用标记器实例。
swift
// 错误示例:每个单词创建新标记器
for word in words {
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = word
}
// 正确示例:设置一次字符串后进行枚举
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = fullText
tagger.enumerateTags(in: fullText.startIndex..<fullText.endIndex,
unit: .word, scheme: .lexicalClass, options: []) { tag, range in
return true
}DON'T: Ignore language hints for short text
错误做法:忽略短文本的语言提示
Language detection on short strings (under ~20 characters) is unreliable.
Set constraints or hints to improve accuracy.
swift
// WRONG: Detect language of a single word
let lang = NLLanguageRecognizer.dominantLanguage(for: "chat") // French or English?
// CORRECT: Provide context
let recognizer = NLLanguageRecognizer()
recognizer.languageHints = [.english: 0.8, .french: 0.2]
recognizer.processString("chat")短字符串(约20字符以下)的语言检测结果不可靠。应设置约束或提示以提升准确率。
swift
// 错误示例:检测单个单词的语言
let lang = NLLanguageRecognizer.dominantLanguage(for: "chat") // 是法语还是英语?
// 正确示例:提供上下文提示
let recognizer = NLLanguageRecognizer()
recognizer.languageHints = [.english: 0.8, .french: 0.2]
recognizer.processString("chat")Review Checklist
审核检查清单
- and
NLTokenizerinstances used from a single threadNLTagger - Tagger created once per text, not per token
- Language detection uses constraints/hints for short text
- availability checked before use (returns nil if unavailable)
NLEmbedding - Translation checked before attempting translation
LanguageAvailability - used within a SwiftUI view hierarchy
.translationTask() - Batch translation uses to match responses to requests
clientIdentifier - Sentiment scores handled as optional (may return nil for unsupported languages)
- option used with NER to keep multi-word names together
.joinNames - Custom ML models loaded via , not raw Core ML
NLModel
- 和
NLTokenizer实例仅在单个线程中使用NLTagger - 针对同一文本仅创建一次标记器,而非每个分词创建一次
- 短文本的语言检测使用了约束/提示
- 使用前已检查可用性(不可用时返回nil)
NLEmbedding - 翻译前已检查支持情况
LanguageAvailability - 在SwiftUI视图层级内使用
.translationTask() - 批量翻译使用匹配请求与响应
clientIdentifier - 情感评分按可选类型处理(不支持的语言可能返回nil)
- 命名实体识别使用选项拼接多词名称
.joinNames - 自定义ML模型通过加载,而非原生Core ML
NLModel
References
参考资料
- Extended patterns (custom models, contextual embeddings, gazetteers):
references/translation-patterns.md - Natural Language framework
- NLTokenizer
- NLTagger
- NLEmbedding
- NLLanguageRecognizer
- Translation framework
- TranslationSession
- LanguageAvailability
- 扩展模式(自定义模型、上下文嵌入、地名词典):
references/translation-patterns.md - Natural Language框架官方文档
- NLTokenizer官方文档
- NLTagger官方文档
- NLEmbedding官方文档
- NLLanguageRecognizer官方文档
- Translation框架官方文档
- TranslationSession官方文档
- LanguageAvailability官方文档