axiom-vision-ref
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVision Framework API Reference
Vision框架API参考
Comprehensive reference for Vision framework computer vision: subject segmentation, hand/body pose detection, person detection, face analysis, text recognition (OCR), barcode detection, and document scanning.
Vision框架计算机视觉的综合参考:主体分割、手部/身体姿态检测、人物检测、面部分析、文本识别(OCR)、条形码检测以及文档扫描。
When to Use This Reference
何时使用本参考
- Implementing subject lifting using VisionKit or Vision
- Detecting hand/body poses for gesture recognition or fitness apps
- Segmenting people from backgrounds or separating multiple individuals
- Face detection and landmarks for AR effects or authentication
- Combining Vision APIs to solve complex computer vision problems
- Looking up specific API signatures and parameter meanings
- Recognizing text in images (OCR) with VNRecognizeTextRequest
- Detecting barcodes and QR codes with VNDetectBarcodesRequest
- Building live scanners with DataScannerViewController
- Scanning documents with VNDocumentCameraViewController
- Extracting structured document data with RecognizeDocumentsRequest (iOS 26+)
Related skills: See for decision trees and patterns, for troubleshooting
axiom-visionaxiom-vision-diag- 使用VisionKit或Vision实现主体提取
- 检测手部/身体姿态,用于手势识别或健身应用
- 将人物从背景中分割出来,或分离多个个体
- 面部检测与标记点,用于AR特效或身份验证
- 组合Vision API解决复杂的计算机视觉问题
- 查阅特定API的签名和参数含义
- 使用VNRecognizeTextRequest识别图像中的文本(OCR)
- 使用VNDetectBarcodesRequest检测条形码和二维码
- 使用DataScannerViewController构建实时扫描器
- 使用VNDocumentCameraViewController扫描文档
- 使用RecognizeDocumentsRequest提取结构化文档数据(iOS 26+)
相关技能:请查看获取决策树和模式,用于故障排除
axiom-visionaxiom-vision-diagVision Framework Overview
Vision框架概述
Vision provides computer vision algorithms for still images and video:
Core workflow:
- Create request (e.g., )
VNDetectHumanHandPoseRequest() - Create handler with image ()
VNImageRequestHandler(cgImage: image) - Perform request ()
try handler.perform([request]) - Access observations from
request.results
Coordinate system: Lower-left origin, normalized (0.0-1.0) coordinates
Performance: Run on background queue - resource intensive, blocks UI if on main thread
Vision为静态图像和视频提供计算机视觉算法:
核心工作流:
- 创建请求(例如)
VNDetectHumanHandPoseRequest() - 使用图像创建处理器()
VNImageRequestHandler(cgImage: image) - 执行请求()
try handler.perform([request]) - 从中获取观测结果
request.results
坐标系:原点在左下角,坐标为归一化值(0.0-1.0)
性能:在后台队列运行——资源密集型,若在主线程运行会阻塞UI
Subject Segmentation APIs
主体分割API
VNGenerateForegroundInstanceMaskRequest
VNGenerateForegroundInstanceMaskRequest
Availability: iOS 17+, macOS 14+, tvOS 17+, axiom-visionOS 1+
Generates class-agnostic instance mask of foreground objects (people, pets, buildings, food, shoes, etc.)
可用性:iOS 17+、macOS 14+、tvOS 17+、visionOS 1+
生成前景对象(人物、宠物、建筑、食物、鞋子等)的类别无关实例掩码
Basic Usage
基本用法
swift
let request = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
guard let observation = request.results?.first as? VNInstanceMaskObservation else {
return
}swift
let request = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
guard let observation = request.results?.first as? VNInstanceMaskObservation else {
return
}InstanceMaskObservation
InstanceMaskObservation
allInstances: containing all foreground instance indices (excludes background 0)
IndexSetinstanceMask: with UInt8 labels (0 = background, 1+ = instance indices)
CVPixelBufferinstanceAtPoint(_:): Returns instance index at normalized point
swift
let point = CGPoint(x: 0.5, y: 0.5) // Center of image
let instance = observation.instanceAtPoint(point)
if instance == 0 {
print("Background tapped")
} else {
print("Instance \(instance) tapped")
}allInstances:包含所有前景实例的索引(不包括背景0)
IndexSetinstanceMask:,带有UInt8标签(0=背景,1+=实例索引)
CVPixelBufferinstanceAtPoint(_:):返回归一化点处的实例索引
swift
let point = CGPoint(x: 0.5, y: 0.5) // 图像中心
let instance = observation.instanceAtPoint(point)
if instance == 0 {
print("点击了背景")
} else {
print("点击了实例 \(instance)")
}Generating Masks
生成掩码
createScaledMask(for:croppedToInstancesContent:)
Parameters:
- :
forof instances to includeIndexSet - :
croppedToInstancesContent- = Output matches input resolution (for compositing)
false - = Tight crop around selected instances
true
Returns: Single-channel floating-point (soft segmentation mask)
CVPixelBufferswift
// All instances, full resolution
let mask = try observation.createScaledMask(
for: observation.allInstances,
croppedToInstancesContent: false
)
// Single instance, cropped
let instances = IndexSet(integer: 1)
let croppedMask = try observation.createScaledMask(
for: instances,
croppedToInstancesContent: true
)createScaledMask(for:croppedToInstancesContent:)
参数:
- :要包含的实例的
forIndexSet - :
croppedToInstancesContent- = 输出与输入分辨率匹配(用于合成)
false - = 围绕所选实例进行紧密裁剪
true
返回:单通道浮点型(软分割掩码)
CVPixelBufferswift
// 所有实例,全分辨率
let mask = try observation.createScaledMask(
for: observation.allInstances,
croppedToInstancesContent: false
)
// 单个实例,裁剪后
let instances = IndexSet(integer: 1)
let croppedMask = try observation.createScaledMask(
for: instances,
croppedToInstancesContent: true
)Instance Mask Hit Testing
实例掩码点击测试
Access raw pixel buffer to map tap coordinates to instance labels:
swift
let instanceMask = observation.instanceMask
CVPixelBufferLockBaseAddress(instanceMask, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) }
let baseAddress = CVPixelBufferGetBaseAddress(instanceMask)
let width = CVPixelBufferGetWidth(instanceMask)
let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask)
// Convert normalized tap to pixel coordinates
let pixelPoint = VNImagePointForNormalizedPoint(
CGPoint(x: normalizedX, y: normalizedY),
width: imageWidth,
height: imageHeight
)
// Calculate byte offset
let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x)
// Read instance label
let label = UnsafeRawPointer(baseAddress!).load(
fromByteOffset: offset,
as: UInt8.self
)
let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))访问原始像素缓冲区,将点击坐标映射到实例标签:
swift
let instanceMask = observation.instanceMask
CVPixelBufferLockBaseAddress(instanceMask, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) }
let baseAddress = CVPixelBufferGetBaseAddress(instanceMask)
let width = CVPixelBufferGetWidth(instanceMask)
let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask)
// 将归一化点击坐标转换为像素坐标
let pixelPoint = VNImagePointForNormalizedPoint(
CGPoint(x: normalizedX, y: normalizedY),
width: imageWidth,
height: imageHeight
)
// 计算字节偏移量
let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x)
// 读取实例标签
let label = UnsafeRawPointer(baseAddress!).load(
fromByteOffset: offset,
as: UInt8.self
)
let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))VisionKit Subject Lifting
VisionKit主体提取
ImageAnalysisInteraction (iOS)
ImageAnalysisInteraction(iOS)
Availability: iOS 16+, iPadOS 16+
Adds system-like subject lifting UI to views:
swift
let interaction = ImageAnalysisInteraction()
interaction.preferredInteractionTypes = .imageSubject // Or .automatic
imageView.addInteraction(interaction)Interaction types:
- : Subject lifting + Live Text + data detectors
.automatic - : Subject lifting only (no interactive text)
.imageSubject
可用性:iOS 16+、iPadOS 16+
为视图添加系统级的主体提取UI:
swift
let interaction = ImageAnalysisInteraction()
interaction.preferredInteractionTypes = .imageSubject // 或.automatic
imageView.addInteraction(interaction)交互类型:
- :主体提取 + 实时文本 + 数据检测器
.automatic - :仅主体提取(无交互式文本)
.imageSubject
ImageAnalysisOverlayView (macOS)
ImageAnalysisOverlayView(macOS)
Availability: macOS 13+
swift
let overlayView = ImageAnalysisOverlayView()
overlayView.preferredInteractionTypes = .imageSubject
nsView.addSubview(overlayView)可用性:macOS 13+
swift
let overlayView = ImageAnalysisOverlayView()
overlayView.preferredInteractionTypes = .imageSubject
nsView.addSubview(overlayView)Programmatic Access
程序化访问
ImageAnalyzer
ImageAnalyzer
swift
let analyzer = ImageAnalyzer()
let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp])
let analysis = try await analyzer.analyze(image, configuration: configuration)swift
let analyzer = ImageAnalyzer()
let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp])
let analysis = try await analyzer.analyze(image, configuration: configuration)ImageAnalysis
ImageAnalysis
subjects: - All subjects in image
[Subject]highlightedSubjects: - Currently highlighted (user long-pressed)
Set<Subject>subject(at:): Async lookup of subject at normalized point (returns if none)
nilswift
// Get all subjects
let subjects = analysis.subjects
// Look up subject at tap
if let subject = try await analysis.subject(at: tapPoint) {
// Process subject
}
// Change highlight state
analysis.highlightedSubjects = Set([subjects[0], subjects[1]])subjects: - 图像中的所有主体
[Subject]highlightedSubjects: - 当前高亮的主体(用户长按选中)
Set<Subject>subject(at:):异步查找归一化点处的主体(无主体时返回)
nilswift
// 获取所有主体
let subjects = analysis.subjects
// 查找点击位置的主体
if let subject = try await analysis.subject(at: tapPoint) {
// 处理主体
}
// 更改高亮状态
analysis.highlightedSubjects = Set([subjects[0], subjects[1]])Subject Struct
Subject结构体
image: / - Extracted subject with transparency
UIImageNSImagebounds: - Subject boundaries in image coordinates
CGRectswift
// Single subject image
let subjectImage = subject.image
// Composite multiple subjects
let compositeImage = try await analysis.image(for: [subject1, subject2])Out-of-process: VisionKit analysis happens out-of-process (performance benefit, image size limited)
image:/ - 提取的带透明度的主体
UIImageNSImagebounds: - 主体在图像坐标系中的边界
CGRectswift
// 单个主体图像
let subjectImage = subject.image
// 合成多个主体
let compositeImage = try await analysis.image(for: [subject1, subject2])进程外处理:VisionKit分析在进程外进行(性能优势,图像大小受限)
Person Segmentation APIs
人物分割API
VNGeneratePersonSegmentationRequest
VNGeneratePersonSegmentationRequest
Availability: iOS 15+, macOS 12+
Returns single mask containing all people in image:
swift
let request = VNGeneratePersonSegmentationRequest()
// Configure quality level if needed
try handler.perform([request])
guard let observation = request.results?.first as? VNPixelBufferObservation else {
return
}
let personMask = observation.pixelBuffer // CVPixelBuffer可用性:iOS 15+、macOS 12+
返回包含图像中所有人的单个掩码:
swift
let request = VNGeneratePersonSegmentationRequest()
// 如有需要,配置质量级别
try handler.perform([request])
guard let observation = request.results?.first as? VNPixelBufferObservation else {
return
}
let personMask = observation.pixelBuffer // CVPixelBufferVNGeneratePersonInstanceMaskRequest
VNGeneratePersonInstanceMaskRequest
Availability: iOS 17+, macOS 14+
Returns separate masks for up to 4 people:
swift
let request = VNGeneratePersonInstanceMaskRequest()
try handler.perform([request])
guard let observation = request.results?.first as? VNInstanceMaskObservation else {
return
}
// Same InstanceMaskObservation API as foreground instance masks
let allPeople = observation.allInstances // Up to 4 people (1-4)
// Get mask for person 1
let person1Mask = try observation.createScaledMask(
for: IndexSet(integer: 1),
croppedToInstancesContent: false
)Limitations:
- Segments up to 4 people
- With >4 people: may miss people or combine them (typically background people)
- Use to count faces if you need to handle crowded scenes
VNDetectFaceRectanglesRequest
可用性:iOS 17+、macOS 14+
返回最多4个人的单独掩码:
swift
let request = VNGeneratePersonInstanceMaskRequest()
try handler.perform([request])
guard let observation = request.results?.first as? VNInstanceMaskObservation else {
return
}
// 与前景实例掩码使用相同的InstanceMaskObservation API
let allPeople = observation.allInstances // 最多4人(1-4)
// 获取第1个人的掩码
let person1Mask = try observation.createScaledMask(
for: IndexSet(integer: 1),
croppedToInstancesContent: false
)限制:
- 最多分割4个人
- 超过4人时:可能会遗漏人物或合并(通常是背景中的人物)
- 若需要处理拥挤场景,使用统计面部数量
VNDetectFaceRectanglesRequest
Hand Pose Detection
手部姿态检测
VNDetectHumanHandPoseRequest
VNDetectHumanHandPoseRequest
Availability: iOS 14+, macOS 11+
Detects 21 hand landmarks per hand:
swift
let request = VNDetectHumanHandPoseRequest()
request.maximumHandCount = 2 // Default: 2, increase if needed
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for observation in request.results as? [VNHumanHandPoseObservation] ?? [] {
// Process each hand
}Performance note: affects latency. Pose computed only for hands ≤ maximum. Set to lowest acceptable value.
maximumHandCount可用性:iOS 14+、macOS 11+
每只手检测21个手部标记点:
swift
let request = VNDetectHumanHandPoseRequest()
request.maximumHandCount = 2 // 默认值:2,如有需要可增加
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for observation in request.results as? [VNHumanHandPoseObservation] ?? [] {
// 处理每只手
}性能说明:会影响延迟。仅为数量≤最大值的手计算姿态。设置为可接受的最低值。
maximumHandCountHand Landmarks (21 points)
手部标记点(21个点)
Wrist: 1 landmark
Thumb (4 landmarks):
.thumbTip- (interphalangeal joint)
.thumbIP - (metacarpophalangeal joint)
.thumbMP - (carpometacarpal joint)
.thumbCMC
Fingers (4 landmarks each):
- Tip (,
.indexTip,.middleTip,.ringTip).littleTip - DIP (distal interphalangeal joint)
- PIP (proximal interphalangeal joint)
- MCP (metacarpophalangeal joint)
手腕:1个标记点
拇指(4个标记点):
.thumbTip- (指间关节)
.thumbIP - (掌指关节)
.thumbMP - (腕掌关节)
.thumbCMC
手指(每个4个标记点):
- 指尖(、
.indexTip、.middleTip、.ringTip).littleTip - DIP(远侧指间关节)
- PIP(近侧指间关节)
- MCP(掌指关节)
Group Keys
组键
Access landmark groups:
| Group Key | Points |
|---|---|
| All 21 landmarks |
| 4 thumb joints |
| 4 index finger joints |
| 4 middle finger joints |
| 4 ring finger joints |
| 4 little finger joints |
swift
// Get all points
let allPoints = try observation.recognizedPoints(.all)
// Get index finger points only
let indexPoints = try observation.recognizedPoints(.indexFinger)
// Get specific point
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)
// Check confidence
guard thumbTip.confidence > 0.5 else { return }
// Access location (normalized coordinates, lower-left origin)
let location = thumbTip.location // CGPoint访问标记点组:
| 组键 | 点数 |
|---|---|
| 所有21个标记点 |
| 4个拇指关节 |
| 4个食指关节 |
| 4个中指关节 |
| 4个无名指关节 |
| 4个小指关节 |
swift
// 获取所有点
let allPoints = try observation.recognizedPoints(.all)
// 仅获取食指的点
let indexPoints = try observation.recognizedPoints(.indexFinger)
// 获取特定点
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)
// 检查置信度
guard thumbTip.confidence > 0.5 else { return }
// 访问位置(归一化坐标,原点在左下角)
let location = thumbTip.location // CGPointGesture Recognition Example (Pinch)
手势识别示例(捏合)
swift
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)
guard thumbTip.confidence > 0.5, indexTip.confidence > 0.5 else {
return
}
let distance = hypot(
thumbTip.location.x - indexTip.location.x,
thumbTip.location.y - indexTip.location.y
)
let isPinching = distance < 0.05 // Normalized thresholdswift
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)
guard thumbTip.confidence > 0.5, indexTip.confidence > 0.5 else {
return
}
let distance = hypot(
thumbTip.location.x - indexTip.location.x,
thumbTip.location.y - indexTip.location.y
)
let isPinching = distance < 0.05 // 归一化阈值Chirality (Handedness)
手性(左右手)
swift
let chirality = observation.chirality // .left or .right or .unknownswift
let chirality = observation.chirality // .left、.right或.unknownBody Pose Detection
身体姿态检测
VNDetectHumanBodyPoseRequest (2D)
VNDetectHumanBodyPoseRequest(2D)
Availability: iOS 14+, macOS 11+
Detects 18 body landmarks (2D normalized coordinates):
swift
let request = VNDetectHumanBodyPoseRequest()
try handler.perform([request])
for observation in request.results as? [VNHumanBodyPoseObservation] ?? [] {
// Process each person
}可用性:iOS 14+、macOS 11+
检测18个身体标记点(2D归一化坐标):
swift
let request = VNDetectHumanBodyPoseRequest()
try handler.perform([request])
for observation in request.results as? [VNHumanBodyPoseObservation] ?? [] {
// 处理每个人
}Body Landmarks (18 points)
身体标记点(18个点)
Face (5 landmarks):
- ,
.nose,.leftEye,.rightEye,.leftEar.rightEar
Arms (6 landmarks):
- Left: ,
.leftShoulder,.leftElbow.leftWrist - Right: ,
.rightShoulder,.rightElbow.rightWrist
Torso (7 landmarks):
- (between shoulders)
.neck - ,
.leftShoulder(also in arm groups).rightShoulder - ,
.leftHip.rightHip - (between hips)
.root
Legs (6 landmarks):
- Left: ,
.leftHip,.leftKnee.leftAnkle - Right: ,
.rightHip,.rightKnee.rightAnkle
Note: Shoulders and hips appear in multiple groups
面部(5个标记点):
- 、
.nose、.leftEye、.rightEye、.leftEar.rightEar
手臂(6个标记点):
- 左侧:、
.leftShoulder、.leftElbow.leftWrist - 右侧:、
.rightShoulder、.rightElbow.rightWrist
躯干(7个标记点):
- (两肩之间)
.neck - 、
.leftShoulder(也属于手臂组).rightShoulder - 、
.leftHip.rightHip - (两髋之间)
.root
腿部(6个标记点):
- 左侧:、
.leftHip、.leftKnee.leftAnkle - 右侧:、
.rightHip、.rightKnee.rightAnkle
注意:肩膀和髋部属于多个组
Group Keys (Body)
组键(身体)
| Group Key | Points |
|---|---|
| All 18 landmarks |
| 5 face landmarks |
| shoulder, elbow, wrist |
| shoulder, elbow, wrist |
| neck, shoulders, hips, root |
| hip, knee, ankle |
| hip, knee, ankle |
swift
// Get all body points
let allPoints = try observation.recognizedPoints(.all)
// Get left arm only
let leftArmPoints = try observation.recognizedPoints(.leftArm)
// Get specific joint
let leftWrist = try observation.recognizedPoint(.leftWrist)| 组键 | 点数 |
|---|---|
| 所有18个标记点 |
| 5个面部标记点 |
| 肩膀、肘部、手腕 |
| 肩膀、肘部、手腕 |
| 颈部、肩膀、髋部、根部 |
| 髋部、膝盖、脚踝 |
| 髋部、膝盖、脚踝 |
swift
// 获取所有身体点
let allPoints = try observation.recognizedPoints(.all)
// 仅获取左臂的点
let leftArmPoints = try observation.recognizedPoints(.leftArm)
// 获取特定关节
let leftWrist = try observation.recognizedPoint(.leftWrist)VNDetectHumanBodyPose3DRequest (3D)
VNDetectHumanBodyPose3DRequest(3D)
Availability: iOS 17+, macOS 14+
Returns 3D skeleton with 17 joints in meters (real-world coordinates):
swift
let request = VNDetectHumanBodyPose3DRequest()
try handler.perform([request])
guard let observation = request.results?.first as? VNHumanBodyPose3DObservation else {
return
}
// Get 3D joint position
let leftWrist = try observation.recognizedPoint(.leftWrist)
let position = leftWrist.position // simd_float4x4 matrix
let localPosition = leftWrist.localPosition // Relative to parent joint3D Body Landmarks (17 points): Same as 2D except no ears (15 vs 18 2D landmarks)
可用性:iOS 17+、macOS 14+
返回带17个关节的3D骨架,单位为米(真实世界坐标):
swift
let request = VNDetectHumanBodyPose3DRequest()
try handler.perform([request])
guard let observation = request.results?.first as? VNHumanBodyPose3DObservation else {
return
}
// 获取3D关节位置
let leftWrist = try observation.recognizedPoint(.leftWrist)
let position = leftWrist.position // simd_float4x4矩阵
let localPosition = leftWrist.localPosition // 相对于父关节的位置3D身体标记点(17个点):与2D相同,但不包含耳朵(15个,而2D是18个标记点)
3D Observation Properties
3D观测属性
bodyHeight: Estimated height in meters
- With depth data: Measured height
- Without depth data: Reference height (1.8m)
heightEstimation: or
.measured.referencecameraOriginMatrix: camera position/orientation relative to subject
simd_float4x4pointInImage(_:): Project 3D joint back to 2D image coordinates
swift
let wrist2D = try observation.pointInImage(leftWrist)bodyHeight:估计身高,单位为米
- 有深度数据时:实测身高
- 无深度数据时:参考身高(1.8米)
heightEstimation:或
.measured.referencecameraOriginMatrix:,相机相对于主体的位置/方向
simd_float4x4pointInImage(_:):将3D关节投影回2D图像坐标
swift
let wrist2D = try observation.pointInImage(leftWrist)3D Point Classes
3D点类
VNPoint3D: Base class with position matrix
simd_float4x4VNRecognizedPoint3D: Adds identifier (joint name)
VNHumanBodyRecognizedPoint3D: Adds and
localPositionparentJointswift
// Position relative to skeleton root (center of hip)
let modelPosition = leftWrist.position
// Position relative to parent joint (left elbow)
let relativePosition = leftWrist.localPositionVNPoint3D:基类,带有位置矩阵
simd_float4x4VNRecognizedPoint3D:添加了标识符(关节名称)
VNHumanBodyRecognizedPoint3D:添加了和
localPositionparentJointswift
// 相对于骨架根部(髋部中心)的位置
let modelPosition = leftWrist.position
// 相对于父关节(左肘部)的位置
let relativePosition = leftWrist.localPositionDepth Input
深度输入
Vision accepts depth data alongside images:
swift
// From AVDepthData
let handler = VNImageRequestHandler(
cvPixelBuffer: imageBuffer,
depthData: depthData,
orientation: orientation
)
// From file (automatic depth extraction)
let handler = VNImageRequestHandler(url: imageURL) // Depth auto-fetchedDepth formats: Disparity or Depth (interchangeable via AVFoundation)
LiDAR: Use in live capture sessions for accurate scale/measurement
Vision接受与图像一起的深度数据:
swift
// 来自AVDepthData
let handler = VNImageRequestHandler(
cvPixelBuffer: imageBuffer,
depthData: depthData,
orientation: orientation
)
// 来自文件(自动提取深度)
let handler = VNImageRequestHandler(url: imageURL) // 自动获取深度深度格式:视差或深度(可通过AVFoundation互换)
LiDAR:在实时捕获会话中使用,以获得准确的比例/测量结果
Face Detection & Landmarks
面部检测与标记点
VNDetectFaceRectanglesRequest
VNDetectFaceRectanglesRequest
Availability: iOS 11+
Detects face bounding boxes:
swift
let request = VNDetectFaceRectanglesRequest()
try handler.perform([request])
for observation in request.results as? [VNFaceObservation] ?? [] {
let faceBounds = observation.boundingBox // Normalized rect
}可用性:iOS 11+
检测面部边界框:
swift
let request = VNDetectFaceRectanglesRequest()
try handler.perform([request])
for observation in request.results as? [VNFaceObservation] ?? [] {
let faceBounds = observation.boundingBox // 归一化矩形
}VNDetectFaceLandmarksRequest
VNDetectFaceLandmarksRequest
Availability: iOS 11+
Detects face with detailed landmarks:
swift
let request = VNDetectFaceLandmarksRequest()
try handler.perform([request])
for observation in request.results as? [VNFaceObservation] ?? [] {
if let landmarks = observation.landmarks {
let leftEye = landmarks.leftEye
let nose = landmarks.nose
let leftPupil = landmarks.leftPupil // Revision 2+
}
}Revisions:
- Revision 1: Basic landmarks
- Revision 2: Detects upside-down faces
- Revision 3+: Pupil locations
可用性:iOS 11+
检测带有详细标记点的面部:
swift
let request = VNDetectFaceLandmarksRequest()
try handler.perform([request])
for observation in request.results as? [VNFaceObservation] ?? [] {
if let landmarks = observation.landmarks {
let leftEye = landmarks.leftEye
let nose = landmarks.nose
let leftPupil = landmarks.leftPupil // 版本2+
}
}版本:
- 版本1:基础标记点
- 版本2:检测倒置的面部
- 版本3+:瞳孔位置
Person Detection
人物检测
VNDetectHumanRectanglesRequest
VNDetectHumanRectanglesRequest
Availability: iOS 13+
Detects human bounding boxes (torso detection):
swift
let request = VNDetectHumanRectanglesRequest()
try handler.perform([request])
for observation in request.results as? [VNHumanObservation] ?? [] {
let humanBounds = observation.boundingBox // Normalized rect
}Use case: Faster than pose detection when you only need location
可用性:iOS 13+
检测人物边界框(躯干检测):
swift
let request = VNDetectHumanRectanglesRequest()
try handler.perform([request])
for observation in request.results as? [VNHumanObservation] ?? [] {
let humanBounds = observation.boundingBox // 归一化矩形
}使用场景:当仅需要位置时,比姿态检测更快
CoreImage Integration
CoreImage集成
CIBlendWithMask Filter
CIBlendWithMask滤镜
Composite subject on new background using Vision mask:
swift
// 1. Get mask from Vision
let observation = request.results?.first as? VNInstanceMaskObservation
let visionMask = try observation.createScaledMask(
for: observation.allInstances,
croppedToInstancesContent: false
)
// 2. Convert to CIImage
let maskImage = CIImage(cvPixelBuffer: axiom-visionMask)
// 3. Apply filter
let filter = CIFilter(name: "CIBlendWithMask")!
filter.setValue(sourceImage, forKey: kCIInputImageKey)
filter.setValue(maskImage, forKey: kCIInputMaskImageKey)
filter.setValue(newBackground, forKey: kCIInputBackgroundImageKey)
let output = filter.outputImage // Composited resultParameters:
- Input image: Original image to mask
- Mask image: Vision's soft segmentation mask
- Background image: New background (or empty image for transparency)
HDR preservation: CoreImage preserves high dynamic range from input (Vision/VisionKit output is SDR)
使用Vision掩码将主体合成到新背景上:
swift
// 1. 从Vision获取掩码
let observation = request.results?.first as? VNInstanceMaskObservation
let visionMask = try observation.createScaledMask(
for: observation.allInstances,
croppedToInstancesContent: false
)
// 2. 转换为CIImage
let maskImage = CIImage(cvPixelBuffer: visionMask)
// 3. 应用滤镜
let filter = CIFilter(name: "CIBlendWithMask")!
filter.setValue(sourceImage, forKey: kCIInputImageKey)
filter.setValue(maskImage, forKey: kCIInputMaskImageKey)
filter.setValue(newBackground, forKey: kCIInputBackgroundImageKey)
let output = filter.outputImage // 合成结果参数:
- 输入图像:要掩码的原始图像
- 掩码图像:Vision的软分割掩码
- 背景图像:新背景(或空图像以获得透明度)
HDR保留:CoreImage保留输入的高动态范围(Vision/VisionKit输出为SDR)
Text Recognition APIs
文本识别API
VNRecognizeTextRequest
VNRecognizeTextRequest
Availability: iOS 13+, macOS 10.15+
Recognizes text in images with configurable accuracy/speed trade-off.
可用性:iOS 13+、macOS 10.15+
识别图像中的文本,可配置精度/速度权衡。
Basic Usage
基本用法
swift
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate // Or .fast
request.recognitionLanguages = ["en-US", "de-DE"] // Order matters
request.usesLanguageCorrection = true
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for observation in request.results as? [VNRecognizedTextObservation] ?? [] {
// Get top candidates
let candidates = observation.topCandidates(3)
let bestText = candidates.first?.string ?? ""
}swift
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate // 或.fast
request.recognitionLanguages = ["en-US", "de-DE"] // 顺序很重要
request.usesLanguageCorrection = true
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for observation in request.results as? [VNRecognizedTextObservation] ?? [] {
// 获取排名靠前的候选结果
let candidates = observation.topCandidates(3)
let bestText = candidates.first?.string ?? ""
}Recognition Levels
识别级别
| Level | Performance | Accuracy | Best For |
|---|---|---|---|
| Real-time | Good | Camera feed, large text, signs |
| Slower | Excellent | Documents, receipts, handwriting |
Fast path: Character-by-character recognition (Neural Network → Character Detection)
Accurate path: Full-line ML recognition (Neural Network → Line/Word Recognition)
| 级别 | 性能 | 精度 | 最佳适用场景 |
|---|---|---|---|
| 实时 | 良好 | 相机画面、大文本、标识 |
| 较慢 | 优秀 | 文档、收据、手写体 |
快速路径:逐字符识别(神经网络→字符检测)
精确路径:整行机器学习识别(神经网络→行/单词识别)
Properties
属性
| Property | Type | Description |
|---|---|---|
| | |
| | BCP 47 language codes, order = priority |
| | Use language model for correction |
| | Domain-specific vocabulary |
| | Auto-detect language (iOS 16+) |
| | Min text height as fraction of image (0-1) |
| | API version (affects supported languages) |
| 属性 | 类型 | 描述 |
|---|---|---|
| | |
| | BCP 47语言代码,顺序=优先级 |
| | 使用语言模型进行校正 |
| | 特定领域词汇 |
| | 自动检测语言(iOS 16+) |
| | 最小文本高度,占图像的比例(0-1) |
| | API版本(影响支持的语言) |
Language Support
语言支持
swift
// Check supported languages for current settings
let languages = try VNRecognizeTextRequest.supportedRecognitionLanguages(
for: .accurate,
revision: VNRecognizeTextRequestRevision3
)Language correction: Improves accuracy but takes processing time. Disable for codes/serial numbers.
Custom words: Add domain-specific vocabulary for better recognition (medical terms, product codes).
swift
// 检查当前设置支持的语言
let languages = try VNRecognizeTextRequest.supportedRecognitionLanguages(
for: .accurate,
revision: VNRecognizeTextRequestRevision3
)语言校正:提高精度但会占用处理时间。对于代码/序列号,禁用此功能。
自定义词汇:添加特定领域词汇以提高识别精度(医学术语、产品代码)。
VNRecognizedTextObservation
VNRecognizedTextObservation
boundingBox: Normalized rect containing recognized text
topCandidates(_:): Returns ordered by confidence
[VNRecognizedText]boundingBox:包含识别文本的归一化矩形
topCandidates(_:):返回,按置信度排序
[VNRecognizedText]VNRecognizedText
VNRecognizedText
| Property | Type | Description |
|---|---|---|
| | Recognized text |
| | 0.0-1.0 |
| | Box for substring range |
swift
// Get bounding box for substring
let text = candidate.string
if let range = text.range(of: "invoice") {
let box = try candidate.boundingBox(for: range)
}| 属性 | 类型 | 描述 |
|---|---|---|
| | 识别的文本 |
| | 0.0-1.0 |
| | 子字符串范围的边界框 |
swift
// 获取子字符串的边界框
let text = candidate.string
if let range = text.range(of: "invoice") {
let box = try candidate.boundingBox(for: range)
}Barcode Detection APIs
条形码检测API
VNDetectBarcodesRequest
VNDetectBarcodesRequest
Availability: iOS 11+, macOS 10.13+
Detects and decodes barcodes and QR codes.
可用性:iOS 11+、macOS 10.13+
检测并解码条形码和二维码。
Basic Usage
基本用法
swift
let request = VNDetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128] // Specific codes
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for barcode in request.results as? [VNBarcodeObservation] ?? [] {
let payload = barcode.payloadStringValue
let type = barcode.symbology
let bounds = barcode.boundingBox
}swift
let request = VNDetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128] // 特定代码类型
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for barcode in request.results as? [VNBarcodeObservation] ?? [] {
let payload = barcode.payloadStringValue
let type = barcode.symbology
let bounds = barcode.boundingBox
}Symbologies
码制
1D Barcodes:
- (iOS 15+)
.codabar - ,
.code39,.code39Checksum,.code39FullASCII.code39FullASCIIChecksum - ,
.code93.code93i .code128- ,
.ean8.ean13 - ,
.gs1DataBar,.gs1DataBarExpanded(iOS 15+).gs1DataBarLimited - ,
.i2of5.i2of5Checksum .itf14.upce
2D Codes:
.aztec.dataMatrix- (iOS 15+)
.microPDF417 - (iOS 15+)
.microQR .pdf417.qr
Performance: Specifying fewer symbologies = faster detection
1D条形码:
- (iOS 15+)
.codabar - 、
.code39、.code39Checksum、.code39FullASCII.code39FullASCIIChecksum - 、
.code93.code93i .code128- 、
.ean8.ean13 - 、
.gs1DataBar、.gs1DataBarExpanded(iOS 15+).gs1DataBarLimited - 、
.i2of5.i2of5Checksum .itf14.upce
2D码:
.aztec.dataMatrix- (iOS 15+)
.microPDF417 - (iOS 15+)
.microQR .pdf417.qr
性能:指定的码制越少,检测速度越快
Revisions
版本
| Revision | iOS | Features |
|---|---|---|
| 1 | 11+ | Basic detection, one code at a time |
| 2 | 15+ | Codabar, GS1, MicroPDF, MicroQR, better ROI |
| 3 | 16+ | ML-based, multiple codes, better bounding boxes |
| 版本 | iOS | 功能 |
|---|---|---|
| 1 | 11+ | 基础检测,一次检测一个码 |
| 2 | 15+ | Codabar、GS1、MicroPDF、MicroQR,更好的ROI |
| 3 | 16+ | 基于机器学习,多码检测,更好的边界框 |
VNBarcodeObservation
VNBarcodeObservation
| Property | Type | Description |
|---|---|---|
| | Decoded content |
| | Barcode type |
| | Normalized bounds |
| | Corner points |
| 属性 | 类型 | 描述 |
|---|---|---|
| | 解码后的内容 |
| | 条形码类型 |
| | 归一化边界 |
| | 角点 |
VisionKit Scanner APIs
VisionKit扫描器API
DataScannerViewController
DataScannerViewController
Availability: iOS 16+
Camera-based live scanner with built-in UI for text and barcodes.
可用性:iOS 16+
基于相机的实时扫描器,带有内置UI,支持文本和条形码。
Check Availability
检查可用性
swift
// Hardware support
DataScannerViewController.isSupported
// Runtime availability (camera access, parental controls)
DataScannerViewController.isAvailableswift
// 硬件支持
DataScannerViewController.isSupported
// 运行时可用性(相机访问、家长控制)
DataScannerViewController.isAvailableConfiguration
配置
swift
import VisionKit
let dataTypes: Set<DataScannerViewController.RecognizedDataType> = [
.barcode(symbologies: [.qr, .ean13]),
.text(textContentType: .URL), // Or nil for all text
// .text(languages: ["ja"]) // Filter by language
]
let scanner = DataScannerViewController(
recognizedDataTypes: dataTypes,
qualityLevel: .balanced, // .fast, .balanced, .accurate
recognizesMultipleItems: true,
isHighFrameRateTrackingEnabled: true,
isPinchToZoomEnabled: true,
isGuidanceEnabled: true,
isHighlightingEnabled: true
)
scanner.delegate = self
present(scanner, animated: true) {
try? scanner.startScanning()
}swift
import VisionKit
let dataTypes: Set<DataScannerViewController.RecognizedDataType> = [
.barcode(symbologies: [.qr, .ean13]),
.text(textContentType: .URL), // 或nil以识别所有文本
// .text(languages: ["ja"]) // 按语言过滤
]
let scanner = DataScannerViewController(
recognizedDataTypes: dataTypes,
qualityLevel: .balanced, // .fast、.balanced、.accurate
recognizesMultipleItems: true,
isHighFrameRateTrackingEnabled: true,
isPinchToZoomEnabled: true,
isGuidanceEnabled: true,
isHighlightingEnabled: true
)
scanner.delegate = self
present(scanner, animated: true) {
try? scanner.startScanning()
}RecognizedDataType
RecognizedDataType
| Type | Description |
|---|---|
| Specific barcode types |
| All text |
| Text filtered by language |
| Text filtered by type (URL, phone, email) |
| 类型 | 描述 |
|---|---|
| 特定条形码类型 |
| 所有文本 |
| 按语言过滤文本 |
| 按类型过滤文本(URL、电话、邮箱) |
Delegate Protocol
委托协议
swift
protocol DataScannerViewControllerDelegate {
func dataScanner(_ dataScanner: DataScannerViewController,
didTapOn item: RecognizedItem)
func dataScanner(_ dataScanner: DataScannerViewController,
didAdd addedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
didUpdate updatedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
didRemove removedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable)
}swift
protocol DataScannerViewControllerDelegate {
func dataScanner(_ dataScanner: DataScannerViewController,
didTapOn item: RecognizedItem)
func dataScanner(_ dataScanner: DataScannerViewController,
didAdd addedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
didUpdate updatedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
didRemove removedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable)
}RecognizedItem
RecognizedItem
swift
enum RecognizedItem {
case text(RecognizedItem.Text)
case barcode(RecognizedItem.Barcode)
var id: UUID { get }
var bounds: RecognizedItem.Bounds { get }
}
// Text item
struct Text {
let transcript: String
}
// Barcode item
struct Barcode {
let payloadStringValue: String?
let observation: VNBarcodeObservation
}swift
enum RecognizedItem {
case text(RecognizedItem.Text)
case barcode(RecognizedItem.Barcode)
var id: UUID { get }
var bounds: RecognizedItem.Bounds { get }
}
// 文本项
struct Text {
let transcript: String
}
// 条形码项
struct Barcode {
let payloadStringValue: String?
let observation: VNBarcodeObservation
}Async Stream
异步流
swift
// Alternative to delegate
for await items in scanner.recognizedItems {
// Current recognized items
}swift
// 替代委托
for await items in scanner.recognizedItems {
// 当前识别的项
}Custom Highlights
自定义高亮
swift
// Add custom views over recognized items
scanner.overlayContainerView.addSubview(customHighlight)
// Capture still photo
let photo = try await scanner.capturePhoto()swift
// 在识别项上添加自定义视图
scanner.overlayContainerView.addSubview(customHighlight)
// 拍摄静态照片
let photo = try await scanner.capturePhoto()VNDocumentCameraViewController
VNDocumentCameraViewController
Availability: iOS 13+
Document scanning with automatic edge detection, perspective correction, and lighting adjustment.
可用性:iOS 13+
文档扫描,带有自动边缘检测、透视校正和光照调整。
Basic Usage
基本用法
swift
import VisionKit
let camera = VNDocumentCameraViewController()
camera.delegate = self
present(camera, animated: true)swift
import VisionKit
let camera = VNDocumentCameraViewController()
camera.delegate = self
present(camera, animated: true)Delegate Protocol
委托协议
swift
protocol VNDocumentCameraViewControllerDelegate {
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFinishWith scan: VNDocumentCameraScan)
func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController)
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFailWithError error: Error)
}swift
protocol VNDocumentCameraViewControllerDelegate {
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFinishWith scan: VNDocumentCameraScan)
func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController)
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFailWithError error: Error)
}VNDocumentCameraScan
VNDocumentCameraScan
| Property | Type | Description |
|---|---|---|
| | Number of scanned pages |
| | Get page image at index |
| | User-editable title |
swift
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFinishWith scan: VNDocumentCameraScan) {
controller.dismiss(animated: true)
for i in 0..<scan.pageCount {
let pageImage = scan.imageOfPage(at: i)
// Process with VNRecognizeTextRequest
}
}| 属性 | 类型 | 描述 |
|---|---|---|
| | 扫描页数 |
| | 获取指定索引的页面图像 |
| | 用户可编辑的标题 |
swift
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFinishWith scan: VNDocumentCameraScan) {
controller.dismiss(animated: true)
for i in 0..<scan.pageCount {
let pageImage = scan.imageOfPage(at: i)
// 使用VNRecognizeTextRequest处理
}
}Document Analysis APIs
文档分析API
VNDetectDocumentSegmentationRequest
VNDetectDocumentSegmentationRequest
Availability: iOS 15+, macOS 12+
Detects document boundaries for custom camera UIs or post-processing.
swift
let request = VNDetectDocumentSegmentationRequest()
let handler = VNImageRequestHandler(ciImage: image)
try handler.perform([request])
guard let observation = request.results?.first as? VNRectangleObservation else {
return // No document found
}
// Get corner points (normalized)
let corners = [
observation.topLeft,
observation.topRight,
observation.bottomLeft,
observation.bottomRight
]vs VNDetectRectanglesRequest:
- Document: ML-based, trained specifically on documents
- Rectangle: Edge-based, finds any quadrilateral
可用性:iOS 15+、macOS 12+
为自定义相机UI或后处理检测文档边界。
swift
let request = VNDetectDocumentSegmentationRequest()
let handler = VNImageRequestHandler(ciImage: image)
try handler.perform([request])
guard let observation = request.results?.first as? VNRectangleObservation else {
return // 未找到文档
}
// 获取角点(归一化)
let corners = [
observation.topLeft,
observation.topRight,
observation.bottomLeft,
observation.bottomRight
]与VNDetectRectanglesRequest对比:
- 文档:基于机器学习,专门针对文档训练
- 矩形:基于边缘,查找任何四边形
RecognizeDocumentsRequest (iOS 26+)
RecognizeDocumentsRequest(iOS 26+)
Availability: iOS 26+, macOS 26+
Structured document understanding with semantic parsing.
可用性:iOS 26+、macOS 26+
结构化文档理解,带有语义解析。
Basic Usage
基本用法
swift
let request = RecognizeDocumentsRequest()
let observations = try await request.perform(on: imageData)
guard let document = observations.first?.document else {
return
}swift
let request = RecognizeDocumentsRequest()
let observations = try await request.perform(on: imageData)
guard let document = observations.first?.document else {
return
}DocumentObservation Hierarchy
DocumentObservation层次结构
DocumentObservation
└── document: DocumentObservation.Document
├── text: TextObservation
├── tables: [Container.Table]
├── lists: [Container.List]
└── barcodes: [Container.Barcode]DocumentObservation
└── document: DocumentObservation.Document
├── text: TextObservation
├── tables: [Container.Table]
├── lists: [Container.List]
└── barcodes: [Container.Barcode]Table Extraction
表格提取
swift
for table in document.tables {
for row in table.rows {
for cell in row {
let text = cell.content.text.transcript
let detectedData = cell.content.text.detectedData
}
}
}swift
for table in document.tables {
for row in table.rows {
for cell in row {
let text = cell.content.text.transcript
let detectedData = cell.content.text.detectedData
}
}
}Detected Data Types
检测到的数据类型
swift
for data in document.text.detectedData {
switch data.match.details {
case .emailAddress(let email):
let address = email.emailAddress
case .phoneNumber(let phone):
let number = phone.phoneNumber
case .link(let url):
let link = url
case .address(let address):
let components = address
case .date(let date):
let dateValue = date
default:
break
}
}swift
for data in document.text.detectedData {
switch data.match.details {
case .emailAddress(let email):
let address = email.emailAddress
case .phoneNumber(let phone):
let number = phone.phoneNumber
case .link(let url):
let link = url
case .address(let address):
let components = address
case .date(let date):
let dateValue = date
default:
break
}
}TextObservation Hierarchy
TextObservation层次结构
TextObservation
├── transcript: String
├── lines: [TextObservation.Line]
├── paragraphs: [TextObservation.Paragraph]
├── words: [TextObservation.Word]
└── detectedData: [DetectedDataObservation]TextObservation
├── transcript: String
├── lines: [TextObservation.Line]
├── paragraphs: [TextObservation.Paragraph]
├── words: [TextObservation.Word]
└── detectedData: [DetectedDataObservation]API Quick Reference
API快速参考
Subject Segmentation
主体分割
| API | Platform | Purpose |
|---|---|---|
| iOS 17+ | Class-agnostic subject instances |
| iOS 17+ | Up to 4 people separately |
| iOS 15+ | All people (single mask) |
| iOS 16+ | UI for subject lifting |
| API | 平台 | 用途 |
|---|---|---|
| iOS 17+ | 类别无关的主体实例 |
| iOS 17+ | 最多4个人的单独掩码 |
| iOS 15+ | 所有人的单个掩码 |
| iOS 16+ | 主体提取的UI |
Pose Detection
姿态检测
| API | Platform | Landmarks | Coordinates |
|---|---|---|---|
| iOS 14+ | 21 per hand | 2D normalized |
| iOS 14+ | 18 body joints | 2D normalized |
| iOS 17+ | 17 body joints | 3D meters |
| API | 平台 | 标记点 | 坐标 |
|---|---|---|---|
| iOS 14+ | 每只手21个 | 2D归一化 |
| iOS 14+ | 18个身体关节 | 2D归一化 |
| iOS 17+ | 17个身体关节 | 3D米 |
Face & Person Detection
面部与人物检测
| API | Platform | Purpose |
|---|---|---|
| iOS 11+ | Face bounding boxes |
| iOS 11+ | Face with detailed landmarks |
| iOS 13+ | Human torso bounding boxes |
| API | 平台 | 用途 |
|---|---|---|
| iOS 11+ | 面部边界框 |
| iOS 11+ | 带详细标记点的面部 |
| iOS 13+ | 人物躯干边界框 |
Text & Barcode
文本与条形码
| API | Platform | Purpose |
|---|---|---|
| iOS 13+ | Text recognition (OCR) |
| iOS 11+ | Barcode/QR detection |
| iOS 16+ | Live camera scanner (text + barcodes) |
| iOS 13+ | Document scanning with perspective correction |
| iOS 15+ | Programmatic document edge detection |
| iOS 26+ | Structured document extraction |
| API | 平台 | 用途 |
|---|---|---|
| iOS 13+ | 文本识别(OCR) |
| iOS 11+ | 条形码/二维码检测 |
| iOS 16+ | 实时相机扫描器(文本+条形码) |
| iOS 13+ | 带透视校正的文档扫描 |
| iOS 15+ | 程序化文档边缘检测 |
| iOS 26+ | 结构化文档提取 |
Observation Types
观测类型
| Observation | Returned By |
|---|---|
| Foreground/person instance masks |
| Person segmentation (single mask) |
| Hand pose |
| Body pose (2D) |
| Body pose (3D) |
| Face detection/landmarks |
| Human rectangles |
| Text recognition |
| Barcode detection |
| Document segmentation |
| Structured document (iOS 26+) |
| 观测类型 | 由以下API返回 |
|---|---|
| 前景/人物实例掩码 |
| 人物分割(单个掩码) |
| 手部姿态 |
| 身体姿态(2D) |
| 身体姿态(3D) |
| 面部检测/标记点 |
| 人物矩形 |
| 文本识别 |
| 条形码检测 |
| 文档分割 |
| 结构化文档(iOS 26+) |
Resources
资源
WWDC: 2019-234, 2021-10041, 2022-10024, 2022-10025, 2025-272, 2023-10176, 2023-111241, 2023-10048, 2020-10653, 2020-10043, 2020-10099
Docs: /vision, /visionkit, /vision/vnrecognizetextrequest, /vision/vndetectbarcodesrequest
Skills: axiom-vision, axiom-vision-diag
WWDC:2019-234、2021-10041、2022-10024、2022-10025、2025-272、2023-10176、2023-111241、2023-10048、2020-10653、2020-10043、2020-10099
文档:/vision、/visionkit、/vision/vnrecognizetextrequest、/vision/vndetectbarcodesrequest
技能:axiom-vision、axiom-vision-diag