axiom-ios-vision
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseiOS Computer Vision Router
iOS Computer Vision 路由指引
You MUST use this skill for ANY computer vision work using the Vision framework.
在使用Vision框架进行任何计算机视觉工作时,你都必须使用此技能。
When to Use
使用场景
Use this router when:
- Analyzing images or video
- Detecting objects, faces, or people
- Tracking hand or body pose
- Segmenting people or subjects
- Lifting subjects from backgrounds
- Recognizing text in images (OCR)
- Detecting barcodes or QR codes
- Scanning documents
- Using VisionKit or DataScannerViewController
在以下场景中使用本路由工具:
- 分析图像或视频
- 检测目标、人脸或人物
- 追踪手部或身体姿态
- 分割人物或主体
- 从背景中提取主体
- 识别图像中的文本(OCR)
- 检测条形码或二维码
- 扫描文档
- 使用VisionKit或DataScannerViewController
Routing Logic
路由逻辑
Vision Work
视觉任务处理
Implementation patterns →
/skill axiom-vision- Subject segmentation (VisionKit)
- Hand pose detection (21 landmarks)
- Body pose detection (2D/3D)
- Person segmentation
- Face detection
- Isolating objects while excluding hands
- Text recognition (VNRecognizeTextRequest)
- Barcode/QR detection (VNDetectBarcodesRequest)
- Document scanning (VNDocumentCameraViewController)
- Live scanning (DataScannerViewController)
- Structured document extraction (RecognizeDocumentsRequest, iOS 26+)
API reference →
/skill axiom-vision-ref- Complete Vision framework API
- VNDetectHumanHandPoseRequest
- VNDetectHumanBodyPoseRequest
- VNGenerateForegroundInstanceMaskRequest
- VNRecognizeTextRequest (fast/accurate modes)
- VNDetectBarcodesRequest (symbologies)
- DataScannerViewController delegates
- RecognizeDocumentsRequest (iOS 26+)
- Coordinate conversion patterns
Diagnostics →
/skill axiom-vision-diag- Subject not detected
- Hand pose missing landmarks
- Low confidence observations
- Performance issues
- Coordinate conversion bugs
- Text not recognized or wrong characters
- Barcodes not detected
- DataScanner showing blank or no items
- Document edges not detected
实现模式 →
/skill axiom-vision- 主体分割(VisionKit)
- 手部姿态检测(21个关键点)
- 身体姿态检测(2D/3D)
- 人物分割
- 人脸检测
- 隔离目标同时排除手部
- 文本识别(VNRecognizeTextRequest)
- 条形码/二维码检测(VNDetectBarcodesRequest)
- 文档扫描(VNDocumentCameraViewController)
- 实时扫描(DataScannerViewController)
- 结构化文档提取(RecognizeDocumentsRequest,iOS 26+)
API参考 →
/skill axiom-vision-ref- 完整的Vision框架API
- VNDetectHumanHandPoseRequest
- VNDetectHumanBodyPoseRequest
- VNGenerateForegroundInstanceMaskRequest
- VNRecognizeTextRequest(快速/精准模式)
- VNDetectBarcodesRequest(条码类型)
- DataScannerViewController代理方法
- RecognizeDocumentsRequest(iOS 26+)
- 坐标转换模式
诊断排查 →
/skill axiom-vision-diag- 未检测到主体
- 手部姿态关键点缺失
- 检测结果置信度低
- 性能问题
- 坐标转换错误
- 文本未识别或识别字符错误
- 条形码未检测到
- DataScanner显示空白或无结果
- 文档边缘未检测到
Decision Tree
决策树
- Implementing (pose, segmentation, OCR, barcodes, documents, live scanning)? → vision
- Need API reference / code examples? → vision-ref
- Debugging issues (detection failures, confidence, coordinates)? → vision-diag
- 是否需要实现(姿态检测、分割、OCR、条形码、文档、实时扫描)功能?→ 使用vision
- 是否需要API参考/代码示例?→ 使用vision-ref
- 是否需要调试问题(检测失败、置信度、坐标)?→ 使用vision-diag
Anti-Rationalization
常见误区纠正
| Thought | Reality |
|---|---|
| "Vision framework is just a request/handler pattern" | Vision has coordinate conversion, confidence thresholds, and performance gotchas. vision covers them. |
| "I'll handle text recognition without the skill" | VNRecognizeTextRequest has fast/accurate modes and language-specific settings. vision has the patterns. |
| "Subject segmentation is straightforward" | Instance masks have HDR compositing and hand-exclusion patterns. vision covers complex scenarios. |
| 错误想法 | 实际情况 |
|---|---|
| "Vision框架只是请求/处理器模式" | Vision框架涉及坐标转换、置信度阈值和性能陷阱,vision技能涵盖了这些内容。 |
| "我可以不使用该技能来处理文本识别" | VNRecognizeTextRequest有快速/精准模式和特定语言设置,vision技能包含了对应的实现模式。 |
| "主体分割很简单" | 实例蒙版涉及HDR合成和手部排除模式,vision技能涵盖了这些复杂场景。 |
Critical Patterns
关键实现模式
vision:
- Subject segmentation with VisionKit
- Hand pose detection (21 landmarks)
- Body pose detection (2D/3D, up to 4 people)
- Isolating objects while excluding hands
- CoreImage HDR compositing
- Text recognition (fast vs accurate modes)
- Barcode detection (symbology selection)
- Document scanning with perspective correction
- Live scanning with DataScannerViewController
- Structured document extraction (iOS 26+)
vision-diag:
- Subject detection failures
- Landmark tracking issues
- Performance optimization
- Observation confidence thresholds
- Text recognition failures (language, contrast)
- Barcode detection issues (symbology, distance)
- DataScanner troubleshooting
- Document edge detection problems
vision技能:
- 使用VisionKit实现主体分割
- 手部姿态检测(21个关键点)
- 身体姿态检测(2D/3D,最多支持4人)
- 隔离目标同时排除手部
- CoreImage HDR合成
- 文本识别(快速vs精准模式)
- 条形码检测(条码类型选择)
- 带透视校正的文档扫描
- 使用DataScannerViewController实现实时扫描
- 结构化文档提取(iOS 26+)
vision-diag技能:
- 主体检测失败排查
- 关键点追踪问题排查
- 性能优化
- 检测结果置信度阈值设置
- 文本识别失败排查(语言、对比度问题)
- 条形码检测问题排查(条码类型、距离问题)
- DataScanner故障排查
- 文档边缘检测问题排查
Example Invocations
调用示例
User: "How do I detect hand pose in an image?"
→ Invoke:
/skill axiom-visionUser: "Isolate a subject but exclude the user's hands"
→ Invoke:
/skill axiom-visionUser: "How do I read text from an image?"
→ Invoke:
/skill axiom-visionUser: "Scan QR codes with the camera"
→ Invoke:
/skill axiom-visionUser: "How do I implement document scanning?"
→ Invoke:
/skill axiom-visionUser: "Use DataScannerViewController for live text"
→ Invoke:
/skill axiom-visionUser: "Subject detection isn't working"
→ Invoke:
/skill axiom-vision-diagUser: "Text recognition returns wrong characters"
→ Invoke:
/skill axiom-vision-diagUser: "Barcode not being detected"
→ Invoke:
/skill axiom-vision-diagUser: "Show me VNDetectHumanBodyPoseRequest examples"
→ Invoke:
/skill axiom-vision-refUser: "What symbologies does VNDetectBarcodesRequest support?"
→ Invoke:
/skill axiom-vision-refUser: "RecognizeDocumentsRequest API reference"
→ Invoke:
/skill axiom-vision-ref用户:"如何在图像中检测手部姿态?"
→ 调用:
/skill axiom-vision用户:"提取主体但排除用户的手部"
→ 调用:
/skill axiom-vision用户:"如何从图像中读取文本?"
→ 调用:
/skill axiom-vision用户:"用相机扫描二维码"
→ 调用:
/skill axiom-vision用户:"如何实现文档扫描功能?"
→ 调用:
/skill axiom-vision用户:"使用DataScannerViewController进行实时文本识别"
→ 调用:
/skill axiom-vision用户:"主体检测功能无法正常工作"
→ 调用:
/skill axiom-vision-diag用户:"文本识别返回错误字符"
→ 调用:
/skill axiom-vision-diag用户:"条形码无法被检测到"
→ 调用:
/skill axiom-vision-diag用户:"展示VNDetectHumanBodyPoseRequest的示例代码"
→ 调用:
/skill axiom-vision-ref用户:"VNDetectBarcodesRequest支持哪些条码类型?"
→ 调用:
/skill axiom-vision-ref用户:"RecognizeDocumentsRequest的API参考"
→ 调用:
/skill axiom-vision-ref