senior-computer-vision
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSenior Computer Vision Engineer
资深计算机视觉工程师
Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.
面向目标检测、图像分割和视觉AI系统部署的生产级计算机视觉工程技能。
Table of Contents
目录
Quick Start
快速开始
bash
undefinedbash
undefinedGenerate training configuration for YOLO or Faster R-CNN
为YOLO或Faster R-CNN生成训练配置
python scripts/vision_model_trainer.py models/ --task detection --arch yolov8
python scripts/vision_model_trainer.py models/ --task detection --arch yolov8
Analyze model for optimization opportunities (quantization, pruning)
分析模型以寻找优化机会(量化、剪枝)
python scripts/inference_optimizer.py model.pt --target onnx --benchmark
python scripts/inference_optimizer.py model.pt --target onnx --benchmark
Build dataset pipeline with augmentations
构建带数据增强的数据集流水线
python scripts/dataset_pipeline_builder.py images/ --format coco --augment
undefinedpython scripts/dataset_pipeline_builder.py images/ --format coco --augment
undefinedCore Expertise
核心能力
This skill provides guidance on:
- Object Detection: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR
- Instance Segmentation: Mask R-CNN, YOLACT, SOLOv2
- Semantic Segmentation: DeepLabV3+, SegFormer, SAM (Segment Anything)
- Image Classification: ResNet, EfficientNet, Vision Transformers (ViT, DeiT)
- Video Analysis: Object tracking (ByteTrack, SORT), action recognition
- 3D Vision: Depth estimation, point cloud processing, NeRF
- Production Deployment: ONNX, TensorRT, OpenVINO, CoreML
本技能提供以下方面的指导:
- 目标检测:YOLO系列(v5-v11)、Faster R-CNN、DETR、RT-DETR
- 实例分割:Mask R-CNN、YOLACT、SOLOv2
- 语义分割:DeepLabV3+、SegFormer、SAM(Segment Anything)
- 图像分类:ResNet、EfficientNet、Vision Transformers(ViT、DeiT)
- 视频分析:目标跟踪(ByteTrack、SORT)、动作识别
- 3D视觉:深度估计、点云处理、NeRF
- 生产部署:ONNX、TensorRT、OpenVINO、CoreML
Tech Stack
技术栈
| Category | Technologies |
|---|---|
| Frameworks | PyTorch, torchvision, timm |
| Detection | Ultralytics (YOLO), Detectron2, MMDetection |
| Segmentation | segment-anything, mmsegmentation |
| Optimization | ONNX, TensorRT, OpenVINO, torch.compile |
| Image Processing | OpenCV, Pillow, albumentations |
| Annotation | CVAT, Label Studio, Roboflow |
| Experiment Tracking | MLflow, Weights & Biases |
| Serving | Triton Inference Server, TorchServe |
| 类别 | 技术 |
|---|---|
| 框架 | PyTorch、torchvision、timm |
| 检测工具 | Ultralytics(YOLO)、Detectron2、MMDetection |
| 分割工具 | segment-anything、mmsegmentation |
| 优化工具 | ONNX、TensorRT、OpenVINO、torch.compile |
| 图像处理 | OpenCV、Pillow、albumentations |
| 标注工具 | CVAT、Label Studio、Roboflow |
| 实验跟踪 | MLflow、Weights & Biases |
| 模型服务 | Triton Inference Server、TorchServe |
Workflow 1: Object Detection Pipeline
工作流1:目标检测流水线
Use this workflow when building an object detection system from scratch.
当从零开始构建目标检测系统时使用此工作流。
Step 1: Define Detection Requirements
步骤1:定义检测需求
Analyze the detection task requirements:
Detection Requirements Analysis:
- Target objects: [list specific classes to detect]
- Real-time requirement: [yes/no, target FPS]
- Accuracy priority: [speed vs accuracy trade-off]
- Deployment target: [cloud GPU, edge device, mobile]
- Dataset size: [number of images, annotations per class]分析检测任务需求:
检测需求分析:
- 目标对象:[列出要检测的具体类别]
- 实时性要求:[是/否,目标FPS]
- 精度优先级:[速度与精度的权衡]
- 部署目标:[云端GPU、边缘设备、移动端]
- 数据集规模:[图像数量、每类标注数量]Step 2: Select Detection Architecture
步骤2:选择检测架构
Choose architecture based on requirements:
| Requirement | Recommended Architecture | Why |
|---|---|---|
| Real-time (>30 FPS) | YOLOv8/v11, RT-DETR | Single-stage, optimized for speed |
| High accuracy | Faster R-CNN, DINO | Two-stage, better localization |
| Small objects | YOLO + SAHI, Faster R-CNN + FPN | Multi-scale detection |
| Edge deployment | YOLOv8n, MobileNetV3-SSD | Lightweight architectures |
| Transformer-based | DETR, DINO, RT-DETR | End-to-end, no NMS required |
根据需求选择架构:
| 需求 | 推荐架构 | 原因 |
|---|---|---|
| 实时性(>30 FPS) | YOLOv8/v11、RT-DETR | 单阶段模型,针对速度优化 |
| 高精度 | Faster R-CNN、DINO | 两阶段模型,定位效果更好 |
| 小目标检测 | YOLO + SAHI、Faster R-CNN + FPN | 多尺度检测 |
| 边缘部署 | YOLOv8n、MobileNetV3-SSD | 轻量级架构 |
| 基于Transformer的模型 | DETR、DINO、RT-DETR | 端到端,无需NMS |
Step 3: Prepare Dataset
步骤3:准备数据集
Convert annotations to required format:
bash
undefined将标注转换为所需格式:
bash
undefinedCOCO format (recommended)
COCO格式(推荐)
python scripts/dataset_pipeline_builder.py data/images/
--annotations data/labels/
--format coco
--split 0.8 0.1 0.1
--output data/coco/
--annotations data/labels/
--format coco
--split 0.8 0.1 0.1
--output data/coco/
python scripts/dataset_pipeline_builder.py data/images/
--annotations data/labels/
--format coco
--split 0.8 0.1 0.1
--output data/coco/
--annotations data/labels/
--format coco
--split 0.8 0.1 0.1
--output data/coco/
Verify dataset
验证数据集
python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"
undefinedpython -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"
undefinedStep 4: Configure Training
步骤4:配置训练
Generate training configuration:
bash
undefined生成训练配置:
bash
undefinedFor Ultralytics YOLO
针对Ultralytics YOLO
python scripts/vision_model_trainer.py data/coco/
--task detection
--arch yolov8m
--epochs 100
--batch 16
--imgsz 640
--output configs/
--task detection
--arch yolov8m
--epochs 100
--batch 16
--imgsz 640
--output configs/
python scripts/vision_model_trainer.py data/coco/
--task detection
--arch yolov8m
--epochs 100
--batch 16
--imgsz 640
--output configs/
--task detection
--arch yolov8m
--epochs 100
--batch 16
--imgsz 640
--output configs/
For Detectron2
针对Detectron2
python scripts/vision_model_trainer.py data/coco/
--task detection
--arch faster_rcnn_R_50_FPN
--framework detectron2
--output configs/
--task detection
--arch faster_rcnn_R_50_FPN
--framework detectron2
--output configs/
undefinedpython scripts/vision_model_trainer.py data/coco/
--task detection
--arch faster_rcnn_R_50_FPN
--framework detectron2
--output configs/
--task detection
--arch faster_rcnn_R_50_FPN
--framework detectron2
--output configs/
undefinedStep 5: Train and Validate
步骤5:训练与验证
bash
undefinedbash
undefinedUltralytics training
Ultralytics训练
yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640
yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640
Detectron2 training
Detectron2训练
python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1
python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1
Validate on test set
在测试集上验证
yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml
undefinedyolo detect val model=runs/detect/train/weights/best.pt data=data.yaml
undefinedStep 6: Evaluate Results
步骤6:评估结果
Key metrics to analyze:
| Metric | Target | Description |
|---|---|---|
| mAP@50 | >0.7 | Mean Average Precision at IoU 0.5 |
| mAP@50:95 | >0.5 | COCO primary metric |
| Precision | >0.8 | Low false positives |
| Recall | >0.8 | Low missed detections |
| Inference time | <33ms | For 30 FPS real-time |
需要分析的关键指标:
| 指标 | 目标值 | 描述 |
|---|---|---|
| mAP@50 | >0.7 | IoU为0.5时的平均精度均值 |
| mAP@50:95 | >0.5 | COCO主要指标 |
| 精确率 | >0.8 | 低误报率 |
| 召回率 | >0.8 | 低漏检率 |
| 推理时间 | <33ms | 满足30 FPS实时性要求 |
Workflow 2: Model Optimization and Deployment
工作流2:模型优化与部署
Use this workflow when preparing a trained model for production deployment.
当准备将训练好的模型部署到生产环境时使用此工作流。
Step 1: Benchmark Baseline Performance
步骤1:基准性能测试
bash
undefinedbash
undefinedMeasure current model performance
测量当前模型性能
python scripts/inference_optimizer.py model.pt
--benchmark
--input-size 640 640
--batch-sizes 1 4 8 16
--warmup 10
--iterations 100
--benchmark
--input-size 640 640
--batch-sizes 1 4 8 16
--warmup 10
--iterations 100
Expected output:
Baseline Performance (PyTorch FP32):
- Batch 1: 45.2ms (22.1 FPS)
- Batch 4: 89.4ms (44.7 FPS)
- Batch 8: 165.3ms (48.4 FPS)
- Memory: 2.1 GB
- Parameters: 25.9M
undefinedpython scripts/inference_optimizer.py model.pt
--benchmark
--input-size 640 640
--batch-sizes 1 4 8 16
--warmup 10
--iterations 100
--benchmark
--input-size 640 640
--batch-sizes 1 4 8 16
--warmup 10
--iterations 100
预期输出:
基准性能(PyTorch FP32):
- 批量1:45.2ms(22.1 FPS)
- 批量4:89.4ms(44.7 FPS)
- 批量8:165.3ms(48.4 FPS)
- 内存占用:2.1 GB
- 参数数量:25.9M
undefinedStep 2: Select Optimization Strategy
步骤2:选择优化策略
| Deployment Target | Optimization Path |
|---|---|
| NVIDIA GPU (cloud) | PyTorch → ONNX → TensorRT FP16 |
| NVIDIA GPU (edge) | PyTorch → TensorRT INT8 |
| Intel CPU | PyTorch → ONNX → OpenVINO |
| Apple Silicon | PyTorch → CoreML |
| Generic CPU | PyTorch → ONNX Runtime |
| Mobile | PyTorch → TFLite or ONNX Mobile |
| 部署目标 | 优化路径 |
|---|---|
| NVIDIA GPU(云端) | PyTorch → ONNX → TensorRT FP16 |
| NVIDIA GPU(边缘) | PyTorch → TensorRT INT8 |
| Intel CPU | PyTorch → ONNX → OpenVINO |
| Apple Silicon | PyTorch → CoreML |
| 通用CPU | PyTorch → ONNX Runtime |
| 移动端 | PyTorch → TFLite或ONNX Mobile |
Step 3: Export to ONNX
步骤3:导出为ONNX格式
bash
undefinedbash
undefinedExport with dynamic batch size
导出支持动态批量大小的模型
python scripts/inference_optimizer.py model.pt
--export onnx
--input-size 640 640
--dynamic-batch
--simplify
--output model.onnx
--export onnx
--input-size 640 640
--dynamic-batch
--simplify
--output model.onnx
python scripts/inference_optimizer.py model.pt
--export onnx
--input-size 640 640
--dynamic-batch
--simplify
--output model.onnx
--export onnx
--input-size 640 640
--dynamic-batch
--simplify
--output model.onnx
Verify ONNX model
验证ONNX模型
python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"
undefinedpython -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"
undefinedStep 4: Apply Quantization (Optional)
步骤4:应用量化(可选)
For INT8 quantization with calibration:
bash
undefined带校准的INT8量化:
bash
undefinedGenerate calibration dataset
生成校准数据集
python scripts/inference_optimizer.py model.onnx
--quantize int8
--calibration-data data/calibration/
--calibration-samples 500
--output model_int8.onnx
--quantize int8
--calibration-data data/calibration/
--calibration-samples 500
--output model_int8.onnx
Quantization impact analysis:
| Precision | Size | Speed | Accuracy Drop |
|-----------|------|-------|---------------|
| FP32 | 100% | 1x | 0% |
| FP16 | 50% | 1.5-2x | <0.5% |
| INT8 | 25% | 2-4x | 1-3% |python scripts/inference_optimizer.py model.onnx
--quantize int8
--calibration-data data/calibration/
--calibration-samples 500
--output model_int8.onnx
--quantize int8
--calibration-data data/calibration/
--calibration-samples 500
--output model_int8.onnx
量化影响分析:
| 精度 | 模型大小 | 速度 | 精度下降 |
|-----------|------|-------|---------------|
| FP32 | 100% | 1x | 0% |
| FP16 | 50% | 1.5-2x | <0.5% |
| INT8 | 25% | 2-4x | 1-3% |Step 5: Convert to Target Runtime
步骤5:转换为目标运行时格式
bash
undefinedbash
undefinedTensorRT (NVIDIA GPU)
TensorRT(NVIDIA GPU)
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16
OpenVINO (Intel)
OpenVINO(Intel)
mo --input_model model.onnx --output_dir openvino/
mo --input_model model.onnx --output_dir openvino/
CoreML (Apple)
CoreML(Apple)
python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"
undefinedpython -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"
undefinedStep 6: Benchmark Optimized Model
步骤6:基准测试优化后的模型
bash
python scripts/inference_optimizer.py model.engine \
--benchmark \
--runtime tensorrt \
--compare model.ptExpected speedup:
Optimization Results:
- Original (PyTorch FP32): 45.2ms
- Optimized (TensorRT FP16): 12.8ms
- Speedup: 3.5x
- Accuracy change: -0.3% mAPbash
python scripts/inference_optimizer.py model.engine \
--benchmark \
--runtime tensorrt \
--compare model.pt预期加速效果:
优化结果:
- 原始模型(PyTorch FP32):45.2ms
- 优化后模型(TensorRT FP16):12.8ms
- 加速比:3.5x
- 精度变化:-0.3% mAPWorkflow 3: Custom Dataset Preparation
工作流3:自定义数据集准备
Use this workflow when preparing a computer vision dataset for training.
当准备用于训练的计算机视觉数据集时使用此工作流。
Step 1: Audit Raw Data
步骤1:原始数据审核
bash
undefinedbash
undefinedAnalyze image dataset
分析图像数据集
python scripts/dataset_pipeline_builder.py data/raw/
--analyze
--output analysis/
--analyze
--output analysis/
Analysis report includes:
Dataset Analysis:
- Total images: 5,234
- Image sizes: 640x480 to 4096x3072 (variable)
- Formats: JPEG (4,891), PNG (343)
- Corrupted: 12 files
- Duplicates: 45 pairs
Annotation Analysis:
- Format detected: Pascal VOC XML
- Total annotations: 28,456
- Classes: 5 (car, person, bicycle, dog, cat)
- Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
- Empty images: 234
undefinedpython scripts/dataset_pipeline_builder.py data/raw/
--analyze
--output analysis/
--analyze
--output analysis/
分析报告包含:
数据集分析:
- 总图像数:5,234
- 图像尺寸:640x480至4096x3072(可变)
- 格式:JPEG(4,891)、PNG(343)
- 损坏文件:12个
- 重复文件:45对
标注分析:
- 检测到的格式:Pascal VOC XML
- 总标注数:28,456
- 类别:5个(汽车、行人、自行车、狗、猫)
- 分布:汽车(12,340)、行人(8,234)、自行车(3,456)、狗(2,890)、猫(1,536)
- 无标注图像:234张
undefinedStep 2: Clean and Validate
步骤2:数据清理与验证
bash
undefinedbash
undefinedRemove corrupted and duplicate images
移除损坏和重复的图像
python scripts/dataset_pipeline_builder.py data/raw/
--clean
--remove-corrupted
--remove-duplicates
--output data/cleaned/
--clean
--remove-corrupted
--remove-duplicates
--output data/cleaned/
undefinedpython scripts/dataset_pipeline_builder.py data/raw/
--clean
--remove-corrupted
--remove-duplicates
--output data/cleaned/
--clean
--remove-corrupted
--remove-duplicates
--output data/cleaned/
undefinedStep 3: Convert Annotation Format
步骤3:转换标注格式
bash
undefinedbash
undefinedConvert VOC to COCO format
将VOC格式转换为COCO格式
python scripts/dataset_pipeline_builder.py data/cleaned/
--annotations data/annotations/
--input-format voc
--output-format coco
--output data/coco/
--annotations data/annotations/
--input-format voc
--output-format coco
--output data/coco/
Supported format conversions:
| From | To |
|------|-----|
| Pascal VOC XML | COCO JSON |
| YOLO TXT | COCO JSON |
| COCO JSON | YOLO TXT |
| LabelMe JSON | COCO JSON |
| CVAT XML | COCO JSON |python scripts/dataset_pipeline_builder.py data/cleaned/
--annotations data/annotations/
--input-format voc
--output-format coco
--output data/coco/
--annotations data/annotations/
--input-format voc
--output-format coco
--output data/coco/
支持的格式转换:
| 源格式 | 目标格式 |
|------|-----|
| Pascal VOC XML | COCO JSON |
| YOLO TXT | COCO JSON |
| COCO JSON | YOLO TXT |
| LabelMe JSON | COCO JSON |
| CVAT XML | COCO JSON |Step 4: Apply Augmentations
步骤4:应用数据增强
bash
undefinedbash
undefinedGenerate augmentation config
生成数据增强配置
python scripts/dataset_pipeline_builder.py data/coco/
--augment
--aug-config configs/augmentation.yaml
--output data/augmented/
--augment
--aug-config configs/augmentation.yaml
--output data/augmented/
Recommended augmentations for detection:
```yamlpython scripts/dataset_pipeline_builder.py data/coco/
--augment
--aug-config configs/augmentation.yaml
--output data/augmented/
--augment
--aug-config configs/augmentation.yaml
--output data/augmented/
推荐的检测任务数据增强方式:
```yamlconfigs/augmentation.yaml
configs/augmentation.yaml
augmentations:
geometric:
- horizontal_flip: { p: 0.5 }
- vertical_flip: { p: 0.1 } # Only if orientation invariant
- rotate: { limit: 15, p: 0.3 }
- scale: { scale_limit: 0.2, p: 0.5 }
color:
- brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 }
- hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 }
- blur: { blur_limit: 3, p: 0.1 }
advanced:
- mosaic: { p: 0.5 } # YOLO-style mosaic
- mixup: { p: 0.1 } # Image mixing
- cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }
undefinedaugmentations:
geometric:
- horizontal_flip: { p: 0.5 }
- vertical_flip: { p: 0.1 } # 仅当目标不依赖方向时使用
- rotate: { limit: 15, p: 0.3 }
- scale: { scale_limit: 0.2, p: 0.5 }
color:
- brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 }
- hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 }
- blur: { blur_limit: 3, p: 0.1 }
advanced:
- mosaic: { p: 0.5 } # YOLO风格马赛克
- mixup: { p: 0.1 } # 图像混合
- cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }
undefinedStep 5: Create Train/Val/Test Splits
步骤5:创建训练/验证/测试集划分
bash
python scripts/dataset_pipeline_builder.py data/augmented/ \
--split 0.8 0.1 0.1 \
--stratify \
--seed 42 \
--output data/final/Split strategy guidelines:
| Dataset Size | Train | Val | Test |
|---|---|---|---|
| <1,000 images | 70% | 15% | 15% |
| 1,000-10,000 | 80% | 10% | 10% |
| >10,000 | 90% | 5% | 5% |
bash
python scripts/dataset_pipeline_builder.py data/augmented/ \
--split 0.8 0.1 0.1 \
--stratify \
--seed 42 \
--output data/final/划分策略指南:
| 数据集规模 | 训练集 | 验证集 | 测试集 |
|---|---|---|---|
| <1,000张图像 | 70% | 15% | 15% |
| 1,000-10,000张图像 | 80% | 10% | 10% |
| >10,000张图像 | 90% | 5% | 5% |
Step 6: Generate Dataset Configuration
步骤6:生成数据集配置文件
bash
undefinedbash
undefinedFor Ultralytics YOLO
针对Ultralytics YOLO
python scripts/dataset_pipeline_builder.py data/final/
--generate-config yolo
--output data.yaml
--generate-config yolo
--output data.yaml
python scripts/dataset_pipeline_builder.py data/final/
--generate-config yolo
--output data.yaml
--generate-config yolo
--output data.yaml
For Detectron2
针对Detectron2
python scripts/dataset_pipeline_builder.py data/final/
--generate-config detectron2
--output detectron2_config.py
--generate-config detectron2
--output detectron2_config.py
undefinedpython scripts/dataset_pipeline_builder.py data/final/
--generate-config detectron2
--output detectron2_config.py
--generate-config detectron2
--output detectron2_config.py
undefinedArchitecture Selection Guide
架构选择指南
Object Detection Architectures
目标检测架构
| Architecture | Speed | Accuracy | Best For |
|---|---|---|---|
| YOLOv8n | 1.2ms | 37.3 mAP | Edge, mobile, real-time |
| YOLOv8s | 2.1ms | 44.9 mAP | Balanced speed/accuracy |
| YOLOv8m | 4.2ms | 50.2 mAP | General purpose |
| YOLOv8l | 6.8ms | 52.9 mAP | High accuracy |
| YOLOv8x | 10.1ms | 53.9 mAP | Maximum accuracy |
| RT-DETR-L | 5.3ms | 53.0 mAP | Transformer, no NMS |
| Faster R-CNN R50 | 46ms | 40.2 mAP | Two-stage, high quality |
| DINO-4scale | 85ms | 49.0 mAP | SOTA transformer |
| 架构 | 速度 | 精度 | 最佳适用场景 |
|---|---|---|---|
| YOLOv8n | 1.2ms | 37.3 mAP | 边缘设备、移动端、实时场景 |
| YOLOv8s | 2.1ms | 44.9 mAP | 速度与精度平衡 |
| YOLOv8m | 4.2ms | 50.2 mAP | 通用场景 |
| YOLOv8l | 6.8ms | 52.9 mAP | 高精度需求 |
| YOLOv8x | 10.1ms | 53.9 mAP | 最高精度需求 |
| RT-DETR-L | 5.3ms | 53.0 mAP | Transformer模型,无需NMS |
| Faster R-CNN R50 | 46ms | 40.2 mAP | 两阶段模型,高质量检测 |
| DINO-4scale | 85ms | 49.0 mAP | 最先进的Transformer模型 |
Segmentation Architectures
分割架构
| Architecture | Type | Speed | Best For |
|---|---|---|---|
| YOLOv8-seg | Instance | 4.5ms | Real-time instance seg |
| Mask R-CNN | Instance | 67ms | High-quality masks |
| SAM | Promptable | 50ms | Zero-shot segmentation |
| DeepLabV3+ | Semantic | 25ms | Scene parsing |
| SegFormer | Semantic | 15ms | Efficient semantic seg |
| 架构 | 类型 | 速度 | 最佳适用场景 |
|---|---|---|---|
| YOLOv8-seg | 实例分割 | 4.5ms | 实时实例分割 |
| Mask R-CNN | 实例分割 | 67ms | 高质量掩码 |
| SAM | 可提示分割 | 50ms | 零样本分割 |
| DeepLabV3+ | 语义分割 | 25ms | 场景解析 |
| SegFormer | 语义分割 | 15ms | 高效语义分割 |
CNN vs Vision Transformer Trade-offs
CNN与Vision Transformer的权衡
| Aspect | CNN (YOLO, R-CNN) | ViT (DETR, DINO) |
|---|---|---|
| Training data needed | 1K-10K images | 10K-100K+ images |
| Training time | Fast | Slow (needs more epochs) |
| Inference speed | Faster | Slower |
| Small objects | Good with FPN | Needs multi-scale |
| Global context | Limited | Excellent |
| Positional encoding | Implicit | Explicit |
| 方面 | CNN(YOLO、R-CNN) | ViT(DETR、DINO) |
|---|---|---|
| 所需训练数据 | 1K-10K张图像 | 10K-100K+张图像 |
| 训练时间 | 快 | 慢(需要更多轮次) |
| 推理速度 | 更快 | 更慢 |
| 小目标检测 | 结合FPN效果好 | 需要多尺度处理 |
| 全局上下文 | 有限 | 优秀 |
| 位置编码 | 隐式 | 显式 |
Reference Documentation
参考文档
1. Computer Vision Architectures
1. 计算机视觉架构
See for:
references/computer_vision_architectures.md- CNN backbone architectures (ResNet, EfficientNet, ConvNeXt)
- Vision Transformer variants (ViT, DeiT, Swin)
- Detection heads (anchor-based vs anchor-free)
- Feature Pyramid Networks (FPN, BiFPN, PANet)
- Neck architectures for multi-scale detection
查看获取:
references/computer_vision_architectures.md- CNN骨干架构(ResNet、EfficientNet、ConvNeXt)
- Vision Transformer变体(ViT、DeiT、Swin)
- 检测头(基于锚点 vs 无锚点)
- 特征金字塔网络(FPN、BiFPN、PANet)
- 用于多尺度检测的Neck架构
2. Object Detection Optimization
2. 目标检测优化
See for:
references/object_detection_optimization.md- Non-Maximum Suppression variants (NMS, Soft-NMS, DIoU-NMS)
- Anchor optimization and anchor-free alternatives
- Loss function design (focal loss, GIoU, CIoU, DIoU)
- Training strategies (warmup, cosine annealing, EMA)
- Data augmentation for detection (mosaic, mixup, copy-paste)
查看获取:
references/object_detection_optimization.md- 非极大值抑制变体(NMS、Soft-NMS、DIoU-NMS)
- 锚点优化与无锚点替代方案
- 损失函数设计(Focal Loss、GIoU、CIoU、DIoU)
- 训练策略(热身、余弦退火、EMA)
- 检测任务的数据增强(马赛克、MixUp、Copy-Paste)
3. Production Vision Systems
3. 生产级视觉系统
See for:
references/production_vision_systems.md- ONNX export and optimization
- TensorRT deployment pipeline
- Batch inference optimization
- Edge device deployment (Jetson, Intel NCS)
- Model serving with Triton
- Video processing pipelines
查看获取:
references/production_vision_systems.md- ONNX导出与优化
- TensorRT部署流水线
- 批量推理优化
- 边缘设备部署(Jetson、Intel NCS)
- 基于Triton的模型服务
- 视频处理流水线
Common Commands
常用命令
Ultralytics YOLO
Ultralytics YOLO
bash
undefinedbash
undefinedTraining
训练
yolo detect train data=coco.yaml model=yolov8m.pt epochs=100 imgsz=640
yolo detect train data=coco.yaml model=yolov8m.pt epochs=100 imgsz=640
Validation
验证
yolo detect val model=best.pt data=coco.yaml
yolo detect val model=best.pt data=coco.yaml
Inference
推理
yolo detect predict model=best.pt source=images/ save=True
yolo detect predict model=best.pt source=images/ save=True
Export
导出
yolo export model=best.pt format=onnx simplify=True dynamic=True
undefinedyolo export model=best.pt format=onnx simplify=True dynamic=True
undefinedDetectron2
Detectron2
bash
undefinedbash
undefinedTraining
训练
python train_net.py --config-file configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml
--num-gpus 1 OUTPUT_DIR ./output
--num-gpus 1 OUTPUT_DIR ./output
python train_net.py --config-file configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml
--num-gpus 1 OUTPUT_DIR ./output
--num-gpus 1 OUTPUT_DIR ./output
Evaluation
评估
python train_net.py --config-file configs/faster_rcnn.yaml --eval-only
MODEL.WEIGHTS output/model_final.pth
MODEL.WEIGHTS output/model_final.pth
python train_net.py --config-file configs/faster_rcnn.yaml --eval-only
MODEL.WEIGHTS output/model_final.pth
MODEL.WEIGHTS output/model_final.pth
Inference
推理
python demo.py --config-file configs/faster_rcnn.yaml
--input images/*.jpg --output results/
--opts MODEL.WEIGHTS output/model_final.pth
--input images/*.jpg --output results/
--opts MODEL.WEIGHTS output/model_final.pth
undefinedpython demo.py --config-file configs/faster_rcnn.yaml
--input images/*.jpg --output results/
--opts MODEL.WEIGHTS output/model_final.pth
--input images/*.jpg --output results/
--opts MODEL.WEIGHTS output/model_final.pth
undefinedMMDetection
MMDetection
bash
undefinedbash
undefinedTraining
训练
python tools/train.py configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py
python tools/train.py configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py
Testing
测试
python tools/test.py configs/faster_rcnn.py checkpoints/latest.pth --eval bbox
python tools/test.py configs/faster_rcnn.py checkpoints/latest.pth --eval bbox
Inference
推理
python demo/image_demo.py demo.jpg configs/faster_rcnn.py checkpoints/latest.pth
undefinedpython demo/image_demo.py demo.jpg configs/faster_rcnn.py checkpoints/latest.pth
undefinedModel Optimization
模型优化
bash
undefinedbash
undefinedONNX export and simplify
ONNX导出与简化
python -c "import torch; model = torch.load('model.pt'); torch.onnx.export(model, torch.randn(1,3,640,640), 'model.onnx', opset_version=17)"
python -m onnxsim model.onnx model_sim.onnx
python -c "import torch; model = torch.load('model.pt'); torch.onnx.export(model, torch.randn(1,3,640,640), 'model.onnx', opset_version=17)"
python -m onnxsim model.onnx model_sim.onnx
TensorRT conversion
TensorRT转换
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 --workspace=4096
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 --workspace=4096
Benchmark
基准测试
trtexec --loadEngine=model.engine --batch=1 --iterations=1000 --avgRuns=100
undefinedtrtexec --loadEngine=model.engine --batch=1 --iterations=1000 --avgRuns=100
undefinedPerformance Targets
性能目标
| Metric | Real-time | High Accuracy | Edge |
|---|---|---|---|
| FPS | >30 | >10 | >15 |
| mAP@50 | >0.6 | >0.8 | >0.5 |
| Latency P99 | <50ms | <150ms | <100ms |
| GPU Memory | <4GB | <8GB | <2GB |
| Model Size | <50MB | <200MB | <20MB |
| 指标 | 实时场景 | 高精度场景 | 边缘场景 |
|---|---|---|---|
| FPS | >30 | >10 | >15 |
| mAP@50 | >0.6 | >0.8 | >0.5 |
| P99延迟 | <50ms | <150ms | <100ms |
| GPU内存 | <4GB | <8GB | <2GB |
| 模型大小 | <50MB | <200MB | <20MB |
Resources
资源
- Architecture Guide:
references/computer_vision_architectures.md - Optimization Guide:
references/object_detection_optimization.md - Deployment Guide:
references/production_vision_systems.md - Scripts: directory for automation tools
scripts/
- 架构指南:
references/computer_vision_architectures.md - 优化指南:
references/object_detection_optimization.md - 部署指南:
references/production_vision_systems.md - 脚本:目录下的自动化工具
scripts/