senior-computer-vision

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Senior Computer Vision Engineer

资深计算机视觉工程师

Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.

面向目标检测、图像分割和视觉AI系统部署的生产级计算机视觉工程技能。

Quick Start

快速开始

bash

undefined

bash

undefined

Generate training configuration for YOLO or Faster R-CNN

为YOLO或Faster R-CNN生成训练配置

python scripts/vision_model_trainer.py models/ --task detection --arch yolov8

Analyze model for optimization opportunities (quantization, pruning)

分析模型以寻找优化机会（量化、剪枝）

python scripts/inference_optimizer.py model.pt --target onnx --benchmark

Build dataset pipeline with augmentations

构建带数据增强的数据集流水线

python scripts/dataset_pipeline_builder.py images/ --format coco --augment

undefined

python scripts/dataset_pipeline_builder.py images/ --format coco --augment

undefined

Core Expertise

核心能力

This skill provides guidance on:

Object Detection: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR
Instance Segmentation: Mask R-CNN, YOLACT, SOLOv2
Semantic Segmentation: DeepLabV3+, SegFormer, SAM (Segment Anything)
Image Classification: ResNet, EfficientNet, Vision Transformers (ViT, DeiT)
Video Analysis: Object tracking (ByteTrack, SORT), action recognition
3D Vision: Depth estimation, point cloud processing, NeRF
Production Deployment: ONNX, TensorRT, OpenVINO, CoreML

本技能提供以下方面的指导：

目标检测：YOLO系列（v5-v11）、Faster R-CNN、DETR、RT-DETR
实例分割：Mask R-CNN、YOLACT、SOLOv2
语义分割：DeepLabV3+、SegFormer、SAM（Segment Anything）
图像分类：ResNet、EfficientNet、Vision Transformers（ViT、DeiT）
视频分析：目标跟踪（ByteTrack、SORT）、动作识别
3D视觉：深度估计、点云处理、NeRF
生产部署：ONNX、TensorRT、OpenVINO、CoreML

Tech Stack

技术栈

Category	Technologies
Frameworks	PyTorch, torchvision, timm
Detection	Ultralytics (YOLO), Detectron2, MMDetection
Segmentation	segment-anything, mmsegmentation
Optimization	ONNX, TensorRT, OpenVINO, torch.compile
Image Processing	OpenCV, Pillow, albumentations
Annotation	CVAT, Label Studio, Roboflow
Experiment Tracking	MLflow, Weights & Biases
Serving	Triton Inference Server, TorchServe

类别	技术
框架	PyTorch、torchvision、timm
检测工具	Ultralytics（YOLO）、Detectron2、MMDetection
分割工具	segment-anything、mmsegmentation
优化工具	ONNX、TensorRT、OpenVINO、torch.compile
图像处理	OpenCV、Pillow、albumentations
标注工具	CVAT、Label Studio、Roboflow
实验跟踪	MLflow、Weights & Biases
模型服务	Triton Inference Server、TorchServe

Workflow 1: Object Detection Pipeline

工作流1：目标检测流水线

Use this workflow when building an object detection system from scratch.

当从零开始构建目标检测系统时使用此工作流。

Step 1: Define Detection Requirements

步骤1：定义检测需求

Analyze the detection task requirements:

Detection Requirements Analysis:
- Target objects: [list specific classes to detect]
- Real-time requirement: [yes/no, target FPS]
- Accuracy priority: [speed vs accuracy trade-off]
- Deployment target: [cloud GPU, edge device, mobile]
- Dataset size: [number of images, annotations per class]

分析检测任务需求：

检测需求分析：
- 目标对象：[列出要检测的具体类别]
- 实时性要求：[是/否，目标FPS]
- 精度优先级：[速度与精度的权衡]
- 部署目标：[云端GPU、边缘设备、移动端]
- 数据集规模：[图像数量、每类标注数量]

Step 2: Select Detection Architecture

步骤2：选择检测架构

Choose architecture based on requirements:

Requirement	Recommended Architecture	Why
Real-time (>30 FPS)	YOLOv8/v11, RT-DETR	Single-stage, optimized for speed
High accuracy	Faster R-CNN, DINO	Two-stage, better localization
Small objects	YOLO + SAHI, Faster R-CNN + FPN	Multi-scale detection
Edge deployment	YOLOv8n, MobileNetV3-SSD	Lightweight architectures
Transformer-based	DETR, DINO, RT-DETR	End-to-end, no NMS required

根据需求选择架构：

需求	推荐架构	原因
实时性（>30 FPS）	YOLOv8/v11、RT-DETR	单阶段模型，针对速度优化
高精度	Faster R-CNN、DINO	两阶段模型，定位效果更好
小目标检测	YOLO + SAHI、Faster R-CNN + FPN	多尺度检测
边缘部署	YOLOv8n、MobileNetV3-SSD	轻量级架构
基于Transformer的模型	DETR、DINO、RT-DETR	端到端，无需NMS

Step 3: Prepare Dataset

步骤3：准备数据集

Convert annotations to required format:

bash

undefined

将标注转换为所需格式：

bash

undefined

COCO format (recommended)

COCO格式（推荐）

python scripts/dataset_pipeline_builder.py data/images/
--annotations data/labels/
--format coco
--split 0.8 0.1 0.1
--output data/coco/

Verify dataset

验证数据集

python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"

undefined

python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"

undefined

Step 4: Configure Training

步骤4：配置训练

Generate training configuration:

bash

undefined

生成训练配置：

bash

undefined

For Ultralytics YOLO

针对Ultralytics YOLO

python scripts/vision_model_trainer.py data/coco/
--task detection
--arch yolov8m
--epochs 100
--batch 16
--imgsz 640
--output configs/

For Detectron2

针对Detectron2

python scripts/vision_model_trainer.py data/coco/
--task detection
--arch faster_rcnn_R_50_FPN
--framework detectron2
--output configs/

undefined

python scripts/vision_model_trainer.py data/coco/
--task detection
--arch faster_rcnn_R_50_FPN
--framework detectron2
--output configs/

undefined

Step 5: Train and Validate

步骤5：训练与验证

bash

undefined

bash

undefined

Ultralytics training

Ultralytics训练

yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640

Detectron2 training

Detectron2训练

python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1

Validate on test set

在测试集上验证

yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml

undefined

yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml

undefined

Step 6: Evaluate Results

步骤6：评估结果

Key metrics to analyze:

Metric	Target	Description
mAP@50	>0.7	Mean Average Precision at IoU 0.5
mAP@50:95	>0.5	COCO primary metric
Precision	>0.8	Low false positives
Recall	>0.8	Low missed detections
Inference time	<33ms	For 30 FPS real-time

需要分析的关键指标：

指标	目标值	描述
mAP@50	>0.7	IoU为0.5时的平均精度均值
mAP@50:95	>0.5	COCO主要指标
精确率	>0.8	低误报率
召回率	>0.8	低漏检率
推理时间	<33ms	满足30 FPS实时性要求

Workflow 2: Model Optimization and Deployment

工作流2：模型优化与部署

Use this workflow when preparing a trained model for production deployment.

当准备将训练好的模型部署到生产环境时使用此工作流。

Step 1: Benchmark Baseline Performance

步骤1：基准性能测试

bash

undefined

bash

undefined

Measure current model performance

测量当前模型性能

python scripts/inference_optimizer.py model.pt
--benchmark
--input-size 640 640
--batch-sizes 1 4 8 16
--warmup 10
--iterations 100


Expected output:

Baseline Performance (PyTorch FP32):

Batch 1: 45.2ms (22.1 FPS)
Batch 4: 89.4ms (44.7 FPS)
Batch 8: 165.3ms (48.4 FPS)
Memory: 2.1 GB
Parameters: 25.9M

undefined

python scripts/inference_optimizer.py model.pt
--benchmark
--input-size 640 640
--batch-sizes 1 4 8 16
--warmup 10
--iterations 100


预期输出：

基准性能（PyTorch FP32）：

批量1：45.2ms（22.1 FPS）
批量4：89.4ms（44.7 FPS）
批量8：165.3ms（48.4 FPS）
内存占用：2.1 GB
参数数量：25.9M

undefined

Step 2: Select Optimization Strategy

步骤2：选择优化策略

Deployment Target	Optimization Path
NVIDIA GPU (cloud)	PyTorch → ONNX → TensorRT FP16
NVIDIA GPU (edge)	PyTorch → TensorRT INT8
Intel CPU	PyTorch → ONNX → OpenVINO
Apple Silicon	PyTorch → CoreML
Generic CPU	PyTorch → ONNX Runtime
Mobile	PyTorch → TFLite or ONNX Mobile

部署目标	优化路径
NVIDIA GPU（云端）	PyTorch → ONNX → TensorRT FP16
NVIDIA GPU（边缘）	PyTorch → TensorRT INT8
Intel CPU	PyTorch → ONNX → OpenVINO
Apple Silicon	PyTorch → CoreML
通用CPU	PyTorch → ONNX Runtime
移动端	PyTorch → TFLite或ONNX Mobile

Step 3: Export to ONNX

步骤3：导出为ONNX格式

bash

undefined

bash

undefined

Export with dynamic batch size

导出支持动态批量大小的模型

python scripts/inference_optimizer.py model.pt
--export onnx
--input-size 640 640
--dynamic-batch
--simplify
--output model.onnx

Verify ONNX model

验证ONNX模型

python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"

undefined

python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"

undefined

Step 4: Apply Quantization (Optional)

步骤4：应用量化（可选）

For INT8 quantization with calibration:

bash

undefined

带校准的INT8量化：

bash

undefined

Generate calibration dataset

生成校准数据集

python scripts/inference_optimizer.py model.onnx
--quantize int8
--calibration-data data/calibration/
--calibration-samples 500
--output model_int8.onnx


Quantization impact analysis:

| Precision | Size | Speed | Accuracy Drop |
|-----------|------|-------|---------------|
| FP32 | 100% | 1x | 0% |
| FP16 | 50% | 1.5-2x | <0.5% |
| INT8 | 25% | 2-4x | 1-3% |

python scripts/inference_optimizer.py model.onnx
--quantize int8
--calibration-data data/calibration/
--calibration-samples 500
--output model_int8.onnx


量化影响分析：

| 精度 | 模型大小 | 速度 | 精度下降 |
|-----------|------|-------|---------------|
| FP32 | 100% | 1x | 0% |
| FP16 | 50% | 1.5-2x | <0.5% |
| INT8 | 25% | 2-4x | 1-3% |

Step 5: Convert to Target Runtime

步骤5：转换为目标运行时格式

bash

undefined

bash

undefined

TensorRT (NVIDIA GPU)

TensorRT（NVIDIA GPU）

trtexec --onnx=model.onnx --saveEngine=model.engine --fp16

OpenVINO (Intel)

OpenVINO（Intel）

mo --input_model model.onnx --output_dir openvino/

CoreML (Apple)

CoreML（Apple）

python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"

undefined

python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"

undefined

Step 6: Benchmark Optimized Model

步骤6：基准测试优化后的模型

bash

python scripts/inference_optimizer.py model.engine \
    --benchmark \
    --runtime tensorrt \
    --compare model.pt

Expected speedup:

Optimization Results:
- Original (PyTorch FP32): 45.2ms
- Optimized (TensorRT FP16): 12.8ms
- Speedup: 3.5x
- Accuracy change: -0.3% mAP

bash

python scripts/inference_optimizer.py model.engine \
    --benchmark \
    --runtime tensorrt \
    --compare model.pt

预期加速效果：

优化结果：
- 原始模型（PyTorch FP32）：45.2ms
- 优化后模型（TensorRT FP16）：12.8ms
- 加速比：3.5x
- 精度变化：-0.3% mAP

Workflow 3: Custom Dataset Preparation

工作流3：自定义数据集准备

Use this workflow when preparing a computer vision dataset for training.

当准备用于训练的计算机视觉数据集时使用此工作流。

Step 1: Audit Raw Data

步骤1：原始数据审核

bash

undefined

bash

undefined

Analyze image dataset

分析图像数据集

python scripts/dataset_pipeline_builder.py data/raw/
--analyze
--output analysis/


Analysis report includes:

Dataset Analysis:

Total images: 5,234
Image sizes: 640x480 to 4096x3072 (variable)
Formats: JPEG (4,891), PNG (343)
Corrupted: 12 files
Duplicates: 45 pairs

Annotation Analysis:

Format detected: Pascal VOC XML
Total annotations: 28,456
Classes: 5 (car, person, bicycle, dog, cat)
Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
Empty images: 234

undefined

python scripts/dataset_pipeline_builder.py data/raw/
--analyze
--output analysis/


分析报告包含：

数据集分析：

总图像数：5,234
图像尺寸：640x480至4096x3072（可变）
格式：JPEG（4,891）、PNG（343）
损坏文件：12个
重复文件：45对

标注分析：

检测到的格式：Pascal VOC XML
总标注数：28,456
类别：5个（汽车、行人、自行车、狗、猫）
分布：汽车（12,340）、行人（8,234）、自行车（3,456）、狗（2,890）、猫（1,536）
无标注图像：234张

undefined

Step 2: Clean and Validate

步骤2：数据清理与验证

bash

undefined

bash

undefined

Remove corrupted and duplicate images

移除损坏和重复的图像

python scripts/dataset_pipeline_builder.py data/raw/
--clean
--remove-corrupted
--remove-duplicates
--output data/cleaned/

undefined

python scripts/dataset_pipeline_builder.py data/raw/
--clean
--remove-corrupted
--remove-duplicates
--output data/cleaned/

undefined

Step 3: Convert Annotation Format

步骤3：转换标注格式

bash

undefined

bash

undefined

Convert VOC to COCO format

将VOC格式转换为COCO格式

python scripts/dataset_pipeline_builder.py data/cleaned/
--annotations data/annotations/
--input-format voc
--output-format coco
--output data/coco/


Supported format conversions:

| From | To |
|------|-----|
| Pascal VOC XML | COCO JSON |
| YOLO TXT | COCO JSON |
| COCO JSON | YOLO TXT |
| LabelMe JSON | COCO JSON |
| CVAT XML | COCO JSON |

python scripts/dataset_pipeline_builder.py data/cleaned/
--annotations data/annotations/
--input-format voc
--output-format coco
--output data/coco/


支持的格式转换：

| 源格式 | 目标格式 |
|------|-----|
| Pascal VOC XML | COCO JSON |
| YOLO TXT | COCO JSON |
| COCO JSON | YOLO TXT |
| LabelMe JSON | COCO JSON |
| CVAT XML | COCO JSON |

Step 4: Apply Augmentations

步骤4：应用数据增强

bash

undefined

bash

undefined

Generate augmentation config

生成数据增强配置

python scripts/dataset_pipeline_builder.py data/coco/
--augment
--aug-config configs/augmentation.yaml
--output data/augmented/


Recommended augmentations for detection:

```yaml

python scripts/dataset_pipeline_builder.py data/coco/
--augment
--aug-config configs/augmentation.yaml
--output data/augmented/


推荐的检测任务数据增强方式：

```yaml

configs/augmentation.yaml

augmentations: geometric: - horizontal_flip: { p: 0.5 } - vertical_flip: { p: 0.1 } # Only if orientation invariant - rotate: { limit: 15, p: 0.3 } - scale: { scale_limit: 0.2, p: 0.5 }

color: - brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 } - hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 } - blur: { blur_limit: 3, p: 0.1 }

advanced: - mosaic: { p: 0.5 } # YOLO-style mosaic - mixup: { p: 0.1 } # Image mixing - cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }

undefined

augmentations: geometric: - horizontal_flip: { p: 0.5 } - vertical_flip: { p: 0.1 } # 仅当目标不依赖方向时使用 - rotate: { limit: 15, p: 0.3 } - scale: { scale_limit: 0.2, p: 0.5 }

color: - brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 } - hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 } - blur: { blur_limit: 3, p: 0.1 }

advanced: - mosaic: { p: 0.5 } # YOLO风格马赛克 - mixup: { p: 0.1 } # 图像混合 - cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }

undefined

Step 5: Create Train/Val/Test Splits

步骤5：创建训练/验证/测试集划分

bash

python scripts/dataset_pipeline_builder.py data/augmented/ \
    --split 0.8 0.1 0.1 \
    --stratify \
    --seed 42 \
    --output data/final/

Split strategy guidelines:

Dataset Size	Train	Val	Test
<1,000 images	70%	15%	15%
1,000-10,000	80%	10%	10%
>10,000	90%	5%	5%

bash

python scripts/dataset_pipeline_builder.py data/augmented/ \
    --split 0.8 0.1 0.1 \
    --stratify \
    --seed 42 \
    --output data/final/

划分策略指南：

数据集规模	训练集	验证集	测试集
<1,000张图像	70%	15%	15%
1,000-10,000张图像	80%	10%	10%
>10,000张图像	90%	5%	5%

Step 6: Generate Dataset Configuration

步骤6：生成数据集配置文件

bash

undefined

bash

undefined

For Ultralytics YOLO

针对Ultralytics YOLO

python scripts/dataset_pipeline_builder.py data/final/
--generate-config yolo
--output data.yaml

For Detectron2

针对Detectron2

python scripts/dataset_pipeline_builder.py data/final/
--generate-config detectron2
--output detectron2_config.py

undefined

python scripts/dataset_pipeline_builder.py data/final/
--generate-config detectron2
--output detectron2_config.py

undefined

Architecture Selection Guide

架构选择指南

Object Detection Architectures

目标检测架构

Architecture	Speed	Accuracy	Best For
YOLOv8n	1.2ms	37.3 mAP	Edge, mobile, real-time
YOLOv8s	2.1ms	44.9 mAP	Balanced speed/accuracy
YOLOv8m	4.2ms	50.2 mAP	General purpose
YOLOv8l	6.8ms	52.9 mAP	High accuracy
YOLOv8x	10.1ms	53.9 mAP	Maximum accuracy
RT-DETR-L	5.3ms	53.0 mAP	Transformer, no NMS
Faster R-CNN R50	46ms	40.2 mAP	Two-stage, high quality
DINO-4scale	85ms	49.0 mAP	SOTA transformer

架构	速度	精度	最佳适用场景
YOLOv8n	1.2ms	37.3 mAP	边缘设备、移动端、实时场景
YOLOv8s	2.1ms	44.9 mAP	速度与精度平衡
YOLOv8m	4.2ms	50.2 mAP	通用场景
YOLOv8l	6.8ms	52.9 mAP	高精度需求
YOLOv8x	10.1ms	53.9 mAP	最高精度需求
RT-DETR-L	5.3ms	53.0 mAP	Transformer模型，无需NMS
Faster R-CNN R50	46ms	40.2 mAP	两阶段模型，高质量检测
DINO-4scale	85ms	49.0 mAP	最先进的Transformer模型

Segmentation Architectures

分割架构

Architecture	Type	Speed	Best For
YOLOv8-seg	Instance	4.5ms	Real-time instance seg
Mask R-CNN	Instance	67ms	High-quality masks
SAM	Promptable	50ms	Zero-shot segmentation
DeepLabV3+	Semantic	25ms	Scene parsing
SegFormer	Semantic	15ms	Efficient semantic seg

架构	类型	速度	最佳适用场景
YOLOv8-seg	实例分割	4.5ms	实时实例分割
Mask R-CNN	实例分割	67ms	高质量掩码
SAM	可提示分割	50ms	零样本分割
DeepLabV3+	语义分割	25ms	场景解析
SegFormer	语义分割	15ms	高效语义分割

CNN vs Vision Transformer Trade-offs

CNN与Vision Transformer的权衡

Aspect	CNN (YOLO, R-CNN)	ViT (DETR, DINO)
Training data needed	1K-10K images	10K-100K+ images
Training time	Fast	Slow (needs more epochs)
Inference speed	Faster	Slower
Small objects	Good with FPN	Needs multi-scale
Global context	Limited	Excellent
Positional encoding	Implicit	Explicit

方面	CNN（YOLO、R-CNN）	ViT（DETR、DINO）
所需训练数据	1K-10K张图像	10K-100K+张图像
训练时间	快	慢（需要更多轮次）
推理速度	更快	更慢
小目标检测	结合FPN效果好	需要多尺度处理
全局上下文	有限	优秀
位置编码	隐式	显式

Reference Documentation

参考文档

1. Computer Vision Architectures

1. 计算机视觉架构

See

references/computer_vision_architectures.md

for:

CNN backbone architectures (ResNet, EfficientNet, ConvNeXt)
Vision Transformer variants (ViT, DeiT, Swin)
Detection heads (anchor-based vs anchor-free)
Feature Pyramid Networks (FPN, BiFPN, PANet)
Neck architectures for multi-scale detection

查看

references/computer_vision_architectures.md

获取：

CNN骨干架构（ResNet、EfficientNet、ConvNeXt）
Vision Transformer变体（ViT、DeiT、Swin）
检测头（基于锚点 vs 无锚点）
特征金字塔网络（FPN、BiFPN、PANet）
用于多尺度检测的Neck架构

2. Object Detection Optimization

2. 目标检测优化

See

references/object_detection_optimization.md

for:

Non-Maximum Suppression variants (NMS, Soft-NMS, DIoU-NMS)
Anchor optimization and anchor-free alternatives
Loss function design (focal loss, GIoU, CIoU, DIoU)
Training strategies (warmup, cosine annealing, EMA)
Data augmentation for detection (mosaic, mixup, copy-paste)

查看

references/object_detection_optimization.md

获取：

非极大值抑制变体（NMS、Soft-NMS、DIoU-NMS）
锚点优化与无锚点替代方案
损失函数设计（Focal Loss、GIoU、CIoU、DIoU）
训练策略（热身、余弦退火、EMA）
检测任务的数据增强（马赛克、MixUp、Copy-Paste）

3. Production Vision Systems

3. 生产级视觉系统

See

references/production_vision_systems.md

for:

ONNX export and optimization
TensorRT deployment pipeline
Batch inference optimization
Edge device deployment (Jetson, Intel NCS)
Model serving with Triton
Video processing pipelines

查看

references/production_vision_systems.md

获取：

ONNX导出与优化
TensorRT部署流水线
批量推理优化
边缘设备部署（Jetson、Intel NCS）
基于Triton的模型服务
视频处理流水线

Common Commands

常用命令

Ultralytics YOLO

bash

undefined

bash

undefined

Training

训练

yolo detect train data=coco.yaml model=yolov8m.pt epochs=100 imgsz=640

Validation

验证

yolo detect val model=best.pt data=coco.yaml

Inference

推理

yolo detect predict model=best.pt source=images/ save=True

Export

导出

yolo export model=best.pt format=onnx simplify=True dynamic=True

undefined

yolo export model=best.pt format=onnx simplify=True dynamic=True

undefined

Detectron2

bash

undefined

bash

undefined

Training

训练

python train_net.py --config-file configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml
--num-gpus 1 OUTPUT_DIR ./output

Evaluation

评估

python train_net.py --config-file configs/faster_rcnn.yaml --eval-only
MODEL.WEIGHTS output/model_final.pth

Inference

推理

python demo.py --config-file configs/faster_rcnn.yaml
--input images/*.jpg --output results/
--opts MODEL.WEIGHTS output/model_final.pth

undefined

python demo.py --config-file configs/faster_rcnn.yaml
--input images/*.jpg --output results/
--opts MODEL.WEIGHTS output/model_final.pth

undefined

MMDetection

bash

undefined

bash

undefined

Training

训练

python tools/train.py configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py

Testing

测试

python tools/test.py configs/faster_rcnn.py checkpoints/latest.pth --eval bbox

Inference

推理

python demo/image_demo.py demo.jpg configs/faster_rcnn.py checkpoints/latest.pth

undefined

python demo/image_demo.py demo.jpg configs/faster_rcnn.py checkpoints/latest.pth

undefined

Model Optimization

模型优化

bash

undefined

bash

undefined

ONNX export and simplify

ONNX导出与简化

python -c "import torch; model = torch.load('model.pt'); torch.onnx.export(model, torch.randn(1,3,640,640), 'model.onnx', opset_version=17)" python -m onnxsim model.onnx model_sim.onnx

TensorRT conversion

TensorRT转换

trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 --workspace=4096

Benchmark

基准测试

trtexec --loadEngine=model.engine --batch=1 --iterations=1000 --avgRuns=100

undefined

trtexec --loadEngine=model.engine --batch=1 --iterations=1000 --avgRuns=100

undefined

Performance Targets

性能目标

Metric	Real-time	High Accuracy	Edge
FPS	>30	>10	>15
mAP@50	>0.6	>0.8	>0.5
Latency P99	<50ms	<150ms	<100ms
GPU Memory	<4GB	<8GB	<2GB
Model Size	<50MB	<200MB	<20MB

指标	实时场景	高精度场景	边缘场景
FPS	>30	>10	>15
mAP@50	>0.6	>0.8	>0.5
P99延迟	<50ms	<150ms	<100ms
GPU内存	<4GB	<8GB	<2GB
模型大小	<50MB	<200MB	<20MB

Resources

资源

Architecture Guide:

references/computer_vision_architectures.md

Optimization Guide:

references/object_detection_optimization.md

Deployment Guide:
```
references/production_vision_systems.md
```
Scripts:
```
scripts/
```
directory for automation tools

架构指南：

references/computer_vision_architectures.md

优化指南：

references/object_detection_optimization.md

部署指南：
```
references/production_vision_systems.md
```
脚本：
```
scripts/
```
目录下的自动化工具

senior-computer-vision

Original

Translation

Senior Computer Vision Engineer

资深计算机视觉工程师

Table of Contents

目录

Quick Start

快速开始

Generate training configuration for YOLO or Faster R-CNN

为YOLO或Faster R-CNN生成训练配置

Analyze model for optimization opportunities (quantization, pruning)

分析模型以寻找优化机会（量化、剪枝）

Build dataset pipeline with augmentations

构建带数据增强的数据集流水线

Core Expertise

核心能力

Tech Stack

技术栈

Workflow 1: Object Detection Pipeline

工作流1：目标检测流水线

Step 1: Define Detection Requirements

步骤1：定义检测需求

Step 2: Select Detection Architecture

步骤2：选择检测架构

Step 3: Prepare Dataset

步骤3：准备数据集

COCO format (recommended)

COCO格式（推荐）

Verify dataset

验证数据集

Step 4: Configure Training

步骤4：配置训练

For Ultralytics YOLO

针对Ultralytics YOLO

For Detectron2

针对Detectron2

Step 5: Train and Validate

步骤5：训练与验证

Ultralytics training

Ultralytics训练

Detectron2 training

Detectron2训练

Validate on test set

在测试集上验证

Step 6: Evaluate Results

步骤6：评估结果

Workflow 2: Model Optimization and Deployment

工作流2：模型优化与部署

Step 1: Benchmark Baseline Performance

步骤1：基准性能测试

Measure current model performance

测量当前模型性能

Step 2: Select Optimization Strategy

步骤2：选择优化策略

Step 3: Export to ONNX

步骤3：导出为ONNX格式

Export with dynamic batch size

导出支持动态批量大小的模型

Verify ONNX model

验证ONNX模型

Step 4: Apply Quantization (Optional)

步骤4：应用量化（可选）

Generate calibration dataset

生成校准数据集

Step 5: Convert to Target Runtime

步骤5：转换为目标运行时格式

TensorRT (NVIDIA GPU)

TensorRT（NVIDIA GPU）

OpenVINO (Intel)

OpenVINO（Intel）

CoreML (Apple)

CoreML（Apple）

Step 6: Benchmark Optimized Model

步骤6：基准测试优化后的模型

Workflow 3: Custom Dataset Preparation

工作流3：自定义数据集准备

Step 1: Audit Raw Data

步骤1：原始数据审核

Analyze image dataset