drone-cv-expert

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Drone CV Expert

Drone CV专家

Expert in robotics, drone systems, and computer vision for autonomous aerial platforms.

专注于自主空中平台的机器人技术、无人机系统与计算机视觉领域专家。

Decision Tree: When to Use This Skill

决策树：何时使用本Skill

User mentions drones or UAVs?
├─ YES → Is it about inspection/detection of specific things (fire, roof damage, thermal)?
│        ├─ YES → Use drone-inspection-specialist
│        └─ NO → Is it about flight control, navigation, or general CV?
│                ├─ YES → Use THIS SKILL (drone-cv-expert)
│                └─ NO → Is it about GPU rendering/shaders?
│                        ├─ YES → Use metal-shader-expert
│                        └─ NO → Use THIS SKILL as default drone skill
└─ NO → Is it general object detection without drone context?
        ├─ YES → Use clip-aware-embeddings or other CV skill
        └─ NO → Probably not a drone question

用户提及无人机或UAV？
├─ 是 → 是否涉及特定事物的巡检/检测（火灾、屋顶损坏、热成像）？
│        ├─ 是 → 使用drone-inspection-specialist
│        └─ 否 → 是否涉及飞行控制、导航或通用CV？
│                ├─ 是 → 使用本SKILL (drone-cv-expert)
│                └─ 否 → 是否涉及GPU渲染/着色器？
│                        ├─ 是 → 使用metal-shader-expert
│                        └─ 否 → 将本SKILL作为默认无人机Skill使用
└─ 否 → 是否为无无人机场景的通用目标检测？
        ├─ 是 → 使用clip-aware-embeddings或其他CV Skill
        └─ 否 → 大概率与无人机无关

Core Competencies

核心能力

Flight Control & Navigation

飞行控制与导航

PID Tuning: Position, velocity, attitude control loops
SLAM: ORB-SLAM, LSD-SLAM, visual-inertial odometry (VIO)
Path Planning: A*, RRT, RRT*, Dijkstra, potential fields
Sensor Fusion: EKF, UKF, complementary filters
GPS-Denied Navigation: AprilTags, visual odometry, LiDAR SLAM

PID调参: 位置、速度、姿态控制回路
SLAM: ORB-SLAM、LSD-SLAM、视觉惯性里程计（VIO）
路径规划: A*、RRT、RRT*、Dijkstra、势场法
传感器融合: EKF、UKF、互补滤波器
无GPS导航: AprilTags、视觉里程计、LiDAR SLAM

Computer Vision

计算机视觉

Object Detection: YOLO (v5/v8/v10), EfficientDet, SSD
Tracking: ByteTrack, DeepSORT, SORT, optical flow
Edge Deployment: TensorRT, ONNX, OpenVINO optimization
3D Vision: Stereo depth, point clouds, structure-from-motion

目标检测: YOLO (v5/v8/v10)、EfficientDet、SSD
目标跟踪: ByteTrack、DeepSORT、SORT、光流
边缘部署: TensorRT、ONNX、OpenVINO优化
3D视觉: 立体深度、点云、运动恢复结构

Hardware Integration

硬件集成

Flight Controllers: Pixhawk, Ardupilot, PX4, DJI
Protocols: MAVLink, DroneKit, MAVSDK
Edge Compute: Jetson (Nano/Xavier/Orin), Coral TPU
Sensors: IMU, GPS, barometer, LiDAR, depth cameras

飞控系统: Pixhawk、Ardupilot、PX4、DJI
通信协议: MAVLink、DroneKit、MAVSDK
边缘计算: Jetson (Nano/Xavier/Orin)、Coral TPU
传感器: IMU、GPS、气压计、LiDAR、深度相机

Anti-Patterns to Avoid

需避免的反模式

1. "Simulation-Only Syndrome"

1. "唯仿真综合征"

Wrong: Testing only in Gazebo/AirSim, then deploying directly to real drone. Right: Simulation → Bench test → Tethered flight → Controlled environment → Field.

错误做法: 仅在Gazebo/AirSim中测试，直接部署至真实无人机。 正确做法: 仿真 → 台架测试 → 系留飞行 → 受控环境飞行 → 野外飞行。

2. "EKF Overkill"

2. "EKF过度使用"

Wrong: Using Extended Kalman Filter when complementary filter suffices. Right: Match filter complexity to requirements:

Complementary filter: Basic stabilization, attitude only
EKF: Multi-sensor fusion, GPS+IMU+baro
UKF: Highly nonlinear systems, aggressive maneuvers

错误做法: 在互补滤波器足够使用的场景下仍使用扩展卡尔曼滤波器（EKF）。 正确做法: 根据需求匹配滤波器复杂度：

互补滤波器: 基础稳定、仅姿态控制
EKF: 多传感器融合、GPS+IMU+气压计
UKF: 高度非线性系统、激进机动场景

3. "Max Resolution Assumption"

3. "最高分辨率假设"

Wrong: Processing 4K frames at 30fps expecting real-time performance. Right: Resolution trade-offs by altitude/speed:

Altitude	Speed	Resolution	FPS	Rationale
<30m	Slow	1920x1080	30	Detail needed
30-100m	Medium	1280x720	30	Balance
>100m	Fast	640x480	60	Speed priority

错误做法: 处理30fps的4K帧并期望实时性能。 正确做法: 根据高度/速度权衡分辨率：

高度	速度	分辨率	FPS	理由
<30m	低速	1920x1080	30	需要细节
30-100m	中速	1280x720	30	平衡性能与细节
>100m	高速	640x480	60	优先保证速度

4. "Single-Thread Processing"

4. "单线程处理"

Wrong: Sequential detect → track → control in one loop. Right: Pipeline parallelism:

Thread 1: Camera capture (async)
Thread 2: Object detection (GPU)
Thread 3: Tracking + state estimation
Thread 4: Control commands

错误做法: 在单个循环中按顺序执行检测 → 跟踪 → 控制。 正确做法: 流水线并行处理：

线程1: 相机捕获（异步）
线程2: 目标检测（GPU加速）
线程3: 跟踪 + 状态估计
线程4: 控制指令输出

5. "GPS Trust"

5. "过度信任GPS"

Wrong: Assuming GPS is always accurate and available. Right: Multi-source position estimation:

GPS: 2-5m accuracy outdoor, unavailable indoor
Visual odometry: 0.1-1% drift, lighting dependent
AprilTags: cm-level accuracy where deployed
IMU: Short-term only, drift accumulates

错误做法: 假设GPS始终准确可用。 正确做法: 多源位置估计：

GPS: 户外精度2-5m，室内不可用
视觉里程计: 漂移0.1-1%，受光照影响
AprilTags: 部署场景下厘米级精度
IMU: 仅适用于短期，漂移会累积

6. "One Model Fits All"

6. "单一模型适配所有场景"

Wrong: Using same YOLO model for all scenarios. Right: Model selection by constraint:

Constraint	Model	Notes
Latency critical	YOLOv8n	6ms inference
Balanced	YOLOv8s	15ms, better accuracy
Accuracy first	YOLOv8x	50ms, highest mAP
Edge device	YOLOv8n + TensorRT	3ms on Jetson

错误做法: 在所有场景中使用相同的YOLO模型。 正确做法: 根据约束选择模型：

约束条件	模型	说明
延迟敏感	YOLOv8n	6ms推理时间
性能平衡	YOLOv8s	15ms、精度更优
精度优先	YOLOv8x	50ms、最高mAP
边缘设备	YOLOv8n + TensorRT	Jetson平台上3ms推理

Problem-Solving Framework

问题解决框架

1. Constraint Analysis

1. 约束分析

Compute: What hardware? (Jetson Nano = ~5 TOPS, Xavier = 32 TOPS)
Power: Battery capacity? Flight time impact?
Latency: Control loop rate? Detection response time?
Weight: Payload capacity? Center of gravity?
Environment: Indoor/outdoor? GPS available? Lighting conditions?

计算能力: 使用何种硬件？（Jetson Nano = ~5 TOPS，Xavier = 32 TOPS）
功耗: 电池容量？对飞行时间的影响？
延迟: 控制回路频率？检测响应时间？
重量: 有效载荷能力？重心位置？
环境: 室内/室外？GPS是否可用？光照条件？

2. Algorithm Selection Matrix

2. 算法选择矩阵

Problem	Classical Approach	Deep Learning	When to Use Each
Feature tracking	KLT optical flow	FlowNet	Classical: Real-time, limited compute. DL: Robust, more compute
Object detection	HOG+SVM	YOLO/SSD	Classical: Simple objects, no GPU. DL: Complex, GPU available
SLAM	ORB-SLAM	DROID-SLAM	Classical: Mature, debuggable. DL: Better in challenging scenes
Path planning	A*, RRT	RL-based	Classical: Known environments. DL: Complex, dynamic

问题	经典方法	深度学习方法	适用场景
特征跟踪	KLT光流	FlowNet	经典方法：实时性强、计算需求低；深度学习方法：鲁棒性高、计算需求高
目标检测	HOG+SVM	YOLO/SSD	经典方法：简单目标、无GPU；深度学习方法：复杂目标、有GPU可用
SLAM	ORB-SLAM	DROID-SLAM	经典方法：成熟、易调试；深度学习方法：复杂场景下表现更优
路径规划	A*、RRT	强化学习方法	经典方法：已知环境；深度学习方法：复杂动态环境

3. Safety Checklist

3. 安全检查清单

Quick Reference Tables

快速参考表格

MAVLink Message Types

MAVLink消息类型

Message	Purpose	Frequency
HEARTBEAT	Connection alive	1 Hz
ATTITUDE	Roll/pitch/yaw	10-100 Hz
LOCAL_POSITION_NED	Position	10-50 Hz
GPS_RAW_INT	Raw GPS	1-10 Hz
SET_POSITION_TARGET	Commands	As needed

消息	用途	频率
HEARTBEAT	连接保活	1 Hz
ATTITUDE	滚转/俯仰/偏航	10-100 Hz
LOCAL_POSITION_NED	位置信息	10-50 Hz
GPS_RAW_INT	原始GPS数据	1-10 Hz
SET_POSITION_TARGET	控制指令	按需发送

Kalman Filter Tuning

卡尔曼滤波器调参

Matrix	High Values	Low Values
Q (process noise)	Trust measurements more	Trust model more
R (measurement noise)	Trust model more	Trust measurements more
P (initial covariance)	Uncertain initial state	Confident initial state

矩阵	高值含义	低值含义
Q（过程噪声）	更信任测量值	更信任模型预测
R（测量噪声）	更信任模型预测	更信任测量值
P（初始协方差）	初始状态不确定	初始状态确定

Common Coordinate Frames

常用坐标系

Frame	Origin	Axes	Use
NED	Takeoff point	North-East-Down	Navigation
ENU	Takeoff point	East-North-Up	ROS standard
Body	Drone CG	Forward-Right-Down	Control
Camera	Lens center	Right-Down-Forward	Vision

坐标系	原点	轴定义	用途
NED	起飞点	北-东-下	导航
ENU	起飞点	东-北-上	ROS标准
机体坐标系	无人机重心	前-右-下	控制
相机坐标系	镜头中心	右-下-前	视觉处理

Reference Files

参考文件

Detailed implementations in

references/

```
navigation-algorithms.md
```
- SLAM, path planning, localization
```
sensor-fusion-ekf.md
```
- Kalman filters, multi-sensor fusion
```
object-detection-tracking.md
```
- YOLO, ByteTrack, optical flow

详细实现位于

references/

```
navigation-algorithms.md
```
- SLAM、路径规划、定位
```
sensor-fusion-ekf.md
```
- 卡尔曼滤波器、多传感器融合
```
object-detection-tracking.md
```
- YOLO、ByteTrack、光流

Simulation Tools

仿真工具

Tool	Strengths	Weaknesses	Best For
Gazebo	ROS integration, physics	Graphics quality	ROS development
AirSim	Photorealistic, CV-focused	Windows-centric	Vision algorithms
Webots	Multi-robot, accessible	Less drone-specific	Swarm simulations
MATLAB/Simulink	Control design	Not real-time	Controller tuning

工具	优势	劣势	最佳适用场景
Gazebo	ROS集成、物理仿真准确	图形质量一般	ROS开发
AirSim	照片级真实感、聚焦CV	偏Windows平台	视觉算法开发
Webots	多机器人仿真、易上手	无人机针对性较弱	集群仿真
MATLAB/Simulink	控制设计专业	非实时	控制器调参

Emerging Technologies (2024-2025)

新兴技术（2024-2025）

Event cameras: 1μs temporal resolution, no motion blur
Neuromorphic computing: Loihi 2 for ultra-low-power inference
4D Radar: Velocity + 3D position, works in all weather
Swarm autonomy: Decentralized coordination, emergent behavior
Foundation models: SAM, CLIP for zero-shot detection

事件相机: 1μs时间分辨率、无运动模糊
神经形态计算: Loihi 2芯片实现超低功耗推理
4D雷达: 速度+3D位置、全天候工作
集群自主: 去中心化协同、涌现行为
基础模型: SAM、CLIP实现零样本检测

Integration Points

集成对接点

drone-inspection-specialist: Domain-specific detection (fire, damage, thermal)
metal-shader-expert: GPU-accelerated vision processing, custom shaders
collage-layout-expert: Report generation, visual composition

Key Principle: In drone systems, reliability trumps performance. A 95% accurate system that never crashes is better than 99% accurate that fails unpredictably. Always have fallbacks.

drone-inspection-specialist: 特定领域检测（火灾、损坏、热成像）
metal-shader-expert: GPU加速视觉处理、自定义着色器
collage-layout-expert: 报告生成、视觉合成

核心原则: 在无人机系统中，可靠性优于性能。95%准确率且从不崩溃的系统，远胜于99%准确率但会不可预测故障的系统。始终要有 fallback 方案。