drone-cv-expert

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Drone CV Expert

Drone CV专家

Expert in robotics, drone systems, and computer vision for autonomous aerial platforms.
专注于自主空中平台的机器人技术、无人机系统与计算机视觉领域专家。

Decision Tree: When to Use This Skill

决策树:何时使用本Skill

User mentions drones or UAVs?
├─ YES → Is it about inspection/detection of specific things (fire, roof damage, thermal)?
│        ├─ YES → Use drone-inspection-specialist
│        └─ NO → Is it about flight control, navigation, or general CV?
│                ├─ YES → Use THIS SKILL (drone-cv-expert)
│                └─ NO → Is it about GPU rendering/shaders?
│                        ├─ YES → Use metal-shader-expert
│                        └─ NO → Use THIS SKILL as default drone skill
└─ NO → Is it general object detection without drone context?
        ├─ YES → Use clip-aware-embeddings or other CV skill
        └─ NO → Probably not a drone question
用户提及无人机或UAV?
├─ 是 → 是否涉及特定事物的巡检/检测(火灾、屋顶损坏、热成像)?
│        ├─ 是 → 使用drone-inspection-specialist
│        └─ 否 → 是否涉及飞行控制、导航或通用CV?
│                ├─ 是 → 使用本SKILL (drone-cv-expert)
│                └─ 否 → 是否涉及GPU渲染/着色器?
│                        ├─ 是 → 使用metal-shader-expert
│                        └─ 否 → 将本SKILL作为默认无人机Skill使用
└─ 否 → 是否为无无人机场景的通用目标检测?
        ├─ 是 → 使用clip-aware-embeddings或其他CV Skill
        └─ 否 → 大概率与无人机无关

Core Competencies

核心能力

Flight Control & Navigation

飞行控制与导航

  • PID Tuning: Position, velocity, attitude control loops
  • SLAM: ORB-SLAM, LSD-SLAM, visual-inertial odometry (VIO)
  • Path Planning: A*, RRT, RRT*, Dijkstra, potential fields
  • Sensor Fusion: EKF, UKF, complementary filters
  • GPS-Denied Navigation: AprilTags, visual odometry, LiDAR SLAM
  • PID调参: 位置、速度、姿态控制回路
  • SLAM: ORB-SLAM、LSD-SLAM、视觉惯性里程计(VIO)
  • 路径规划: A*、RRT、RRT*、Dijkstra、势场法
  • 传感器融合: EKF、UKF、互补滤波器
  • 无GPS导航: AprilTags、视觉里程计、LiDAR SLAM

Computer Vision

计算机视觉

  • Object Detection: YOLO (v5/v8/v10), EfficientDet, SSD
  • Tracking: ByteTrack, DeepSORT, SORT, optical flow
  • Edge Deployment: TensorRT, ONNX, OpenVINO optimization
  • 3D Vision: Stereo depth, point clouds, structure-from-motion
  • 目标检测: YOLO (v5/v8/v10)、EfficientDet、SSD
  • 目标跟踪: ByteTrack、DeepSORT、SORT、光流
  • 边缘部署: TensorRT、ONNX、OpenVINO优化
  • 3D视觉: 立体深度、点云、运动恢复结构

Hardware Integration

硬件集成

  • Flight Controllers: Pixhawk, Ardupilot, PX4, DJI
  • Protocols: MAVLink, DroneKit, MAVSDK
  • Edge Compute: Jetson (Nano/Xavier/Orin), Coral TPU
  • Sensors: IMU, GPS, barometer, LiDAR, depth cameras
  • 飞控系统: Pixhawk、Ardupilot、PX4、DJI
  • 通信协议: MAVLink、DroneKit、MAVSDK
  • 边缘计算: Jetson (Nano/Xavier/Orin)、Coral TPU
  • 传感器: IMU、GPS、气压计、LiDAR、深度相机

Anti-Patterns to Avoid

需避免的反模式

1. "Simulation-Only Syndrome"

1. "唯仿真综合征"

Wrong: Testing only in Gazebo/AirSim, then deploying directly to real drone. Right: Simulation → Bench test → Tethered flight → Controlled environment → Field.
错误做法: 仅在Gazebo/AirSim中测试,直接部署至真实无人机。 正确做法: 仿真 → 台架测试 → 系留飞行 → 受控环境飞行 → 野外飞行。

2. "EKF Overkill"

2. "EKF过度使用"

Wrong: Using Extended Kalman Filter when complementary filter suffices. Right: Match filter complexity to requirements:
  • Complementary filter: Basic stabilization, attitude only
  • EKF: Multi-sensor fusion, GPS+IMU+baro
  • UKF: Highly nonlinear systems, aggressive maneuvers
错误做法: 在互补滤波器足够使用的场景下仍使用扩展卡尔曼滤波器(EKF)。 正确做法: 根据需求匹配滤波器复杂度:
  • 互补滤波器: 基础稳定、仅姿态控制
  • EKF: 多传感器融合、GPS+IMU+气压计
  • UKF: 高度非线性系统、激进机动场景

3. "Max Resolution Assumption"

3. "最高分辨率假设"

Wrong: Processing 4K frames at 30fps expecting real-time performance. Right: Resolution trade-offs by altitude/speed:
AltitudeSpeedResolutionFPSRationale
<30mSlow1920x108030Detail needed
30-100mMedium1280x72030Balance
>100mFast640x48060Speed priority
错误做法: 处理30fps的4K帧并期望实时性能。 正确做法: 根据高度/速度权衡分辨率:
高度速度分辨率FPS理由
<30m低速1920x108030需要细节
30-100m中速1280x72030平衡性能与细节
>100m高速640x48060优先保证速度

4. "Single-Thread Processing"

4. "单线程处理"

Wrong: Sequential detect → track → control in one loop. Right: Pipeline parallelism:
Thread 1: Camera capture (async)
Thread 2: Object detection (GPU)
Thread 3: Tracking + state estimation
Thread 4: Control commands
错误做法: 在单个循环中按顺序执行检测 → 跟踪 → 控制。 正确做法: 流水线并行处理:
线程1: 相机捕获(异步)
线程2: 目标检测(GPU加速)
线程3: 跟踪 + 状态估计
线程4: 控制指令输出

5. "GPS Trust"

5. "过度信任GPS"

Wrong: Assuming GPS is always accurate and available. Right: Multi-source position estimation:
  • GPS: 2-5m accuracy outdoor, unavailable indoor
  • Visual odometry: 0.1-1% drift, lighting dependent
  • AprilTags: cm-level accuracy where deployed
  • IMU: Short-term only, drift accumulates
错误做法: 假设GPS始终准确可用。 正确做法: 多源位置估计:
  • GPS: 户外精度2-5m,室内不可用
  • 视觉里程计: 漂移0.1-1%,受光照影响
  • AprilTags: 部署场景下厘米级精度
  • IMU: 仅适用于短期,漂移会累积

6. "One Model Fits All"

6. "单一模型适配所有场景"

Wrong: Using same YOLO model for all scenarios. Right: Model selection by constraint:
ConstraintModelNotes
Latency criticalYOLOv8n6ms inference
BalancedYOLOv8s15ms, better accuracy
Accuracy firstYOLOv8x50ms, highest mAP
Edge deviceYOLOv8n + TensorRT3ms on Jetson
错误做法: 在所有场景中使用相同的YOLO模型。 正确做法: 根据约束选择模型:
约束条件模型说明
延迟敏感YOLOv8n6ms推理时间
性能平衡YOLOv8s15ms、精度更优
精度优先YOLOv8x50ms、最高mAP
边缘设备YOLOv8n + TensorRTJetson平台上3ms推理

Problem-Solving Framework

问题解决框架

1. Constraint Analysis

1. 约束分析

  • Compute: What hardware? (Jetson Nano = ~5 TOPS, Xavier = 32 TOPS)
  • Power: Battery capacity? Flight time impact?
  • Latency: Control loop rate? Detection response time?
  • Weight: Payload capacity? Center of gravity?
  • Environment: Indoor/outdoor? GPS available? Lighting conditions?
  • 计算能力: 使用何种硬件?(Jetson Nano = ~5 TOPS,Xavier = 32 TOPS)
  • 功耗: 电池容量?对飞行时间的影响?
  • 延迟: 控制回路频率?检测响应时间?
  • 重量: 有效载荷能力?重心位置?
  • 环境: 室内/室外?GPS是否可用?光照条件?

2. Algorithm Selection Matrix

2. 算法选择矩阵

ProblemClassical ApproachDeep LearningWhen to Use Each
Feature trackingKLT optical flowFlowNetClassical: Real-time, limited compute. DL: Robust, more compute
Object detectionHOG+SVMYOLO/SSDClassical: Simple objects, no GPU. DL: Complex, GPU available
SLAMORB-SLAMDROID-SLAMClassical: Mature, debuggable. DL: Better in challenging scenes
Path planningA*, RRTRL-basedClassical: Known environments. DL: Complex, dynamic
问题经典方法深度学习方法适用场景
特征跟踪KLT光流FlowNet经典方法:实时性强、计算需求低;深度学习方法:鲁棒性高、计算需求高
目标检测HOG+SVMYOLO/SSD经典方法:简单目标、无GPU;深度学习方法:复杂目标、有GPU可用
SLAMORB-SLAMDROID-SLAM经典方法:成熟、易调试;深度学习方法:复杂场景下表现更优
路径规划A*、RRT强化学习方法经典方法:已知环境;深度学习方法:复杂动态环境

3. Safety Checklist

3. 安全检查清单

  • Kill switch tested and accessible
  • Geofence configured
  • Return-to-home altitude set
  • Low battery action defined
  • Signal loss action defined
  • Propeller guards (if applicable)
  • Pre-flight sensor calibration
  • Weather conditions checked
  • 急停开关已测试且触手可及
  • 地理围栏已配置
  • 返航高度已设置
  • 低电量应对策略已定义
  • 信号丢失应对策略已定义
  • 螺旋桨保护罩已安装(如适用)
  • 飞行前传感器已校准
  • 天气条件已核查

Quick Reference Tables

快速参考表格

MAVLink Message Types

MAVLink消息类型

MessagePurposeFrequency
HEARTBEATConnection alive1 Hz
ATTITUDERoll/pitch/yaw10-100 Hz
LOCAL_POSITION_NEDPosition10-50 Hz
GPS_RAW_INTRaw GPS1-10 Hz
SET_POSITION_TARGETCommandsAs needed
消息用途频率
HEARTBEAT连接保活1 Hz
ATTITUDE滚转/俯仰/偏航10-100 Hz
LOCAL_POSITION_NED位置信息10-50 Hz
GPS_RAW_INT原始GPS数据1-10 Hz
SET_POSITION_TARGET控制指令按需发送

Kalman Filter Tuning

卡尔曼滤波器调参

MatrixHigh ValuesLow Values
Q (process noise)Trust measurements moreTrust model more
R (measurement noise)Trust model moreTrust measurements more
P (initial covariance)Uncertain initial stateConfident initial state
矩阵高值含义低值含义
Q(过程噪声)更信任测量值更信任模型预测
R(测量噪声)更信任模型预测更信任测量值
P(初始协方差)初始状态不确定初始状态确定

Common Coordinate Frames

常用坐标系

FrameOriginAxesUse
NEDTakeoff pointNorth-East-DownNavigation
ENUTakeoff pointEast-North-UpROS standard
BodyDrone CGForward-Right-DownControl
CameraLens centerRight-Down-ForwardVision
坐标系原点轴定义用途
NED起飞点北-东-下导航
ENU起飞点东-北-上ROS标准
机体坐标系无人机重心前-右-下控制
相机坐标系镜头中心右-下-前视觉处理

Reference Files

参考文件

Detailed implementations in
references/
:
  • navigation-algorithms.md
    - SLAM, path planning, localization
  • sensor-fusion-ekf.md
    - Kalman filters, multi-sensor fusion
  • object-detection-tracking.md
    - YOLO, ByteTrack, optical flow
详细实现位于
references/
目录:
  • navigation-algorithms.md
    - SLAM、路径规划、定位
  • sensor-fusion-ekf.md
    - 卡尔曼滤波器、多传感器融合
  • object-detection-tracking.md
    - YOLO、ByteTrack、光流

Simulation Tools

仿真工具

ToolStrengthsWeaknessesBest For
GazeboROS integration, physicsGraphics qualityROS development
AirSimPhotorealistic, CV-focusedWindows-centricVision algorithms
WebotsMulti-robot, accessibleLess drone-specificSwarm simulations
MATLAB/SimulinkControl designNot real-timeController tuning
工具优势劣势最佳适用场景
GazeboROS集成、物理仿真准确图形质量一般ROS开发
AirSim照片级真实感、聚焦CV偏Windows平台视觉算法开发
Webots多机器人仿真、易上手无人机针对性较弱集群仿真
MATLAB/Simulink控制设计专业非实时控制器调参

Emerging Technologies (2024-2025)

新兴技术(2024-2025)

  • Event cameras: 1μs temporal resolution, no motion blur
  • Neuromorphic computing: Loihi 2 for ultra-low-power inference
  • 4D Radar: Velocity + 3D position, works in all weather
  • Swarm autonomy: Decentralized coordination, emergent behavior
  • Foundation models: SAM, CLIP for zero-shot detection
  • 事件相机: 1μs时间分辨率、无运动模糊
  • 神经形态计算: Loihi 2芯片实现超低功耗推理
  • 4D雷达: 速度+3D位置、全天候工作
  • 集群自主: 去中心化协同、涌现行为
  • 基础模型: SAM、CLIP实现零样本检测

Integration Points

集成对接点

  • drone-inspection-specialist: Domain-specific detection (fire, damage, thermal)
  • metal-shader-expert: GPU-accelerated vision processing, custom shaders
  • collage-layout-expert: Report generation, visual composition

Key Principle: In drone systems, reliability trumps performance. A 95% accurate system that never crashes is better than 99% accurate that fails unpredictably. Always have fallbacks.
  • drone-inspection-specialist: 特定领域检测(火灾、损坏、热成像)
  • metal-shader-expert: GPU加速视觉处理、自定义着色器
  • collage-layout-expert: 报告生成、视觉合成

核心原则: 在无人机系统中,可靠性优于性能。95%准确率且从不崩溃的系统,远胜于99%准确率但会不可预测故障的系统。始终要有 fallback 方案。