opencv

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenCV - Computer Vision and Image Processing

OpenCV - 计算机视觉与图像处理

OpenCV (Open Source Computer Vision Library) is the de facto standard library for computer vision tasks. It provides 2500+ optimized algorithms for real-time image and video processing, from basic operations like reading images to advanced tasks like face recognition and 3D reconstruction.
OpenCV(开源计算机视觉库)是计算机视觉领域的事实标准库。它提供了2500余种经过优化的算法,用于实时图像和视频处理,涵盖从读取图像等基础操作到人脸识别、3D重建等高级任务。

When to Use

适用场景

  • Reading, writing, and displaying images and videos from files or cameras.
  • Image preprocessing (resizing, cropping, rotating, color conversion).
  • Edge detection (Canny, Sobel) and contour finding.
  • Feature detection and matching (SIFT, ORB, AKAZE).
  • Object detection (Haar Cascades, HOG, DNN module for YOLO/SSD).
  • Face detection and recognition.
  • Image segmentation (thresholding, watershed, GrabCut).
  • Video analysis (motion detection, object tracking, optical flow).
  • Camera calibration and 3D reconstruction.
  • Image stitching and panorama creation.
  • Real-time applications requiring fast performance.
  • 读取、写入和显示来自文件或相机的图像与视频。
  • 图像预处理(缩放、裁剪、旋转、颜色转换)。
  • 边缘检测(Canny、Sobel算法)与轮廓提取。
  • 特征检测与匹配(SIFT、ORB、AKAZE算法)。
  • 目标检测(Haar级联、HOG、用于YOLO/SSD的DNN模块)。
  • 人脸检测与识别。
  • 图像分割(阈值处理、分水岭算法、GrabCut算法)。
  • 视频分析(运动检测、目标跟踪、光流)。
  • 相机标定与3D重建。
  • 图像拼接与全景图创建。
  • 对性能要求较高的实时应用场景。

Reference Documentation

参考文档

Official docs: https://docs.opencv.org/4.x/
GitHub: https://github.com/opencv/opencv
Tutorials: https://docs.opencv.org/4.x/d9/df8/tutorial_root.html
Search patterns:
cv2.imread
,
cv2.cvtColor
,
cv2.Canny
,
cv2.findContours
,
cv2.VideoCapture
官方文档https://docs.opencv.org/4.x/
GitHub仓库https://github.com/opencv/opencv
教程https://docs.opencv.org/4.x/d9/df8/tutorial_root.html
常用搜索关键词
cv2.imread
,
cv2.cvtColor
,
cv2.Canny
,
cv2.findContours
,
cv2.VideoCapture

Core Principles

核心原则

Image as NumPy Array

图像以NumPy数组表示

OpenCV represents images as NumPy arrays with shape (height, width, channels). This allows seamless integration with NumPy operations and other scientific Python libraries.
OpenCV将图像表示为形状为(高度, 宽度, 通道数)的NumPy数组,这使得它能与NumPy操作及其他Python科学计算库无缝集成。

BGR Color Space (Not RGB!)

BGR颜色空间(注意不是RGB!)

OpenCV uses BGR (Blue-Green-Red) instead of RGB by default. This is critical to remember when displaying images or integrating with other libraries.
OpenCV默认使用BGR(蓝-绿-红)颜色空间而非RGB。在显示图像或与其他库集成时,这一点至关重要。

In-Place vs Copy Operations

原地操作与复制操作

Many OpenCV functions modify images in-place for performance. Understanding when copies are made is essential for efficient code.
许多OpenCV函数为了性能会原地修改图像。理解何时会创建副本对于编写高效代码至关重要。

C++ Performance in Python

Python中的C++级性能

OpenCV is written in optimized C++, making it extremely fast even when called from Python. Avoid Python loops when OpenCV vectorized operations exist.
OpenCV由经过优化的C++编写,因此即使在Python中调用也能达到极高的速度。当OpenCV提供向量化操作时,应避免使用Python循环。

Quick Reference

快速参考

Installation

安装

bash
undefined
bash
undefined

Basic OpenCV

基础版OpenCV

pip install opencv-python
pip install opencv-python

With contrib modules (SIFT, SURF, etc.)

包含扩展模块的版本(支持SIFT、SURF等)

pip install opencv-contrib-python
pip install opencv-contrib-python

Headless (no GUI, for servers)

无头版(无GUI,适用于服务器)

pip install opencv-python-headless
undefined
pip install opencv-python-headless
undefined

Standard Imports

标准导入

python
import cv2
import numpy as np
import matplotlib.pyplot as plt
python
import cv2
import numpy as np
import matplotlib.pyplot as plt

Basic Pattern - Read, Process, Display

基础流程 - 读取、处理、显示

python
import cv2
python
import cv2

1. Read image

1. 读取图像

img = cv2.imread('image.jpg')
img = cv2.imread('image.jpg')

2. Process (convert to grayscale)

2. 处理(转换为灰度图)

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

3. Display

3. 显示

cv2.imshow('Grayscale', gray) cv2.waitKey(0) # Wait for key press cv2.destroyAllWindows()
undefined
cv2.imshow('Grayscale', gray) cv2.waitKey(0) # 等待按键输入 cv2.destroyAllWindows()
undefined

Basic Pattern - Video Processing

基础流程 - 视频处理

python
import cv2
python
import cv2

1. Open video capture

1. 打开视频捕获

cap = cv2.VideoCapture(0) # 0 = default camera, or 'video.mp4'
while True: # 2. Read frame ret, frame = cap.read() if not ret: break
# 3. Process frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# 4. Display
cv2.imshow('Video', gray)

# 5. Exit on 'q' key
if cv2.waitKey(1) & 0xFF == ord('q'):
    break
cap = cv2.VideoCapture(0) # 0 = 默认摄像头,或传入视频文件路径如'video.mp4'
while True: # 2. 读取帧 ret, frame = cap.read() if not ret: break
# 3. 处理帧
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# 4. 显示
cv2.imshow('Video', gray)

# 5. 按下'q'键退出
if cv2.waitKey(1) & 0xFF == ord('q'):
    break

6. Cleanup

6. 释放资源

cap.release() cv2.destroyAllWindows()
undefined
cap.release() cv2.destroyAllWindows()
undefined

Critical Rules

关键规则

✅ DO

✅ 正确做法

  • Check Image Loaded - Always verify
    img is not None
    after
    cv2.imread()
    to catch file errors.
  • Use cv2.cvtColor() for Color Conversion - Don't manually rearrange channels; use the provided conversion codes.
  • Release Resources - Always call
    cap.release()
    and
    cv2.destroyAllWindows()
    when done with video/windows.
  • Copy Before Modifying - Use
    img.copy()
    if you need to preserve the original image.
  • Use Appropriate Data Types - Keep images as uint8 (0-255) for display, convert to float32 (0-1) for mathematical operations.
  • Validate VideoCapture - Check
    cap.isOpened()
    before reading frames.
  • Use BGR2RGB for Matplotlib - Convert BGR to RGB when displaying with matplotlib.
  • Vectorize Operations - Use OpenCV's built-in functions instead of Python loops over pixels.
  • 检查图像是否加载成功 - 调用
    cv2.imread()
    后,务必验证
    img is not None
    以捕获文件错误。
  • 使用cv2.cvtColor()进行颜色转换 - 不要手动调整通道顺序,使用提供的转换代码。
  • 释放资源 - 处理完视频或窗口后,务必调用
    cap.release()
    cv2.destroyAllWindows()
  • 修改前先复制 - 如果需要保留原始图像,使用
    img.copy()
  • 使用合适的数据类型 - 显示图像时保持uint8格式(0-255),进行数学运算时转换为float32格式(0-1)。
  • 验证VideoCapture - 读取帧前检查
    cap.isOpened()
  • Matplotlib显示时转换为BGR2RGB - 使用matplotlib显示时,将BGR转换为RGB。
  • 使用向量化操作 - 使用OpenCV内置函数而非Python循环遍历像素。

❌ DON'T

❌ 错误做法

  • Don't Assume RGB - OpenCV uses BGR by default; convert to RGB for matplotlib or PIL.
  • Don't Forget waitKey() - Without
    cv2.waitKey()
    , windows won't display properly.
  • Don't Mix PIL and OpenCV Directly - Convert between them explicitly (OpenCV uses BGR, PIL uses RGB).
  • Don't Process Video in Memory - Process frame-by-frame to avoid memory issues with large videos.
  • Don't Use Python Loops for Pixels - This is 100x slower than vectorized operations.
  • Don't Hardcode Paths - Use
    os.path.join()
    or
    pathlib
    for cross-platform compatibility.
  • 不要默认使用RGB - OpenCV默认使用BGR;若使用matplotlib或PIL,需转换为RGB。
  • 不要忘记waitKey() - 没有
    cv2.waitKey()
    ,窗口无法正常显示。
  • 不要直接混合使用PIL和OpenCV - 显式进行格式转换(OpenCV用BGR,PIL用RGB)。
  • 不要在内存中处理整个视频 - 逐帧处理以避免大视频的内存问题。
  • 不要使用Python循环遍历像素 - 这比向量化操作慢100倍。
  • 不要硬编码路径 - 使用
    os.path.join()
    pathlib
    实现跨平台兼容。

Anti-Patterns (NEVER)

反模式(绝对避免)

python
import cv2
import numpy as np
python
import cv2
import numpy as np

❌ BAD: Not checking if image loaded

❌ 错误示例:未检查图像是否加载成功

img = cv2.imread('image.jpg') gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Crashes if file doesn't exist!
img = cv2.imread('image.jpg') gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # 若文件不存在会崩溃!

✅ GOOD: Always validate

✅ 正确示例:始终验证

img = cv2.imread('image.jpg') if img is None: raise FileNotFoundError("Image not found") gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.imread('image.jpg') if img is None: raise FileNotFoundError("未找到图像文件") gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

❌ BAD: Using Python loops for pixel manipulation

❌ 错误示例:使用Python循环操作像素

for i in range(img.shape[0]): for j in range(img.shape[1]): img[i, j] = img[i, j] * 0.5 # Extremely slow!
for i in range(img.shape[0]): for j in range(img.shape[1]): img[i, j] = img[i, j] * 0.5 # 速度极慢!

✅ GOOD: Vectorized NumPy operations

✅ 正确示例:使用NumPy向量化操作

img = (img * 0.5).astype(np.uint8)
img = (img * 0.5).astype(np.uint8)

❌ BAD: Displaying BGR image with matplotlib

❌ 错误示例:使用matplotlib显示BGR图像

plt.imshow(img) # Colors will be wrong!
plt.imshow(img) # 颜色会显示错误!

✅ GOOD: Convert to RGB first

✅ 正确示例:先转换为RGB

img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) plt.imshow(img_rgb)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) plt.imshow(img_rgb)

❌ BAD: Not releasing video capture

❌ 错误示例:未释放视频捕获资源

cap = cv2.VideoCapture('video.mp4') while cap.read()[0]: pass
cap = cv2.VideoCapture('video.mp4') while cap.read()[0]: pass

Memory leak! Camera still locked!

内存泄漏!摄像头仍被占用!

✅ GOOD: Always release

✅ 正确示例:始终释放资源

cap = cv2.VideoCapture('video.mp4') try: while cap.read()[0]: pass finally: cap.release()
undefined
cap = cv2.VideoCapture('video.mp4') try: while cap.read()[0]: pass finally: cap.release()
undefined

Image I/O and Display

图像输入输出与显示

Reading and Writing Images

图像的读取与写入

python
import cv2
python
import cv2

Read image (returns None if failed)

读取图像(若失败返回None)

img = cv2.imread('image.jpg')
img = cv2.imread('image.jpg')

Read as grayscale

以灰度图读取

gray = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
gray = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)

Read with alpha channel

读取带alpha通道的图像

img_alpha = cv2.imread('image.png', cv2.IMREAD_UNCHANGED)
img_alpha = cv2.imread('image.png', cv2.IMREAD_UNCHANGED)

Write image

写入图像

cv2.imwrite('output.jpg', img)
cv2.imwrite('output.jpg', img)

Write with quality (JPEG: 0-100, PNG: 0-9 compression)

指定质量写入(JPEG: 0-100,PNG: 0-9压缩比)

cv2.imwrite('output.jpg', img, [cv2.IMWRITE_JPEG_QUALITY, 95]) cv2.imwrite('output.png', img, [cv2.IMWRITE_PNG_COMPRESSION, 9])
cv2.imwrite('output.jpg', img, [cv2.IMWRITE_JPEG_QUALITY, 95]) cv2.imwrite('output.png', img, [cv2.IMWRITE_PNG_COMPRESSION, 9])

Check if image loaded

检查图像是否加载成功

if img is None: print("Error: Could not load image") else: print(f"Image shape: {img.shape}") # (height, width, channels)
undefined
if img is None: print("错误:无法加载图像") else: print(f"图像尺寸:{img.shape}") # (高度, 宽度, 通道数)
undefined

Display Images

图像显示

python
import cv2
python
import cv2

Display image in window

在窗口中显示图像

cv2.imshow('Window Name', img) cv2.waitKey(0) # Wait indefinitely for key press cv2.destroyAllWindows()
cv2.imshow('窗口名称', img) cv2.waitKey(0) # 无限等待按键输入 cv2.destroyAllWindows()

Display for specific duration (milliseconds)

显示指定时长(毫秒)

cv2.imshow('Image', img) cv2.waitKey(3000) # Wait 3 seconds cv2.destroyAllWindows()
cv2.imshow('图像', img) cv2.waitKey(3000) # 等待3秒 cv2.destroyAllWindows()

Display multiple images

显示多张图像

cv2.imshow('Original', img) cv2.imshow('Gray', gray) cv2.waitKey(0) cv2.destroyAllWindows()
cv2.imshow('原始图像', img) cv2.imshow('灰度图像', gray) cv2.waitKey(0) cv2.destroyAllWindows()

Display with matplotlib (convert BGR to RGB!)

使用matplotlib显示(需转换为RGB!)

import matplotlib.pyplot as plt
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) plt.imshow(img_rgb) plt.axis('off') plt.show()
undefined
import matplotlib.pyplot as plt
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) plt.imshow(img_rgb) plt.axis('off') plt.show()
undefined

Video Capture

视频捕获

python
import cv2
python
import cv2

Open camera (0 = default, 1 = second camera, etc.)

打开摄像头(0 = 默认摄像头,1 = 第二个摄像头等)

cap = cv2.VideoCapture(0)
cap = cv2.VideoCapture(0)

Open video file

打开视频文件

cap = cv2.VideoCapture('video.mp4')
cap = cv2.VideoCapture('video.mp4')

Check if opened successfully

检查是否成功打开

if not cap.isOpened(): print("Error: Could not open video") exit()
if not cap.isOpened(): print("错误:无法打开视频") exit()

Get video properties

获取视频属性

fps = cap.get(cv2.CAP_PROP_FPS) width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print(f"Video: {width}x{height} @ {fps} fps, {total_frames} frames")
fps = cap.get(cv2.CAP_PROP_FPS) width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print(f"视频信息:{width}x{height} @ {fps} fps,共{total_frames}帧")

Read and process frames

读取并处理帧

while True: ret, frame = cap.read()
if not ret:
    print("End of video or error")
    break

# Process frame here
cv2.imshow('Frame', frame)

if cv2.waitKey(1) & 0xFF == ord('q'):
    break
cap.release() cv2.destroyAllWindows()
undefined
while True: ret, frame = cap.read()
if not ret:
    print("视频结束或发生错误")
    break

# 在此处处理帧
cv2.imshow('帧', frame)

if cv2.waitKey(1) & 0xFF == ord('q'):
    break
cap.release() cv2.destroyAllWindows()
undefined

Writing Videos

视频写入

python
import cv2

cap = cv2.VideoCapture('input.mp4')
python
import cv2

cap = cv2.VideoCapture('input.mp4')

Get video properties

获取视频属性

fps = int(cap.get(cv2.CAP_PROP_FPS)) width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS)) width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

Create VideoWriter

创建VideoWriter对象

fourcc = cv2.VideoWriter_fourcc(*'mp4v') # or 'XVID', 'MJPG' out = cv2.VideoWriter('output.mp4', fourcc, fps, (width, height))
while True: ret, frame = cap.read() if not ret: break
# Process frame
processed = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
processed = cv2.cvtColor(processed, cv2.COLOR_GRAY2BGR)  # Convert back to 3-channel

# Write frame
out.write(processed)
cap.release() out.release() cv2.destroyAllWindows()
undefined
fourcc = cv2.VideoWriter_fourcc(*'mp4v') # 或使用'XVID', 'MJPG' out = cv2.VideoWriter('output.mp4', fourcc, fps, (width, height))
while True: ret, frame = cap.read() if not ret: break
# 处理帧
processed = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
processed = cv2.cvtColor(processed, cv2.COLOR_GRAY2BGR)  # 转换回3通道

# 写入帧
out.write(processed)
cap.release() out.release() cv2.destroyAllWindows()
undefined

Image Transformations

图像变换

Resizing and Cropping

缩放与裁剪

python
import cv2

img = cv2.imread('image.jpg')
python
import cv2

img = cv2.imread('image.jpg')

Resize to specific dimensions

缩放到指定尺寸

resized = cv2.resize(img, (800, 600)) # (width, height)
resized = cv2.resize(img, (800, 600)) # (宽度, 高度)

Resize by scale factor

按比例缩放

scaled = cv2.resize(img, None, fx=0.5, fy=0.5) # 50% of original
scaled = cv2.resize(img, None, fx=0.5, fy=0.5) # 原始尺寸的50%

Resize with interpolation methods

使用不同插值方法缩放

resized_linear = cv2.resize(img, (800, 600), interpolation=cv2.INTER_LINEAR) # Default resized_cubic = cv2.resize(img, (800, 600), interpolation=cv2.INTER_CUBIC) # Better quality resized_area = cv2.resize(img, (400, 300), interpolation=cv2.INTER_AREA) # Best for shrinking
resized_linear = cv2.resize(img, (800, 600), interpolation=cv2.INTER_LINEAR) # 默认方法 resized_cubic = cv2.resize(img, (800, 600), interpolation=cv2.INTER_CUBIC) # 更高质量 resized_area = cv2.resize(img, (400, 300), interpolation=cv2.INTER_AREA) # 缩小图像时最佳

Crop (using NumPy slicing)

裁剪(使用NumPy切片)

height, width = img.shape[:2] cropped = img[100:400, 200:600] # [y1:y2, x1:x2]
height, width = img.shape[:2] cropped = img[100:400, 200:600] # [y1:y2, x1:x2]

Center crop

中心裁剪

crop_size = 300 center_x, center_y = width // 2, height // 2 x1 = center_x - crop_size // 2 y1 = center_y - crop_size // 2 center_cropped = img[y1:y1+crop_size, x1:x1+crop_size]
undefined
crop_size = 300 center_x, center_y = width // 2, height // 2 x1 = center_x - crop_size // 2 y1 = center_y - crop_size // 2 center_cropped = img[y1:y1+crop_size, x1:x1+crop_size]
undefined

Rotation and Flipping

旋转与翻转

python
import cv2
python
import cv2

Flip horizontally

水平翻转

flipped_h = cv2.flip(img, 1)
flipped_h = cv2.flip(img, 1)

Flip vertically

垂直翻转

flipped_v = cv2.flip(img, 0)
flipped_v = cv2.flip(img, 0)

Flip both

水平垂直同时翻转

flipped_both = cv2.flip(img, -1)
flipped_both = cv2.flip(img, -1)

Rotate 90 degrees clockwise

顺时针旋转90度

rotated_90 = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)
rotated_90 = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)

Rotate 180 degrees

旋转180度

rotated_180 = cv2.rotate(img, cv2.ROTATE_180)
rotated_180 = cv2.rotate(img, cv2.ROTATE_180)

Rotate 90 degrees counter-clockwise

逆时针旋转90度

rotated_90_ccw = cv2.rotate(img, cv2.ROTATE_90_COUNTERCLOCKWISE)
rotated_90_ccw = cv2.rotate(img, cv2.ROTATE_90_COUNTERCLOCKWISE)

Rotate by arbitrary angle (around center)

任意角度旋转(围绕中心)

height, width = img.shape[:2] center = (width // 2, height // 2) angle = 45 # degrees
height, width = img.shape[:2] center = (width // 2, height // 2) angle = 45 # 角度

Get rotation matrix

获取旋转矩阵

M = cv2.getRotationMatrix2D(center, angle, scale=1.0)
M = cv2.getRotationMatrix2D(center, angle, scale=1.0)

Apply rotation

应用旋转

rotated = cv2.warpAffine(img, M, (width, height))
rotated = cv2.warpAffine(img, M, (width, height))

Rotate and scale

旋转并缩放

M_scaled = cv2.getRotationMatrix2D(center, 30, scale=0.8) rotated_scaled = cv2.warpAffine(img, M_scaled, (width, height))
undefined
M_scaled = cv2.getRotationMatrix2D(center, 30, scale=0.8) rotated_scaled = cv2.warpAffine(img, M_scaled, (width, height))
undefined

Color Space Conversions

颜色空间转换

python
import cv2

img = cv2.imread('image.jpg')
python
import cv2

img = cv2.imread('image.jpg')

BGR to RGB (for matplotlib)

BGR转RGB(用于matplotlib)

rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

BGR to Grayscale

BGR转灰度图

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

BGR to HSV (useful for color-based segmentation)

BGR转HSV(适用于基于颜色的分割)

hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

BGR to LAB

BGR转LAB

lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)

Grayscale to BGR (add color channels)

灰度图转BGR(添加颜色通道)

gray_bgr = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
gray_bgr = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)

Extract individual channels

提取单个通道

b, g, r = cv2.split(img)
b, g, r = cv2.split(img)

Merge channels

合并通道

merged = cv2.merge([b, g, r])
undefined
merged = cv2.merge([b, g, r])
undefined

Image Filtering and Enhancement

图像滤波与增强

Blurring and Smoothing

模糊与平滑

python
import cv2
python
import cv2

Gaussian blur (reduce noise)

高斯模糊(降噪)

blurred = cv2.GaussianBlur(img, (5, 5), 0) # (kernel_size, sigma)
blurred = cv2.GaussianBlur(img, (5, 5), 0) # (核大小, 标准差)

Median blur (good for salt-and-pepper noise)

中值模糊(适用于椒盐噪声)

median = cv2.medianBlur(img, 5) # kernel_size must be odd
median = cv2.medianBlur(img, 5) # 核大小必须为奇数

Bilateral filter (edge-preserving smoothing)

双边滤波(保留边缘的平滑)

bilateral = cv2.bilateralFilter(img, 9, 75, 75) # (d, sigmaColor, sigmaSpace)
bilateral = cv2.bilateralFilter(img, 9, 75, 75) # (直径, 颜色标准差, 空间标准差)

Average blur

均值模糊

avg_blur = cv2.blur(img, (5, 5))
avg_blur = cv2.blur(img, (5, 5))

Box filter

方框滤波

box = cv2.boxFilter(img, -1, (5, 5))
undefined
box = cv2.boxFilter(img, -1, (5, 5))
undefined

Edge Detection

边缘检测

python
import cv2
python
import cv2

Convert to grayscale first

先转换为灰度图

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

Canny edge detection

Canny边缘检测

edges = cv2.Canny(gray, threshold1=50, threshold2=150)
edges = cv2.Canny(gray, threshold1=50, threshold2=150)

Sobel edge detection (gradient in x and y)

Sobel边缘检测(x和y方向梯度)

sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3) # X gradient sobely = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3) # Y gradient sobel = cv2.magnitude(sobelx, sobely)
sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3) # X方向梯度 sobely = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3) # Y方向梯度 sobel = cv2.magnitude(sobelx, sobely)

Laplacian edge detection

Laplacian边缘检测

laplacian = cv2.Laplacian(gray, cv2.CV_64F)
laplacian = cv2.Laplacian(gray, cv2.CV_64F)

Scharr (more accurate than Sobel for small kernels)

Scharr算子(小核时比Sobel更准确)

scharrx = cv2.Scharr(gray, cv2.CV_64F, 1, 0) scharry = cv2.Scharr(gray, cv2.CV_64F, 0, 1)
undefined
scharrx = cv2.Scharr(gray, cv2.CV_64F, 1, 0) scharry = cv2.Scharr(gray, cv2.CV_64F, 0, 1)
undefined

Morphological Operations

形态学操作

python
import cv2
import numpy as np
python
import cv2
import numpy as np

Define kernel

定义核

kernel = np.ones((5, 5), np.uint8)
kernel = np.ones((5, 5), np.uint8)

Erosion (shrink white regions)

腐蚀(缩小白色区域)

eroded = cv2.erode(img, kernel, iterations=1)
eroded = cv2.erode(img, kernel, iterations=1)

Dilation (expand white regions)

膨胀(扩大白色区域)

dilated = cv2.dilate(img, kernel, iterations=1)
dilated = cv2.dilate(img, kernel, iterations=1)

Opening (erosion followed by dilation) - removes noise

开运算(先腐蚀后膨胀)- 去除噪声

opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)

Closing (dilation followed by erosion) - closes gaps

闭运算(先膨胀后腐蚀)- 填补缝隙

closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)

Gradient (difference between dilation and erosion) - outlines

梯度运算(膨胀与腐蚀的差)- 提取轮廓

gradient = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)
gradient = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)

Top hat (difference between input and opening) - bright spots

顶帽运算(输入与开运算的差)- 提取亮区域

tophat = cv2.morphologyEx(img, cv2.MORPH_TOPHAT, kernel)
tophat = cv2.morphologyEx(img, cv2.MORPH_TOPHAT, kernel)

Black hat (difference between closing and input) - dark spots

黑帽运算(闭运算与输入的差)- 提取暗区域

blackhat = cv2.morphologyEx(img, cv2.MORPH_BLACKHAT, kernel)
undefined
blackhat = cv2.morphologyEx(img, cv2.MORPH_BLACKHAT, kernel)
undefined

Thresholding

阈值处理

python
import cv2

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
python
import cv2

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

Simple threshold

简单阈值

ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)

Binary inverse

反向二值阈值

ret, thresh_inv = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)
ret, thresh_inv = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)

Truncate

截断阈值

ret, thresh_trunc = cv2.threshold(gray, 127, 255, cv2.THRESH_TRUNC)
ret, thresh_trunc = cv2.threshold(gray, 127, 255, cv2.THRESH_TRUNC)

To zero

零阈值

ret, thresh_tozero = cv2.threshold(gray, 127, 255, cv2.THRESH_TOZERO)
ret, thresh_tozero = cv2.threshold(gray, 127, 255, cv2.THRESH_TOZERO)

Otsu's thresholding (automatic threshold calculation)

Otsu阈值(自动计算阈值)

ret, thresh_otsu = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
ret, thresh_otsu = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

Adaptive thresholding (different threshold for different regions)

自适应阈值(不同区域使用不同阈值)

adaptive_mean = cv2.adaptiveThreshold( gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2 )
adaptive_gaussian = cv2.adaptiveThreshold( gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2 )
undefined
adaptive_mean = cv2.adaptiveThreshold( gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2 )
adaptive_gaussian = cv2.adaptiveThreshold( gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2 )
undefined

Contours and Shape Detection

轮廓与形状检测

Finding and Drawing Contours

轮廓的查找与绘制

python
import cv2
python
import cv2

Convert to grayscale and threshold

转换为灰度图并进行阈值处理

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)

Find contours

查找轮廓

contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

Draw all contours

绘制所有轮廓

img_contours = img.copy() cv2.drawContours(img_contours, contours, -1, (0, 255, 0), 2) # -1 = all contours
img_contours = img.copy() cv2.drawContours(img_contours, contours, -1, (0, 255, 0), 2) # -1 = 绘制所有轮廓

Draw specific contour

绘制指定轮廓

cv2.drawContours(img_contours, contours, 0, (255, 0, 0), 3) # First contour
cv2.drawContours(img_contours, contours, 0, (255, 0, 0), 3) # 绘制第一个轮廓

Iterate through contours

遍历所有轮廓

for i, contour in enumerate(contours): # Calculate area area = cv2.contourArea(contour)
# Calculate perimeter
perimeter = cv2.arcLength(contour, True)

# Filter by area
if area > 1000:
    cv2.drawContours(img_contours, [contour], -1, (0, 0, 255), 2)
    
    # Get bounding rectangle
    x, y, w, h = cv2.boundingRect(contour)
    cv2.rectangle(img_contours, (x, y), (x+w, y+h), (255, 0, 0), 2)
undefined
for i, contour in enumerate(contours): # 计算面积 area = cv2.contourArea(contour)
# 计算周长
perimeter = cv2.arcLength(contour, True)

# 按面积过滤
if area > 1000:
    cv2.drawContours(img_contours, [contour], -1, (0, 0, 255), 2)
    
    # 获取边界矩形
    x, y, w, h = cv2.boundingRect(contour)
    cv2.rectangle(img_contours, (x, y), (x+w, y+h), (255, 0, 0), 2)
undefined

Shape Approximation

形状逼近

python
import cv2

for contour in contours:
    # Approximate contour to polygon
    epsilon = 0.02 * cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, epsilon, True)
    
    # Number of vertices
    n_vertices = len(approx)
    
    # Classify shape
    if n_vertices == 3:
        shape = "Triangle"
    elif n_vertices == 4:
        # Check if rectangle or square
        x, y, w, h = cv2.boundingRect(approx)
        aspect_ratio = float(w) / h
        shape = "Square" if 0.95 <= aspect_ratio <= 1.05 else "Rectangle"
    elif n_vertices > 4:
        shape = "Circle" if n_vertices > 10 else "Polygon"
    
    # Draw and label
    cv2.drawContours(img, [approx], -1, (0, 255, 0), 2)
    x, y = approx[0][0]
    cv2.putText(img, shape, (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
python
import cv2

for contour in contours:
    # 将轮廓逼近为多边形
    epsilon = 0.02 * cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, epsilon, True)
    
    # 顶点数量
    n_vertices = len(approx)
    
    # 分类形状
    if n_vertices == 3:
        shape = "三角形"
    elif n_vertices == 4:
        # 判断是矩形还是正方形
        x, y, w, h = cv2.boundingRect(approx)
        aspect_ratio = float(w) / h
        shape = "正方形" if 0.95 <= aspect_ratio <= 1.05 else "矩形"
    elif n_vertices > 4:
        shape = "圆形" if n_vertices > 10 else "多边形"
    
    # 绘制并标注
    cv2.drawContours(img, [approx], -1, (0, 255, 0), 2)
    x, y = approx[0][0]
    cv2.putText(img, shape, (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)

Contour Features

轮廓特征

python
import cv2
import numpy as np

for contour in contours:
    # Moments (for center of mass)
    M = cv2.moments(contour)
    if M['m00'] != 0:
        cx = int(M['m10'] / M['m00'])
        cy = int(M['m01'] / M['m00'])
        cv2.circle(img, (cx, cy), 5, (255, 0, 0), -1)
    
    # Minimum enclosing circle
    (x, y), radius = cv2.minEnclosingCircle(contour)
    center = (int(x), int(y))
    radius = int(radius)
    cv2.circle(img, center, radius, (0, 255, 0), 2)
    
    # Fit ellipse (requires at least 5 points)
    if len(contour) >= 5:
        ellipse = cv2.fitEllipse(contour)
        cv2.ellipse(img, ellipse, (255, 0, 255), 2)
    
    # Convex hull
    hull = cv2.convexHull(contour)
    cv2.drawContours(img, [hull], -1, (0, 255, 255), 2)
    
    # Solidity (contour area / convex hull area)
    hull_area = cv2.contourArea(hull)
    contour_area = cv2.contourArea(contour)
    solidity = contour_area / hull_area if hull_area > 0 else 0
python
import cv2
import numpy as np

for contour in contours:
    # 矩(用于计算质心)
    M = cv2.moments(contour)
    if M['m00'] != 0:
        cx = int(M['m10'] / M['m00'])
        cy = int(M['m01'] / M['m00'])
        cv2.circle(img, (cx, cy), 5, (255, 0, 0), -1)
    
    # 最小外接圆
    (x, y), radius = cv2.minEnclosingCircle(contour)
    center = (int(x), int(y))
    radius = int(radius)
    cv2.circle(img, center, radius, (0, 255, 0), 2)
    
    # 拟合椭圆(至少需要5个点)
    if len(contour) >= 5:
        ellipse = cv2.fitEllipse(contour)
        cv2.ellipse(img, ellipse, (255, 0, 255), 2)
    
    # 凸包
    hull = cv2.convexHull(contour)
    cv2.drawContours(img, [hull], -1, (0, 255, 255), 2)
    
    # 坚实度(轮廓面积 / 凸包面积)
    hull_area = cv2.contourArea(hull)
    contour_area = cv2.contourArea(contour)
    solidity = contour_area / hull_area if hull_area > 0 else 0

Feature Detection and Matching

特征检测与匹配

ORB (Oriented FAST and Rotated BRIEF)

ORB(Oriented FAST and Rotated BRIEF)

python
import cv2

img1 = cv2.imread('image1.jpg', cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread('image2.jpg', cv2.IMREAD_GRAYSCALE)
python
import cv2

img1 = cv2.imread('image1.jpg', cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread('image2.jpg', cv2.IMREAD_GRAYSCALE)

Create ORB detector

创建ORB检测器

orb = cv2.ORB_create(nfeatures=1000)
orb = cv2.ORB_create(nfeatures=1000)

Detect keypoints and compute descriptors

检测关键点并计算描述符

kp1, des1 = orb.detectAndCompute(img1, None) kp2, des2 = orb.detectAndCompute(img2, None)
kp1, des1 = orb.detectAndCompute(img1, None) kp2, des2 = orb.detectAndCompute(img2, None)

Draw keypoints

绘制关键点

img1_kp = cv2.drawKeypoints(img1, kp1, None, color=(0, 255, 0))
img1_kp = cv2.drawKeypoints(img1, kp1, None, color=(0, 255, 0))

Match descriptors using BFMatcher

使用BFMatcher匹配描述符

bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True) matches = bf.match(des1, des2)
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True) matches = bf.match(des1, des2)

Sort matches by distance (best first)

按距离排序匹配结果(最优在前)

matches = sorted(matches, key=lambda x: x.distance)
matches = sorted(matches, key=lambda x: x.distance)

Draw top matches

绘制前N个匹配结果

img_matches = cv2.drawMatches( img1, kp1, img2, kp2, matches[:50], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS )
cv2.imshow('Matches', img_matches) cv2.waitKey(0)
undefined
img_matches = cv2.drawMatches( img1, kp1, img2, kp2, matches[:50], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS )
cv2.imshow('匹配结果', img_matches) cv2.waitKey(0)
undefined

SIFT (Scale-Invariant Feature Transform)

SIFT(Scale-Invariant Feature Transform)

python
import cv2
python
import cv2

Note: SIFT is in opencv-contrib-python, not opencv-python

注意:SIFT包含在opencv-contrib-python中,而非基础版opencv-python

img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)

Create SIFT detector

创建SIFT检测器

sift = cv2.SIFT_create()
sift = cv2.SIFT_create()

Detect keypoints and compute descriptors

检测关键点并计算描述符

keypoints, descriptors = sift.detectAndCompute(img, None)
keypoints, descriptors = sift.detectAndCompute(img, None)

Draw keypoints

绘制关键点

img_kp = cv2.drawKeypoints( img, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS )
print(f"Number of keypoints: {len(keypoints)}")
undefined
img_kp = cv2.drawKeypoints( img, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS )
print(f"关键点数量:{len(keypoints)}")
undefined

Feature Matching with FLANN

使用FLANN进行特征匹配

python
import cv2
import numpy as np
python
import cv2
import numpy as np

Detect features

检测特征

sift = cv2.SIFT_create() kp1, des1 = sift.detectAndCompute(img1, None) kp2, des2 = sift.detectAndCompute(img2, None)
sift = cv2.SIFT_create() kp1, des1 = sift.detectAndCompute(img1, None) kp2, des2 = sift.detectAndCompute(img2, None)

FLANN parameters

FLANN参数

FLANN_INDEX_KDTREE = 1 index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5) search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params) matches = flann.knnMatch(des1, des2, k=2)
FLANN_INDEX_KDTREE = 1 index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5) search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params) matches = flann.knnMatch(des1, des2, k=2)

Lowe's ratio test

Lowe比率测试

good_matches = [] for m, n in matches: if m.distance < 0.7 * n.distance: good_matches.append(m)
print(f"Good matches: {len(good_matches)}")
good_matches = [] for m, n in matches: if m.distance < 0.7 * n.distance: good_matches.append(m)
print(f"有效匹配数量:{len(good_matches)}")

Draw good matches

绘制有效匹配结果

img_matches = cv2.drawMatches( img1, kp1, img2, kp2, good_matches, None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS )
undefined
img_matches = cv2.drawMatches( img1, kp1, img2, kp2, good_matches, None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS )
undefined

Object Detection

目标检测

Haar Cascade (Face Detection)

Haar级联(人脸检测)

python
import cv2
python
import cv2

Load pre-trained Haar Cascade

加载预训练的Haar级联分类器

face_cascade = cv2.CascadeClassifier( cv2.data.haarcascades + 'haarcascade_frontalface_default.xml' ) eye_cascade = cv2.CascadeClassifier( cv2.data.haarcascades + 'haarcascade_eye.xml' )
img = cv2.imread('people.jpg') gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
face_cascade = cv2.CascadeClassifier( cv2.data.haarcascades + 'haarcascade_frontalface_default.xml' ) eye_cascade = cv2.CascadeClassifier( cv2.data.haarcascades + 'haarcascade_eye.xml' )
img = cv2.imread('people.jpg') gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

Detect faces

检测人脸

faces = face_cascade.detectMultiScale( gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30) )
faces = face_cascade.detectMultiScale( gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30) )

Draw rectangles around faces

在人脸周围绘制矩形

for (x, y, w, h) in faces: cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)
# Detect eyes in face region
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]

eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex, ey, ew, eh) in eyes:
    cv2.rectangle(roi_color, (ex, ey), (ex+ew, ey+eh), (0, 255, 0), 2)
cv2.imshow('Faces', img) cv2.waitKey(0)
undefined
for (x, y, w, h) in faces: cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)
# 在人脸区域内检测眼睛
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]

eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex, ey, ew, eh) in eyes:
    cv2.rectangle(roi_color, (ex, ey), (ex+ew, ey+eh), (0, 255, 0), 2)
cv2.imshow('人脸检测结果', img) cv2.waitKey(0)
undefined

Template Matching

模板匹配

python
import cv2

img = cv2.imread('image.jpg')
template = cv2.imread('template.jpg')
python
import cv2

img = cv2.imread('image.jpg')
template = cv2.imread('template.jpg')

Convert to grayscale

转换为灰度图

img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
h, w = template_gray.shape
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
h, w = template_gray.shape

Template matching

模板匹配

result = cv2.matchTemplate(img_gray, template_gray, cv2.TM_CCOEFF_NORMED)
result = cv2.matchTemplate(img_gray, template_gray, cv2.TM_CCOEFF_NORMED)

Find locations above threshold

查找高于阈值的匹配位置

threshold = 0.8 locations = np.where(result >= threshold)
threshold = 0.8 locations = np.where(result >= threshold)

Draw rectangles

绘制匹配矩形

for pt in zip(*locations[::-1]): cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0, 255, 0), 2)
cv2.imshow('Matches', img) cv2.waitKey(0)
undefined
for pt in zip(*locations[::-1]): cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0, 255, 0), 2)
cv2.imshow('匹配结果', img) cv2.waitKey(0)
undefined

Practical Workflows

实用工作流

1. Document Scanner (Perspective Transform)

1. 文档扫描(透视变换)

python
import cv2
import numpy as np

def order_points(pts):
    """Order points: top-left, top-right, bottom-right, bottom-left."""
    rect = np.zeros((4, 2), dtype="float32")
    
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]  # Top-left
    rect[2] = pts[np.argmax(s)]  # Bottom-right
    
    diff = np.diff(pts, axis=1)
    rect[1] = pts[np.argmin(diff)]  # Top-right
    rect[3] = pts[np.argmax(diff)]  # Bottom-left
    
    return rect

def four_point_transform(image, pts):
    """Apply perspective transform to get bird's eye view."""
    rect = order_points(pts)
    (tl, tr, br, bl) = rect
    
    # Compute width and height
    widthA = np.linalg.norm(br - bl)
    widthB = np.linalg.norm(tr - tl)
    maxWidth = max(int(widthA), int(widthB))
    
    heightA = np.linalg.norm(tr - br)
    heightB = np.linalg.norm(tl - bl)
    maxHeight = max(int(heightA), int(heightB))
    
    # Destination points
    dst = np.array([
        [0, 0],
        [maxWidth - 1, 0],
        [maxWidth - 1, maxHeight - 1],
        [0, maxHeight - 1]
    ], dtype="float32")
    
    # Perspective transform
    M = cv2.getPerspectiveTransform(rect, dst)
    warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
    
    return warped
python
import cv2
import numpy as np

def order_points(pts):
    """排序点:左上、右上、右下、左下。"""
    rect = np.zeros((4, 2), dtype="float32")
    
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]  # 左上
    rect[2] = pts[np.argmax(s)]  # 右下
    
    diff = np.diff(pts, axis=1)
    rect[1] = pts[np.argmin(diff)]  # 右上
    rect[3] = pts[np.argmax(diff)]  # 左下
    
    return rect

def four_point_transform(image, pts):
    """应用透视变换获取鸟瞰图。"""
    rect = order_points(pts)
    (tl, tr, br, bl) = rect
    
    # 计算宽度和高度
    widthA = np.linalg.norm(br - bl)
    widthB = np.linalg.norm(tr - tl)
    maxWidth = max(int(widthA), int(widthB))
    
    heightA = np.linalg.norm(tr - br)
    heightB = np.linalg.norm(tl - bl)
    maxHeight = max(int(heightA), int(heightB))
    
    # 目标点
    dst = np.array([
        [0, 0],
        [maxWidth - 1, 0],
        [maxWidth - 1, maxHeight - 1],
        [0, maxHeight - 1]
    ], dtype="float32")
    
    # 透视变换矩阵
    M = cv2.getPerspectiveTransform(rect, dst)
    warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
    
    return warped

Usage

使用示例

img = cv2.imread('document.jpg') gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) edges = cv2.Canny(gray, 50, 150)
img = cv2.imread('document.jpg') gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) edges = cv2.Canny(gray, 50, 150)

Find contours

查找轮廓

contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) contours = sorted(contours, key=cv2.contourArea, reverse=True)
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) contours = sorted(contours, key=cv2.contourArea, reverse=True)

Find document contour (assume largest quadrilateral)

查找文档轮廓(假设最大的四边形为文档)

for contour in contours: peri = cv2.arcLength(contour, True) approx = cv2.approxPolyDP(contour, 0.02 * peri, True)
if len(approx) == 4:
    pts = approx.reshape(4, 2)
    scanned = four_point_transform(img, pts)
    cv2.imshow('Scanned', scanned)
    cv2.waitKey(0)
    break
undefined
for contour in contours: peri = cv2.arcLength(contour, True) approx = cv2.approxPolyDP(contour, 0.02 * peri, True)
if len(approx) == 4:
    pts = approx.reshape(4, 2)
    scanned = four_point_transform(img, pts)
    cv2.imshow('扫描结果', scanned)
    cv2.waitKey(0)
    break
undefined

2. Motion Detection

2. 运动检测

python
import cv2

def detect_motion(video_path):
    """Detect motion in video using frame differencing."""
    cap = cv2.VideoCapture(video_path)
    
    ret, frame1 = cap.read()
    ret, frame2 = cap.read()
    
    while cap.isOpened():
        # Compute difference between frames
        diff = cv2.absdiff(frame1, frame2)
        gray = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
        blur = cv2.GaussianBlur(gray, (5, 5), 0)
        _, thresh = cv2.threshold(blur, 20, 255, cv2.THRESH_BINARY)
        
        # Dilate to fill gaps
        dilated = cv2.dilate(thresh, None, iterations=3)
        
        # Find contours
        contours, _ = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        
        # Draw bounding boxes
        for contour in contours:
            if cv2.contourArea(contour) < 500:
                continue
            
            x, y, w, h = cv2.boundingRect(contour)
            cv2.rectangle(frame1, (x, y), (x+w, y+h), (0, 255, 0), 2)
            cv2.putText(frame1, "Motion", (x, y-10),
                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
        
        cv2.imshow('Motion Detection', frame1)
        
        # Update frames
        frame1 = frame2
        ret, frame2 = cap.read()
        
        if not ret or cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    cap.release()
    cv2.destroyAllWindows()
python
import cv2

def detect_motion(video_path):
    """使用帧差法检测视频中的运动。"""
    cap = cv2.VideoCapture(video_path)
    
    ret, frame1 = cap.read()
    ret, frame2 = cap.read()
    
    while cap.isOpened():
        # 计算帧之间的差异
        diff = cv2.absdiff(frame1, frame2)
        gray = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
        blur = cv2.GaussianBlur(gray, (5, 5), 0)
        _, thresh = cv2.threshold(blur, 20, 255, cv2.THRESH_BINARY)
        
        # 膨胀操作填补缝隙
        dilated = cv2.dilate(thresh, None, iterations=3)
        
        # 查找轮廓
        contours, _ = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        
        # 绘制边界框
        for contour in contours:
            if cv2.contourArea(contour) < 500:
                continue
            
            x, y, w, h = cv2.boundingRect(contour)
            cv2.rectangle(frame1, (x, y), (x+w, y+h), (0, 255, 0), 2)
            cv2.putText(frame1, "运动区域", (x, y-10),
                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
        
        cv2.imshow('运动检测结果', frame1)
        
        # 更新帧
        frame1 = frame2
        ret, frame2 = cap.read()
        
        if not ret or cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    cap.release()
    cv2.destroyAllWindows()

Usage

使用示例

detect_motion('video.mp4')

detect_motion('video.mp4')

undefined
undefined

3. Color-Based Object Tracking

3. 基于颜色的目标跟踪

python
import cv2
import numpy as np

def track_colored_object(video_path, lower_color, upper_color):
    """Track object by color in HSV space."""
    cap = cv2.VideoCapture(video_path)
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        # Convert to HSV
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
        
        # Create mask for color
        mask = cv2.inRange(hsv, lower_color, upper_color)
        
        # Remove noise
        mask = cv2.erode(mask, None, iterations=2)
        mask = cv2.dilate(mask, None, iterations=2)
        
        # Find contours
        contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        
        if contours:
            # Find largest contour
            largest = max(contours, key=cv2.contourArea)
            
            # Get center and radius
            ((x, y), radius) = cv2.minEnclosingCircle(largest)
            
            if radius > 10:
                # Draw circle and center
                cv2.circle(frame, (int(x), int(y)), int(radius), (0, 255, 0), 2)
                cv2.circle(frame, (int(x), int(y)), 5, (0, 0, 255), -1)
        
        cv2.imshow('Tracking', frame)
        cv2.imshow('Mask', mask)
        
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    cap.release()
    cv2.destroyAllWindows()
python
import cv2
import numpy as np

def track_colored_object(video_path, lower_color, upper_color):
    """在HSV空间中基于颜色跟踪目标。"""
    cap = cv2.VideoCapture(video_path)
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        # 转换为HSV颜色空间
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
        
        # 创建颜色掩码
        mask = cv2.inRange(hsv, lower_color, upper_color)
        
        # 去除噪声
        mask = cv2.erode(mask, None, iterations=2)
        mask = cv2.dilate(mask, None, iterations=2)
        
        # 查找轮廓
        contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        
        if contours:
            # 找到最大的轮廓
            largest = max(contours, key=cv2.contourArea)
            
            # 获取中心和半径
            ((x, y), radius) = cv2.minEnclosingCircle(largest)
            
            if radius > 10:
                # 绘制圆形和中心
                cv2.circle(frame, (int(x), int(y)), int(radius), (0, 255, 0), 2)
                cv2.circle(frame, (int(x), int(y)), 5, (0, 0, 255), -1)
        
        cv2.imshow('跟踪结果', frame)
        cv2.imshow('掩码', mask)
        
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    cap.release()
    cv2.destroyAllWindows()

Usage: Track red object

使用示例:跟踪红色目标

lower_red = np.array([0, 100, 100])

lower_red = np.array([0, 100, 100])

upper_red = np.array([10, 255, 255])

upper_red = np.array([10, 255, 255])

track_colored_object(0, lower_red, upper_red)

track_colored_object(0, lower_red, upper_red)

undefined
undefined

4. QR Code Detection

4. QR码检测

python
import cv2

def detect_qr_code(image_path):
    """Detect and decode QR codes."""
    img = cv2.imread(image_path)
    
    # Initialize QR code detector
    detector = cv2.QRCodeDetector()
    
    # Detect and decode
    data, bbox, straight_qrcode = detector.detectAndDecode(img)
    
    if bbox is not None:
        # Draw bounding box
        n_lines = len(bbox)
        for i in range(n_lines):
            point1 = tuple(bbox[i][0].astype(int))
            point2 = tuple(bbox[(i+1) % n_lines][0].astype(int))
            cv2.line(img, point1, point2, (0, 255, 0), 3)
        
        # Display decoded data
        if data:
            print(f"QR Code data: {data}")
            cv2.putText(img, data, (50, 50),
                       cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    
    cv2.imshow('QR Code', img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
python
import cv2

def detect_qr_code(image_path):
    """检测并解码QR码。"""
    img = cv2.imread(image_path)
    
    # 初始化QR码检测器
    detector = cv2.QRCodeDetector()
    
    # 检测并解码
    data, bbox, straight_qrcode = detector.detectAndDecode(img)
    
    if bbox is not None:
        # 绘制边界框
        n_lines = len(bbox)
        for i in range(n_lines):
            point1 = tuple(bbox[i][0].astype(int))
            point2 = tuple(bbox[(i+1) % n_lines][0].astype(int))
            cv2.line(img, point1, point2, (0, 255, 0), 3)
        
        # 显示解码数据
        if data:
            print(f"QR码内容:{data}")
            cv2.putText(img, data, (50, 50),
                       cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    
    cv2.imshow('QR码检测结果', img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

Usage

使用示例

detect_qr_code('qrcode.jpg')

detect_qr_code('qrcode.jpg')

undefined
undefined

5. Image Stitching (Panorama)

5. 图像拼接(全景图)

python
import cv2

def create_panorama(images):
    """Stitch multiple images into panorama."""
    # Create stitcher
    stitcher = cv2.Stitcher_create()
    
    # Stitch images
    status, pano = stitcher.stitch(images)
    
    if status == cv2.Stitcher_OK:
        print("Panorama created successfully")
        return pano
    else:
        print(f"Error: {status}")
        return None
python
import cv2

def create_panorama(images):
    """将多张图像拼接为全景图。"""
    # 创建拼接器
    stitcher = cv2.Stitcher_create()
    
    # 拼接图像
    status, pano = stitcher.stitch(images)
    
    if status == cv2.Stitcher_OK:
        print("全景图创建成功")
        return pano
    else:
        print(f"错误代码:{status}")
        return None

Usage

使用示例

img1 = cv2.imread('image1.jpg') img2 = cv2.imread('image2.jpg') img3 = cv2.imread('image3.jpg')
panorama = create_panorama([img1, img2, img3])
if panorama is not None: cv2.imshow('Panorama', panorama) cv2.waitKey(0)
undefined
img1 = cv2.imread('image1.jpg') img2 = cv2.imread('image2.jpg') img3 = cv2.imread('image3.jpg')
panorama = create_panorama([img1, img2, img3])
if panorama is not None: cv2.imshow('全景图', panorama) cv2.waitKey(0)
undefined

Performance Optimization

性能优化

Use GPU Acceleration

使用GPU加速

python
import cv2
python
import cv2

Check CUDA availability

检查CUDA可用性

print(f"CUDA devices: {cv2.cuda.getCudaEnabledDeviceCount()}")
print(f"CUDA设备数量:{cv2.cuda.getCudaEnabledDeviceCount()}")

Upload to GPU

将图像上传到GPU

gpu_img = cv2.cuda_GpuMat() gpu_img.upload(img)
gpu_img = cv2.cuda_GpuMat() gpu_img.upload(img)

GPU operations (must use cv2.cuda module)

GPU操作(必须使用cv2.cuda模块)

gpu_gray = cv2.cuda.cvtColor(gpu_img, cv2.COLOR_BGR2GRAY)
gpu_gray = cv2.cuda.cvtColor(gpu_img, cv2.COLOR_BGR2GRAY)

Download from GPU

从GPU下载结果

result = gpu_gray.download()
undefined
result = gpu_gray.download()
undefined

Vectorize Operations

向量化操作

python
undefined
python
undefined

❌ SLOW: Python loops

❌ 缓慢:Python循环

for i in range(height): for j in range(width): img[i, j] = img[i, j] * 0.5
for i in range(height): for j in range(width): img[i, j] = img[i, j] * 0.5

✅ FAST: NumPy vectorization

✅ 快速:NumPy向量化

img = (img * 0.5).astype(np.uint8)
img = (img * 0.5).astype(np.uint8)

✅ FAST: OpenCV built-in functions

✅ 快速:OpenCV内置函数

img = cv2.convertScaleAbs(img, alpha=0.5, beta=0)
undefined
img = cv2.convertScaleAbs(img, alpha=0.5, beta=0)
undefined

Multi-threading for Video

视频多线程处理

python
import cv2
from threading import Thread
from queue import Queue

class VideoCapture:
    """Threaded video capture for better performance."""
    
    def __init__(self, src):
        self.cap = cv2.VideoCapture(src)
        self.q = Queue(maxsize=128)
        self.stopped = False
        
    def start(self):
        Thread(target=self._reader, daemon=True).start()
        return self
        
    def _reader(self):
        while not self.stopped:
            ret, frame = self.cap.read()
            if not ret:
                self.stop()
                break
            self.q.put(frame)
                
    def read(self):
        return self.q.get()
    
    def stop(self):
        self.stopped = True
        self.cap.release()
python
import cv2
from threading import Thread
from queue import Queue

class VideoCapture:
    """多线程视频捕获以提升性能。"""
    
    def __init__(self, src):
        self.cap = cv2.VideoCapture(src)
        self.q = Queue(maxsize=128)
        self.stopped = False
        
    def start(self):
        Thread(target=self._reader, daemon=True).start()
        return self
        
    def _reader(self):
        while not self.stopped:
            ret, frame = self.cap.read()
            if not ret:
                self.stop()
                break
            self.q.put(frame)
                
    def read(self):
        return self.q.get()
    
    def stop(self):
        self.stopped = True
        self.cap.release()

Usage

使用示例

cap = VideoCapture(0).start() while True: frame = cap.read() # Process frame... if cv2.waitKey(1) & 0xFF == ord('q'): break cap.stop()
undefined
cap = VideoCapture(0).start() while True: frame = cap.read() # 处理帧... if cv2.waitKey(1) & 0xFF == ord('q'): break cap.stop()
undefined

Common Pitfalls and Solutions

常见陷阱与解决方案

The "BGR vs RGB" Color Confusion

“BGR vs RGB”颜色混淆

OpenCV uses BGR, most other libraries use RGB.
python
undefined
OpenCV使用BGR,大多数其他库使用RGB。
python
undefined

❌ Problem: Colors look wrong in matplotlib

❌ 问题:matplotlib中颜色显示错误

img = cv2.imread('image.jpg') plt.imshow(img) # Blue and red are swapped!
img = cv2.imread('image.jpg') plt.imshow(img) # 蓝色和红色颠倒!

✅ Solution: Convert to RGB

✅ 解决方案:转换为RGB

img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) plt.imshow(img_rgb)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) plt.imshow(img_rgb)

✅ Alternative: Use OpenCV's imshow

✅ 替代方案:使用OpenCV的imshow

cv2.imshow('Correct Colors', img) cv2.waitKey(0)
undefined
cv2.imshow('颜色正确显示', img) cv2.waitKey(0)
undefined

The "Window Won't Close" Problem

“窗口无法关闭”问题

Windows stay open without proper key handling.
python
undefined
没有正确处理按键时,窗口会保持打开状态。
python
undefined

❌ Problem: Window frozen

❌ 问题:窗口冻结

cv2.imshow('Image', img)
cv2.imshow('图像', img)

Program hangs!

程序挂起!

✅ Solution: Always use waitKey

✅ 解决方案:始终使用waitKey

cv2.imshow('Image', img) cv2.waitKey(0) # Wait for key press cv2.destroyAllWindows()
undefined
cv2.imshow('图像', img) cv2.waitKey(0) # 等待按键输入 cv2.destroyAllWindows()
undefined

The "Video Capture Not Released" Problem

“视频捕获未释放”问题

Camera stays locked if not released properly.
python
undefined
如果未正确释放,摄像头会保持锁定状态。
python
undefined

❌ Problem: Camera locked after crash

❌ 问题:程序崩溃后摄像头仍被锁定

cap = cv2.VideoCapture(0)
cap = cv2.VideoCapture(0)

... code crashes ...

... 程序崩溃 ...

Camera still locked!

摄像头仍被占用!

✅ Solution: Use try-finally

✅ 解决方案:使用try-finally

cap = cv2.VideoCapture(0) try: while True: ret, frame = cap.read() # ... process ... finally: cap.release() cv2.destroyAllWindows()
undefined
cap = cv2.VideoCapture(0) try: while True: ret, frame = cap.read() # ... 处理 ... finally: cap.release() cv2.destroyAllWindows()
undefined

The "Image Modification" Confusion

“图像修改”混淆

Some operations modify in-place, others return new images.
python
undefined
有些操作会原地修改图像,有些则返回新图像。
python
undefined

In-place modification

原地修改

cv2.rectangle(img, (10, 10), (100, 100), (0, 255, 0), 2) # Modifies img
cv2.rectangle(img, (10, 10), (100, 100), (0, 255, 0), 2) # 修改原img

Returns new image

返回新图像

blurred = cv2.GaussianBlur(img, (5, 5), 0) # img unchanged
blurred = cv2.GaussianBlur(img, (5, 5), 0) # img保持不变

✅ Always use .copy() if you need original

✅ 若需要保留原图像,始终使用.copy()

img_copy = img.copy() cv2.rectangle(img_copy, (10, 10), (100, 100), (0, 255, 0), 2)
undefined
img_copy = img.copy() cv2.rectangle(img_copy, (10, 10), (100, 100), (0, 255, 0), 2)
undefined

The "Contour Hierarchy" Misunderstanding

“轮廓层级”误解

findContours
returns different structures based on retrieval mode.
python
undefined
findContours
会根据检索模式返回不同的结构。
python
undefined

External contours only (most common)

仅检索外部轮廓(最常用)

contours, hierarchy = cv2.findContours( thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE )
contours, hierarchy = cv2.findContours( thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE )

All contours with full hierarchy

检索所有轮廓及完整层级

contours, hierarchy = cv2.findContours( thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE )
contours, hierarchy = cv2.findContours( thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE )

⚠️ hierarchy structure: [Next, Previous, First_Child, Parent]

⚠️ 层级结构:[下一个轮廓, 上一个轮廓, 第一个子轮廓, 父轮廓]

Most use cases only need RETR_EXTERNAL

大多数场景只需使用RETR_EXTERNAL


OpenCV is the Swiss Army knife of computer vision. Its vast library of optimized algorithms, combined with Python's ease of use, makes it the perfect tool for everything from simple image processing to complex real-time vision systems. Master these fundamentals, and you'll have the foundation to tackle any computer vision challenge.

OpenCV是计算机视觉领域的瑞士军刀。其庞大的优化算法库,结合Python的易用性,使其成为从简单图像处理到复杂实时视觉系统开发的完美工具。掌握这些基础知识,你将具备应对任何计算机视觉挑战的基础。