Matplotlib - Data Visualization
Matplotlib - 数据可视化
The most widely used library for 2D (and basic 3D) plotting. It provides full control over every element of a figure, from line styles to axis spines.
这是用于2D(及基础3D)绘图的最常用库。它允许你完全控制图表的每一个元素,从线条样式到坐标轴边框。
- Creating publication-quality 2D plots (Line, Scatter, Bar, Hist)
- Visualizing scientific data (Heatmaps, Contours, Vector fields)
- Generating complex multi-panel figures
- Fine-tuning plots for papers/reports (LaTeX support)
- Building custom visualization tools and dashboards
- Plotting data directly from NumPy arrays or Pandas DataFrames
- 创建出版级2D图表(折线图、散点图、柱状图、直方图)
- 科学数据可视化(热图、等高线、向量场)
- 生成复杂的多面板图表
- 为论文/报告微调图表(支持LaTeX)
- 构建自定义可视化工具和仪表盘
- 直接基于NumPy数组或Pandas DataFrame绘图
Reference Documentation
参考文档
Two Interfaces: Choose Wisely
两种接口:明智选择
| Interface | Method | Use Case |
|---|
| Object-Oriented (OO) | | Recommended. Best for complex, reproducible plots. |
| Pyplot (State-based) | | Quick interactive checks. Avoid for scripts/modules. |
| 接口类型 | 使用方式 | 适用场景 |
|---|
| 面向对象(OO) | | 推荐使用,最适合复杂、可复现的绘图场景。 |
| Pyplot(基于状态) | | 快速交互式检查,避免在脚本/模块中使用。 |
Use Matplotlib For
Matplotlib的适用场景
- High-level control over figure layout.
- Precise styling for publication.
- Embedding plots in GUI applications.
- 对图表布局进行高级控制
- 为出版物进行精确样式调整
- 在GUI应用中嵌入图表
Do NOT Use For
Matplotlib的不适用场景
- Interactive web dashboards (use Plotly or Bokeh).
- Rapid statistical exploration (use Seaborn — it's built on Matplotlib but simpler for stats).
- Very large datasets (>1M points) in real-time (use Datashader or VisPy).
- 交互式Web仪表盘(使用Plotly或Bokeh)
- 快速统计探索(使用Seaborn——它基于Matplotlib构建,但统计绘图更简单)
- 超大规模数据集(>100万条数据)的实时可视化(使用Datashader或VisPy)
bash
pip install matplotlib
bash
pip install matplotlib
python
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib import gridspec
python
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib import gridspec
Basic Pattern - The OO Interface (The "Proper" Way)
基础模式 - 面向对象接口(“标准”用法)
python
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
python
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
1. Create Figure and Axis objects
1. 创建Figure和Axis对象
fig, ax = plt.subplots(figsize=(8, 5))
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(x, y, label='Sine Wave', color='tab:blue', linewidth=2)
ax.plot(x, y, label='Sine Wave', color='tab:blue', linewidth=2)
ax.set_xlabel('Time (s)')
ax.set_ylabel('Amplitude')
ax.set_title('Oscillation Example')
ax.legend()
ax.grid(True, linestyle='--')
ax.set_xlabel('时间 (s)')
ax.set_ylabel('振幅')
ax.set_title('振荡示例')
ax.legend()
ax.grid(True, linestyle='--')
fig.savefig('plot.pdf', dpi=300, bbox_inches='tight')
fig.savefig('plot.pdf', dpi=300, bbox_inches='tight')
- Use the OO interface () - It prevents errors in multi-plot scripts.
- Use - When saving, to ensure labels aren't cut off.
- Set dpi - Use 300+ for print, 72-100 for web.
- Close figures - Use in loops to avoid memory leaks.
- Label everything - Every axis must have a label and units.
- Vector formats - Save as or for academic papers (lossless scaling).
- Colorblind-friendly - Use or colormaps.
- 使用面向对象接口()——避免多绘图脚本中的错误
- 保存时使用——确保标签不会被截断
- 设置dpi——印刷用300+,网页用72-100
- 关闭图表——在循环中使用避免内存泄漏
- 为所有元素添加标签——每个坐标轴必须有标签和单位
- 使用矢量格式——学术论文保存为或(无损缩放)
- 使用色盲友好的配色方案——使用或色阶
- Mix and - It leads to "hidden state" bugs.
- Use in loops - It blocks execution; use instead.
- Manual legend placement - Let try first.
- Hardcode font sizes - Use
plt.rcParams.update({'font.size': 12})
for consistency.
- Use "Rainbow" (Jet) - It creates false gradients; use perceptually uniform maps like or .
- 混合使用和——会导致“隐藏状态”错误
- 在循环中使用——会阻塞执行;改用
- 手动设置图例位置——先尝试
- 硬编码字体大小——使用
plt.rcParams.update({'font.size': 12})
保持一致性
- 使用“彩虹色阶(Jet)”——会产生虚假渐变;使用感知均匀的色阶如或
Anti-Patterns (NEVER)
反模式(绝对避免)
❌ BAD: Mixing interfaces (State-based + OO)
❌ 错误:混合接口(基于状态 + 面向对象)
plt.figure()
ax = plt.gca()
plt.plot(x, y) # Confusing state
ax.set_title('Test')
plt.figure()
ax = plt.gca()
plt.plot(x, y) # 状态混乱
ax.set_title('Test')
✅ GOOD: Consistent OO interface
✅ 正确:一致的面向对象接口
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_title('Test')
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_title('Test')
❌ BAD: Overlapping subplots
❌ 错误:子图重叠
fig, axs = plt.subplots(2, 2)
fig, axs = plt.subplots(2, 2)
Plots look squashed and titles overlap
图表看起来拥挤,标题重叠
✅ GOOD: Use constrained_layout or tight_layout
✅ 正确:使用constrained_layout或tight_layout
fig, axs = plt.subplots(2, 2, constrained_layout=True)
fig, axs = plt.subplots(2, 2, constrained_layout=True)
Labels, Ticks, and Styles
标签、刻度和样式
python
fig, ax = plt.subplots()
ax.plot(x, y, 'o-', color='red', markersize=4, alpha=0.7)
python
fig, ax = plt.subplots()
ax.plot(x, y, 'o-', color='red', markersize=4, alpha=0.7)
Explicitly setting limits
显式设置范围
ax.set_xlim(0, 10)
ax.set_ylim(-1.5, 1.5)
ax.set_xlim(0, 10)
ax.set_ylim(-1.5, 1.5)
ax.set_xticks([0, 2.5, 5, 7.5, 10])
ax.set_xticklabels(['Start', '1/4', 'Mid', '3/4', 'End'])
ax.set_xticks([0, 2.5, 5, 7.5, 10])
ax.set_xticklabels(['开始', '1/4', '中间', '3/4', '结束'])
Spines (Box around the plot)
坐标轴边框
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
Adding text and arrows
添加文本和箭头
ax.annotate('Local Max', xy=(1.5, 1), xytext=(3, 1.2),
arrowprops=dict(facecolor='black', shrink=0.05))
ax.annotate('局部最大值', xy=(1.5, 1), xytext=(3, 1.2),
arrowprops=dict(facecolor='black', shrink=0.05))
Subplots and GridSpec
子图和GridSpec
fig, axs = plt.subplots(2, 2, figsize=(10, 10))
axs[0, 0].plot(x, y) # Top left
axs[1, 1].scatter(x, y) # Bottom right
fig, axs = plt.subplots(2, 2, figsize=(10, 10))
axs[0, 0].plot(x, y) # 左上
axs[1, 1].scatter(x, y) # 右下
Complex grid (Uneven sizes)
复杂网格(尺寸不均)
fig = plt.figure(figsize=(10, 6))
gs = gridspec.GridSpec(2, 2, width_ratios=[2, 1], height_ratios=[1, 2])
ax1 = fig.add_subplot(gs[0, 0]) # Top left (large width)
ax2 = fig.add_subplot(gs[0, 1]) # Top right
ax3 = fig.add_subplot(gs[1, :]) # Bottom spanning all columns
fig = plt.figure(figsize=(10, 6))
gs = gridspec.GridSpec(2, 2, width_ratios=[2, 1], height_ratios=[1, 2])
ax1 = fig.add_subplot(gs[0, 0]) # 左上(宽尺寸)
ax2 = fig.add_subplot(gs[0, 1]) # 右上
ax3 = fig.add_subplot(gs[1, :]) # 底部跨所有列
Scientific Plot Types
科学绘图类型
Heatmaps and Colorbars
热图和颜色条
python
data = np.random.rand(10, 10)
fig, ax = plt.subplots()
im = ax.imshow(data, cmap='viridis', interpolation='nearest')
python
data = np.random.rand(10, 10)
fig, ax = plt.subplots()
im = ax.imshow(data, cmap='viridis', interpolation='nearest')
cbar = fig.colorbar(im, ax=ax, label='Intensity [a.u.]')
cbar = fig.colorbar(im, ax=ax, label='强度 [a.u.]')
Proper alignment of colorbar
颜色条的正确对齐
from mpl_toolkits.axes_grid1 import make_axes_locatable
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.05)
fig.colorbar(im, cax=cax)
from mpl_toolkits.axes_grid1 import make_axes_locatable
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.05)
fig.colorbar(im, cax=cax)
Histograms and Error Bars
直方图和误差棒
data = np.random.normal(0, 1, 1000)
ax.hist(data, bins=30, density=True, alpha=0.6, color='g', edgecolor='black')
data = np.random.normal(0, 1, 1000)
ax.hist(data, bins=30, density=True, alpha=0.6, color='g', edgecolor='black')
x = np.arange(10)
y = x**2
yerr = np.sqrt(y)
ax.errorbar(x, y, yerr=yerr, fmt='o', capsize=5, label='Data with noise')
x = np.arange(10)
y = x**2
yerr = np.sqrt(y)
ax.errorbar(x, y, yerr=yerr, fmt='o', capsize=5, label='带噪声的数据')
python
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
surf = ax.plot_surface(X, Y, Z, cmap='coolwarm', linewidth=0, antialiased=False)
fig.colorbar(surf, shrink=0.5, aspect=5)
python
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
surf = ax.plot_surface(X, Y, Z, cmap='coolwarm', linewidth=0, antialiased=False)
fig.colorbar(surf, shrink=0.5, aspect=5)
Formatting for Publication
出版级格式设置
Using LaTeX and RcParams
使用LaTeX和RcParams
plt.style.use('seaborn-v0_8-paper') # or 'ggplot', 'bmh'
plt.style.use('seaborn-v0_8-paper') # 或'ggplot', 'bmh'
plt.rcParams.update({
"text.usetex": True,
"font.family": "serif",
"font.serif": ["Computer Modern Roman"],
"axes.labelsize": 14,
})
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xlabel(r'$\alpha_{i} + \beta \sin(\omega t)$') # LaTeX string
plt.rcParams.update({
"text.usetex": True,
"font.family": "serif",
"font.serif": ["Computer Modern Roman"],
"axes.labelsize": 14,
})
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xlabel(r'$\alpha_{i} + \beta \sin(\omega t)$') # LaTeX字符串
1. Multi-dataset Comparison Workflow
1. 多数据集对比工作流
python
def plot_comparison(datasets, labels):
fig, ax = plt.subplots(figsize=(10, 6))
colors = plt.cm.viridis(np.linspace(0, 1, len(datasets)))
for data, label, color in zip(datasets, labels, colors):
ax.plot(data['x'], data['y'], label=label, color=color, lw=1.5)
ax.fill_between(data['x'], data['y']-data['std'], data['y']+data['std'],
alpha=0.2, color=color)
ax.set_title('Experiment Results Comparison')
ax.legend(frameon=False)
return fig, ax
python
def plot_comparison(datasets, labels):
fig, ax = plt.subplots(figsize=(10, 6))
colors = plt.cm.viridis(np.linspace(0, 1, len(datasets)))
for data, label, color in zip(datasets, labels, colors):
ax.plot(data['x'], data['y'], label=label, color=color, lw=1.5)
ax.fill_between(data['x'], data['y']-data['std'], data['y']+data['std'],
alpha=0.2, color=color)
ax.set_title('实验结果对比')
ax.legend(frameon=False)
return fig, ax
2. Monitoring Real-time Data (Interactive)
2. 实时数据监控(交互式)
Use this in a Jupyter environment or script
在Jupyter环境或脚本中使用
plt.ion() # Interactive mode on
fig, ax = plt.subplots()
line, = ax.plot([], [])
for i in range(100):
new_data = np.random.rand(10)
line.set_data(np.arange(len(new_data)), new_data)
ax.relim()
ax.autoscale_view()
fig.canvas.draw()
fig.canvas.flush_events()
plt.pause(0.1)
plt.ion() # 开启交互模式
fig, ax = plt.subplots()
line, = ax.plot([], [])
for i in range(100):
new_data = np.random.rand(10)
line.set_data(np.arange(len(new_data)), new_data)
ax.relim()
ax.autoscale_view()
fig.canvas.draw()
fig.canvas.flush_events()
plt.pause(0.1)
3. Creating a Cluster Map / Correlation Matrix
3. 创建聚类图/相关矩阵
python
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 4), columns=['A', 'B', 'C', 'D'])
corr = df.corr()
fig, ax = plt.subplots()
im = ax.imshow(corr, cmap='RdBu_r', vmin=-1, vmax=1)
ax.set_xticks(np.arange(len(corr.columns)), labels=corr.columns)
ax.set_yticks(np.arange(len(corr.index)), labels=corr.index)
python
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 4), columns=['A', 'B', 'C', 'D'])
corr = df.corr()
fig, ax = plt.subplots()
im = ax.imshow(corr, cmap='RdBu_r', vmin=-1, vmax=1)
ax.set_xticks(np.arange(len(corr.columns)), labels=corr.columns)
ax.set_yticks(np.arange(len(corr.index)), labels=corr.index)
Loop over data dimensions and create text annotations.
遍历数据维度并创建文本注释
for i in range(len(corr.index)):
for j in range(len(corr.columns)):
text = ax.text(j, i, f"{corr.iloc[i, j]:.2f}",
ha="center", va="center", color="black")
for i in range(len(corr.index)):
for j in range(len(corr.columns)):
text = ax.text(j, i, f"{corr.iloc[i, j]:.2f}",
ha="center", va="center", color="black")
Performance Optimization
性能优化
Plotting Large Data
大规模数据绘图
1. Use 'agg' backend for non-interactive rendering
1. 使用'agg'后端进行非交互式渲染
import matplotlib
matplotlib.use('Agg')
import matplotlib
matplotlib.use('Agg')
2. Use PathCollection for scatter plots with many points
2. 对大量点的散点图使用PathCollection
ax.scatter(x, y, s=1) # slow for 1M points
ax.scatter(x, y, s=1) # 100万条数据时速度慢
3. Use marker='' (none) and only lines for speed
3. 使用marker=''(无标记)仅绘制线条以提升速度
ax.plot(x, y, marker=None)
ax.plot(x, y, marker=None)
4. Decimate data before plotting
4. 绘图前对数据进行降采样
ax.plot(x[::10], y[::10]) # Plot every 10th point
ax.plot(x[::10], y[::10]) # 每10个点绘制一个
Common Pitfalls and Solutions
常见问题及解决方案
Date/Time Axis issues
日期/时间坐标轴问题
❌ Problem: Dates look like a black blob
❌ 问题:日期显示成黑色块
✅ Solution: Use AutoDateLocator and AutoDateFormatter
✅ 解决方案:使用AutoDateLocator和AutoDateFormatter
import matplotlib.dates as mdates
fig, ax = plt.subplots()
ax.plot(dates, values)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
fig.autofmt_xdate() # Rotates labels
import matplotlib.dates as mdates
fig, ax = plt.subplots()
ax.plot(dates, values)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
fig.autofmt_xdate() # 旋转标签
Multiple Legends on one plot
单个图表上的多个图例
❌ Problem: Calling ax.legend() twice replaces the first one
❌ 问题:调用ax.legend()两次会替换第一个图例
✅ Solution: Manually add the first artist back
✅ 解决方案:手动将第一个图例元素添加回去
fig, ax = plt.subplots()
line1, = ax.plot([1, 2], [1, 2], label='Line 1')
line2, = ax.plot([1, 2], [2, 1], label='Line 2')
first_legend = ax.legend(handles=[line1], loc='upper left')
ax.add_artist(first_legend) # Add back
ax.legend(handles=[line2], loc='lower right')
fig, ax = plt.subplots()
line1, = ax.plot([1, 2], [1, 2], label='Line 1')
line2, = ax.plot([1, 2], [2, 1], label='Line 2')
first_legend = ax.legend(handles=[line1], loc='upper left')
ax.add_artist(first_legend) # 添加回去
ax.legend(handles=[line2], loc='lower right')
Image Saving Quality (Clipping)
图片保存质量(截断问题)
❌ Problem: Legend or Axis title is cut off in the .png file
❌ 问题:.png文件中的图例或坐标轴标题被截断
fig.savefig('output.png', bbox_inches='tight')
fig.savefig('output.png', bbox_inches='tight')
- Always use the OO interface () for scripts and modules
- Save figures with appropriate formats - Use PDF/SVG for publications, PNG for web
- Set DPI appropriately - 300+ for print, 72-100 for screen
- Use when saving to prevent clipping
- Close figures in loops to prevent memory leaks
- Use colorblind-friendly colormaps - Avoid 'jet', prefer 'viridis', 'plasma', 'inferno'
- Label all axes with descriptive names and units
- Use for subplots to prevent overlap
- Configure global styles with for consistency
- Test plots at target resolution before finalizing
- 始终使用面向对象接口()用于脚本和模块
- 使用合适的格式保存图表——出版物用PDF/SVG,网页用PNG
- 适当设置DPI——印刷用300+,屏幕用72-100
- **保存时使用**防止内容截断
- 循环中关闭图表避免内存泄漏
- 使用色盲友好的色阶——避免'jet',优先选择'viridis'、'plasma'、'inferno'
- 为所有坐标轴添加标签,包含描述性名称和单位
- **子图使用**避免重叠
- 使用配置全局样式保持一致性
- 最终确定前在目标分辨率下测试图表