data-visualization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseData Visualization Skill
数据可视化技能
Chart selection guidance, Python visualization code patterns, design principles, and accessibility considerations for creating effective data visualizations.
包含图表选择指南、Python可视化代码模板、设计原则,以及创建高效数据可视化内容时的可访问性注意事项。
Chart Selection Guide
图表选择指南
Choose by Data Relationship
按数据关系选择
| What You're Showing | Best Chart | Alternatives |
|---|---|---|
| Trend over time | Line chart | Area chart (if showing cumulative or composition) |
| Comparison across categories | Vertical bar chart | Horizontal bar (many categories), lollipop chart |
| Ranking | Horizontal bar chart | Dot plot, slope chart (comparing two periods) |
| Part-to-whole composition | Stacked bar chart | Treemap (hierarchical), waffle chart |
| Composition over time | Stacked area chart | 100% stacked bar (for proportion focus) |
| Distribution | Histogram | Box plot (comparing groups), violin plot, strip plot |
| Correlation (2 variables) | Scatter plot | Bubble chart (add 3rd variable as size) |
| Correlation (many variables) | Heatmap (correlation matrix) | Pair plot |
| Geographic patterns | Choropleth map | Bubble map, hex map |
| Flow / process | Sankey diagram | Funnel chart (sequential stages) |
| Relationship network | Network graph | Chord diagram |
| Performance vs. target | Bullet chart | Gauge (single KPI only) |
| Multiple KPIs at once | Small multiples | Dashboard with separate charts |
| 展示内容 | 最佳图表 | 替代方案 |
|---|---|---|
| 时间趋势 | 折线图 | 面积图(展示累计或构成情况时) |
| 跨类别对比 | 垂直柱状图 | 水平柱状图(类别较多时)、棒棒糖图 |
| 排名情况 | 水平柱状图 | 点图、斜率图(对比两个时期) |
| 整体构成占比 | 堆叠柱状图 | 树形图(层级数据)、华夫图 |
| 随时间变化的构成 | 堆叠面积图 | 100%堆叠柱状图(侧重占比时) |
| 数据分布 | 直方图 | 箱线图(对比分组)、小提琴图、散点条图 |
| 双变量相关性 | 散点图 | 气泡图(用大小表示第三个变量) |
| 多变量相关性 | 热力图(相关矩阵) | 配对图 |
| 地理分布模式 | 分级统计图 | 气泡地图、六边形地图 |
| 流程/流向 | 桑基图 | 漏斗图(展示阶段顺序) |
| 关系网络 | 网络图 | 弦图 |
| 实际 vs 目标绩效 | 子弹图 | 仪表盘(仅适用于单个KPI) |
| 多KPI同时展示 | 小多图 | 包含多个独立图表的仪表盘 |
When NOT to Use Certain Charts
需避免使用的图表场景
- Pie charts: Avoid unless <6 categories and exact proportions matter less than rough comparison. Humans are bad at comparing angles. Use bar charts instead.
- 3D charts: Never. They distort perception and add no information.
- Dual-axis charts: Use cautiously. They can mislead by implying correlation. Clearly label both axes if used.
- Stacked bar (many categories): Hard to compare middle segments. Use small multiples or grouped bars instead.
- Donut charts: Slightly better than pie charts but same fundamental issues. Use for single KPI display at most.
- 饼图:除非类别少于6个且无需精确对比占比。人类对角度的感知能力较差,建议改用柱状图。
- 3D图表:绝对不要使用。它们会扭曲视觉感知,且无额外信息价值。
- 双轴图表:谨慎使用。容易误导用户,暗示数据间存在相关性。若必须使用,需清晰标注两个坐标轴。
- 多类别堆叠柱状图:中间分段难以对比,建议改用小多图或分组柱状图。
- 环形图:仅比饼图略好,但存在相同的本质问题,最多用于单个KPI展示。
Python Visualization Code Patterns
Python可视化代码模板
Setup and Style
环境配置与样式设置
python
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
import pandas as pd
import numpy as nppython
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
import pandas as pd
import numpy as npProfessional style setup
专业样式配置
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({
'figure.figsize': (10, 6),
'figure.dpi': 150,
'font.size': 11,
'axes.titlesize': 14,
'axes.titleweight': 'bold',
'axes.labelsize': 11,
'xtick.labelsize': 10,
'ytick.labelsize': 10,
'legend.fontsize': 10,
'figure.titlesize': 16,
})
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({
'figure.figsize': (10, 6),
'figure.dpi': 150,
'font.size': 11,
'axes.titlesize': 14,
'axes.titleweight': 'bold',
'axes.labelsize': 11,
'xtick.labelsize': 10,
'ytick.labelsize': 10,
'legend.fontsize': 10,
'figure.titlesize': 16,
})
Colorblind-friendly palettes
适用于色弱人群的配色方案
PALETTE_CATEGORICAL = ['#4C72B0', '#DD8452', '#55A868', '#C44E52', '#8172B3', '#937860']
PALETTE_SEQUENTIAL = 'YlOrRd'
PALETTE_DIVERGING = 'RdBu_r'
undefinedPALETTE_CATEGORICAL = ['#4C72B0', '#DD8452', '#55A868', '#C44E52', '#8172B3', '#937860']
PALETTE_SEQUENTIAL = 'YlOrRd'
PALETTE_DIVERGING = 'RdBu_r'
undefinedLine Chart (Time Series)
折线图(时间序列)
python
fig, ax = plt.subplots(figsize=(10, 6))
for label, group in df.groupby('category'):
ax.plot(group['date'], group['value'], label=label, linewidth=2)
ax.set_title('Metric Trend by Category', fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Value')
ax.legend(loc='upper left', frameon=True)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)python
fig, ax = plt.subplots(figsize=(10, 6))
for label, group in df.groupby('category'):
ax.plot(group['date'], group['value'], label=label, linewidth=2)
ax.set_title('各类别指标趋势', fontweight='bold')
ax.set_xlabel('日期')
ax.set_ylabel('数值')
ax.legend(loc='upper left', frameon=True)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)Format dates on x-axis
格式化X轴日期
fig.autofmt_xdate()
plt.tight_layout()
plt.savefig('trend_chart.png', dpi=150, bbox_inches='tight')
undefinedfig.autofmt_xdate()
plt.tight_layout()
plt.savefig('trend_chart.png', dpi=150, bbox_inches='tight')
undefinedBar Chart (Comparison)
柱状图(对比分析)
python
fig, ax = plt.subplots(figsize=(10, 6))python
fig, ax = plt.subplots(figsize=(10, 6))Sort by value for easy reading
按数值排序,提升可读性
df_sorted = df.sort_values('metric', ascending=True)
bars = ax.barh(df_sorted['category'], df_sorted['metric'], color=PALETTE_CATEGORICAL[0])
df_sorted = df.sort_values('metric', ascending=True)
bars = ax.barh(df_sorted['category'], df_sorted['metric'], color=PALETTE_CATEGORICAL[0])
Add value labels
添加数值标签
for bar in bars:
width = bar.get_width()
ax.text(width + 0.5, bar.get_y() + bar.get_height()/2,
f'{width:,.0f}', ha='left', va='center', fontsize=10)
ax.set_title('Metric by Category (Ranked)', fontweight='bold')
ax.set_xlabel('Metric Value')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('bar_chart.png', dpi=150, bbox_inches='tight')
undefinedfor bar in bars:
width = bar.get_width()
ax.text(width + 0.5, bar.get_y() + bar.get_height()/2,
f'{width:,.0f}', ha='left', va='center', fontsize=10)
ax.set_title('各类别指标排名', fontweight='bold')
ax.set_xlabel('指标数值')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('bar_chart.png', dpi=150, bbox_inches='tight')
undefinedHistogram (Distribution)
直方图(分布情况)
python
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(df['value'], bins=30, color=PALETTE_CATEGORICAL[0], edgecolor='white', alpha=0.8)python
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(df['value'], bins=30, color=PALETTE_CATEGORICAL[0], edgecolor='white', alpha=0.8)Add mean and median lines
添加均值和中位数线
mean_val = df['value'].mean()
median_val = df['value'].median()
ax.axvline(mean_val, color='red', linestyle='--', linewidth=1.5, label=f'Mean: {mean_val:,.1f}')
ax.axvline(median_val, color='green', linestyle='--', linewidth=1.5, label=f'Median: {median_val:,.1f}')
ax.set_title('Distribution of Values', fontweight='bold')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.legend()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('histogram.png', dpi=150, bbox_inches='tight')
undefinedmean_val = df['value'].mean()
median_val = df['value'].median()
ax.axvline(mean_val, color='red', linestyle='--', linewidth=1.5, label=f'均值: {mean_val:,.1f}')
ax.axvline(median_val, color='green', linestyle='--', linewidth=1.5, label=f'中位数: {median_val:,.1f}')
ax.set_title('数值分布情况', fontweight='bold')
ax.set_xlabel('数值')
ax.set_ylabel('频次')
ax.legend()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('histogram.png', dpi=150, bbox_inches='tight')
undefinedHeatmap
热力图
python
fig, ax = plt.subplots(figsize=(10, 8))python
fig, ax = plt.subplots(figsize=(10, 8))Pivot data for heatmap format
转换数据为热力图格式
pivot = df.pivot_table(index='row_dim', columns='col_dim', values='metric', aggfunc='sum')
sns.heatmap(pivot, annot=True, fmt=',.0f', cmap='YlOrRd',
linewidths=0.5, ax=ax, cbar_kws={'label': 'Metric Value'})
ax.set_title('Metric by Row Dimension and Column Dimension', fontweight='bold')
ax.set_xlabel('Column Dimension')
ax.set_ylabel('Row Dimension')
plt.tight_layout()
plt.savefig('heatmap.png', dpi=150, bbox_inches='tight')
undefinedpivot = df.pivot_table(index='row_dim', columns='col_dim', values='metric', aggfunc='sum')
sns.heatmap(pivot, annot=True, fmt=',.0f', cmap='YlOrRd',
linewidths=0.5, ax=ax, cbar_kws={'label': '指标数值'})
ax.set_title('行维度与列维度的指标分布', fontweight='bold')
ax.set_xlabel('列维度')
ax.set_ylabel('行维度')
plt.tight_layout()
plt.savefig('heatmap.png', dpi=150, bbox_inches='tight')
undefinedSmall Multiples
小多图
python
categories = df['category'].unique()
n_cats = len(categories)
n_cols = min(3, n_cats)
n_rows = (n_cats + n_cols - 1) // n_cols
fig, axes = plt.subplots(n_rows, n_cols, figsize=(5*n_cols, 4*n_rows), sharex=True, sharey=True)
axes = axes.flatten() if n_cats > 1 else [axes]
for i, cat in enumerate(categories):
ax = axes[i]
subset = df[df['category'] == cat]
ax.plot(subset['date'], subset['value'], color=PALETTE_CATEGORICAL[i % len(PALETTE_CATEGORICAL)])
ax.set_title(cat, fontsize=12)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)python
categories = df['category'].unique()
n_cats = len(categories)
n_cols = min(3, n_cats)
n_rows = (n_cats + n_cols - 1) // n_cols
fig, axes = plt.subplots(n_rows, n_cols, figsize=(5*n_cols, 4*n_rows), sharex=True, sharey=True)
axes = axes.flatten() if n_cats > 1 else [axes]
for i, cat in enumerate(categories):
ax = axes[i]
subset = df[df['category'] == cat]
ax.plot(subset['date'], subset['value'], color=PALETTE_CATEGORICAL[i % len(PALETTE_CATEGORICAL)])
ax.set_title(cat, fontsize=12)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)Hide empty subplots
隐藏空的子图
for j in range(i+1, len(axes)):
axes[j].set_visible(False)
fig.suptitle('Trends by Category', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('small_multiples.png', dpi=150, bbox_inches='tight')
undefinedfor j in range(i+1, len(axes)):
axes[j].set_visible(False)
fig.suptitle('各类别趋势对比', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('small_multiples.png', dpi=150, bbox_inches='tight')
undefinedNumber Formatting Helpers
数值格式化工具
python
def format_number(val, format_type='number'):
"""Format numbers for chart labels."""
if format_type == 'currency':
if abs(val) >= 1e9:
return f'${val/1e9:.1f}B'
elif abs(val) >= 1e6:
return f'${val/1e6:.1f}M'
elif abs(val) >= 1e3:
return f'${val/1e3:.1f}K'
else:
return f'${val:,.0f}'
elif format_type == 'percent':
return f'{val:.1f}%'
elif format_type == 'number':
if abs(val) >= 1e9:
return f'{val/1e9:.1f}B'
elif abs(val) >= 1e6:
return f'{val/1e6:.1f}M'
elif abs(val) >= 1e3:
return f'{val/1e3:.1f}K'
else:
return f'{val:,.0f}'
return str(val)python
def format_number(val, format_type='number'):
"""为图表标签格式化数值。"""
if format_type == 'currency':
if abs(val) >= 1e9:
return f'${val/1e9:.1f}B'
elif abs(val) >= 1e6:
return f'${val/1e6:.1f}M'
elif abs(val) >= 1e3:
return f'${val/1e3:.1f}K'
else:
return f'${val:,.0f}'
elif format_type == 'percent':
return f'{val:.1f}%'
elif format_type == 'number':
if abs(val) >= 1e9:
return f'{val/1e9:.1f}B'
elif abs(val) >= 1e6:
return f'{val/1e6:.1f}M'
elif abs(val) >= 1e3:
return f'{val/1e3:.1f}K'
else:
return f'{val:,.0f}'
return str(val)Usage with axis formatter
与坐标轴格式化工具配合使用
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: format_number(x, 'currency')))
undefinedax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: format_number(x, 'currency')))
undefinedInteractive Charts with Plotly
基于Plotly的交互式图表
python
import plotly.express as px
import plotly.graph_objects as gopython
import plotly.express as px
import plotly.graph_objects as goSimple interactive line chart
简单交互式折线图
fig = px.line(df, x='date', y='value', color='category',
title='Interactive Metric Trend',
labels={'value': 'Metric Value', 'date': 'Date'})
fig.update_layout(hovermode='x unified')
fig.write_html('interactive_chart.html')
fig.show()
fig = px.line(df, x='date', y='value', color='category',
title='交互式指标趋势',
labels={'value': '指标数值', 'date': '日期'})
fig.update_layout(hovermode='x unified')
fig.write_html('interactive_chart.html')
fig.show()
Interactive scatter with hover data
带悬浮信息的交互式散点图
fig = px.scatter(df, x='metric_a', y='metric_b', color='category',
size='size_metric', hover_data=['name', 'detail_field'],
title='Correlation Analysis')
fig.show()
undefinedfig = px.scatter(df, x='metric_a', y='metric_b', color='category',
size='size_metric', hover_data=['name', 'detail_field'],
title='相关性分析')
fig.show()
undefinedDesign Principles
设计原则
Color
色彩使用
- Use color purposefully: Color should encode data, not decorate
- Highlight the story: Use a bright accent color for the key insight; grey everything else
- Sequential data: Use a single-hue gradient (light to dark) for ordered values
- Diverging data: Use a two-hue gradient with neutral midpoint for data with a meaningful center
- Categorical data: Use distinct hues, maximum 6-8 before it gets confusing
- Avoid red/green only: 8% of men are red-green colorblind. Use blue/orange as primary pair
- 有目的性地使用色彩:色彩应用于编码数据,而非装饰
- 突出核心结论:用明亮的强调色突出关键洞察,其余内容用灰色弱化
- 连续型数据:使用单色调渐变(从浅到深)表示有序数值
- 发散型数据:使用双色调渐变搭配中性中点,展示带有明确基准的数据集
- 分类数据:使用区分度高的色调,最多6-8个类别,避免混淆
- 避免仅用红绿色:8%的男性存在红绿色盲,优先使用蓝橙色配对
Typography
排版规范
- Title states the insight: "Revenue grew 23% YoY" beats "Revenue by Month"
- Subtitle adds context: Date range, filters applied, data source
- Axis labels are readable: Never rotated 90 degrees if avoidable. Shorten or wrap instead
- Data labels add precision: Use on key points, not every single bar
- Annotation highlights: Call out specific points with text annotations
- 标题直接点明洞察:“营收同比增长23%”优于“月度营收情况”
- 副标题补充上下文:包含时间范围、筛选条件、数据来源
- 坐标轴标签易读:尽量避免90度旋转,可缩短文字或换行
- 数据标签提升精度:仅在关键位置添加,无需为每个柱状图都添加
- 标注突出重点:用文本标注强调特定数据点
Layout
布局设计
- Reduce chart junk: Remove gridlines, borders, backgrounds that don't carry information
- Sort meaningfully: Categories sorted by value (not alphabetically) unless there's a natural order (months, stages)
- Appropriate aspect ratio: Time series wider than tall (3:1 to 2:1); comparisons can be squarer
- White space is good: Don't cram charts together. Give each visualization room to breathe
- 减少冗余元素:移除无信息价值的网格线、边框、背景
- 合理排序:按数值排序(而非字母顺序),除非存在自然顺序(如月份、流程阶段)
- 合适的宽高比:时间序列图宽大于高(3:1至2:1);对比图可接近正方形
- 合理留白:不要过度拥挤,为每个可视化内容预留空间
Accuracy
准确性要求
- Bar charts start at zero: Always. A bar from 95 to 100 exaggerates a 5% difference
- Line charts can have non-zero baselines: When the range of variation is meaningful
- Consistent scales across panels: When comparing multiple charts, use the same axis range
- Show uncertainty: Error bars, confidence intervals, or ranges when data is uncertain
- Label your axes: Never make the reader guess what the numbers mean
- 柱状图必须从零开始:否则会夸大差异,比如从95到100的柱状图会放大5%的变化
- 折线图可使用非零基线:当变化范围本身具有意义时
- 多图保持刻度一致:对比多个图表时,使用相同的坐标轴范围
- 展示不确定性:数据存在不确定性时,添加误差线、置信区间或范围
- 标注坐标轴:绝对不要让用户猜测数值含义
Accessibility Considerations
可访问性注意事项
Color Blindness
色弱适配
- Never rely on color alone to distinguish data series
- Add pattern fills, different line styles (solid, dashed, dotted), or direct labels
- Test with a colorblind simulator (e.g., Coblis, Sim Daltonism)
- Use the colorblind-friendly palette:
sns.color_palette("colorblind")
- 绝不单独依赖色彩区分数据系列
- 添加图案填充、不同线条样式(实线、虚线、点线)或直接标注
- 使用色弱模拟器测试(如Coblis、Sim Daltonism)
- 使用色弱友好配色:
sns.color_palette("colorblind")
Screen Readers
屏幕阅读器适配
- Include alt text describing the chart's key finding
- Provide a data table alternative alongside the visualization
- Use semantic titles and labels
- 包含描述图表核心结论的替代文本
- 同时提供数据表格作为可视化内容的替代方案
- 使用语义化的标题和标签
General Accessibility
通用可访问性
- Sufficient contrast between data elements and background
- Text size minimum 10pt for labels, 12pt for titles
- Avoid conveying information only through spatial position (add labels)
- Consider printing: does the chart work in black and white?
- 数据元素与背景间有足够的对比度
- 标签最小字号10pt,标题最小12pt
- 避免仅通过空间位置传递信息(添加标签)
- 考虑打印场景:图表在黑白模式下是否仍可正常阅读?
Accessibility Checklist
可访问性检查清单
Before sharing a visualization:
- Chart works without color (patterns, labels, or line styles differentiate series)
- Text is readable at standard zoom level
- Title describes the insight, not just the data
- Axes are labeled with units
- Legend is clear and positioned without obscuring data
- Data source and date range are noted
分享可视化内容前,请确认:
- 无需依赖色彩即可区分数据系列(通过图案、标签或线条样式)
- 标准缩放比例下文本清晰可读
- 标题描述核心洞察,而非仅说明数据内容
- 坐标轴标注了单位
- 图例清晰,且位置不会遮挡数据
- 标注了数据来源和时间范围 ",