data-visualization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Visualization Skill

数据可视化技能

Chart selection guidance, Python visualization code patterns, design principles, and accessibility considerations for creating effective data visualizations.
包含图表选择指南、Python可视化代码模板、设计原则,以及创建高效数据可视化内容时的可访问性注意事项。

Chart Selection Guide

图表选择指南

Choose by Data Relationship

按数据关系选择

What You're ShowingBest ChartAlternatives
Trend over timeLine chartArea chart (if showing cumulative or composition)
Comparison across categoriesVertical bar chartHorizontal bar (many categories), lollipop chart
RankingHorizontal bar chartDot plot, slope chart (comparing two periods)
Part-to-whole compositionStacked bar chartTreemap (hierarchical), waffle chart
Composition over timeStacked area chart100% stacked bar (for proportion focus)
DistributionHistogramBox plot (comparing groups), violin plot, strip plot
Correlation (2 variables)Scatter plotBubble chart (add 3rd variable as size)
Correlation (many variables)Heatmap (correlation matrix)Pair plot
Geographic patternsChoropleth mapBubble map, hex map
Flow / processSankey diagramFunnel chart (sequential stages)
Relationship networkNetwork graphChord diagram
Performance vs. targetBullet chartGauge (single KPI only)
Multiple KPIs at onceSmall multiplesDashboard with separate charts
展示内容最佳图表替代方案
时间趋势折线图面积图(展示累计或构成情况时)
跨类别对比垂直柱状图水平柱状图(类别较多时)、棒棒糖图
排名情况水平柱状图点图、斜率图(对比两个时期)
整体构成占比堆叠柱状图树形图(层级数据)、华夫图
随时间变化的构成堆叠面积图100%堆叠柱状图(侧重占比时)
数据分布直方图箱线图(对比分组)、小提琴图、散点条图
双变量相关性散点图气泡图(用大小表示第三个变量)
多变量相关性热力图(相关矩阵)配对图
地理分布模式分级统计图气泡地图、六边形地图
流程/流向桑基图漏斗图(展示阶段顺序)
关系网络网络图弦图
实际 vs 目标绩效子弹图仪表盘(仅适用于单个KPI)
多KPI同时展示小多图包含多个独立图表的仪表盘

When NOT to Use Certain Charts

需避免使用的图表场景

  • Pie charts: Avoid unless <6 categories and exact proportions matter less than rough comparison. Humans are bad at comparing angles. Use bar charts instead.
  • 3D charts: Never. They distort perception and add no information.
  • Dual-axis charts: Use cautiously. They can mislead by implying correlation. Clearly label both axes if used.
  • Stacked bar (many categories): Hard to compare middle segments. Use small multiples or grouped bars instead.
  • Donut charts: Slightly better than pie charts but same fundamental issues. Use for single KPI display at most.
  • 饼图:除非类别少于6个且无需精确对比占比。人类对角度的感知能力较差,建议改用柱状图。
  • 3D图表:绝对不要使用。它们会扭曲视觉感知,且无额外信息价值。
  • 双轴图表:谨慎使用。容易误导用户,暗示数据间存在相关性。若必须使用,需清晰标注两个坐标轴。
  • 多类别堆叠柱状图:中间分段难以对比,建议改用小多图或分组柱状图。
  • 环形图:仅比饼图略好,但存在相同的本质问题,最多用于单个KPI展示。

Python Visualization Code Patterns

Python可视化代码模板

Setup and Style

环境配置与样式设置

python
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
import pandas as pd
import numpy as np
python
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
import pandas as pd
import numpy as np

Professional style setup

专业样式配置

plt.style.use('seaborn-v0_8-whitegrid') plt.rcParams.update({ 'figure.figsize': (10, 6), 'figure.dpi': 150, 'font.size': 11, 'axes.titlesize': 14, 'axes.titleweight': 'bold', 'axes.labelsize': 11, 'xtick.labelsize': 10, 'ytick.labelsize': 10, 'legend.fontsize': 10, 'figure.titlesize': 16, })
plt.style.use('seaborn-v0_8-whitegrid') plt.rcParams.update({ 'figure.figsize': (10, 6), 'figure.dpi': 150, 'font.size': 11, 'axes.titlesize': 14, 'axes.titleweight': 'bold', 'axes.labelsize': 11, 'xtick.labelsize': 10, 'ytick.labelsize': 10, 'legend.fontsize': 10, 'figure.titlesize': 16, })

Colorblind-friendly palettes

适用于色弱人群的配色方案

PALETTE_CATEGORICAL = ['#4C72B0', '#DD8452', '#55A868', '#C44E52', '#8172B3', '#937860'] PALETTE_SEQUENTIAL = 'YlOrRd' PALETTE_DIVERGING = 'RdBu_r'
undefined
PALETTE_CATEGORICAL = ['#4C72B0', '#DD8452', '#55A868', '#C44E52', '#8172B3', '#937860'] PALETTE_SEQUENTIAL = 'YlOrRd' PALETTE_DIVERGING = 'RdBu_r'
undefined

Line Chart (Time Series)

折线图(时间序列)

python
fig, ax = plt.subplots(figsize=(10, 6))

for label, group in df.groupby('category'):
    ax.plot(group['date'], group['value'], label=label, linewidth=2)

ax.set_title('Metric Trend by Category', fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Value')
ax.legend(loc='upper left', frameon=True)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
python
fig, ax = plt.subplots(figsize=(10, 6))

for label, group in df.groupby('category'):
    ax.plot(group['date'], group['value'], label=label, linewidth=2)

ax.set_title('各类别指标趋势', fontweight='bold')
ax.set_xlabel('日期')
ax.set_ylabel('数值')
ax.legend(loc='upper left', frameon=True)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

Format dates on x-axis

格式化X轴日期

fig.autofmt_xdate()
plt.tight_layout() plt.savefig('trend_chart.png', dpi=150, bbox_inches='tight')
undefined
fig.autofmt_xdate()
plt.tight_layout() plt.savefig('trend_chart.png', dpi=150, bbox_inches='tight')
undefined

Bar Chart (Comparison)

柱状图(对比分析)

python
fig, ax = plt.subplots(figsize=(10, 6))
python
fig, ax = plt.subplots(figsize=(10, 6))

Sort by value for easy reading

按数值排序,提升可读性

df_sorted = df.sort_values('metric', ascending=True)
bars = ax.barh(df_sorted['category'], df_sorted['metric'], color=PALETTE_CATEGORICAL[0])
df_sorted = df.sort_values('metric', ascending=True)
bars = ax.barh(df_sorted['category'], df_sorted['metric'], color=PALETTE_CATEGORICAL[0])

Add value labels

添加数值标签

for bar in bars: width = bar.get_width() ax.text(width + 0.5, bar.get_y() + bar.get_height()/2, f'{width:,.0f}', ha='left', va='center', fontsize=10)
ax.set_title('Metric by Category (Ranked)', fontweight='bold') ax.set_xlabel('Metric Value') ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False)
plt.tight_layout() plt.savefig('bar_chart.png', dpi=150, bbox_inches='tight')
undefined
for bar in bars: width = bar.get_width() ax.text(width + 0.5, bar.get_y() + bar.get_height()/2, f'{width:,.0f}', ha='left', va='center', fontsize=10)
ax.set_title('各类别指标排名', fontweight='bold') ax.set_xlabel('指标数值') ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False)
plt.tight_layout() plt.savefig('bar_chart.png', dpi=150, bbox_inches='tight')
undefined

Histogram (Distribution)

直方图(分布情况)

python
fig, ax = plt.subplots(figsize=(10, 6))

ax.hist(df['value'], bins=30, color=PALETTE_CATEGORICAL[0], edgecolor='white', alpha=0.8)
python
fig, ax = plt.subplots(figsize=(10, 6))

ax.hist(df['value'], bins=30, color=PALETTE_CATEGORICAL[0], edgecolor='white', alpha=0.8)

Add mean and median lines

添加均值和中位数线

mean_val = df['value'].mean() median_val = df['value'].median() ax.axvline(mean_val, color='red', linestyle='--', linewidth=1.5, label=f'Mean: {mean_val:,.1f}') ax.axvline(median_val, color='green', linestyle='--', linewidth=1.5, label=f'Median: {median_val:,.1f}')
ax.set_title('Distribution of Values', fontweight='bold') ax.set_xlabel('Value') ax.set_ylabel('Frequency') ax.legend() ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False)
plt.tight_layout() plt.savefig('histogram.png', dpi=150, bbox_inches='tight')
undefined
mean_val = df['value'].mean() median_val = df['value'].median() ax.axvline(mean_val, color='red', linestyle='--', linewidth=1.5, label=f'均值: {mean_val:,.1f}') ax.axvline(median_val, color='green', linestyle='--', linewidth=1.5, label=f'中位数: {median_val:,.1f}')
ax.set_title('数值分布情况', fontweight='bold') ax.set_xlabel('数值') ax.set_ylabel('频次') ax.legend() ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False)
plt.tight_layout() plt.savefig('histogram.png', dpi=150, bbox_inches='tight')
undefined

Heatmap

热力图

python
fig, ax = plt.subplots(figsize=(10, 8))
python
fig, ax = plt.subplots(figsize=(10, 8))

Pivot data for heatmap format

转换数据为热力图格式

pivot = df.pivot_table(index='row_dim', columns='col_dim', values='metric', aggfunc='sum')
sns.heatmap(pivot, annot=True, fmt=',.0f', cmap='YlOrRd', linewidths=0.5, ax=ax, cbar_kws={'label': 'Metric Value'})
ax.set_title('Metric by Row Dimension and Column Dimension', fontweight='bold') ax.set_xlabel('Column Dimension') ax.set_ylabel('Row Dimension')
plt.tight_layout() plt.savefig('heatmap.png', dpi=150, bbox_inches='tight')
undefined
pivot = df.pivot_table(index='row_dim', columns='col_dim', values='metric', aggfunc='sum')
sns.heatmap(pivot, annot=True, fmt=',.0f', cmap='YlOrRd', linewidths=0.5, ax=ax, cbar_kws={'label': '指标数值'})
ax.set_title('行维度与列维度的指标分布', fontweight='bold') ax.set_xlabel('列维度') ax.set_ylabel('行维度')
plt.tight_layout() plt.savefig('heatmap.png', dpi=150, bbox_inches='tight')
undefined

Small Multiples

小多图

python
categories = df['category'].unique()
n_cats = len(categories)
n_cols = min(3, n_cats)
n_rows = (n_cats + n_cols - 1) // n_cols

fig, axes = plt.subplots(n_rows, n_cols, figsize=(5*n_cols, 4*n_rows), sharex=True, sharey=True)
axes = axes.flatten() if n_cats > 1 else [axes]

for i, cat in enumerate(categories):
    ax = axes[i]
    subset = df[df['category'] == cat]
    ax.plot(subset['date'], subset['value'], color=PALETTE_CATEGORICAL[i % len(PALETTE_CATEGORICAL)])
    ax.set_title(cat, fontsize=12)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
python
categories = df['category'].unique()
n_cats = len(categories)
n_cols = min(3, n_cats)
n_rows = (n_cats + n_cols - 1) // n_cols

fig, axes = plt.subplots(n_rows, n_cols, figsize=(5*n_cols, 4*n_rows), sharex=True, sharey=True)
axes = axes.flatten() if n_cats > 1 else [axes]

for i, cat in enumerate(categories):
    ax = axes[i]
    subset = df[df['category'] == cat]
    ax.plot(subset['date'], subset['value'], color=PALETTE_CATEGORICAL[i % len(PALETTE_CATEGORICAL)])
    ax.set_title(cat, fontsize=12)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)

Hide empty subplots

隐藏空的子图

for j in range(i+1, len(axes)): axes[j].set_visible(False)
fig.suptitle('Trends by Category', fontsize=14, fontweight='bold', y=1.02) plt.tight_layout() plt.savefig('small_multiples.png', dpi=150, bbox_inches='tight')
undefined
for j in range(i+1, len(axes)): axes[j].set_visible(False)
fig.suptitle('各类别趋势对比', fontsize=14, fontweight='bold', y=1.02) plt.tight_layout() plt.savefig('small_multiples.png', dpi=150, bbox_inches='tight')
undefined

Number Formatting Helpers

数值格式化工具

python
def format_number(val, format_type='number'):
    """Format numbers for chart labels."""
    if format_type == 'currency':
        if abs(val) >= 1e9:
            return f'${val/1e9:.1f}B'
        elif abs(val) >= 1e6:
            return f'${val/1e6:.1f}M'
        elif abs(val) >= 1e3:
            return f'${val/1e3:.1f}K'
        else:
            return f'${val:,.0f}'
    elif format_type == 'percent':
        return f'{val:.1f}%'
    elif format_type == 'number':
        if abs(val) >= 1e9:
            return f'{val/1e9:.1f}B'
        elif abs(val) >= 1e6:
            return f'{val/1e6:.1f}M'
        elif abs(val) >= 1e3:
            return f'{val/1e3:.1f}K'
        else:
            return f'{val:,.0f}'
    return str(val)
python
def format_number(val, format_type='number'):
    """为图表标签格式化数值。"""
    if format_type == 'currency':
        if abs(val) >= 1e9:
            return f'${val/1e9:.1f}B'
        elif abs(val) >= 1e6:
            return f'${val/1e6:.1f}M'
        elif abs(val) >= 1e3:
            return f'${val/1e3:.1f}K'
        else:
            return f'${val:,.0f}'
    elif format_type == 'percent':
        return f'{val:.1f}%'
    elif format_type == 'number':
        if abs(val) >= 1e9:
            return f'{val/1e9:.1f}B'
        elif abs(val) >= 1e6:
            return f'{val/1e6:.1f}M'
        elif abs(val) >= 1e3:
            return f'{val/1e3:.1f}K'
        else:
            return f'{val:,.0f}'
    return str(val)

Usage with axis formatter

与坐标轴格式化工具配合使用

ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: format_number(x, 'currency')))
undefined
ax.yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: format_number(x, 'currency')))
undefined

Interactive Charts with Plotly

基于Plotly的交互式图表

python
import plotly.express as px
import plotly.graph_objects as go
python
import plotly.express as px
import plotly.graph_objects as go

Simple interactive line chart

简单交互式折线图

fig = px.line(df, x='date', y='value', color='category', title='Interactive Metric Trend', labels={'value': 'Metric Value', 'date': 'Date'}) fig.update_layout(hovermode='x unified') fig.write_html('interactive_chart.html') fig.show()
fig = px.line(df, x='date', y='value', color='category', title='交互式指标趋势', labels={'value': '指标数值', 'date': '日期'}) fig.update_layout(hovermode='x unified') fig.write_html('interactive_chart.html') fig.show()

Interactive scatter with hover data

带悬浮信息的交互式散点图

fig = px.scatter(df, x='metric_a', y='metric_b', color='category', size='size_metric', hover_data=['name', 'detail_field'], title='Correlation Analysis') fig.show()
undefined
fig = px.scatter(df, x='metric_a', y='metric_b', color='category', size='size_metric', hover_data=['name', 'detail_field'], title='相关性分析') fig.show()
undefined

Design Principles

设计原则

Color

色彩使用

  • Use color purposefully: Color should encode data, not decorate
  • Highlight the story: Use a bright accent color for the key insight; grey everything else
  • Sequential data: Use a single-hue gradient (light to dark) for ordered values
  • Diverging data: Use a two-hue gradient with neutral midpoint for data with a meaningful center
  • Categorical data: Use distinct hues, maximum 6-8 before it gets confusing
  • Avoid red/green only: 8% of men are red-green colorblind. Use blue/orange as primary pair
  • 有目的性地使用色彩:色彩应用于编码数据,而非装饰
  • 突出核心结论:用明亮的强调色突出关键洞察,其余内容用灰色弱化
  • 连续型数据:使用单色调渐变(从浅到深)表示有序数值
  • 发散型数据:使用双色调渐变搭配中性中点,展示带有明确基准的数据集
  • 分类数据:使用区分度高的色调,最多6-8个类别,避免混淆
  • 避免仅用红绿色:8%的男性存在红绿色盲,优先使用蓝橙色配对

Typography

排版规范

  • Title states the insight: "Revenue grew 23% YoY" beats "Revenue by Month"
  • Subtitle adds context: Date range, filters applied, data source
  • Axis labels are readable: Never rotated 90 degrees if avoidable. Shorten or wrap instead
  • Data labels add precision: Use on key points, not every single bar
  • Annotation highlights: Call out specific points with text annotations
  • 标题直接点明洞察:“营收同比增长23%”优于“月度营收情况”
  • 副标题补充上下文:包含时间范围、筛选条件、数据来源
  • 坐标轴标签易读:尽量避免90度旋转,可缩短文字或换行
  • 数据标签提升精度:仅在关键位置添加,无需为每个柱状图都添加
  • 标注突出重点:用文本标注强调特定数据点

Layout

布局设计

  • Reduce chart junk: Remove gridlines, borders, backgrounds that don't carry information
  • Sort meaningfully: Categories sorted by value (not alphabetically) unless there's a natural order (months, stages)
  • Appropriate aspect ratio: Time series wider than tall (3:1 to 2:1); comparisons can be squarer
  • White space is good: Don't cram charts together. Give each visualization room to breathe
  • 减少冗余元素:移除无信息价值的网格线、边框、背景
  • 合理排序:按数值排序(而非字母顺序),除非存在自然顺序(如月份、流程阶段)
  • 合适的宽高比:时间序列图宽大于高(3:1至2:1);对比图可接近正方形
  • 合理留白:不要过度拥挤,为每个可视化内容预留空间

Accuracy

准确性要求

  • Bar charts start at zero: Always. A bar from 95 to 100 exaggerates a 5% difference
  • Line charts can have non-zero baselines: When the range of variation is meaningful
  • Consistent scales across panels: When comparing multiple charts, use the same axis range
  • Show uncertainty: Error bars, confidence intervals, or ranges when data is uncertain
  • Label your axes: Never make the reader guess what the numbers mean
  • 柱状图必须从零开始:否则会夸大差异,比如从95到100的柱状图会放大5%的变化
  • 折线图可使用非零基线:当变化范围本身具有意义时
  • 多图保持刻度一致:对比多个图表时,使用相同的坐标轴范围
  • 展示不确定性:数据存在不确定性时,添加误差线、置信区间或范围
  • 标注坐标轴:绝对不要让用户猜测数值含义

Accessibility Considerations

可访问性注意事项

Color Blindness

色弱适配

  • Never rely on color alone to distinguish data series
  • Add pattern fills, different line styles (solid, dashed, dotted), or direct labels
  • Test with a colorblind simulator (e.g., Coblis, Sim Daltonism)
  • Use the colorblind-friendly palette:
    sns.color_palette("colorblind")
  • 绝不单独依赖色彩区分数据系列
  • 添加图案填充、不同线条样式(实线、虚线、点线)或直接标注
  • 使用色弱模拟器测试(如Coblis、Sim Daltonism)
  • 使用色弱友好配色:
    sns.color_palette("colorblind")

Screen Readers

屏幕阅读器适配

  • Include alt text describing the chart's key finding
  • Provide a data table alternative alongside the visualization
  • Use semantic titles and labels
  • 包含描述图表核心结论的替代文本
  • 同时提供数据表格作为可视化内容的替代方案
  • 使用语义化的标题和标签

General Accessibility

通用可访问性

  • Sufficient contrast between data elements and background
  • Text size minimum 10pt for labels, 12pt for titles
  • Avoid conveying information only through spatial position (add labels)
  • Consider printing: does the chart work in black and white?
  • 数据元素与背景间有足够的对比度
  • 标签最小字号10pt,标题最小12pt
  • 避免仅通过空间位置传递信息(添加标签)
  • 考虑打印场景:图表在黑白模式下是否仍可正常阅读?

Accessibility Checklist

可访问性检查清单

Before sharing a visualization:
  • Chart works without color (patterns, labels, or line styles differentiate series)
  • Text is readable at standard zoom level
  • Title describes the insight, not just the data
  • Axes are labeled with units
  • Legend is clear and positioned without obscuring data
  • Data source and date range are noted
分享可视化内容前,请确认:
  • 无需依赖色彩即可区分数据系列(通过图案、标签或线条样式)
  • 标准缩放比例下文本清晰可读
  • 标题描述核心洞察,而非仅说明数据内容
  • 坐标轴标注了单位
  • 图例清晰,且位置不会遮挡数据
  • 标注了数据来源和时间范围 ",