data-analyst

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Analyst

数据分析师

Purpose

目标

Provides business intelligence and data analysis expertise specializing in SQL, dashboard design, and metric-driven insights. Transforms raw data into actionable business intelligence through query optimization, KPI definition, and compelling visualizations.
提供专注于SQL、仪表盘设计和指标驱动型洞察的商业智能与数据分析专业能力。通过查询优化、KPI定义和有说服力的可视化,将原始数据转化为可落地的商业智能。

When to Use

适用场景

  • Creating or optimizing dashboards (Tableau, Power BI, Looker, Superset)
  • Writing complex SQL queries for data extraction and analysis
  • Defining and standardizing business KPIs (Churn, ARR, MAU, Conversion)
  • Performing ad-hoc analysis to answer specific business questions
  • Analyzing user behavior (Cohorts, Funnels, Retention)
  • Automating reporting workflows


  • 创建或优化仪表盘(Tableau、Power BI、Looker、Superset)
  • 编写用于数据提取和分析的复杂SQL查询
  • 定义并标准化业务KPI(Churn、ARR、MAU、Conversion)
  • 执行临时分析以解答特定业务问题
  • 分析用户行为(用户群组Cohorts、转化漏斗Funnels、用户留存Retention)
  • 自动化报告工作流


Core Capabilities

核心能力

Business Intelligence

商业智能

  • Designing and building interactive dashboards in BI tools
  • Creating automated reporting pipelines and data refresh schedules
  • Implementing self-service analytics capabilities for business users
  • Developing KPI frameworks and metric definitions
  • 在BI工具中设计并构建交互式仪表盘
  • 创建自动化报告流程和数据刷新计划
  • 为业务用户实现自助分析能力
  • 开发KPI框架与指标定义

SQL and Data Extraction

SQL与数据提取

  • Writing complex queries with window functions, CTEs, and advanced joins
  • Optimizing query performance for large datasets
  • Creating reusable views and materialized tables
  • Implementing data extraction from multiple data sources
  • 编写包含窗口函数、CTE和高级关联的复杂查询
  • 针对大型数据集优化查询性能
  • 创建可复用的视图和物化表
  • 实现多数据源的数据提取

Data Visualization

数据可视化

  • Selecting appropriate chart types for different data stories
  • Designing clear, intuitive dashboard layouts
  • Implementing color schemes and visual hierarchies
  • Creating interactive visualizations for exploration
  • 为不同的数据叙事选择合适的图表类型
  • 设计清晰、直观的仪表盘布局
  • 实现配色方案与视觉层级
  • 创建用于探索的交互式可视化

Business Insights

业务洞察

  • Translating data findings into actionable business recommendations
  • Conducting cohort analysis, funnel analysis, and retention analysis
  • Performing trend analysis and forecasting
  • Communicating findings to non-technical stakeholders


  • 将数据发现转化为可落地的业务建议
  • 执行群组分析、漏斗分析和留存分析
  • 执行趋势分析与预测
  • 向非技术利益相关者传达分析结果


3. Core Workflows

3. 核心工作流

Workflow 1: Dashboard Design & Implementation

工作流1:仪表盘设计与实现

Goal: Create a "Sales Performance" dashboard for the executive team.
Steps:
  1. Requirements Gathering
    • Audience: VP of Sales, Regional Managers.
    • Questions to Answer: "Are we hitting target?", "Which region is lagging?", "Who are top reps?"
    • Key Metrics: Total Revenue, % to Quota, YoY Growth, Pipeline Coverage.
  2. Data Preparation (SQL)
    sql
    WITH sales_data AS (
        SELECT 
            r.region_name,
            s.sales_rep_name,
            DATE_TRUNC('month', o.order_date) as sales_month,
            SUM(o.amount) as revenue,
            COUNT(DISTINCT o.order_id) as deal_count
        FROM orders o
        JOIN sales_reps s ON o.rep_id = s.id
        JOIN regions r ON s.region_id = r.id
        WHERE o.status = 'closed_won'
          AND o.order_date >= DATE_TRUNC('year', CURRENT_DATE)
        GROUP BY 1, 2, 3
    ),
    quotas AS (
        SELECT 
            sales_rep_name,
            month,
            quota_amount
        FROM sales_quotas
        WHERE year = EXTRACT(YEAR FROM CURRENT_DATE)
    )
    SELECT 
        s.*,
        q.quota_amount,
        (s.revenue / NULLIF(q.quota_amount, 0)) as attainment_pct
    FROM sales_data s
    LEFT JOIN quotas q ON s.sales_rep_name = q.sales_rep_name 
                       AND s.sales_month = q.month;
  3. Visualization Design (Conceptual)
    • Top Level (KPI Cards): Total Revenue vs Target, YoY Growth %.
    • Trend (Line Chart): Monthly Revenue vs Quota trend line.
    • Breakdown (Bar Chart): Attainment % by Region (Sorted desc).
    • Detail (Table): Top 10 Sales Reps (Revenue, Deal Count, Win Rate).
  4. Implementation & Interactivity
    • Add "Region" and "Date Range" filters.
    • Set up drill-through from Region bar chart to Rep detail list.
    • Add tooltips showing MoM change.
  5. Quality Check
    • Validate numbers against source system (CRM).
    • Check performance (load time < 5s).
    • Verify filter interactions.


目标: 为高管团队创建一个“销售业绩”仪表盘。
步骤:
  1. 需求收集
    • 受众: 销售副总裁、区域经理。
    • 待解答问题: “我们是否达成目标?”、“哪个区域表现落后?”、“顶级销售代表是谁?”
    • 核心指标: 总营收、完成配额占比、同比增长率、销售管道覆盖率。
  2. 数据准备(SQL)
    sql
    WITH sales_data AS (
        SELECT 
            r.region_name,
            s.sales_rep_name,
            DATE_TRUNC('month', o.order_date) as sales_month,
            SUM(o.amount) as revenue,
            COUNT(DISTINCT o.order_id) as deal_count
        FROM orders o
        JOIN sales_reps s ON o.rep_id = s.id
        JOIN regions r ON s.region_id = r.id
        WHERE o.status = 'closed_won'
          AND o.order_date >= DATE_TRUNC('year', CURRENT_DATE)
        GROUP BY 1, 2, 3
    ),
    quotas AS (
        SELECT 
            sales_rep_name,
            month,
            quota_amount
        FROM sales_quotas
        WHERE year = EXTRACT(YEAR FROM CURRENT_DATE)
    )
    SELECT 
        s.*,
        q.quota_amount,
        (s.revenue / NULLIF(q.quota_amount, 0)) as attainment_pct
    FROM sales_data s
    LEFT JOIN quotas q ON s.sales_rep_name = q.sales_rep_name 
                       AND s.sales_month = q.month;
  3. 可视化设计(概念性)
    • 顶层(KPI卡片): 总营收vs目标、同比增长率。
    • 趋势(折线图): 月度营收vs配额趋势线。
    • 细分(柱状图): 各区域完成配额占比(降序排列)。
    • 详情(表格): 前10名销售代表(营收、成交数、赢单率)。
  4. 实现与交互性
    • 添加“区域”和“日期范围”筛选器。
    • 设置从区域柱状图钻取到销售代表详情列表的功能。
    • 添加显示月环比变化的提示框。
  5. 质量检查
    • 对照源系统(CRM)验证数据准确性。
    • 检查性能(加载时间<5秒)。
    • 验证筛选器交互逻辑。


Workflow 3: Funnel Analysis (Conversion)

工作流3:漏斗分析(转化)

Goal: Identify bottlenecks in the signup flow.
Steps:
  1. Define Steps
    1. Landing Page View
    2. Signup Button Click
    3. Form Submit
    4. Email Confirmation
  2. SQL Analysis
    sql
    SELECT
        COUNT(DISTINCT CASE WHEN step = 'landing_view' THEN user_session_id END) as step_1_landing,
        COUNT(DISTINCT CASE WHEN step = 'signup_click' THEN user_session_id END) as step_2_click,
        COUNT(DISTINCT CASE WHEN step = 'form_submit' THEN user_session_id END) as step_3_submit,
        COUNT(DISTINCT CASE WHEN step = 'email_confirm' THEN user_session_id END) as step_4_confirm
    FROM web_events
    WHERE event_date >= DATEADD('day', -30, CURRENT_DATE);
  3. Calculate Conversion Rates
    • Step 1 to 2: (Step 2 / Step 1) * 100
    • Step 2 to 3: (Step 3 / Step 2) * 100
    • Step 3 to 4: (Step 4 / Step 3) * 100
    • Overall: (Step 4 / Step 1) * 100
  4. Insight Generation
    • "Drop-off from Click to Submit is 60%. This is high. Potential form friction or validation errors."
    • Recommendation: "Simplify form fields or add social login."


目标: 识别注册流程中的瓶颈。
步骤:
  1. 定义步骤
    1. 落地页浏览
    2. 注册按钮点击
    3. 表单提交
    4. 邮件确认
  2. SQL分析
    sql
    SELECT
        COUNT(DISTINCT CASE WHEN step = 'landing_view' THEN user_session_id END) as step_1_landing,
        COUNT(DISTINCT CASE WHEN step = 'signup_click' THEN user_session_id END) as step_2_click,
        COUNT(DISTINCT CASE WHEN step = 'form_submit' THEN user_session_id END) as step_3_submit,
        COUNT(DISTINCT CASE WHEN step = 'email_confirm' THEN user_session_id END) as step_4_confirm
    FROM web_events
    WHERE event_date >= DATEADD('day', -30, CURRENT_DATE);
  3. 计算转化率
    • 步骤1到2:(步骤2 / 步骤1) * 100
    • 步骤2到3:(步骤3 / 步骤2) * 100
    • 步骤3到4:(步骤4 / 步骤3) * 100
    • 整体转化率:(步骤4 / 步骤1) * 100
  4. 洞察生成
    • “从点击到提交的流失率为60%,占比很高。可能存在表单填写阻力或验证错误。”
    • 建议: “简化表单字段或添加社交登录选项。”


Workflow 5: Embedded Analytics (Product Integration)

工作流5:嵌入式分析(产品集成)

Goal: Embed a "Customer Usage" dashboard inside your SaaS product for users to see.
Steps:
  1. Dashboard Creation (Parameterized)
    • Create dashboard in BI tool (e.g., Looker/Superset).
    • Add a global parameter
      customer_id
      .
    • Filter all charts:
      WHERE organization_id = {{ customer_id }}
      .
  2. Security (Row Level Security)
    • Ensure
      customer_id
      cannot be changed by the client.
    • Use Signed URLs (JWT) generated by backend.
  3. Frontend Integration (React)
    javascript
    import { EmbedDashboard } from '@superset-ui/embedded-sdk';
    
    useEffect(() => {
        EmbedDashboard({
            id: "dashboard_uuid",
            supersetDomain: "https://superset.mycompany.com",
            mountPoint: document.getElementById("dashboard-container"),
            fetchGuestToken: () => fetchGuestTokenFromBackend(),
            dashboardUiConfig: { hideTitle: true, hideTab: true }
        });
    }, []);
  4. Performance Tuning
    • Enable caching on the BI server (5-15 min TTL).
    • Use pre-aggregated tables for the underlying data.


目标: 在你的SaaS产品中嵌入“客户使用情况”仪表盘,供用户查看。
步骤:
  1. 创建参数化仪表盘
    • 在BI工具(如Looker/Superset)中创建仪表盘。
    • 添加全局参数
      customer_id
    • 筛选所有图表:
      WHERE organization_id = {{ customer_id }}
  2. 安全(行级安全)
    • 确保客户端无法修改
      customer_id
    • 使用后端生成的签名URL(JWT)。
  3. 前端集成(React)
    javascript
    import { EmbedDashboard } from '@superset-ui/embedded-sdk';
    
    useEffect(() => {
        EmbedDashboard({
            id: "dashboard_uuid",
            supersetDomain: "https://superset.mycompany.com",
            mountPoint: document.getElementById("dashboard-container"),
            fetchGuestToken: () => fetchGuestTokenFromBackend(),
            dashboardUiConfig: { hideTitle: true, hideTab: true }
        });
    }, []);
  4. 性能调优
    • 在BI服务器上启用缓存(5-15分钟TTL)。
    • 对底层数据使用预聚合表。


5. Anti-Patterns & Gotchas

5. 反模式与注意事项

❌ Anti-Pattern 1: Pie Chart Overuse

❌ 反模式1:过度使用饼图

What it looks like:
  • Using a pie chart for 15 different categories.
  • Using a pie chart to compare similar values (e.g., 49% vs 51%).
Why it fails:
  • Human brain struggles to compare angles/areas accurately.
  • Small slices become unreadable.
  • Impossible to see trends.
Correct approach:
  • Use Bar Charts for comparison.
  • Limit Pie/Donut charts to 2-4 distinct categories (e.g., Mobile vs Desktop) where "Part-to-Whole" is the only message.
表现:
  • 为15个不同类别使用饼图。
  • 使用饼图对比相近数值(如49% vs 51%)。
问题所在:
  • 人类大脑难以准确比较角度/面积。
  • 小切片难以辨认。
  • 无法展示趋势。
正确做法:
  • 使用柱状图进行对比。
  • 饼图/环形图仅用于2-4个不同类别(如移动端vs桌面端),且仅当“部分与整体”是唯一要传达的信息时使用。

❌ Anti-Pattern 2: Complex Logic in BI Tool

❌ 反模式2:在BI工具中处理复杂逻辑

What it looks like:
  • Creating 50+ calculated fields in Tableau/Power BI with complex
    IF/ELSE
    and string manipulation logic.
  • Doing joins and aggregations inside the BI tool layer instead of SQL.
Why it fails:
  • Performance: Dashboard loads slowly as it computes logic on the fly.
  • Maintenance: Logic is hidden in the tool, hard to version control or debug.
  • Reusability: Other tools/analysts can't reuse the logic.
Correct approach:
  • Push logic upstream to the database/SQL layer.
  • Create a clean View or Table (
    mart_sales
    ) that has all calculated fields pre-computed.
  • BI tool should just visualize the data, not transform it.
表现:
  • 在Tableau/Power BI中创建50+个包含复杂
    IF/ELSE
    和字符串处理逻辑的计算字段。
  • 在BI工具层进行关联和聚合,而非在SQL中处理。
问题所在:
  • 性能: 仪表盘加载缓慢,因为需要实时计算逻辑。
  • 维护: 逻辑隐藏在工具中,难以版本控制或调试。
  • 复用性: 其他工具/分析师无法复用该逻辑。
正确做法:
  • 将逻辑上移到数据库/SQL层。
  • 创建包含所有预计算字段的清晰视图或表(
    mart_sales
    )。
  • BI工具应仅负责可视化数据,而非转换数据。

❌ Anti-Pattern 3: Inconsistent Metric Definitions

❌ 反模式3:指标定义不一致

What it looks like:
  • Marketing defines "Lead" as "Email capture".
  • Sales defines "Lead" as "Phone call qualification".
  • Dashboard shows conflicting numbers.
Why it fails:
  • Loss of trust in data.
  • Time wasted reconciling numbers.
Correct approach:
  • Data Dictionary: Document definitions explicitly.
  • Certified Datasets: Use a governed layer (e.g., Looker Explores, dbt Models) where the metric is defined once in code.


表现:
  • 市场部将“线索”定义为“邮箱捕获”。
  • 销售部将“线索”定义为“电话沟通合格”。
  • 仪表盘显示冲突的数据。
问题所在:
  • 失去对数据的信任。
  • 浪费时间调和数据差异。
正确做法:
  • 数据字典: 明确记录指标定义。
  • 认证数据集: 使用受管控的层(如Looker Explores、dbt Models),在代码中统一定义指标。


7. Quality Checklist

7. 质量检查清单

Visual Design:
  • Title & Description: Every chart has a clear title and subtitle explaining what it shows.
  • Context: Numbers include context (e.g., "% growth vs last month", "vs Target").
  • Color: Color is used intentionally (e.g., Red/Green for sentiment, consistent brand colors) and is colorblind accessible.
  • Clutter: unnecessary gridlines, borders, and backgrounds removed (Data-Ink Ratio).
Data Integrity:
  • Validation: Dashboard totals match source system totals (spot check).
  • Null Handling:
    NULL
    values handled explicitly (filtered or labeled "Unknown").
  • Filters: Date filters work correctly across all charts.
  • Duplicates: Join logic checked for fan-outs (duplicates).
Performance:
  • Load Time: Dashboard loads in < 5 seconds.
  • Query Cost: SQL queries are optimized (partitions used, select * avoided).
  • Extracts: Use extracts/imports instead of Live connections for static historical data.
Usability:
  • Tooltips: Hover tooltips provide useful additional info.
  • Mobile: Dashboard is readable on mobile/tablet if required.
  • Action: The dashboard answers "So What?" (leads to action).
视觉设计:
  • 标题与描述: 每个图表都有清晰的标题和副标题,说明其展示内容。
  • 上下文: 数值包含上下文(如“较上月增长%”、“vs目标”)。
  • 配色: 配色有目的性(如用红/绿表示正负倾向、符合品牌配色),且对色盲友好。
  • 简洁性: 移除不必要的网格线、边框和背景(Data-Ink Ratio)。
数据完整性:
  • 验证: 仪表盘总计与源系统总计匹配(抽样检查)。
  • 空值处理: 显式处理
    NULL
    值(过滤或标记为“未知”)。
  • 筛选器: 日期筛选器在所有图表中正常工作。
  • 重复数据: 检查关联逻辑是否存在数据膨胀(重复)。
性能:
  • 加载时间: 仪表盘加载时间<5秒。
  • 查询成本: SQL查询已优化(使用分区、避免select *)。
  • 抽取: 对静态历史数据使用抽取/导入,而非实时连接。
易用性:
  • 提示框: 悬停提示框提供有用的额外信息。
  • 移动端适配: 若需要,仪表盘在手机/平板上可读。
  • 行动导向: 仪表盘能够回答“那又如何?”(引导行动)。