funnel-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Funnel Analysis

漏斗分析

Quick Start

快速开始

Analyze multi-step user journeys to measure conversion rates, identify drop-off points, compare segments, and optimize funnel performance with actionable insights.
分析多步骤用户旅程,衡量转化率、识别流失节点、对比用户分群,并基于可落地的洞察优化漏斗性能。

Context Requirements

上下文要求

Before analyzing the funnel, I need:
  1. Funnel Steps: The sequence of actions users take
  2. Event Data: User activity showing who completed each step
  3. Time Window: How long users have to complete the funnel
  4. Success Criteria: What counts as completion at each step
  5. Segments (optional): Groups to compare (e.g., by channel, device, cohort)
在分析漏斗之前,我需要以下信息:
  1. 漏斗步骤:用户执行的操作序列
  2. 事件数据:显示哪些用户完成了每个步骤的用户活动数据
  3. 时间窗口:用户完成漏斗的最长时限
  4. 成功标准:每个步骤的完成判定规则
  5. 用户分群(可选):需要对比的用户组(例如按渠道、设备、同期群划分)

Context Gathering

信息收集

For Funnel Steps:

关于漏斗步骤:

"Please define the funnel steps in order. For example:
E-commerce Purchase Funnel:
  1. View Product Page
  2. Add to Cart
  3. Begin Checkout
  4. Enter Payment Info
  5. Complete Purchase
SaaS Onboarding Funnel:
  1. Sign Up
  2. Email Verified
  3. Complete Profile
  4. Invite Team Member
  5. First Project Created
What are your funnel steps?"
"请按顺序定义漏斗步骤。例如:
电商购买漏斗:
  1. 浏览商品页
  2. 添加至购物车
  3. 开始结账
  4. 填写支付信息
  5. 完成购买
SaaS 注册漏斗:
  1. 注册账号
  2. 验证邮箱
  3. 完善个人资料
  4. 邀请团队成员
  5. 创建首个项目
你的漏斗步骤是什么?"

For Event Data:

关于事件数据:

"I need data showing which users completed which steps. Provide:
Option 1 - Event Log:
user_id | event_name        | timestamp
123     | view_product      | 2024-12-15 10:00:00
123     | add_to_cart       | 2024-12-15 10:05:00
123     | begin_checkout    | 2024-12-15 10:10:00
456     | view_product      | 2024-12-15 11:00:00
Option 2 - Pre-aggregated:
user_id | reached_step_1 | reached_step_2 | reached_step_3 |...
123     | TRUE           | TRUE           | TRUE           |...
456     | TRUE           | FALSE          | FALSE          |...
Option 3 - Database Query: Share SQL to fetch relevant events
Which format works for you?"
"我需要显示用户完成各步骤情况的数据,请提供以下格式之一:
选项1 - 事件日志:
user_id | event_name        | timestamp
123     | view_product      | 2024-12-15 10:00:00
123     | add_to_cart       | 2024-12-15 10:05:00
123     | begin_checkout    | 2024-12-15 10:10:00
456     | view_product      | 2024-12-15 11:00:00
选项2 - 预聚合数据:
user_id | reached_step_1 | reached_step_2 | reached_step_3 |...
123     | TRUE           | TRUE           | TRUE           |...
456     | TRUE           | FALSE          | FALSE          |...
选项3 - 数据库查询语句: 获取相关事件的SQL语句
哪种格式更适合你?"

For Time Window:

关于时间窗口:

"How long do users have to complete the funnel?
Common Windows:
  • Session-based: Within single session (30 min)
  • Same-day: Within 24 hours
  • Multi-day: Within 7 days, 30 days
  • Unlimited: Any time eventually
What makes sense for your use case?"
"用户完成漏斗的最长时限是多久?
常见时间窗口:
  • 会话内:单个会话内(30分钟)
  • 当日:24小时内
  • 多日:7天、30天内
  • 无限制:任何时间完成均可
哪种时间窗口符合你的业务场景?"

For Success Criteria:

关于成功标准:

"For each step, what counts as completion?
Examples:
  • Step 1 (View Product): Page view event
  • Step 2 (Add to Cart): Click 'Add to Cart' button
  • Step 3 (Checkout): Land on checkout page
  • Step 4 (Payment): Submit payment form
  • Step 5 (Complete): Order confirmation
Any nuances? (e.g., 'view product for >10 seconds', 'add any item', etc.)"
"每个步骤的完成判定规则是什么?
示例:
  • 步骤1(浏览商品):触发商品页浏览事件
  • 步骤2(添加至购物车):点击“加入购物车”按钮
  • 步骤3(开始结账):进入结账页面
  • 步骤4(填写支付信息):提交支付表单
  • 步骤5(完成购买):生成订单确认
是否有特殊规则?(例如:“浏览商品超过10秒”、“添加任意商品”等)"

For Segments:

关于用户分群:

"Want to compare funnel performance across groups?
Common Segments:
  • Acquisition channel (organic, paid, referral)
  • Device type (mobile, desktop, tablet)
  • User type (new, returning, power user)
  • Geographic region
  • Product/plan tier
  • Time period (weekday vs weekend)
Which segments are most important?"
"是否需要对比不同用户组的漏斗性能?
常见分群维度:
  • 获取渠道(自然流量、付费流量、推荐流量)
  • 设备类型(移动端、桌面端、平板端)
  • 用户类型(新用户、回头客、核心用户)
  • 地理区域
  • 产品/套餐层级
  • 时间段(工作日 vs 周末)
哪些分群维度对你的决策最重要?"

Workflow

分析流程

Step 1: Load and Validate Event Data

步骤1:加载并验证事件数据

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

Load event data

Load event data

events = pd.read_csv('user_events.csv') events['timestamp'] = pd.to_datetime(events['timestamp'])
print(f"📊 Event Data Loaded:") print(f" Total Events: {len(events):,}") print(f" Unique Users: {events['user_id'].nunique():,}") print(f" Date Range: {events['timestamp'].min()} to {events['timestamp'].max()}") print(f" Event Types: {events['event_name'].unique()}")

**Checkpoint**: "Data loaded. Do the event names match your funnel steps?"
events = pd.read_csv('user_events.csv') events['timestamp'] = pd.to_datetime(events['timestamp'])
print(f"📊 Event Data Loaded:") print(f" Total Events: {len(events):,}") print(f" Unique Users: {events['user_id'].nunique():,}") print(f" Date Range: {events['timestamp'].min()} to {events['timestamp'].max()}") print(f" Event Types: {events['event_name'].unique()}")

**检查点**:"数据已加载。事件名称是否与你定义的漏斗步骤匹配?"

Step 2: Define Funnel Configuration

步骤2:定义漏斗配置

python
undefined
python
undefined

Define funnel steps in order

Define funnel steps in order

funnel_steps = [ {'step': 1, 'name': 'View Product', 'event': 'view_product'}, {'step': 2, 'name': 'Add to Cart', 'event': 'add_to_cart'}, {'step': 3, 'name': 'Begin Checkout', 'event': 'begin_checkout'}, {'step': 4, 'name': 'Payment Info', 'event': 'enter_payment'}, {'step': 5, 'name': 'Complete Purchase', 'event': 'purchase_complete'} ]
funnel_steps = [ {'step': 1, 'name': 'View Product', 'event': 'view_product'}, {'step': 2, 'name': 'Add to Cart', 'event': 'add_to_cart'}, {'step': 3, 'name': 'Begin Checkout', 'event': 'begin_checkout'}, {'step': 4, 'name': 'Payment Info', 'event': 'enter_payment'}, {'step': 5, 'name': 'Complete Purchase', 'event': 'purchase_complete'} ]

Time window for funnel completion (in days)

Time window for funnel completion (in days)

TIME_WINDOW_DAYS = 7
print("🎯 Funnel Configuration:") for step in funnel_steps: print(f" Step {step['step']}: {step['name']} ({step['event']})") print(f"\nTime Window: {TIME_WINDOW_DAYS} days")
undefined
TIME_WINDOW_DAYS = 7
print("🎯 Funnel Configuration:") for step in funnel_steps: print(f" Step {step['step']}: {step['name']} ({step['event']})") print(f"\nTime Window: {TIME_WINDOW_DAYS} days")
undefined

Step 3: Build Funnel Data

步骤3:构建漏斗数据

python
def build_funnel_data(events, funnel_steps, time_window_days):
    """
    For each user, determine which funnel steps they reached
    """
    funnel_data = []
    
    # Get users who started the funnel (reached step 1)
    step1_event = funnel_steps[0]['event']
    users_started = events[events['event_name'] == step1_event]['user_id'].unique()
    
    print(f"Building funnel for {len(users_started):,} users...")
    
    for user_id in users_started:
        user_events = events[events['user_id'] == user_id].sort_values('timestamp')
        
        # Find first occurrence of step 1
        step1_events = user_events[user_events['event_name'] == step1_event]
        if len(step1_events) == 0:
            continue
            
        start_time = step1_events.iloc[0]['timestamp']
        end_time = start_time + timedelta(days=time_window_days)
        
        # Check each subsequent step
        user_funnel = {
            'user_id': user_id,
            'start_time': start_time,
            'step_1': True,
            'step_1_time': start_time
        }
        
        for i, step in enumerate(funnel_steps[1:], start=2):
            # Look for this step's event after previous step and within window
            step_events = user_events[
                (user_events['event_name'] == step['event']) &
                (user_events['timestamp'] >= start_time) &
                (user_events['timestamp'] <= end_time)
            ]
            
            if len(step_events) > 0:
                user_funnel[f'step_{i}'] = True
                user_funnel[f'step_{i}_time'] = step_events.iloc[0]['timestamp']
            else:
                user_funnel[f'step_{i}'] = False
                user_funnel[f'step_{i}_time'] = None
                # If they didn't reach this step, they didn't reach later steps
                for j in range(i+1, len(funnel_steps)+1):
                    user_funnel[f'step_{j}'] = False
                    user_funnel[f'step_{j}_time'] = None
                break
        
        funnel_data.append(user_funnel)
    
    return pd.DataFrame(funnel_data)

funnel_df = build_funnel_data(events, funnel_steps, TIME_WINDOW_DAYS)
print(f"✓ Funnel built for {len(funnel_df):,} users")
python
def build_funnel_data(events, funnel_steps, time_window_days):
    """
    For each user, determine which funnel steps they reached
    """
    funnel_data = []
    
    # Get users who started the funnel (reached step 1)
    step1_event = funnel_steps[0]['event']
    users_started = events[events['event_name'] == step1_event]['user_id'].unique()
    
    print(f"Building funnel for {len(users_started):,} users...")
    
    for user_id in users_started:
        user_events = events[events['user_id'] == user_id].sort_values('timestamp')
        
        # Find first occurrence of step 1
        step1_events = user_events[user_events['event_name'] == step1_event]
        if len(step1_events) == 0:
            continue
            
        start_time = step1_events.iloc[0]['timestamp']
        end_time = start_time + timedelta(days=time_window_days)
        
        # Check each subsequent step
        user_funnel = {
            'user_id': user_id,
            'start_time': start_time,
            'step_1': True,
            'step_1_time': start_time
        }
        
        for i, step in enumerate(funnel_steps[1:], start=2):
            # Look for this step's event after previous step and within window
            step_events = user_events[
                (user_events['event_name'] == step['event']) &
                (user_events['timestamp'] >= start_time) &
                (user_events['timestamp'] <= end_time)
            ]
            
            if len(step_events) > 0:
                user_funnel[f'step_{i}'] = True
                user_funnel[f'step_{i}_time'] = step_events.iloc[0]['timestamp']
            else:
                user_funnel[f'step_{i}'] = False
                user_funnel[f'step_{i}_time'] = None
                # If they didn't reach this step, they didn't reach later steps
                for j in range(i+1, len(funnel_steps)+1):
                    user_funnel[f'step_{j}'] = False
                    user_funnel[f'step_{j}_time'] = None
                break
        
        funnel_data.append(user_funnel)
    
    return pd.DataFrame(funnel_data)

funnel_df = build_funnel_data(events, funnel_steps, TIME_WINDOW_DAYS)
print(f"✓ Funnel built for {len(funnel_df):,} users")

Step 4: Calculate Funnel Metrics

步骤4:计算漏斗指标

python
def calculate_funnel_metrics(funnel_df, funnel_steps):
    """Calculate conversion rates and drop-offs"""
    
    metrics = []
    total_users = len(funnel_df)
    
    for i, step in enumerate(funnel_steps, start=1):
        users_reached = funnel_df[f'step_{i}'].sum()
        conversion_from_top = (users_reached / total_users) * 100
        
        if i > 1:
            users_prev_step = funnel_df[f'step_{i-1}'].sum()
            conversion_from_prev = (users_reached / users_prev_step) * 100 if users_prev_step > 0 else 0
            drop_off = users_prev_step - users_reached
            drop_off_rate = ((users_prev_step - users_reached) / users_prev_step) * 100 if users_prev_step > 0 else 0
        else:
            conversion_from_prev = 100.0
            drop_off = 0
            drop_off_rate = 0
        
        metrics.append({
            'step': i,
            'step_name': step['name'],
            'users_reached': int(users_reached),
            'conversion_from_top': conversion_from_top,
            'conversion_from_prev': conversion_from_prev,
            'drop_off': int(drop_off),
            'drop_off_rate': drop_off_rate
        })
    
    return pd.DataFrame(metrics)

funnel_metrics = calculate_funnel_metrics(funnel_df, funnel_steps)

print("\n📊 Funnel Conversion Metrics:\n")
print(funnel_metrics.to_string(index=False))
python
def calculate_funnel_metrics(funnel_df, funnel_steps):
    """Calculate conversion rates and drop-offs"""
    
    metrics = []
    total_users = len(funnel_df)
    
    for i, step in enumerate(funnel_steps, start=1):
        users_reached = funnel_df[f'step_{i}'].sum()
        conversion_from_top = (users_reached / total_users) * 100
        
        if i > 1:
            users_prev_step = funnel_df[f'step_{i-1}'].sum()
            conversion_from_prev = (users_reached / users_prev_step) * 100 if users_prev_step > 0 else 0
            drop_off = users_prev_step - users_reached
            drop_off_rate = ((users_prev_step - users_reached) / users_prev_step) * 100 if users_prev_step > 0 else 0
        else:
            conversion_from_prev = 100.0
            drop_off = 0
            drop_off_rate = 0
        
        metrics.append({
            'step': i,
            'step_name': step['name'],
            'users_reached': int(users_reached),
            'conversion_from_top': conversion_from_top,
            'conversion_from_prev': conversion_from_prev,
            'drop_off': int(drop_off),
            'drop_off_rate': drop_off_rate
        })
    
    return pd.DataFrame(metrics)

funnel_metrics = calculate_funnel_metrics(funnel_df, funnel_steps)

print("\n📊 Funnel Conversion Metrics:\n")
print(funnel_metrics.to_string(index=False))

Step 5: Visualize Funnel

步骤5:可视化漏斗

python
def plot_funnel(metrics):
    """Create funnel visualization"""
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Funnel chart (absolute numbers)
    ax1.barh(metrics['step_name'], metrics['users_reached'], 
             color=plt.cm.Blues(np.linspace(0.4, 0.8, len(metrics))))
    
    # Add value labels
    for i, (name, users) in enumerate(zip(metrics['step_name'], metrics['users_reached'])):
        ax1.text(users, i, f' {int(users):,}', va='center')
    
    ax1.set_xlabel('Users')
    ax1.set_title('Funnel: Absolute Users per Step')
    ax1.invert_yaxis()
    
    # Conversion rate chart
    colors = ['green' if rate >= 80 else 'orange' if rate >= 60 else 'red' 
              for rate in metrics['conversion_from_prev']]
    
    ax2.barh(metrics['step_name'], metrics['conversion_from_prev'], color=colors)
    
    # Add percentage labels
    for i, (name, rate) in enumerate(zip(metrics['step_name'], metrics['conversion_from_prev'])):
        ax2.text(rate, i, f' {rate:.1f}%', va='center')
    
    ax2.set_xlabel('Conversion Rate (%)')
    ax2.set_title('Step-to-Step Conversion Rate')
    ax2.set_xlim(0, 105)
    ax2.invert_yaxis()
    
    plt.tight_layout()
    plt.savefig('funnel_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()

plot_funnel(funnel_metrics)
python
def plot_funnel(metrics):
    """Create funnel visualization"""
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Funnel chart (absolute numbers)
    ax1.barh(metrics['step_name'], metrics['users_reached'], 
             color=plt.cm.Blues(np.linspace(0.4, 0.8, len(metrics))))
    
    # Add value labels
    for i, (name, users) in enumerate(zip(metrics['step_name'], metrics['users_reached'])):
        ax1.text(users, i, f' {int(users):,}', va='center')
    
    ax1.set_xlabel('Users')
    ax1.set_title('Funnel: Absolute Users per Step')
    ax1.invert_yaxis()
    
    # Conversion rate chart
    colors = ['green' if rate >= 80 else 'orange' if rate >= 60 else 'red' 
              for rate in metrics['conversion_from_prev']]
    
    ax2.barh(metrics['step_name'], metrics['conversion_from_prev'], color=colors)
    
    # Add percentage labels
    for i, (name, rate) in enumerate(zip(metrics['step_name'], metrics['conversion_from_prev'])):
        ax2.text(rate, i, f' {rate:.1f}%', va='center')
    
    ax2.set_xlabel('Conversion Rate (%)')
    ax2.set_title('Step-to-Step Conversion Rate')
    ax2.set_xlim(0, 105)
    ax2.invert_yaxis()
    
    plt.tight_layout()
    plt.savefig('funnel_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()

plot_funnel(funnel_metrics)

Step 6: Analyze Drop-Off Points

步骤6:分析流失节点

python
def analyze_drop_offs(metrics):
    """Identify and prioritize drop-off points"""
    
    # Find biggest drop-off by absolute users
    biggest_drop = metrics.loc[metrics['drop_off'].idxmax()]
    
    # Find biggest drop-off by rate
    worst_conversion = metrics.loc[metrics['conversion_from_prev'].idxmin()]
    
    print("\n🔍 Drop-Off Analysis:")
    print(f"\n  Biggest Drop-Off (absolute):")
    print(f"    {biggest_drop['step_name']}")
    print(f"    Lost {biggest_drop['drop_off']:,} users ({biggest_drop['drop_off_rate']:.1f}%)")
    
    print(f"\n  Worst Conversion Rate:")
    print(f"    {worst_conversion['step_name']}")
    print(f"    Only {worst_conversion['conversion_from_prev']:.1f}% converted")
    
    # Categorize steps
    print(f"\n  Step Performance:")
    for _, row in metrics.iterrows():
        if row['step'] == 1:
            continue
        rate = row['conversion_from_prev']
        if rate >= 80:
            status = "✅ GOOD"
        elif rate >= 60:
            status = "⚠️  MODERATE"
        else:
            status = "🔴 POOR"
        print(f"    {status} {row['step_name']}: {rate:.1f}%")

analyze_drop_offs(funnel_metrics)
python
def analyze_drop_offs(metrics):
    """Identify and prioritize drop-off points"""
    
    # Find biggest drop-off by absolute users
    biggest_drop = metrics.loc[metrics['drop_off'].idxmax()]
    
    # Find biggest drop-off by rate
    worst_conversion = metrics.loc[metrics['conversion_from_prev'].idxmin()]
    
    print("\n🔍 Drop-Off Analysis:")
    print(f"\n  Biggest Drop-Off (absolute):")
    print(f"    {biggest_drop['step_name']}")
    print(f"    Lost {biggest_drop['drop_off']:,} users ({biggest_drop['drop_off_rate']:.1f}%)")
    
    print(f"\n  Worst Conversion Rate:")
    print(f"    {worst_conversion['step_name']}")
    print(f"    Only {worst_conversion['conversion_from_prev']:.1f}% converted")
    
    # Categorize steps
    print(f"\n  Step Performance:")
    for _, row in metrics.iterrows():
        if row['step'] == 1:
            continue
        rate = row['conversion_from_prev']
        if rate >= 80:
            status = "✅ GOOD"
        elif rate >= 60:
            status = "⚠️  MODERATE"
        else:
            status = "🔴 POOR"
        print(f"    {status} {row['step_name']}: {rate:.1f}%")

analyze_drop_offs(funnel_metrics)

Step 7: Time-to-Convert Analysis

步骤7:转化时长分析

python
def analyze_time_to_convert(funnel_df, funnel_steps):
    """Analyze how long users take at each step"""
    
    print("\n⏱️  Time to Convert Analysis:")
    
    for i in range(2, len(funnel_steps) + 1):
        # Calculate time between steps
        time_col = f'step_{i}_time'
        prev_time_col = f'step_{i-1}_time'
        
        converted = funnel_df[funnel_df[f'step_{i}'] == True].copy()
        
        if len(converted) == 0:
            continue
        
        converted['time_diff'] = (converted[time_col] - converted[prev_time_col]).dt.total_seconds() / 60
        
        print(f"\n  {funnel_steps[i-2]['name']}{funnel_steps[i-1]['name']}:")
        print(f"    Median: {converted['time_diff'].median():.1f} minutes")
        print(f"    P25: {converted['time_diff'].quantile(0.25):.1f} min")
        print(f"    P75: {converted['time_diff'].quantile(0.75):.1f} min")
        print(f"    P95: {converted['time_diff'].quantile(0.95):.1f} min")

analyze_time_to_convert(funnel_df, funnel_steps)
python
def analyze_time_to_convert(funnel_df, funnel_steps):
    """Analyze how long users take at each step"""
    
    print("\n⏱️  Time to Convert Analysis:")
    
    for i in range(2, len(funnel_steps) + 1):
        # Calculate time between steps
        time_col = f'step_{i}_time'
        prev_time_col = f'step_{i-1}_time'
        
        converted = funnel_df[funnel_df[f'step_{i}'] == True].copy()
        
        if len(converted) == 0:
            continue
        
        converted['time_diff'] = (converted[time_col] - converted[prev_time_col]).dt.total_seconds() / 60
        
        print(f"\n  {funnel_steps[i-2]['name']}{funnel_steps[i-1]['name']}:")
        print(f"    Median: {converted['time_diff'].median():.1f} minutes")
        print(f"    P25: {converted['time_diff'].quantile(0.25):.1f} min")
        print(f"    P75: {converted['time_diff'].quantile(0.75):.1f} min")
        print(f"    P95: {converted['time_diff'].quantile(0.95):.1f} min")

analyze_time_to_convert(funnel_df, funnel_steps)

Step 8: Segment Comparison

步骤8:分群对比分析

python
def compare_segments(events, funnel_df, segment_col='channel'):
    """Compare funnel performance across segments"""
    
    # Add segment info to funnel data
    user_segments = events[['user_id', segment_col]].drop_duplicates('user_id')
    funnel_with_segment = funnel_df.merge(user_segments, on='user_id', how='left')
    
    print(f"\n📊 Funnel by {segment_col.title()}:")
    
    segment_metrics = []
    for segment in funnel_with_segment[segment_col].unique():
        segment_data = funnel_with_segment[funnel_with_segment[segment_col] == segment]
        segment_funnel = calculate_funnel_metrics(segment_data, funnel_steps)
        
        # Overall conversion rate (top to bottom)
        overall_conversion = segment_funnel.iloc[-1]['conversion_from_top']
        
        segment_metrics.append({
            'segment': segment,
            'users': len(segment_data),
            'overall_conversion': overall_conversion
        })
        
        print(f"\n  {segment}:")
        print(f"    Users: {len(segment_data):,}")
        print(f"    End-to-End Conversion: {overall_conversion:.1f}%")
        
        # Show biggest drop-off for this segment
        worst = segment_funnel.loc[segment_funnel['conversion_from_prev'].idxmin()]
        print(f"    Worst Step: {worst['step_name']} ({worst['conversion_from_prev']:.1f}%)")
    
    # Compare segments
    segment_comparison = pd.DataFrame(segment_metrics).sort_values('overall_conversion', ascending=False)
    
    print(f"\n  Segment Ranking:")
    for _, row in segment_comparison.iterrows():
        print(f"    {row['segment']}: {row['overall_conversion']:.1f}%")
python
def compare_segments(events, funnel_df, segment_col='channel'):
    """Compare funnel performance across segments"""
    
    # Add segment info to funnel data
    user_segments = events[['user_id', segment_col]].drop_duplicates('user_id')
    funnel_with_segment = funnel_df.merge(user_segments, on='user_id', how='left')
    
    print(f"\n📊 Funnel by {segment_col.title()}:")
    
    segment_metrics = []
    for segment in funnel_with_segment[segment_col].unique():
        segment_data = funnel_with_segment[funnel_with_segment[segment_col] == segment]
        segment_funnel = calculate_funnel_metrics(segment_data, funnel_steps)
        
        # Overall conversion rate (top to bottom)
        overall_conversion = segment_funnel.iloc[-1]['conversion_from_top']
        
        segment_metrics.append({
            'segment': segment,
            'users': len(segment_data),
            'overall_conversion': overall_conversion
        })
        
        print(f"\n  {segment}:")
        print(f"    Users: {len(segment_data):,}")
        print(f"    End-to-End Conversion: {overall_conversion:.1f}%")
        
        # Show biggest drop-off for this segment
        worst = segment_funnel.loc[segment_funnel['conversion_from_prev'].idxmin()]
        print(f"    Worst Step: {worst['step_name']} ({worst['conversion_from_prev']:.1f}%)")
    
    # Compare segments
    segment_comparison = pd.DataFrame(segment_metrics).sort_values('overall_conversion', ascending=False)
    
    print(f"\n  Segment Ranking:")
    for _, row in segment_comparison.iterrows():
        print(f"    {row['segment']}: {row['overall_conversion']:.1f}%")

Example: Compare by channel

Example: Compare by channel

if 'channel' in events.columns: compare_segments(events, funnel_df, 'channel')
undefined
if 'channel' in events.columns: compare_segments(events, funnel_df, 'channel')
undefined

Context Validation

上下文验证

Before proceeding, verify:
  • Funnel steps are clearly defined and in correct order
  • Event data includes all necessary steps
  • Time window makes sense for the user journey
  • Success criteria for each step is unambiguous
  • Have user IDs to track individuals through funnel
开始分析前,请确认以下内容:
  • 漏斗步骤已清晰定义且顺序正确
  • 事件数据包含所有必要步骤的记录
  • 时间窗口符合用户旅程的实际情况
  • 每个步骤的成功标准无歧义
  • 已具备用户ID用于追踪个体用户的漏斗路径

Output Template

输出报告模板

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
FUNNEL ANALYSIS REPORT
E-commerce Purchase Funnel
Period: Dec 1-31, 2024
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📊 FUNNEL OVERVIEW
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Total Users Entered: 50,000
Overall Conversion: 12.5% (6,250 purchases)

Step                    Users      Conv%    Drop-Off
──────────────────────────────────────────────────
1. View Product        50,000     100.0%         -
2. Add to Cart         35,000      70.0%    30.0%
3. Begin Checkout      21,000      60.0%    40.0%
4. Payment Info        15,750      75.0%    25.0%
5. Complete Purchase    6,250      39.7%    60.3%

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔍 KEY FINDINGS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔴 CRITICAL DROP-OFF:
   Complete Purchase (Step 5)
   - Only 39.7% complete after entering payment
   - Losing 9,500 users at final step
   - Potential revenue impact: $285,000

⚠️  MODERATE DROP-OFF:
   Begin Checkout (Step 3)
   - 40% abandon cart before checkout
   - Losing 14,000 users

✅ GOOD PERFORMANCE:
   Add to Cart (Step 2): 70% conversion
   Payment Info (Step 4): 75% conversion

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⏱️  TIME TO CONVERT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

View → Add to Cart:     Median 2.3 min
Add to Cart → Checkout: Median 8.5 min
Checkout → Payment:     Median 3.1 min
Payment → Complete:     Median 1.2 min

Total Journey: Median 15.1 minutes

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📱 SEGMENT COMPARISON
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

By Device:
  Desktop:  15.2% conversion (30,000 users)
  Mobile:   10.1% conversion (20,000 users)
  
  Gap: Mobile 33% lower conversion
  Worst Mobile Step: Complete Purchase (28% vs 45% desktop)

By Channel:
  Organic:  14.3% conversion
  Paid:     11.8% conversion  
  Email:    16.7% conversion

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 RECOMMENDATIONS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

PRIORITY 1 (High Impact):
1. Investigate payment completion drop-off
   - Review error messages at payment step
   - Check mobile payment UX
   - Consider guest checkout option
   - Potential gain: +3,000 conversions/month

PRIORITY 2 (Medium Impact):
2. Reduce cart abandonment
   - Add save cart feature
   - Send abandonment emails
   - Show trust signals earlier
   - Potential gain: +2,000 conversions/month

PRIORITY 3 (Mobile Optimization):
3. Improve mobile experience
   - Simplify mobile checkout flow
   - Optimize for smaller screens
   - Potential gain: +1,000 conversions/month

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📁 FILES GENERATED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✓ funnel_analysis.png (visualization)
✓ funnel_metrics.csv (detailed metrics)
✓ user_journeys.csv (individual user paths)
✓ segment_comparison.csv (breakdown by segment)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
漏斗分析报告
电商购买漏斗
分析周期:2024年12月1日-31日
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📊 漏斗概览
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

进入漏斗总用户数:50,000
整体转化率:12.5%(6,250笔订单)

步骤                用户数      转化率    流失率
──────────────────────────────────────────────────
1. 浏览商品页        50,000     100.0%         -
2. 添加至购物车      35,000      70.0%    30.0%
3. 开始结账          21,000      60.0%    40.0%
4. 填写支付信息      15,750      75.0%    25.0%
5. 完成购买          6,250      39.7%    60.3%

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔍 关键发现
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔴 严重流失节点:
   完成购买(步骤5)
   - 填写支付信息后仅39.7%的用户完成购买
   - 此节点流失9,500名用户
   - 潜在收入影响:$285,000

⚠️  中度流失节点:
   开始结账(步骤3)
   - 40%的用户在结账前放弃购物车
   - 此节点流失14,000名用户

✅ 表现良好节点:
   添加至购物车(步骤2):70%转化率
   填写支付信息(步骤4):75%转化率

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⏱️  转化时长
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

浏览 → 添加至购物车:     中位数2.3分钟
添加至购物车 → 开始结账: 中位数8.5分钟
开始结账 → 填写支付信息: 中位数3.1分钟
填写支付信息 → 完成购买: 中位数1.2分钟

全程时长:中位数15.1分钟

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📱 分群对比
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

按设备分群:
  桌面端:15.2%转化率(30,000名用户)
  移动端:10.1%转化率(20,000名用户)
  
  差距:移动端转化率比桌面端低33%
  移动端最差节点:完成购买(28% vs 桌面端45%)

按渠道分群:
  自然流量:14.3%转化率
  付费流量:11.8%转化率  
  邮件流量:16.7%转化率

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 优化建议
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

优先级1(高影响):
1. 调研支付完成环节的流失原因
   - 检查支付环节的错误提示信息
   - 优化移动端支付体验
   - 考虑提供游客结账选项
   - 潜在收益:每月增加3,000笔转化

优先级2(中影响):
2. 降低购物车放弃率
   - 添加购物车保存功能
   - 发送购物车放弃提醒邮件
   - 提前展示信任标识
   - 潜在收益:每月增加2,000笔转化

优先级3(移动端优化):
3. 提升移动端体验
   - 简化移动端结账流程
   - 针对小屏幕优化界面
   - 潜在收益:每月增加1,000笔转化

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📁 生成文件
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✓ funnel_analysis.png(可视化图表)
✓ funnel_metrics.csv(详细指标数据)
✓ user_journeys.csv(个体用户路径数据)
✓ segment_comparison.csv(分群对比数据)

Common Scenarios

常见场景

Scenario 1: "Why is our signup funnel performing poorly?"

场景1:“为什么我们的注册漏斗表现不佳?”

→ Build funnel from landing → signup → activation → Identify biggest drop-off step → Compare segments (source, device, etc.) → Analyze time-to-convert at each step → Provide specific recommendations
→ 构建从落地页→注册→激活的漏斗 → 识别最大流失节点 → 对比不同分群(来源、设备等)的表现 → 分析各步骤的转化时长 → 提供具体优化建议

Scenario 2: "Mobile conversion is lower than desktop"

场景2:“移动端转化率低于桌面端”

→ Run funnel analysis separately for each device → Identify which step(s) mobile underperforms → Compare time-to-convert (mobile users slower?) → Highlight specific mobile UX issues
→ 分别为移动端和桌面端运行漏斗分析 → 识别移动端表现不佳的步骤 → 对比转化时长(移动端用户是否操作更慢?) → 突出移动端的具体体验问题

Scenario 3: "Test if new checkout flow improved conversion"

场景3:“测试新结账流程是否提升了转化率”

→ Compare funnel before/after change → Calculate statistical significance of difference → Show which specific steps improved → Measure overall impact
→ 对比流程变更前后的漏斗表现 → 计算转化率差异的统计显著性 → 展示哪些步骤的表现得到了提升 → 衡量整体业务影响

Scenario 4: "Optimize onboarding for different user types"

场景4:“针对不同用户类型优化注册流程”

→ Segment by user type (free, trial, paid) → Build separate funnels for each → Identify where each segment drops off → Create targeted interventions
→ 按用户类型(免费用户、试用用户、付费用户)分群 → 为每个分群构建独立漏斗 → 识别各分群的流失节点 → 制定针对性的干预措施

Scenario 5: "Track funnel performance over time"

场景5:“追踪漏斗性能的时间趋势”

→ Calculate weekly/monthly funnel metrics → Show trend in conversion rates → Flag when performance degrades → Correlate with product changes
→ 计算每周/每月的漏斗指标 → 展示转化率的变化趋势 → 标记性能下降的时间段 → 关联产品变更事件分析原因

Handling Missing Context

缺失上下文的处理方式

User says "analyze our funnel" without defining steps: "I can help! First, what's the user journey you want to analyze? Example: Landing page → Signup → Onboarding → Activation. What are your steps?"
User doesn't know time window: "Let me analyze the data to see typical completion times, then we can decide on an appropriate window. Most users complete within X days."
Event data is messy: "I see multiple event names that might represent the same step. Let me map them:
  • 'view_product', 'product_page' → Step 1?
  • 'add_cart', 'added_to_cart' → Step 2? Does this look right?"
User wants to compare many segments: "I can analyze all segments, but let's prioritize. Which 2-3 segments matter most for decision-making?"
用户仅说“分析我们的漏斗”但未定义步骤: “我可以帮你分析!首先,请告诉我你要分析的用户旅程是什么?例如:落地页→注册→流程引导→激活。你的漏斗步骤是什么?”
用户不确定时间窗口: “我可以先分析数据,看看用户完成漏斗的典型时长,再确定合适的时间窗口。目前多数用户在X天内完成漏斗。”
事件数据混乱: “我发现多个事件名称可能对应同一个步骤,我来做一下映射:
  • 'view_product', 'product_page' → 步骤1?
  • 'add_cart', 'added_to_cart' → 步骤2? 这样的映射是否正确?”
用户想要对比多个分群: “我可以分析所有分群,但我们先优先处理对你决策最重要的2-3个分群维度,可以吗?”

Advanced Options

高级分析选项

After basic funnel analysis, offer:
Cohort-Based Funnels: "Want to see how funnel performance changes over time? I can show conversion rates by signup cohort."
Micro-Conversion Analysis: "I can break down each major step into micro-steps to find exactly where users hesitate."
Drop-Off Prediction: "Using behavior patterns, I can predict which users are likely to drop off and when."
Recovery Analysis: "I can identify users who dropped off but later returned to complete the funnel."
Funnel Optimization Calculator: "I can estimate revenue impact of improving conversion at each step by X%."
A/B Test Power Analysis: "Planning to test funnel changes? I can calculate required sample size for statistical significance."
完成基础漏斗分析后,可提供以下高级分析:
同期群漏斗分析: “想看看漏斗性能随时间的变化吗?我可以按注册同期群展示转化率趋势。”
微转化分析: “我可以将每个主要步骤拆分为微步骤,精准定位用户的犹豫节点。”
流失预测: “基于用户行为模式,我可以预测哪些用户可能会流失以及流失的时间点。”
回流分析: “我可以识别那些中途流失但后续返回完成漏斗的用户。”
漏斗优化收益计算器: “我可以估算每个步骤转化率提升X%后的潜在收入影响。”
A/B测试功效分析: “计划测试漏斗变更?我可以计算达到统计显著性所需的样本量。”