employee-performance-analytics-hr

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Employee Performance Analytics HR Skill

员工绩效分析HR工具

Skill by ara.so — Data Skills collection.
ara.so开发的工具 — 数据工具合集。

Overview

概述

Employee Performance Analytics is a Python and SQL-based HR analytics tool that transforms employee data into actionable insights. It uses SQLite for KPI aggregation and pandas/matplotlib for visualization, generating departmental performance reports, efficiency metrics, and workload analysis.
The project provides an end-to-end analytics pipeline: data loading → SQL feature engineering → Python analysis → visualization exports.
员工绩效分析是一款基于Python和SQL的HR分析工具,可将员工数据转化为可执行的洞察。它使用SQLite进行KPI聚合,借助pandas/matplotlib实现可视化,生成部门绩效报告、效率指标及工作量分析结果。
该项目提供端到端的分析流程:数据加载 → SQL特征工程 → Python分析 → 可视化导出。

Installation

安装

bash
undefined
bash
undefined

Clone the repository

克隆仓库

Create virtual environment

创建虚拟环境

python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
python -m venv .venv source .venv/bin/activate # Windows系统执行:.venv\Scripts\activate

Install dependencies

安装依赖

pip install -r requirements.txt
undefined
pip install -r requirements.txt
undefined

Dependencies

依赖项

txt
pandas>=2.0.0
matplotlib>=3.7.0
seaborn>=0.12.0
sqlite3  # Built-in with Python
numpy>=1.24.0
txt
pandas>=2.0.0
matplotlib>=3.7.0
seaborn>=0.12.0
sqlite3  # Python内置库
numpy>=1.24.0

Project Structure

项目结构

employee-performance-analytics/
├── data/
│   └── employees.csv          # Raw employee data
├── src/
│   ├── create_db.py          # CSV to SQLite loader
│   ├── queries.sql           # SQL KPI queries
│   ├── analyze_performance.py # Main analysis script
│   └── utils.py              # Helper functions
└── outputs/
    ├── department_kpis.csv
    ├── performance_summary.csv
    └── charts/               # Generated visualizations
employee-performance-analytics/
├── data/
│   └── employees.csv          # 原始员工数据
├── src/
│   ├── create_db.py          # CSV转SQLite加载器
│   ├── queries.sql           # SQL KPI查询语句
│   ├── analyze_performance.py # 主分析脚本
│   └── utils.py              # 辅助函数
└── outputs/
    ├── department_kpis.csv
    ├── performance_summary.csv
    └── charts/               # 生成的可视化图表

Data Schema

数据结构

The project expects employee data with these columns:
ColumnTypeDescription
employee_id
intUnique identifier
name
stringEmployee name
department
stringDepartment (Engineering, Sales, etc.)
role
stringJob title
date
dateRecord date (YYYY-MM-DD)
tasks_completed
intDaily tasks completed
hours_worked
floatHours worked
rating
floatPerformance rating (1-5)
projects
intActive projects
absences
int1 if absent, 0 otherwise
项目所需的员工数据需包含以下列:
列名类型描述
employee_id
int唯一标识符
name
string员工姓名
department
string部门(如工程、销售等)
role
string职位名称
date
date记录日期(YYYY-MM-DD)
tasks_completed
int每日完成任务数
hours_worked
float工作时长
rating
float绩效评分(1-5)
projects
int参与的活跃项目数
absences
int缺勤标记(1表示缺勤,0表示出勤)

Key Commands

核心命令

1. Load Data into SQLite

1. 将数据加载到SQLite

bash
python src/create_db.py --csv data/employees.csv --db hr.db
Options:
  • --csv
    : Path to input CSV file
  • --db
    : Output SQLite database path
  • --table
    : Table name (default:
    employees
    )
bash
python src/create_db.py --csv data/employees.csv --db hr.db
选项说明:
  • --csv
    : 输入CSV文件路径
  • --db
    : 输出SQLite数据库路径
  • --table
    : 表名(默认值:
    employees

2. Run Performance Analysis

2. 运行绩效分析

bash
python src/analyze_performance.py --db hr.db --sql src/queries.sql --outdir outputs
Options:
  • --db
    : Path to SQLite database
  • --sql
    : Path to SQL queries file
  • --outdir
    : Output directory for CSV reports and charts
bash
python src/analyze_performance.py --db hr.db --sql src/queries.sql --outdir outputs
选项说明:
  • --db
    : SQLite数据库路径
  • --sql
    : SQL查询文件路径
  • --outdir
    : CSV报告和图表的输出目录

Core SQL Queries

核心SQL查询语句

The
queries.sql
file contains three main analytical views:
queries.sql
文件包含三个主要分析视图:

Department KPIs

部门KPI

sql
-- Department-level performance metrics
CREATE VIEW IF NOT EXISTS department_kpis AS
SELECT
    department,
    COUNT(DISTINCT employee_id) AS employee_count,
    AVG(rating) AS avg_rating,
    SUM(tasks_completed) AS total_tasks,
    SUM(hours_worked) AS total_hours,
    ROUND(AVG(CAST(absences AS FLOAT)), 2) AS absence_rate,
    ROUND(SUM(tasks_completed) * 1.0 / SUM(hours_worked), 2) AS tasks_per_hour
FROM employees
GROUP BY department
ORDER BY avg_rating DESC;
sql
-- 部门级绩效指标
CREATE VIEW IF NOT EXISTS department_kpis AS
SELECT
    department,
    COUNT(DISTINCT employee_id) AS employee_count,
    AVG(rating) AS avg_rating,
    SUM(tasks_completed) AS total_tasks,
    SUM(hours_worked) AS total_hours,
    ROUND(AVG(CAST(absences AS FLOAT)), 2) AS absence_rate,
    ROUND(SUM(tasks_completed) * 1.0 / SUM(hours_worked), 2) AS tasks_per_hour
FROM employees
GROUP BY department
ORDER BY avg_rating DESC;

Employee Summary

员工绩效汇总

sql
-- Individual employee performance aggregation
CREATE VIEW IF NOT EXISTS employee_summary AS
SELECT
    employee_id,
    name,
    department,
    role,
    SUM(tasks_completed) AS total_tasks,
    SUM(hours_worked) AS total_hours,
    AVG(rating) AS avg_rating,
    COUNT(DISTINCT projects) AS project_count,
    SUM(absences) AS total_absences,
    ROUND(SUM(tasks_completed) * 1.0 / SUM(hours_worked), 2) AS efficiency
FROM employees
GROUP BY employee_id, name, department, role
ORDER BY efficiency DESC;
sql
undefined

Daily Productivity

员工个人绩效聚合

sql
-- Day-by-day productivity tracking
CREATE VIEW IF NOT EXISTS daily_productivity AS
SELECT
    date,
    department,
    SUM(tasks_completed) AS daily_tasks,
    SUM(hours_worked) AS daily_hours,
    AVG(rating) AS daily_rating
FROM employees
GROUP BY date, department
ORDER BY date, department;
CREATE VIEW IF NOT EXISTS employee_summary AS SELECT employee_id, name, department, role, SUM(tasks_completed) AS total_tasks, SUM(hours_worked) AS total_hours, AVG(rating) AS avg_rating, COUNT(DISTINCT projects) AS project_count, SUM(absences) AS total_absences, ROUND(SUM(tasks_completed) * 1.0 / SUM(hours_worked), 2) AS efficiency FROM employees GROUP BY employee_id, name, department, role ORDER BY efficiency DESC;
undefined

Python API Usage

日常生产力

Creating Database from CSV

python
import pandas as pd
import sqlite3

def create_database(csv_path, db_path, table_name='employees'):
    """Load CSV into SQLite database."""
    df = pd.read_csv(csv_path)
    
    # Data validation
    required_cols = ['employee_id', 'name', 'department', 'date', 
                     'tasks_completed', 'hours_worked', 'rating']
    assert all(col in df.columns for col in required_cols), "Missing required columns"
    
    # Create database
    conn = sqlite3.connect(db_path)
    df.to_sql(table_name, conn, if_exists='replace', index=False)
    conn.close()
    print(f"✓ Database created: {db_path}")
sql
undefined

Usage

逐日生产力追踪

create_database('data/employees.csv', 'hr.db')
undefined
CREATE VIEW IF NOT EXISTS daily_productivity AS SELECT date, department, SUM(tasks_completed) AS daily_tasks, SUM(hours_worked) AS daily_hours, AVG(rating) AS daily_rating FROM employees GROUP BY date, department ORDER BY date, department;
undefined

Running SQL Queries

Python API 使用示例

从CSV创建数据库

python
import sqlite3
import pandas as pd

def execute_sql_file(db_path, sql_file_path):
    """Execute SQL script and return results."""
    conn = sqlite3.connect(db_path)
    
    with open(sql_file_path, 'r') as f:
        sql_script = f.read()
    
    # Execute all statements
    cursor = conn.cursor()
    cursor.executescript(sql_script)
    conn.commit()
    
    # Fetch view results
    dept_kpis = pd.read_sql_query("SELECT * FROM department_kpis", conn)
    emp_summary = pd.read_sql_query("SELECT * FROM employee_summary", conn)
    daily_prod = pd.read_sql_query("SELECT * FROM daily_productivity", conn)
    
    conn.close()
    
    return dept_kpis, emp_summary, daily_prod
python
import pandas as pd
import sqlite3

def create_database(csv_path, db_path, table_name='employees'):
    """将CSV加载到SQLite数据库中。"""
    df = pd.read_csv(csv_path)
    
    # 数据验证
    required_cols = ['employee_id', 'name', 'department', 'date', 
                     'tasks_completed', 'hours_worked', 'rating']
    assert all(col in df.columns for col in required_cols), "缺少必填列"
    
    # 创建数据库
    conn = sqlite3.connect(db_path)
    df.to_sql(table_name, conn, if_exists='replace', index=False)
    conn.close()
    print(f"✓ 数据库已创建:{db_path}")

Generating Visualizations

使用示例

python
import matplotlib.pyplot as plt
import seaborn as sns

def plot_department_ratings(dept_kpis, output_path):
    """Bar chart of average rating by department."""
    plt.figure(figsize=(12, 7))
    
    sns.barplot(
        data=dept_kpis,
        x='department',
        y='avg_rating',
        palette='viridis'
    )
    
    plt.title('Average Performance Rating by Department', fontsize=16, weight='bold')
    plt.xlabel('Department', fontsize=12)
    plt.ylabel('Average Rating', fontsize=12)
    plt.xticks(rotation=45, ha='right')
    plt.ylim(0, 5)
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close()

def plot_performance_vs_hours(emp_summary, output_path):
    """Scatter plot of tasks vs hours worked."""
    plt.figure(figsize=(12, 7))
    
    scatter = plt.scatter(
        emp_summary['total_hours'],
        emp_summary['total_tasks'],
        c=emp_summary['avg_rating'],
        cmap='RdYlGn',
        s=100,
        alpha=0.6,
        edgecolors='black'
    )
    
    plt.colorbar(scatter, label='Avg Rating')
    plt.title('Tasks Completed vs Hours Worked', fontsize=16, weight='bold')
    plt.xlabel('Total Hours Worked', fontsize=12)
    plt.ylabel('Total Tasks Completed', fontsize=12)
    plt.grid(alpha=0.3)
    plt.tight_layout()
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close()

def plot_efficiency_distribution(emp_summary, output_path):
    """Histogram of task completion rate."""
    plt.figure(figsize=(12, 7))
    
    plt.hist(
        emp_summary['efficiency'].dropna(),
        bins=30,
        color='steelblue',
        edgecolor='black',
        alpha=0.7
    )
    
    plt.axvline(
        emp_summary['efficiency'].median(),
        color='red',
        linestyle='--',
        linewidth=2,
        label=f"Median: {emp_summary['efficiency'].median():.2f}"
    )
    
    plt.title('Task Completion Rate Distribution', fontsize=16, weight='bold')
    plt.xlabel('Tasks per Hour', fontsize=12)
    plt.ylabel('Number of Employees', fontsize=12)
    plt.legend()
    plt.grid(alpha=0.3)
    plt.tight_layout()
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close()
create_database('data/employees.csv', 'hr.db')
undefined

Complete Analysis Pipeline

执行SQL查询

python
import os
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

class HRAnalyzer:
    """Complete HR analytics pipeline."""
    
    def __init__(self, db_path, sql_path, output_dir):
        self.db_path = db_path
        self.sql_path = sql_path
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(parents=True, exist_ok=True)
        (self.output_dir / 'charts').mkdir(exist_ok=True)
        
    def load_data(self):
        """Execute SQL and load analytical views."""
        conn = sqlite3.connect(self.db_path)
        
        # Execute SQL script
        with open(self.sql_path, 'r') as f:
            conn.executescript(f.read())
        
        # Load views
        self.dept_kpis = pd.read_sql_query("SELECT * FROM department_kpis", conn)
        self.emp_summary = pd.read_sql_query("SELECT * FROM employee_summary", conn)
        self.daily_prod = pd.read_sql_query("SELECT * FROM daily_productivity", conn)
        
        conn.close()
        print("✓ Data loaded from SQL views")
        
    def export_reports(self):
        """Save CSV reports."""
        self.dept_kpis.to_csv(
            self.output_dir / 'department_kpis.csv',
            index=False
        )
        self.emp_summary.to_csv(
            self.output_dir / 'performance_summary.csv',
            index=False
        )
        print(f"✓ Reports saved to {self.output_dir}")
        
    def generate_visualizations(self):
        """Create all performance charts."""
        charts_dir = self.output_dir / 'charts'
        
        # Department ratings
        plt.figure(figsize=(12, 7))
        sns.barplot(data=self.dept_kpis, x='department', y='avg_rating', palette='viridis')
        plt.title('Average Rating by Department', fontsize=16, weight='bold')
        plt.xticks(rotation=45, ha='right')
        plt.tight_layout()
        plt.savefig(charts_dir / 'avg_rating_by_department.png', dpi=300)
        plt.close()
        
        # Performance vs hours
        plt.figure(figsize=(12, 7))
        plt.scatter(
            self.emp_summary['total_hours'],
            self.emp_summary['total_tasks'],
            c=self.emp_summary['avg_rating'],
            cmap='RdYlGn',
            s=100,
            alpha=0.6
        )
        plt.colorbar(label='Avg Rating')
        plt.title('Performance vs Hours Worked', fontsize=16, weight='bold')
        plt.xlabel('Total Hours')
        plt.ylabel('Total Tasks')
        plt.tight_layout()
        plt.savefig(charts_dir / 'performance_vs_hours.png', dpi=300)
        plt.close()
        
        # Efficiency distribution
        plt.figure(figsize=(12, 7))
        plt.hist(self.emp_summary['efficiency'].dropna(), bins=30, color='steelblue', edgecolor='black')
        plt.axvline(self.emp_summary['efficiency'].median(), color='red', linestyle='--', linewidth=2)
        plt.title('Task Completion Rate Distribution', fontsize=16, weight='bold')
        plt.xlabel('Tasks per Hour')
        plt.ylabel('Count')
        plt.tight_layout()
        plt.savefig(charts_dir / 'task_completion_rate.png', dpi=300)
        plt.close()
        
        print(f"✓ Visualizations saved to {charts_dir}")
        
    def run_full_analysis(self):
        """Execute complete analytics pipeline."""
        self.load_data()
        self.export_reports()
        self.generate_visualizations()
        print("✓ Analysis complete!")
python
import sqlite3
import pandas as pd

def execute_sql_file(db_path, sql_file_path):
    """执行SQL脚本并返回结果。"""
    conn = sqlite3.connect(db_path)
    
    with open(sql_file_path, 'r') as f:
        sql_script = f.read()
    
    # 执行所有语句
    cursor = conn.cursor()
    cursor.executescript(sql_script)
    conn.commit()
    
    # 获取视图结果
    dept_kpis = pd.read_sql_query("SELECT * FROM department_kpis", conn)
    emp_summary = pd.read_sql_query("SELECT * FROM employee_summary", conn)
    daily_prod = pd.read_sql_query("SELECT * FROM daily_productivity", conn)
    
    conn.close()
    
    return dept_kpis, emp_summary, daily_prod

Usage

生成可视化图表

analyzer = HRAnalyzer( db_path='hr.db', sql_path='src/queries.sql', output_dir='outputs' ) analyzer.run_full_analysis()
undefined
python
import matplotlib.pyplot as plt
import seaborn as sns

def plot_department_ratings(dept_kpis, output_path):
    """按部门展示平均评分的柱状图。"""
    plt.figure(figsize=(12, 7))
    
    sns.barplot(
        data=dept_kpis,
        x='department',
        y='avg_rating',
        palette='viridis'
    )
    
    plt.title('各部门平均绩效评分', fontsize=16, weight='bold')
    plt.xlabel('部门', fontsize=12)
    plt.ylabel('平均评分', fontsize=12)
    plt.xticks(rotation=45, ha='right')
    plt.ylim(0, 5)
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close()

def plot_performance_vs_hours(emp_summary, output_path):
    """任务完成数与工作时长的散点图。"""
    plt.figure(figsize=(12, 7))
    
    scatter = plt.scatter(
        emp_summary['total_hours'],
        emp_summary['total_tasks'],
        c=emp_summary['avg_rating'],
        cmap='RdYlGn',
        s=100,
        alpha=0.6,
        edgecolors='black'
    )
    
    plt.colorbar(scatter, label='平均评分')
    plt.title('任务完成数与工作时长对比', fontsize=16, weight='bold')
    plt.xlabel('总工作时长', fontsize=12)
    plt.ylabel('总任务完成数', fontsize=12)
    plt.grid(alpha=0.3)
    plt.tight_layout()
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close()

def plot_efficiency_distribution(emp_summary, output_path):
    """任务完成率分布直方图。"""
    plt.figure(figsize=(12, 7))
    
    plt.hist(
        emp_summary['efficiency'].dropna(),
        bins=30,
        color='steelblue',
        edgecolor='black',
        alpha=0.7
    )
    
    plt.axvline(
        emp_summary['efficiency'].median(),
        color='red',
        linestyle='--',
        linewidth=2,
        label=f"中位数: {emp_summary['efficiency'].median():.2f}"
    )
    
    plt.title('任务完成率分布', fontsize=16, weight='bold')
    plt.xlabel('每小时任务数', fontsize=12)
    plt.ylabel('员工数量', fontsize=12)
    plt.legend()
    plt.grid(alpha=0.3)
    plt.tight_layout()
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    plt.close()

Common Patterns

完整分析流程

Custom KPI Queries

python
def get_top_performers(db_path, n=10):
    """Retrieve top N employees by efficiency."""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        name, 
        department, 
        efficiency, 
        avg_rating
    FROM employee_summary
    ORDER BY efficiency DESC
    LIMIT ?
    """
    top_performers = pd.read_sql_query(query, conn, params=(n,))
    conn.close()
    return top_performers

def get_department_trends(db_path, department):
    """Get time-series data for specific department."""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        date, 
        daily_tasks, 
        daily_hours, 
        daily_rating
    FROM daily_productivity
    WHERE department = ?
    ORDER BY date
    """
    trends = pd.read_sql_query(query, conn, params=(department,))
    conn.close()
    return trends
python
import os
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

class HRAnalyzer:
    """完整HR分析流程。"""
    
    def __init__(self, db_path, sql_path, output_dir):
        self.db_path = db_path
        self.sql_path = sql_path
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(parents=True, exist_ok=True)
        (self.output_dir / 'charts').mkdir(exist_ok=True)
        
    def load_data(self):
        """执行SQL并加载分析视图。"""
        conn = sqlite3.connect(self.db_path)
        
        # 执行SQL脚本
        with open(self.sql_path, 'r') as f:
            conn.executescript(f.read())
        
        # 加载视图
        self.dept_kpis = pd.read_sql_query("SELECT * FROM department_kpis", conn)
        self.emp_summary = pd.read_sql_query("SELECT * FROM employee_summary", conn)
        self.daily_prod = pd.read_sql_query("SELECT * FROM daily_productivity", conn)
        
        conn.close()
        print("✓ 已从SQL视图加载数据")
        
    def export_reports(self):
        """保存CSV报告。"""
        self.dept_kpis.to_csv(
            self.output_dir / 'department_kpis.csv',
            index=False
        )
        self.emp_summary.to_csv(
            self.output_dir / 'performance_summary.csv',
            index=False
        )
        print(f"✓ 报告已保存至 {self.output_dir}")
        
    def generate_visualizations(self):
        """生成所有绩效图表。"""
        charts_dir = self.output_dir / 'charts'
        
        # 部门评分
        plt.figure(figsize=(12, 7))
        sns.barplot(data=self.dept_kpis, x='department', y='avg_rating', palette='viridis')
        plt.title('各部门平均评分', fontsize=16, weight='bold')
        plt.xticks(rotation=45, ha='right')
        plt.tight_layout()
        plt.savefig(charts_dir / 'avg_rating_by_department.png', dpi=300)
        plt.close()
        
        # 绩效与工作时长对比
        plt.figure(figsize=(12, 7))
        plt.scatter(
            self.emp_summary['total_hours'],
            self.emp_summary['total_tasks'],
            c=self.emp_summary['avg_rating'],
            cmap='RdYlGn',
            s=100,
            alpha=0.6
        )
        plt.colorbar(label='平均评分')
        plt.title('绩效与工作时长对比', fontsize=16, weight='bold')
        plt.xlabel('总工作时长')
        plt.ylabel('总任务完成数')
        plt.tight_layout()
        plt.savefig(charts_dir / 'performance_vs_hours.png', dpi=300)
        plt.close()
        
        # 效率分布
        plt.figure(figsize=(12, 7))
        plt.hist(self.emp_summary['efficiency'].dropna(), bins=30, color='steelblue', edgecolor='black')
        plt.axvline(self.emp_summary['efficiency'].median(), color='red', linestyle='--', linewidth=2)
        plt.title('任务完成率分布', fontsize=16, weight='bold')
        plt.xlabel('每小时任务数')
        plt.ylabel('员工数量')
        plt.tight_layout()
        plt.savefig(charts_dir / 'task_completion_rate.png', dpi=300)
        plt.close()
        
        print(f"✓ 可视化图表已保存至 {charts_dir}")
        
    def run_full_analysis(self):
        """执行完整分析流程。"""
        self.load_data()
        self.export_reports()
        self.generate_visualizations()
        print("✓ 分析完成!")

Filtering and Aggregation

使用示例

python
def analyze_by_role(db_path, role_filter):
    """Aggregate performance by specific role."""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        role,
        AVG(rating) as avg_rating,
        AVG(tasks_completed) as avg_tasks,
        AVG(hours_worked) as avg_hours
    FROM employees
    WHERE role LIKE ?
    GROUP BY role
    """
    role_stats = pd.read_sql_query(query, conn, params=(f'%{role_filter}%',))
    conn.close()
    return role_stats
analyzer = HRAnalyzer( db_path='hr.db', sql_path='src/queries.sql', output_dir='outputs' ) analyzer.run_full_analysis()
undefined

Adding New Metrics

常见应用模式

自定义KPI查询

python
def calculate_workload_balance(emp_summary):
    """Calculate workload balance score."""
    emp_summary['workload_score'] = (
        emp_summary['total_tasks'] / emp_summary['total_tasks'].max() * 0.4 +
        emp_summary['total_hours'] / emp_summary['total_hours'].max() * 0.3 +
        emp_summary['avg_rating'] / 5 * 0.3
    )
    return emp_summary
python
def get_top_performers(db_path, n=10):
    """获取效率最高的前N名员工。"""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        name, 
        department, 
        efficiency, 
        avg_rating
    FROM employee_summary
    ORDER BY efficiency DESC
    LIMIT ?
    """
    top_performers = pd.read_sql_query(query, conn, params=(n,))
    conn.close()
    return top_performers

def get_department_trends(db_path, department):
    """获取特定部门的时间序列数据。"""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        date, 
        daily_tasks, 
        daily_hours, 
        daily_rating
    FROM daily_productivity
    WHERE department = ?
    ORDER BY date
    """
    trends = pd.read_sql_query(query, conn, params=(department,))
    conn.close()
    return trends

Configuration

过滤与聚合

Custom Chart Styling

python
undefined
python
def analyze_by_role(db_path, role_filter):
    """按特定职位聚合绩效数据。"""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        role,
        AVG(rating) as avg_rating,
        AVG(tasks_completed) as avg_tasks,
        AVG(hours_worked) as avg_hours
    FROM employees
    WHERE role LIKE ?
    GROUP BY role
    """
    role_stats = pd.read_sql_query(query, conn, params=(f'%{role_filter}%',))
    conn.close()
    return role_stats

Set global matplotlib style

添加新指标

plt.style.use('seaborn-v0_8-darkgrid') sns.set_palette("husl")
python
def calculate_workload_balance(emp_summary):
    """计算工作负载平衡得分。"""
    emp_summary['workload_score'] = (
        emp_summary['total_tasks'] / emp_summary['total_tasks'].max() * 0.4 +
        emp_summary['total_hours'] / emp_summary['total_hours'].max() * 0.3 +
        emp_summary['avg_rating'] / 5 * 0.3
    )
    return emp_summary

Custom colors

配置说明

自定义图表样式

DEPT_COLORS = { 'Engineering': '#3498db', 'Sales': '#e74c3c', 'Finance': '#2ecc71', 'HR': '#f39c12', 'Support': '#9b59b6' }
undefined
python
undefined

Database Configuration

设置全局matplotlib样式

python
undefined
plt.style.use('seaborn-v0_8-darkgrid') sns.set_palette("husl")

For larger datasets, enable performance optimizations

自定义颜色

def optimize_database(db_path): """Apply SQLite performance settings.""" conn = sqlite3.connect(db_path) cursor = conn.cursor()
cursor.execute("PRAGMA journal_mode = WAL")
cursor.execute("PRAGMA synchronous = NORMAL")
cursor.execute("PRAGMA cache_size = 10000")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_dept ON employees(department)")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_date ON employees(date)")

conn.commit()
conn.close()
undefined
DEPT_COLORS = { 'Engineering': '#3498db', 'Sales': '#e74c3c', 'Finance': '#2ecc71', 'HR': '#f39c12', 'Support': '#9b59b6' }
undefined

Troubleshooting

数据库配置

Missing Columns Error

python
undefined
python
undefined

Validate CSV before loading

针对大型数据集,启用性能优化

required_columns = [ 'employee_id', 'name', 'department', 'role', 'date', 'tasks_completed', 'hours_worked', 'rating', 'projects', 'absences' ]
df = pd.read_csv('data/employees.csv') missing = set(required_columns) - set(df.columns) if missing: raise ValueError(f"Missing columns: {missing}")
undefined
def optimize_database(db_path): """应用SQLite性能设置。""" conn = sqlite3.connect(db_path) cursor = conn.cursor()
cursor.execute("PRAGMA journal_mode = WAL")
cursor.execute("PRAGMA synchronous = NORMAL")
cursor.execute("PRAGMA cache_size = 10000")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_dept ON employees(department)")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_date ON employees(date)")

conn.commit()
conn.close()
undefined

SQL View Not Found

故障排查

列缺失错误

python
undefined
python
undefined

Check if views exist

加载前验证CSV

def verify_views(db_path): conn = sqlite3.connect(db_path) cursor = conn.cursor() cursor.execute("SELECT name FROM sqlite_master WHERE type='view'") views = [row[0] for row in cursor.fetchall()] conn.close()
expected = ['department_kpis', 'employee_summary', 'daily_productivity']
missing = set(expected) - set(views)
if missing:
    print(f"⚠ Missing views: {missing}. Re-run queries.sql")
else:
    print("✓ All views exist")
undefined
required_columns = [ 'employee_id', 'name', 'department', 'role', 'date', 'tasks_completed', 'hours_worked', 'rating', 'projects', 'absences' ]
df = pd.read_csv('data/employees.csv') missing = set(required_columns) - set(df.columns) if missing: raise ValueError(f"缺少列: {missing}")
undefined

Division by Zero in Efficiency

SQL视图未找到

python
undefined
python
undefined

Safe efficiency calculation

检查视图是否存在

emp_summary['efficiency'] = emp_summary.apply( lambda row: row['total_tasks'] / row['total_hours'] if row['total_hours'] > 0 else None, axis=1 )
undefined
def verify_views(db_path): conn = sqlite3.connect(db_path) cursor = conn.cursor() cursor.execute("SELECT name FROM sqlite_master WHERE type='view'") views = [row[0] for row in cursor.fetchall()] conn.close()
expected = ['department_kpis', 'employee_summary', 'daily_productivity']
missing = set(expected) - set(views)
if missing:
    print(f"⚠ 缺少视图: {missing}。请重新运行queries.sql")
else:
    print("✓ 所有视图均存在")
undefined

Date Parsing Issues

效率计算中出现除零错误

python
undefined
python
undefined

Ensure proper date format

安全计算效率

df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce') df = df.dropna(subset=['date'])
undefined
emp_summary['efficiency'] = emp_summary.apply( lambda row: row['total_tasks'] / row['total_hours'] if row['total_hours'] > 0 else None, axis=1 )
undefined

Advanced Use Cases

日期解析问题

Time Series Analysis

python
def plot_monthly_trends(db_path, department):
    """Monthly productivity trends for a department."""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        strftime('%Y-%m', date) as month,
        AVG(rating) as avg_rating,
        SUM(tasks_completed) as total_tasks
    FROM employees
    WHERE department = ?
    GROUP BY month
    ORDER BY month
    """
    df = pd.read_sql_query(query, conn, params=(department,))
    conn.close()
    
    fig, ax1 = plt.subplots(figsize=(14, 7))
    ax2 = ax1.twinx()
    
    ax1.plot(df['month'], df['avg_rating'], 'b-', linewidth=2, label='Avg Rating')
    ax2.bar(df['month'], df['total_tasks'], alpha=0.3, color='gray', label='Total Tasks')
    
    ax1.set_xlabel('Month')
    ax1.set_ylabel('Average Rating', color='b')
    ax2.set_ylabel('Total Tasks', color='gray')
    plt.title(f'{department} Performance Trends')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
python
undefined

Comparative Analysis

确保日期格式正确

python
def compare_departments(dept_kpis):
    """Generate department comparison report."""
    comparison = dept_kpis[['department', 'avg_rating', 'tasks_per_hour', 'absence_rate']]
    
    # Normalize metrics
    for col in ['avg_rating', 'tasks_per_hour']:
        comparison[f'{col}_norm'] = (
            (comparison[col] - comparison[col].min()) / 
            (comparison[col].max() - comparison[col].min())
        )
    
    comparison['performance_index'] = (
        comparison['avg_rating_norm'] * 0.5 + 
        comparison['tasks_per_hour_norm'] * 0.5
    )
    
    return comparison.sort_values('performance_index', ascending=False)
This skill provides comprehensive guidance for using the Employee Performance Analytics project to build HR dashboards, analyze workforce metrics, and generate actionable insights from employee data.
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce') df = df.dropna(subset=['date'])
undefined

高级应用场景

时间序列分析

python
def plot_monthly_trends(db_path, department):
    """特定部门的月度绩效趋势。"""
    conn = sqlite3.connect(db_path)
    query = """
    SELECT 
        strftime('%Y-%m', date) as month,
        AVG(rating) as avg_rating,
        SUM(tasks_completed) as total_tasks
    FROM employees
    WHERE department = ?
    GROUP BY month
    ORDER BY month
    """
    df = pd.read_sql_query(query, conn, params=(department,))
    conn.close()
    
    fig, ax1 = plt.subplots(figsize=(14, 7))
    ax2 = ax1.twinx()
    
    ax1.plot(df['month'], df['avg_rating'], 'b-', linewidth=2, label='平均评分')
    ax2.bar(df['month'], df['total_tasks'], alpha=0.3, color='gray', label='总任务数')
    
    ax1.set_xlabel('月份')
    ax1.set_ylabel('平均评分', color='b')
    ax2.set_ylabel('总任务数', color='gray')
    plt.title(f'{department}绩效趋势')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()

对比分析

python
def compare_departments(dept_kpis):
    """生成部门对比报告。"""
    comparison = dept_kpis[['department', 'avg_rating', 'tasks_per_hour', 'absence_rate']]
    
    # 指标归一化
    for col in ['avg_rating', 'tasks_per_hour']:
        comparison[f'{col}_norm'] = (
            (comparison[col] - comparison[col].min()) / 
            (comparison[col].max() - comparison[col].min())
        )
    
    comparison['performance_index'] = (
        comparison['avg_rating_norm'] * 0.5 + 
        comparison['tasks_per_hour_norm'] * 0.5
    )
    
    return comparison.sort_values('performance_index', ascending=False)
本工具提供了全面的指导,帮助你使用员工绩效分析项目构建HR仪表板、分析劳动力指标,并从员工数据中生成可执行的洞察。