employee-performance-analytics-hr
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEmployee Performance Analytics HR Skill
员工绩效分析HR工具
Overview
概述
Employee Performance Analytics is a Python and SQL-based HR analytics tool that transforms employee data into actionable insights. It uses SQLite for KPI aggregation and pandas/matplotlib for visualization, generating departmental performance reports, efficiency metrics, and workload analysis.
The project provides an end-to-end analytics pipeline: data loading → SQL feature engineering → Python analysis → visualization exports.
员工绩效分析是一款基于Python和SQL的HR分析工具,可将员工数据转化为可执行的洞察。它使用SQLite进行KPI聚合,借助pandas/matplotlib实现可视化,生成部门绩效报告、效率指标及工作量分析结果。
该项目提供端到端的分析流程:数据加载 → SQL特征工程 → Python分析 → 可视化导出。
Installation
安装
bash
undefinedbash
undefinedClone the repository
克隆仓库
git clone https://github.com/AmirhosseinHonardoust/Employee-Performance-Analytics.git
cd Employee-Performance-Analytics
git clone https://github.com/AmirhosseinHonardoust/Employee-Performance-Analytics.git
cd Employee-Performance-Analytics
Create virtual environment
创建虚拟环境
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
python -m venv .venv
source .venv/bin/activate # Windows系统执行:.venv\Scripts\activate
Install dependencies
安装依赖
pip install -r requirements.txt
undefinedpip install -r requirements.txt
undefinedDependencies
依赖项
txt
pandas>=2.0.0
matplotlib>=3.7.0
seaborn>=0.12.0
sqlite3 # Built-in with Python
numpy>=1.24.0txt
pandas>=2.0.0
matplotlib>=3.7.0
seaborn>=0.12.0
sqlite3 # Python内置库
numpy>=1.24.0Project Structure
项目结构
employee-performance-analytics/
├── data/
│ └── employees.csv # Raw employee data
├── src/
│ ├── create_db.py # CSV to SQLite loader
│ ├── queries.sql # SQL KPI queries
│ ├── analyze_performance.py # Main analysis script
│ └── utils.py # Helper functions
└── outputs/
├── department_kpis.csv
├── performance_summary.csv
└── charts/ # Generated visualizationsemployee-performance-analytics/
├── data/
│ └── employees.csv # 原始员工数据
├── src/
│ ├── create_db.py # CSV转SQLite加载器
│ ├── queries.sql # SQL KPI查询语句
│ ├── analyze_performance.py # 主分析脚本
│ └── utils.py # 辅助函数
└── outputs/
├── department_kpis.csv
├── performance_summary.csv
└── charts/ # 生成的可视化图表Data Schema
数据结构
The project expects employee data with these columns:
| Column | Type | Description |
|---|---|---|
| int | Unique identifier |
| string | Employee name |
| string | Department (Engineering, Sales, etc.) |
| string | Job title |
| date | Record date (YYYY-MM-DD) |
| int | Daily tasks completed |
| float | Hours worked |
| float | Performance rating (1-5) |
| int | Active projects |
| int | 1 if absent, 0 otherwise |
项目所需的员工数据需包含以下列:
| 列名 | 类型 | 描述 |
|---|---|---|
| int | 唯一标识符 |
| string | 员工姓名 |
| string | 部门(如工程、销售等) |
| string | 职位名称 |
| date | 记录日期(YYYY-MM-DD) |
| int | 每日完成任务数 |
| float | 工作时长 |
| float | 绩效评分(1-5) |
| int | 参与的活跃项目数 |
| int | 缺勤标记(1表示缺勤,0表示出勤) |
Key Commands
核心命令
1. Load Data into SQLite
1. 将数据加载到SQLite
bash
python src/create_db.py --csv data/employees.csv --db hr.dbOptions:
- : Path to input CSV file
--csv - : Output SQLite database path
--db - : Table name (default:
--table)employees
bash
python src/create_db.py --csv data/employees.csv --db hr.db选项说明:
- : 输入CSV文件路径
--csv - : 输出SQLite数据库路径
--db - : 表名(默认值:
--table)employees
2. Run Performance Analysis
2. 运行绩效分析
bash
python src/analyze_performance.py --db hr.db --sql src/queries.sql --outdir outputsOptions:
- : Path to SQLite database
--db - : Path to SQL queries file
--sql - : Output directory for CSV reports and charts
--outdir
bash
python src/analyze_performance.py --db hr.db --sql src/queries.sql --outdir outputs选项说明:
- : SQLite数据库路径
--db - : SQL查询文件路径
--sql - : CSV报告和图表的输出目录
--outdir
Core SQL Queries
核心SQL查询语句
The file contains three main analytical views:
queries.sqlqueries.sqlDepartment KPIs
部门KPI
sql
-- Department-level performance metrics
CREATE VIEW IF NOT EXISTS department_kpis AS
SELECT
department,
COUNT(DISTINCT employee_id) AS employee_count,
AVG(rating) AS avg_rating,
SUM(tasks_completed) AS total_tasks,
SUM(hours_worked) AS total_hours,
ROUND(AVG(CAST(absences AS FLOAT)), 2) AS absence_rate,
ROUND(SUM(tasks_completed) * 1.0 / SUM(hours_worked), 2) AS tasks_per_hour
FROM employees
GROUP BY department
ORDER BY avg_rating DESC;sql
-- 部门级绩效指标
CREATE VIEW IF NOT EXISTS department_kpis AS
SELECT
department,
COUNT(DISTINCT employee_id) AS employee_count,
AVG(rating) AS avg_rating,
SUM(tasks_completed) AS total_tasks,
SUM(hours_worked) AS total_hours,
ROUND(AVG(CAST(absences AS FLOAT)), 2) AS absence_rate,
ROUND(SUM(tasks_completed) * 1.0 / SUM(hours_worked), 2) AS tasks_per_hour
FROM employees
GROUP BY department
ORDER BY avg_rating DESC;Employee Summary
员工绩效汇总
sql
-- Individual employee performance aggregation
CREATE VIEW IF NOT EXISTS employee_summary AS
SELECT
employee_id,
name,
department,
role,
SUM(tasks_completed) AS total_tasks,
SUM(hours_worked) AS total_hours,
AVG(rating) AS avg_rating,
COUNT(DISTINCT projects) AS project_count,
SUM(absences) AS total_absences,
ROUND(SUM(tasks_completed) * 1.0 / SUM(hours_worked), 2) AS efficiency
FROM employees
GROUP BY employee_id, name, department, role
ORDER BY efficiency DESC;sql
undefinedDaily Productivity
员工个人绩效聚合
sql
-- Day-by-day productivity tracking
CREATE VIEW IF NOT EXISTS daily_productivity AS
SELECT
date,
department,
SUM(tasks_completed) AS daily_tasks,
SUM(hours_worked) AS daily_hours,
AVG(rating) AS daily_rating
FROM employees
GROUP BY date, department
ORDER BY date, department;CREATE VIEW IF NOT EXISTS employee_summary AS
SELECT
employee_id,
name,
department,
role,
SUM(tasks_completed) AS total_tasks,
SUM(hours_worked) AS total_hours,
AVG(rating) AS avg_rating,
COUNT(DISTINCT projects) AS project_count,
SUM(absences) AS total_absences,
ROUND(SUM(tasks_completed) * 1.0 / SUM(hours_worked), 2) AS efficiency
FROM employees
GROUP BY employee_id, name, department, role
ORDER BY efficiency DESC;
undefinedPython API Usage
日常生产力
Creating Database from CSV
—
python
import pandas as pd
import sqlite3
def create_database(csv_path, db_path, table_name='employees'):
"""Load CSV into SQLite database."""
df = pd.read_csv(csv_path)
# Data validation
required_cols = ['employee_id', 'name', 'department', 'date',
'tasks_completed', 'hours_worked', 'rating']
assert all(col in df.columns for col in required_cols), "Missing required columns"
# Create database
conn = sqlite3.connect(db_path)
df.to_sql(table_name, conn, if_exists='replace', index=False)
conn.close()
print(f"✓ Database created: {db_path}")sql
undefinedUsage
逐日生产力追踪
create_database('data/employees.csv', 'hr.db')
undefinedCREATE VIEW IF NOT EXISTS daily_productivity AS
SELECT
date,
department,
SUM(tasks_completed) AS daily_tasks,
SUM(hours_worked) AS daily_hours,
AVG(rating) AS daily_rating
FROM employees
GROUP BY date, department
ORDER BY date, department;
undefinedRunning SQL Queries
Python API 使用示例
—
从CSV创建数据库
python
import sqlite3
import pandas as pd
def execute_sql_file(db_path, sql_file_path):
"""Execute SQL script and return results."""
conn = sqlite3.connect(db_path)
with open(sql_file_path, 'r') as f:
sql_script = f.read()
# Execute all statements
cursor = conn.cursor()
cursor.executescript(sql_script)
conn.commit()
# Fetch view results
dept_kpis = pd.read_sql_query("SELECT * FROM department_kpis", conn)
emp_summary = pd.read_sql_query("SELECT * FROM employee_summary", conn)
daily_prod = pd.read_sql_query("SELECT * FROM daily_productivity", conn)
conn.close()
return dept_kpis, emp_summary, daily_prodpython
import pandas as pd
import sqlite3
def create_database(csv_path, db_path, table_name='employees'):
"""将CSV加载到SQLite数据库中。"""
df = pd.read_csv(csv_path)
# 数据验证
required_cols = ['employee_id', 'name', 'department', 'date',
'tasks_completed', 'hours_worked', 'rating']
assert all(col in df.columns for col in required_cols), "缺少必填列"
# 创建数据库
conn = sqlite3.connect(db_path)
df.to_sql(table_name, conn, if_exists='replace', index=False)
conn.close()
print(f"✓ 数据库已创建:{db_path}")Generating Visualizations
使用示例
python
import matplotlib.pyplot as plt
import seaborn as sns
def plot_department_ratings(dept_kpis, output_path):
"""Bar chart of average rating by department."""
plt.figure(figsize=(12, 7))
sns.barplot(
data=dept_kpis,
x='department',
y='avg_rating',
palette='viridis'
)
plt.title('Average Performance Rating by Department', fontsize=16, weight='bold')
plt.xlabel('Department', fontsize=12)
plt.ylabel('Average Rating', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.ylim(0, 5)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig(output_path, dpi=300, bbox_inches='tight')
plt.close()
def plot_performance_vs_hours(emp_summary, output_path):
"""Scatter plot of tasks vs hours worked."""
plt.figure(figsize=(12, 7))
scatter = plt.scatter(
emp_summary['total_hours'],
emp_summary['total_tasks'],
c=emp_summary['avg_rating'],
cmap='RdYlGn',
s=100,
alpha=0.6,
edgecolors='black'
)
plt.colorbar(scatter, label='Avg Rating')
plt.title('Tasks Completed vs Hours Worked', fontsize=16, weight='bold')
plt.xlabel('Total Hours Worked', fontsize=12)
plt.ylabel('Total Tasks Completed', fontsize=12)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig(output_path, dpi=300, bbox_inches='tight')
plt.close()
def plot_efficiency_distribution(emp_summary, output_path):
"""Histogram of task completion rate."""
plt.figure(figsize=(12, 7))
plt.hist(
emp_summary['efficiency'].dropna(),
bins=30,
color='steelblue',
edgecolor='black',
alpha=0.7
)
plt.axvline(
emp_summary['efficiency'].median(),
color='red',
linestyle='--',
linewidth=2,
label=f"Median: {emp_summary['efficiency'].median():.2f}"
)
plt.title('Task Completion Rate Distribution', fontsize=16, weight='bold')
plt.xlabel('Tasks per Hour', fontsize=12)
plt.ylabel('Number of Employees', fontsize=12)
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig(output_path, dpi=300, bbox_inches='tight')
plt.close()create_database('data/employees.csv', 'hr.db')
undefinedComplete Analysis Pipeline
执行SQL查询
python
import os
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
class HRAnalyzer:
"""Complete HR analytics pipeline."""
def __init__(self, db_path, sql_path, output_dir):
self.db_path = db_path
self.sql_path = sql_path
self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
(self.output_dir / 'charts').mkdir(exist_ok=True)
def load_data(self):
"""Execute SQL and load analytical views."""
conn = sqlite3.connect(self.db_path)
# Execute SQL script
with open(self.sql_path, 'r') as f:
conn.executescript(f.read())
# Load views
self.dept_kpis = pd.read_sql_query("SELECT * FROM department_kpis", conn)
self.emp_summary = pd.read_sql_query("SELECT * FROM employee_summary", conn)
self.daily_prod = pd.read_sql_query("SELECT * FROM daily_productivity", conn)
conn.close()
print("✓ Data loaded from SQL views")
def export_reports(self):
"""Save CSV reports."""
self.dept_kpis.to_csv(
self.output_dir / 'department_kpis.csv',
index=False
)
self.emp_summary.to_csv(
self.output_dir / 'performance_summary.csv',
index=False
)
print(f"✓ Reports saved to {self.output_dir}")
def generate_visualizations(self):
"""Create all performance charts."""
charts_dir = self.output_dir / 'charts'
# Department ratings
plt.figure(figsize=(12, 7))
sns.barplot(data=self.dept_kpis, x='department', y='avg_rating', palette='viridis')
plt.title('Average Rating by Department', fontsize=16, weight='bold')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig(charts_dir / 'avg_rating_by_department.png', dpi=300)
plt.close()
# Performance vs hours
plt.figure(figsize=(12, 7))
plt.scatter(
self.emp_summary['total_hours'],
self.emp_summary['total_tasks'],
c=self.emp_summary['avg_rating'],
cmap='RdYlGn',
s=100,
alpha=0.6
)
plt.colorbar(label='Avg Rating')
plt.title('Performance vs Hours Worked', fontsize=16, weight='bold')
plt.xlabel('Total Hours')
plt.ylabel('Total Tasks')
plt.tight_layout()
plt.savefig(charts_dir / 'performance_vs_hours.png', dpi=300)
plt.close()
# Efficiency distribution
plt.figure(figsize=(12, 7))
plt.hist(self.emp_summary['efficiency'].dropna(), bins=30, color='steelblue', edgecolor='black')
plt.axvline(self.emp_summary['efficiency'].median(), color='red', linestyle='--', linewidth=2)
plt.title('Task Completion Rate Distribution', fontsize=16, weight='bold')
plt.xlabel('Tasks per Hour')
plt.ylabel('Count')
plt.tight_layout()
plt.savefig(charts_dir / 'task_completion_rate.png', dpi=300)
plt.close()
print(f"✓ Visualizations saved to {charts_dir}")
def run_full_analysis(self):
"""Execute complete analytics pipeline."""
self.load_data()
self.export_reports()
self.generate_visualizations()
print("✓ Analysis complete!")python
import sqlite3
import pandas as pd
def execute_sql_file(db_path, sql_file_path):
"""执行SQL脚本并返回结果。"""
conn = sqlite3.connect(db_path)
with open(sql_file_path, 'r') as f:
sql_script = f.read()
# 执行所有语句
cursor = conn.cursor()
cursor.executescript(sql_script)
conn.commit()
# 获取视图结果
dept_kpis = pd.read_sql_query("SELECT * FROM department_kpis", conn)
emp_summary = pd.read_sql_query("SELECT * FROM employee_summary", conn)
daily_prod = pd.read_sql_query("SELECT * FROM daily_productivity", conn)
conn.close()
return dept_kpis, emp_summary, daily_prodUsage
生成可视化图表
analyzer = HRAnalyzer(
db_path='hr.db',
sql_path='src/queries.sql',
output_dir='outputs'
)
analyzer.run_full_analysis()
undefinedpython
import matplotlib.pyplot as plt
import seaborn as sns
def plot_department_ratings(dept_kpis, output_path):
"""按部门展示平均评分的柱状图。"""
plt.figure(figsize=(12, 7))
sns.barplot(
data=dept_kpis,
x='department',
y='avg_rating',
palette='viridis'
)
plt.title('各部门平均绩效评分', fontsize=16, weight='bold')
plt.xlabel('部门', fontsize=12)
plt.ylabel('平均评分', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.ylim(0, 5)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig(output_path, dpi=300, bbox_inches='tight')
plt.close()
def plot_performance_vs_hours(emp_summary, output_path):
"""任务完成数与工作时长的散点图。"""
plt.figure(figsize=(12, 7))
scatter = plt.scatter(
emp_summary['total_hours'],
emp_summary['total_tasks'],
c=emp_summary['avg_rating'],
cmap='RdYlGn',
s=100,
alpha=0.6,
edgecolors='black'
)
plt.colorbar(scatter, label='平均评分')
plt.title('任务完成数与工作时长对比', fontsize=16, weight='bold')
plt.xlabel('总工作时长', fontsize=12)
plt.ylabel('总任务完成数', fontsize=12)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig(output_path, dpi=300, bbox_inches='tight')
plt.close()
def plot_efficiency_distribution(emp_summary, output_path):
"""任务完成率分布直方图。"""
plt.figure(figsize=(12, 7))
plt.hist(
emp_summary['efficiency'].dropna(),
bins=30,
color='steelblue',
edgecolor='black',
alpha=0.7
)
plt.axvline(
emp_summary['efficiency'].median(),
color='red',
linestyle='--',
linewidth=2,
label=f"中位数: {emp_summary['efficiency'].median():.2f}"
)
plt.title('任务完成率分布', fontsize=16, weight='bold')
plt.xlabel('每小时任务数', fontsize=12)
plt.ylabel('员工数量', fontsize=12)
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig(output_path, dpi=300, bbox_inches='tight')
plt.close()Common Patterns
完整分析流程
Custom KPI Queries
—
python
def get_top_performers(db_path, n=10):
"""Retrieve top N employees by efficiency."""
conn = sqlite3.connect(db_path)
query = """
SELECT
name,
department,
efficiency,
avg_rating
FROM employee_summary
ORDER BY efficiency DESC
LIMIT ?
"""
top_performers = pd.read_sql_query(query, conn, params=(n,))
conn.close()
return top_performers
def get_department_trends(db_path, department):
"""Get time-series data for specific department."""
conn = sqlite3.connect(db_path)
query = """
SELECT
date,
daily_tasks,
daily_hours,
daily_rating
FROM daily_productivity
WHERE department = ?
ORDER BY date
"""
trends = pd.read_sql_query(query, conn, params=(department,))
conn.close()
return trendspython
import os
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
class HRAnalyzer:
"""完整HR分析流程。"""
def __init__(self, db_path, sql_path, output_dir):
self.db_path = db_path
self.sql_path = sql_path
self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
(self.output_dir / 'charts').mkdir(exist_ok=True)
def load_data(self):
"""执行SQL并加载分析视图。"""
conn = sqlite3.connect(self.db_path)
# 执行SQL脚本
with open(self.sql_path, 'r') as f:
conn.executescript(f.read())
# 加载视图
self.dept_kpis = pd.read_sql_query("SELECT * FROM department_kpis", conn)
self.emp_summary = pd.read_sql_query("SELECT * FROM employee_summary", conn)
self.daily_prod = pd.read_sql_query("SELECT * FROM daily_productivity", conn)
conn.close()
print("✓ 已从SQL视图加载数据")
def export_reports(self):
"""保存CSV报告。"""
self.dept_kpis.to_csv(
self.output_dir / 'department_kpis.csv',
index=False
)
self.emp_summary.to_csv(
self.output_dir / 'performance_summary.csv',
index=False
)
print(f"✓ 报告已保存至 {self.output_dir}")
def generate_visualizations(self):
"""生成所有绩效图表。"""
charts_dir = self.output_dir / 'charts'
# 部门评分
plt.figure(figsize=(12, 7))
sns.barplot(data=self.dept_kpis, x='department', y='avg_rating', palette='viridis')
plt.title('各部门平均评分', fontsize=16, weight='bold')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig(charts_dir / 'avg_rating_by_department.png', dpi=300)
plt.close()
# 绩效与工作时长对比
plt.figure(figsize=(12, 7))
plt.scatter(
self.emp_summary['total_hours'],
self.emp_summary['total_tasks'],
c=self.emp_summary['avg_rating'],
cmap='RdYlGn',
s=100,
alpha=0.6
)
plt.colorbar(label='平均评分')
plt.title('绩效与工作时长对比', fontsize=16, weight='bold')
plt.xlabel('总工作时长')
plt.ylabel('总任务完成数')
plt.tight_layout()
plt.savefig(charts_dir / 'performance_vs_hours.png', dpi=300)
plt.close()
# 效率分布
plt.figure(figsize=(12, 7))
plt.hist(self.emp_summary['efficiency'].dropna(), bins=30, color='steelblue', edgecolor='black')
plt.axvline(self.emp_summary['efficiency'].median(), color='red', linestyle='--', linewidth=2)
plt.title('任务完成率分布', fontsize=16, weight='bold')
plt.xlabel('每小时任务数')
plt.ylabel('员工数量')
plt.tight_layout()
plt.savefig(charts_dir / 'task_completion_rate.png', dpi=300)
plt.close()
print(f"✓ 可视化图表已保存至 {charts_dir}")
def run_full_analysis(self):
"""执行完整分析流程。"""
self.load_data()
self.export_reports()
self.generate_visualizations()
print("✓ 分析完成!")Filtering and Aggregation
使用示例
python
def analyze_by_role(db_path, role_filter):
"""Aggregate performance by specific role."""
conn = sqlite3.connect(db_path)
query = """
SELECT
role,
AVG(rating) as avg_rating,
AVG(tasks_completed) as avg_tasks,
AVG(hours_worked) as avg_hours
FROM employees
WHERE role LIKE ?
GROUP BY role
"""
role_stats = pd.read_sql_query(query, conn, params=(f'%{role_filter}%',))
conn.close()
return role_statsanalyzer = HRAnalyzer(
db_path='hr.db',
sql_path='src/queries.sql',
output_dir='outputs'
)
analyzer.run_full_analysis()
undefinedAdding New Metrics
常见应用模式
—
自定义KPI查询
python
def calculate_workload_balance(emp_summary):
"""Calculate workload balance score."""
emp_summary['workload_score'] = (
emp_summary['total_tasks'] / emp_summary['total_tasks'].max() * 0.4 +
emp_summary['total_hours'] / emp_summary['total_hours'].max() * 0.3 +
emp_summary['avg_rating'] / 5 * 0.3
)
return emp_summarypython
def get_top_performers(db_path, n=10):
"""获取效率最高的前N名员工。"""
conn = sqlite3.connect(db_path)
query = """
SELECT
name,
department,
efficiency,
avg_rating
FROM employee_summary
ORDER BY efficiency DESC
LIMIT ?
"""
top_performers = pd.read_sql_query(query, conn, params=(n,))
conn.close()
return top_performers
def get_department_trends(db_path, department):
"""获取特定部门的时间序列数据。"""
conn = sqlite3.connect(db_path)
query = """
SELECT
date,
daily_tasks,
daily_hours,
daily_rating
FROM daily_productivity
WHERE department = ?
ORDER BY date
"""
trends = pd.read_sql_query(query, conn, params=(department,))
conn.close()
return trendsConfiguration
过滤与聚合
Custom Chart Styling
—
python
undefinedpython
def analyze_by_role(db_path, role_filter):
"""按特定职位聚合绩效数据。"""
conn = sqlite3.connect(db_path)
query = """
SELECT
role,
AVG(rating) as avg_rating,
AVG(tasks_completed) as avg_tasks,
AVG(hours_worked) as avg_hours
FROM employees
WHERE role LIKE ?
GROUP BY role
"""
role_stats = pd.read_sql_query(query, conn, params=(f'%{role_filter}%',))
conn.close()
return role_statsSet global matplotlib style
添加新指标
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
python
def calculate_workload_balance(emp_summary):
"""计算工作负载平衡得分。"""
emp_summary['workload_score'] = (
emp_summary['total_tasks'] / emp_summary['total_tasks'].max() * 0.4 +
emp_summary['total_hours'] / emp_summary['total_hours'].max() * 0.3 +
emp_summary['avg_rating'] / 5 * 0.3
)
return emp_summaryCustom colors
配置说明
—
自定义图表样式
DEPT_COLORS = {
'Engineering': '#3498db',
'Sales': '#e74c3c',
'Finance': '#2ecc71',
'HR': '#f39c12',
'Support': '#9b59b6'
}
undefinedpython
undefinedDatabase Configuration
设置全局matplotlib样式
python
undefinedplt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
For larger datasets, enable performance optimizations
自定义颜色
def optimize_database(db_path):
"""Apply SQLite performance settings."""
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute("PRAGMA journal_mode = WAL")
cursor.execute("PRAGMA synchronous = NORMAL")
cursor.execute("PRAGMA cache_size = 10000")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_dept ON employees(department)")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_date ON employees(date)")
conn.commit()
conn.close()undefinedDEPT_COLORS = {
'Engineering': '#3498db',
'Sales': '#e74c3c',
'Finance': '#2ecc71',
'HR': '#f39c12',
'Support': '#9b59b6'
}
undefinedTroubleshooting
数据库配置
Missing Columns Error
—
python
undefinedpython
undefinedValidate CSV before loading
针对大型数据集,启用性能优化
required_columns = [
'employee_id', 'name', 'department', 'role',
'date', 'tasks_completed', 'hours_worked',
'rating', 'projects', 'absences'
]
df = pd.read_csv('data/employees.csv')
missing = set(required_columns) - set(df.columns)
if missing:
raise ValueError(f"Missing columns: {missing}")
undefineddef optimize_database(db_path):
"""应用SQLite性能设置。"""
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute("PRAGMA journal_mode = WAL")
cursor.execute("PRAGMA synchronous = NORMAL")
cursor.execute("PRAGMA cache_size = 10000")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_dept ON employees(department)")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_date ON employees(date)")
conn.commit()
conn.close()undefinedSQL View Not Found
故障排查
—
列缺失错误
python
undefinedpython
undefinedCheck if views exist
加载前验证CSV
def verify_views(db_path):
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='view'")
views = [row[0] for row in cursor.fetchall()]
conn.close()
expected = ['department_kpis', 'employee_summary', 'daily_productivity']
missing = set(expected) - set(views)
if missing:
print(f"⚠ Missing views: {missing}. Re-run queries.sql")
else:
print("✓ All views exist")undefinedrequired_columns = [
'employee_id', 'name', 'department', 'role',
'date', 'tasks_completed', 'hours_worked',
'rating', 'projects', 'absences'
]
df = pd.read_csv('data/employees.csv')
missing = set(required_columns) - set(df.columns)
if missing:
raise ValueError(f"缺少列: {missing}")
undefinedDivision by Zero in Efficiency
SQL视图未找到
python
undefinedpython
undefinedSafe efficiency calculation
检查视图是否存在
emp_summary['efficiency'] = emp_summary.apply(
lambda row: row['total_tasks'] / row['total_hours']
if row['total_hours'] > 0 else None,
axis=1
)
undefineddef verify_views(db_path):
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='view'")
views = [row[0] for row in cursor.fetchall()]
conn.close()
expected = ['department_kpis', 'employee_summary', 'daily_productivity']
missing = set(expected) - set(views)
if missing:
print(f"⚠ 缺少视图: {missing}。请重新运行queries.sql")
else:
print("✓ 所有视图均存在")undefinedDate Parsing Issues
效率计算中出现除零错误
python
undefinedpython
undefinedEnsure proper date format
安全计算效率
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
df = df.dropna(subset=['date'])
undefinedemp_summary['efficiency'] = emp_summary.apply(
lambda row: row['total_tasks'] / row['total_hours']
if row['total_hours'] > 0 else None,
axis=1
)
undefinedAdvanced Use Cases
日期解析问题
Time Series Analysis
—
python
def plot_monthly_trends(db_path, department):
"""Monthly productivity trends for a department."""
conn = sqlite3.connect(db_path)
query = """
SELECT
strftime('%Y-%m', date) as month,
AVG(rating) as avg_rating,
SUM(tasks_completed) as total_tasks
FROM employees
WHERE department = ?
GROUP BY month
ORDER BY month
"""
df = pd.read_sql_query(query, conn, params=(department,))
conn.close()
fig, ax1 = plt.subplots(figsize=(14, 7))
ax2 = ax1.twinx()
ax1.plot(df['month'], df['avg_rating'], 'b-', linewidth=2, label='Avg Rating')
ax2.bar(df['month'], df['total_tasks'], alpha=0.3, color='gray', label='Total Tasks')
ax1.set_xlabel('Month')
ax1.set_ylabel('Average Rating', color='b')
ax2.set_ylabel('Total Tasks', color='gray')
plt.title(f'{department} Performance Trends')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()python
undefinedComparative Analysis
确保日期格式正确
python
def compare_departments(dept_kpis):
"""Generate department comparison report."""
comparison = dept_kpis[['department', 'avg_rating', 'tasks_per_hour', 'absence_rate']]
# Normalize metrics
for col in ['avg_rating', 'tasks_per_hour']:
comparison[f'{col}_norm'] = (
(comparison[col] - comparison[col].min()) /
(comparison[col].max() - comparison[col].min())
)
comparison['performance_index'] = (
comparison['avg_rating_norm'] * 0.5 +
comparison['tasks_per_hour_norm'] * 0.5
)
return comparison.sort_values('performance_index', ascending=False)This skill provides comprehensive guidance for using the Employee Performance Analytics project to build HR dashboards, analyze workforce metrics, and generate actionable insights from employee data.
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
df = df.dropna(subset=['date'])
undefined—
高级应用场景
—
时间序列分析
—
python
def plot_monthly_trends(db_path, department):
"""特定部门的月度绩效趋势。"""
conn = sqlite3.connect(db_path)
query = """
SELECT
strftime('%Y-%m', date) as month,
AVG(rating) as avg_rating,
SUM(tasks_completed) as total_tasks
FROM employees
WHERE department = ?
GROUP BY month
ORDER BY month
"""
df = pd.read_sql_query(query, conn, params=(department,))
conn.close()
fig, ax1 = plt.subplots(figsize=(14, 7))
ax2 = ax1.twinx()
ax1.plot(df['month'], df['avg_rating'], 'b-', linewidth=2, label='平均评分')
ax2.bar(df['month'], df['total_tasks'], alpha=0.3, color='gray', label='总任务数')
ax1.set_xlabel('月份')
ax1.set_ylabel('平均评分', color='b')
ax2.set_ylabel('总任务数', color='gray')
plt.title(f'{department}绩效趋势')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()—
对比分析
—
python
def compare_departments(dept_kpis):
"""生成部门对比报告。"""
comparison = dept_kpis[['department', 'avg_rating', 'tasks_per_hour', 'absence_rate']]
# 指标归一化
for col in ['avg_rating', 'tasks_per_hour']:
comparison[f'{col}_norm'] = (
(comparison[col] - comparison[col].min()) /
(comparison[col].max() - comparison[col].min())
)
comparison['performance_index'] = (
comparison['avg_rating_norm'] * 0.5 +
comparison['tasks_per_hour_norm'] * 0.5
)
return comparison.sort_values('performance_index', ascending=False)本工具提供了全面的指导,帮助你使用员工绩效分析项目构建HR仪表板、分析劳动力指标,并从员工数据中生成可执行的洞察。