mfg-predictive-maintenance
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePredictive Maintenance
预测性维护
Framework
框架
IRON LAW: Predictive > Preventive > Reactive (but each has its place)
Reactive (fix after failure): cheapest per-event, most expensive in downtime
Preventive (fix on schedule): prevents some failures, causes unnecessary maintenance
Predictive (fix based on condition): lowest total cost, requires sensor investment
Not ALL equipment justifies predictive maintenance. Apply to equipment where
unplanned downtime cost >> sensor investment cost.IRON LAW: Predictive > Preventive > Reactive (but each has its place)
Reactive (故障后修复): 单次事件成本最低,停机损失最高
Preventive (按计划维护): 避免部分故障,但会产生不必要的维护工作
Predictive (基于状态维护): 总成本最低,需要传感器投资
并非所有设备都适合预测性维护。仅将其应用于非计划停机成本 >> 传感器投资成本的设备。Maintenance Strategy Comparison
维护策略对比
| Strategy | When to Maintain | Advantage | Disadvantage | Best For |
|---|---|---|---|---|
| Reactive | After failure | Zero upfront cost | Max downtime, safety risk | Non-critical, cheap-to-replace equipment |
| Preventive | On schedule (time/cycles) | Predictable, simple | Over-maintenance (replacing parts that still work) | Equipment with known wear patterns |
| Predictive | Based on condition data | Minimize downtime AND maintenance cost | Requires sensors, data infrastructure, models | Critical, expensive, failure-has-cascading-effect equipment |
| 策略 | 维护时机 | 优势 | 劣势 | 适用场景 |
|---|---|---|---|---|
| 被动维护 | 故障发生后 | 无前期成本 | 停机时间最长,存在安全风险 | 非关键、更换成本低的设备 |
| 预防性维护 | 按计划(时间/周期) | 可预测、操作简单 | 过度维护(更换仍可使用的部件) | 磨损模式已知的设备 |
| 预测性维护 | 基于状态数据 | 最大限度减少停机时间和维护成本 | 需要传感器、数据基础设施和模型 | 关键、高价值、故障会引发连锁反应的设备 |
P-F Curve (Potential Failure → Functional Failure)
P-F曲线(潜在故障→功能故障)
Condition
│
│ ●─── P (Potential failure detected by sensor)
│ ╲
│ ╲ ← P-F Interval (time to act)
│ ╲
│ ● F (Functional failure — equipment stops)
│
└──────────────────── Time
The P-F interval is your window of opportunity. Detect at P, schedule
repair before F. The longer the P-F interval, the more planning time.状态
│
│ ●─── P(传感器检测到的潜在故障)
│ ╲
│ ╲ ← P-F间隔(可采取行动的时间窗口)
│ ╲
│ ● F(功能故障——设备停机)
│
└──────────────────── 时间
P-F间隔是你的机会窗口。在P点检测到故障,在F点之前安排维修。P-F间隔越长,规划时间越充足。Sensor Data Types
传感器数据类型
| Data Type | What It Detects | Equipment |
|---|---|---|
| Vibration | Bearing wear, imbalance, misalignment | Rotating machinery (motors, pumps, turbines) |
| Temperature | Overheating, friction, electrical faults | Motors, transformers, bearings |
| Current/Power | Load changes, electrical degradation | Electric motors, drives |
| Acoustic | Leaks, cavitation, micro-cracks | Pressure systems, pipes, valves |
| Oil analysis | Wear particles, contamination | Gearboxes, hydraulic systems |
| 数据类型 | 检测内容 | 适用设备 |
|---|---|---|
| 振动 | 轴承磨损、不平衡、不对中 | 旋转机械(电机、泵、涡轮机) |
| 温度 | 过热、摩擦、电气故障 | 电机、变压器、轴承 |
| 电流/功率 | 负载变化、电气性能退化 | 电机、驱动器 |
| 声学 | 泄漏、气蚀、微裂纹 | 压力系统、管道、阀门 |
| 油液分析 | 磨损颗粒、污染 | 齿轮箱、液压系统 |
ML Models for RUL (Remaining Useful Life)
用于RUL(剩余使用寿命)的ML模型
| Approach | Method | Data Required |
|---|---|---|
| Statistical | Weibull distribution, exponential degradation | Historical failure times |
| Classical ML | Random Forest, Gradient Boosting on sensor features | Labeled run-to-failure datasets |
| Deep Learning | LSTM, 1D-CNN on raw sensor time series | Large volumes of sensor data |
| Anomaly Detection | Isolation Forest, Autoencoder | Normal operation data only (no failure labels needed) |
| 方法 | 具体技术 | 所需数据 |
|---|---|---|
| 统计方法 | 威布尔分布、指数退化模型 | 历史故障时间数据 |
| 传统机器学习 | 基于传感器特征的随机森林、梯度提升算法 | 标记的全生命周期故障数据集 |
| 深度学习 | 基于原始传感器时间序列的LSTM、1D-CNN | 大量传感器数据 |
| 异常检测 | 孤立森林、自编码器 | 仅需正常运行数据(无需故障标签) |
Implementation Steps
实施步骤
Phase 1: Select Equipment (criticality analysis)
- Which equipment has highest downtime cost?
- Which has cascading failure effects?
- Prioritize: high cost × high frequency
Phase 2: Install Sensors
- Match sensor type to failure mode (see table above)
- Establish data pipeline: sensor → edge/cloud → storage
Phase 3: Build Baseline
- Collect 3-6 months of normal operation data
- Establish "healthy" patterns
Phase 4: Develop Models
- Start simple: threshold-based alerts (vibration > X = warning)
- Graduate to ML models as data accumulates
- Anomaly detection if you have few/no failure examples
Phase 5: Operationalize
- Integrate alerts into maintenance workflow (CMMS)
- Define response procedures for each alert level
- Measure: reduction in unplanned downtime, maintenance cost savings
阶段1:设备选择(关键性分析)
- 哪些设备的停机成本最高?
- 哪些设备故障会引发连锁反应?
- 优先级:高成本 × 高故障频率
阶段2:安装传感器
- 根据故障模式匹配传感器类型(见上表)
- 建立数据管道:传感器 → 边缘/云端 → 存储
阶段3:建立基准
- 收集3-6个月的正常运行数据
- 确立“健康”运行模式
阶段4:开发模型
- 从简单开始:基于阈值的警报(振动 > X = 预警)
- 随着数据积累,逐步升级到ML模型
- 如果故障样本少/无,采用异常检测
阶段5:落地运营
- 将警报集成到维护工作流(CMMS)中
- 为每个警报级别定义响应流程
- 衡量指标:非计划停机时间减少量、维护成本节约额
ROI Calculation
ROI计算
Annual Savings = (Unplanned downtime hours reduced × Downtime cost/hour)
+ (Preventive maintenance events avoided × Cost per event)
- (Sensor + infrastructure + model development cost)年度节约额 = (减少的非计划停机小时数 × 每小时停机成本)
+ (避免的预防性维护事件数 × 单次事件成本)
- (传感器 + 基础设施 + 模型开发成本)Output Format
输出格式
markdown
undefinedmarkdown
undefinedPredictive Maintenance Plan: {Equipment/Line}
预测性维护计划: {设备/生产线}
Equipment Criticality
设备关键性
| Equipment | Downtime Cost/hr | Failure Frequency | Cascading? | Priority |
|---|---|---|---|---|
| {name} | ${X} | {X/year} | Y/N | H/M/L |
| 设备 | 每小时停机成本 | 故障频率 | 是否引发连锁反应 | 优先级 |
|---|---|---|---|---|
| {名称} | ${X} | {X次/年} | 是/否 | 高/中/低 |
Sensor Plan
传感器规划
| Equipment | Failure Mode | Sensor Type | P-F Interval |
|---|---|---|---|
| {name} | {mode} | {sensor} | {est. hours/days} |
| 设备 | 故障模式 | 传感器类型 | P-F间隔 |
|---|---|---|---|
| {名称} | {模式} | {传感器} | {预估小时/天数} |
Projected ROI
预期ROI
| Metric | Before | After | Savings |
|---|---|---|---|
| Unplanned downtime | {hrs/year} | {hrs/year} | ${X}/year |
| Maintenance cost | ${X}/year | ${X}/year | ${X}/year |
| Sensor investment | — | ${X} one-time | Payback: {months} |
undefined| 指标 | 实施前 | 实施后 | 节约额 |
|---|---|---|---|
| 非计划停机时间 | {小时/年} | {小时/年} | ${X}/年 |
| 维护成本 | ${X}/年 | ${X}/年 | ${X}/年 |
| 传感器投资 | — | ${X} 一次性投入 | 回收期: {个月} |
undefinedGotchas
注意事项
- Start with vibration monitoring: It's the most mature, best-understood predictive technique. 80% of rotating equipment failures can be predicted by vibration analysis alone.
- Data quality > model complexity: A simple threshold alert on clean sensor data outperforms a sophisticated ML model on noisy, incomplete data. Fix data quality first.
- False positives kill adoption: If the model cries wolf too often, maintenance teams ignore it. Tune for high precision (few false alarms) even at the cost of some missed detections early on.
- Cultural change is harder than technology: Shifting from "run to failure" culture requires management buy-in and maintenance team training. Technology alone won't change behavior.
- 从振动监测入手:这是最成熟、最易理解的预测技术。80%的旋转设备故障可仅通过振动分析预测。
- 数据质量 > 模型复杂度:基于干净传感器数据的简单阈值警报,性能优于基于嘈杂、不完整数据的复杂ML模型。先解决数据质量问题。
- 误报会扼杀用户接受度:如果模型频繁误报,维护团队会忽略警报。初期优先调整模型以实现高精度(少误报),即使牺牲部分检测率也无妨。
- 文化变革比技术更难:从“故障后再维修”的文化转变需要管理层支持和维护团队培训。仅靠技术无法改变行为。
References
参考资料
- For sensor selection guide by equipment type, see
references/sensor-guide.md - For LSTM-based RUL model tutorial, see
references/rul-tutorial.md
- 按设备类型划分的传感器选择指南,请参阅
references/sensor-guide.md - 基于LSTM的RUL模型教程,请参阅
references/rul-tutorial.md