mfg-predictive-maintenance

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Predictive Maintenance

预测性维护

Framework

框架

IRON LAW: Predictive > Preventive > Reactive (but each has its place)

Reactive (fix after failure): cheapest per-event, most expensive in downtime
Preventive (fix on schedule): prevents some failures, causes unnecessary maintenance
Predictive (fix based on condition): lowest total cost, requires sensor investment

Not ALL equipment justifies predictive maintenance. Apply to equipment where
unplanned downtime cost >> sensor investment cost.
IRON LAW: Predictive > Preventive > Reactive (but each has its place)

Reactive (故障后修复): 单次事件成本最低,停机损失最高
Preventive (按计划维护): 避免部分故障,但会产生不必要的维护工作
Predictive (基于状态维护): 总成本最低,需要传感器投资

并非所有设备都适合预测性维护。仅将其应用于非计划停机成本 >> 传感器投资成本的设备。

Maintenance Strategy Comparison

维护策略对比

StrategyWhen to MaintainAdvantageDisadvantageBest For
ReactiveAfter failureZero upfront costMax downtime, safety riskNon-critical, cheap-to-replace equipment
PreventiveOn schedule (time/cycles)Predictable, simpleOver-maintenance (replacing parts that still work)Equipment with known wear patterns
PredictiveBased on condition dataMinimize downtime AND maintenance costRequires sensors, data infrastructure, modelsCritical, expensive, failure-has-cascading-effect equipment
策略维护时机优势劣势适用场景
被动维护故障发生后无前期成本停机时间最长,存在安全风险非关键、更换成本低的设备
预防性维护按计划(时间/周期)可预测、操作简单过度维护(更换仍可使用的部件)磨损模式已知的设备
预测性维护基于状态数据最大限度减少停机时间和维护成本需要传感器、数据基础设施和模型关键、高价值、故障会引发连锁反应的设备

P-F Curve (Potential Failure → Functional Failure)

P-F曲线(潜在故障→功能故障)

Condition
  │  ●─── P (Potential failure detected by sensor)
  │     ╲
  │      ╲  ← P-F Interval (time to act)
  │       ╲
  │        ● F (Functional failure — equipment stops)
  └──────────────────── Time

The P-F interval is your window of opportunity. Detect at P, schedule
repair before F. The longer the P-F interval, the more planning time.
状态
  │  ●─── P(传感器检测到的潜在故障)
  │     ╲
  │      ╲  ← P-F间隔(可采取行动的时间窗口)
  │       ╲
  │        ● F(功能故障——设备停机)
  └──────────────────── 时间

P-F间隔是你的机会窗口。在P点检测到故障,在F点之前安排维修。P-F间隔越长,规划时间越充足。

Sensor Data Types

传感器数据类型

Data TypeWhat It DetectsEquipment
VibrationBearing wear, imbalance, misalignmentRotating machinery (motors, pumps, turbines)
TemperatureOverheating, friction, electrical faultsMotors, transformers, bearings
Current/PowerLoad changes, electrical degradationElectric motors, drives
AcousticLeaks, cavitation, micro-cracksPressure systems, pipes, valves
Oil analysisWear particles, contaminationGearboxes, hydraulic systems
数据类型检测内容适用设备
振动轴承磨损、不平衡、不对中旋转机械(电机、泵、涡轮机)
温度过热、摩擦、电气故障电机、变压器、轴承
电流/功率负载变化、电气性能退化电机、驱动器
声学泄漏、气蚀、微裂纹压力系统、管道、阀门
油液分析磨损颗粒、污染齿轮箱、液压系统

ML Models for RUL (Remaining Useful Life)

用于RUL(剩余使用寿命)的ML模型

ApproachMethodData Required
StatisticalWeibull distribution, exponential degradationHistorical failure times
Classical MLRandom Forest, Gradient Boosting on sensor featuresLabeled run-to-failure datasets
Deep LearningLSTM, 1D-CNN on raw sensor time seriesLarge volumes of sensor data
Anomaly DetectionIsolation Forest, AutoencoderNormal operation data only (no failure labels needed)
方法具体技术所需数据
统计方法威布尔分布、指数退化模型历史故障时间数据
传统机器学习基于传感器特征的随机森林、梯度提升算法标记的全生命周期故障数据集
深度学习基于原始传感器时间序列的LSTM、1D-CNN大量传感器数据
异常检测孤立森林、自编码器仅需正常运行数据(无需故障标签)

Implementation Steps

实施步骤

Phase 1: Select Equipment (criticality analysis)
  • Which equipment has highest downtime cost?
  • Which has cascading failure effects?
  • Prioritize: high cost × high frequency
Phase 2: Install Sensors
  • Match sensor type to failure mode (see table above)
  • Establish data pipeline: sensor → edge/cloud → storage
Phase 3: Build Baseline
  • Collect 3-6 months of normal operation data
  • Establish "healthy" patterns
Phase 4: Develop Models
  • Start simple: threshold-based alerts (vibration > X = warning)
  • Graduate to ML models as data accumulates
  • Anomaly detection if you have few/no failure examples
Phase 5: Operationalize
  • Integrate alerts into maintenance workflow (CMMS)
  • Define response procedures for each alert level
  • Measure: reduction in unplanned downtime, maintenance cost savings
阶段1:设备选择(关键性分析)
  • 哪些设备的停机成本最高?
  • 哪些设备故障会引发连锁反应?
  • 优先级:高成本 × 高故障频率
阶段2:安装传感器
  • 根据故障模式匹配传感器类型(见上表)
  • 建立数据管道:传感器 → 边缘/云端 → 存储
阶段3:建立基准
  • 收集3-6个月的正常运行数据
  • 确立“健康”运行模式
阶段4:开发模型
  • 从简单开始:基于阈值的警报(振动 > X = 预警)
  • 随着数据积累,逐步升级到ML模型
  • 如果故障样本少/无,采用异常检测
阶段5:落地运营
  • 将警报集成到维护工作流(CMMS)中
  • 为每个警报级别定义响应流程
  • 衡量指标:非计划停机时间减少量、维护成本节约额

ROI Calculation

ROI计算

Annual Savings = (Unplanned downtime hours reduced × Downtime cost/hour)
               + (Preventive maintenance events avoided × Cost per event)
               - (Sensor + infrastructure + model development cost)
年度节约额 = (减少的非计划停机小时数 × 每小时停机成本)
         + (避免的预防性维护事件数 × 单次事件成本)
         - (传感器 + 基础设施 + 模型开发成本)

Output Format

输出格式

markdown
undefined
markdown
undefined

Predictive Maintenance Plan: {Equipment/Line}

预测性维护计划: {设备/生产线}

Equipment Criticality

设备关键性

EquipmentDowntime Cost/hrFailure FrequencyCascading?Priority
{name}${X}{X/year}Y/NH/M/L
设备每小时停机成本故障频率是否引发连锁反应优先级
{名称}${X}{X次/年}是/否高/中/低

Sensor Plan

传感器规划

EquipmentFailure ModeSensor TypeP-F Interval
{name}{mode}{sensor}{est. hours/days}
设备故障模式传感器类型P-F间隔
{名称}{模式}{传感器}{预估小时/天数}

Projected ROI

预期ROI

MetricBeforeAfterSavings
Unplanned downtime{hrs/year}{hrs/year}${X}/year
Maintenance cost${X}/year${X}/year${X}/year
Sensor investment${X} one-timePayback: {months}
undefined
指标实施前实施后节约额
非计划停机时间{小时/年}{小时/年}${X}/年
维护成本${X}/年${X}/年${X}/年
传感器投资${X} 一次性投入回收期: {个月}
undefined

Gotchas

注意事项

  • Start with vibration monitoring: It's the most mature, best-understood predictive technique. 80% of rotating equipment failures can be predicted by vibration analysis alone.
  • Data quality > model complexity: A simple threshold alert on clean sensor data outperforms a sophisticated ML model on noisy, incomplete data. Fix data quality first.
  • False positives kill adoption: If the model cries wolf too often, maintenance teams ignore it. Tune for high precision (few false alarms) even at the cost of some missed detections early on.
  • Cultural change is harder than technology: Shifting from "run to failure" culture requires management buy-in and maintenance team training. Technology alone won't change behavior.
  • 从振动监测入手:这是最成熟、最易理解的预测技术。80%的旋转设备故障可仅通过振动分析预测。
  • 数据质量 > 模型复杂度:基于干净传感器数据的简单阈值警报,性能优于基于嘈杂、不完整数据的复杂ML模型。先解决数据质量问题。
  • 误报会扼杀用户接受度:如果模型频繁误报,维护团队会忽略警报。初期优先调整模型以实现高精度(少误报),即使牺牲部分检测率也无妨。
  • 文化变革比技术更难:从“故障后再维修”的文化转变需要管理层支持和维护团队培训。仅靠技术无法改变行为。

References

参考资料

  • For sensor selection guide by equipment type, see
    references/sensor-guide.md
  • For LSTM-based RUL model tutorial, see
    references/rul-tutorial.md
  • 按设备类型划分的传感器选择指南,请参阅
    references/sensor-guide.md
  • 基于LSTM的RUL模型教程,请参阅
    references/rul-tutorial.md