mfg-predictive-maintenance

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Predictive Maintenance

预测性维护

Framework

框架

IRON LAW: Predictive > Preventive > Reactive (but each has its place)

Reactive (fix after failure): cheapest per-event, most expensive in downtime
Preventive (fix on schedule): prevents some failures, causes unnecessary maintenance
Predictive (fix based on condition): lowest total cost, requires sensor investment

Not ALL equipment justifies predictive maintenance. Apply to equipment where
unplanned downtime cost >> sensor investment cost.

IRON LAW: Predictive > Preventive > Reactive (but each has its place)

Reactive (故障后修复): 单次事件成本最低，停机损失最高
Preventive (按计划维护): 避免部分故障，但会产生不必要的维护工作
Predictive (基于状态维护): 总成本最低，需要传感器投资

并非所有设备都适合预测性维护。仅将其应用于非计划停机成本 >> 传感器投资成本的设备。

Maintenance Strategy Comparison

维护策略对比

Strategy	When to Maintain	Advantage	Disadvantage	Best For
Reactive	After failure	Zero upfront cost	Max downtime, safety risk	Non-critical, cheap-to-replace equipment
Preventive	On schedule (time/cycles)	Predictable, simple	Over-maintenance (replacing parts that still work)	Equipment with known wear patterns
Predictive	Based on condition data	Minimize downtime AND maintenance cost	Requires sensors, data infrastructure, models	Critical, expensive, failure-has-cascading-effect equipment

策略	维护时机	优势	劣势	适用场景
被动维护	故障发生后	无前期成本	停机时间最长，存在安全风险	非关键、更换成本低的设备
预防性维护	按计划（时间/周期）	可预测、操作简单	过度维护（更换仍可使用的部件）	磨损模式已知的设备
预测性维护	基于状态数据	最大限度减少停机时间和维护成本	需要传感器、数据基础设施和模型	关键、高价值、故障会引发连锁反应的设备

P-F Curve (Potential Failure → Functional Failure)

P-F曲线（潜在故障→功能故障）

Condition
  │
  │  ●─── P (Potential failure detected by sensor)
  │     ╲
  │      ╲  ← P-F Interval (time to act)
  │       ╲
  │        ● F (Functional failure — equipment stops)
  │
  └──────────────────── Time

The P-F interval is your window of opportunity. Detect at P, schedule
repair before F. The longer the P-F interval, the more planning time.

状态
  │
  │  ●─── P（传感器检测到的潜在故障）
  │     ╲
  │      ╲  ← P-F间隔（可采取行动的时间窗口）
  │       ╲
  │        ● F（功能故障——设备停机）
  │
  └──────────────────── 时间

P-F间隔是你的机会窗口。在P点检测到故障，在F点之前安排维修。P-F间隔越长，规划时间越充足。

Sensor Data Types

传感器数据类型

Data Type	What It Detects	Equipment
Vibration	Bearing wear, imbalance, misalignment	Rotating machinery (motors, pumps, turbines)
Temperature	Overheating, friction, electrical faults	Motors, transformers, bearings
Current/Power	Load changes, electrical degradation	Electric motors, drives
Acoustic	Leaks, cavitation, micro-cracks	Pressure systems, pipes, valves
Oil analysis	Wear particles, contamination	Gearboxes, hydraulic systems

数据类型	检测内容	适用设备
振动	轴承磨损、不平衡、不对中	旋转机械（电机、泵、涡轮机）
温度	过热、摩擦、电气故障	电机、变压器、轴承
电流/功率	负载变化、电气性能退化	电机、驱动器
声学	泄漏、气蚀、微裂纹	压力系统、管道、阀门
油液分析	磨损颗粒、污染	齿轮箱、液压系统

ML Models for RUL (Remaining Useful Life)

用于RUL（剩余使用寿命）的ML模型

Approach	Method	Data Required
Statistical	Weibull distribution, exponential degradation	Historical failure times
Classical ML	Random Forest, Gradient Boosting on sensor features	Labeled run-to-failure datasets
Deep Learning	LSTM, 1D-CNN on raw sensor time series	Large volumes of sensor data
Anomaly Detection	Isolation Forest, Autoencoder	Normal operation data only (no failure labels needed)

方法	具体技术	所需数据
统计方法	威布尔分布、指数退化模型	历史故障时间数据
传统机器学习	基于传感器特征的随机森林、梯度提升算法	标记的全生命周期故障数据集
深度学习	基于原始传感器时间序列的LSTM、1D-CNN	大量传感器数据
异常检测	孤立森林、自编码器	仅需正常运行数据（无需故障标签）

Implementation Steps

实施步骤

Phase 1: Select Equipment (criticality analysis)

Which equipment has highest downtime cost?
Which has cascading failure effects?
Prioritize: high cost × high frequency

Phase 2: Install Sensors

Match sensor type to failure mode (see table above)
Establish data pipeline: sensor → edge/cloud → storage

Phase 3: Build Baseline

Collect 3-6 months of normal operation data
Establish "healthy" patterns

Phase 4: Develop Models

Start simple: threshold-based alerts (vibration > X = warning)
Graduate to ML models as data accumulates
Anomaly detection if you have few/no failure examples

Phase 5: Operationalize

Integrate alerts into maintenance workflow (CMMS)
Define response procedures for each alert level
Measure: reduction in unplanned downtime, maintenance cost savings

阶段1：设备选择（关键性分析）

哪些设备的停机成本最高？
哪些设备故障会引发连锁反应？
优先级：高成本 × 高故障频率

阶段2：安装传感器

根据故障模式匹配传感器类型（见上表）
建立数据管道：传感器 → 边缘/云端 → 存储

阶段3：建立基准

收集3-6个月的正常运行数据
确立“健康”运行模式

阶段4：开发模型

从简单开始：基于阈值的警报（振动 > X = 预警）
随着数据积累，逐步升级到ML模型
如果故障样本少/无，采用异常检测

阶段5：落地运营

将警报集成到维护工作流（CMMS）中
为每个警报级别定义响应流程
衡量指标：非计划停机时间减少量、维护成本节约额

ROI Calculation

ROI计算

Annual Savings = (Unplanned downtime hours reduced × Downtime cost/hour)
               + (Preventive maintenance events avoided × Cost per event)
               - (Sensor + infrastructure + model development cost)

年度节约额 = （减少的非计划停机小时数 × 每小时停机成本）
         + （避免的预防性维护事件数 × 单次事件成本）
         - （传感器 + 基础设施 + 模型开发成本）

Output Format

输出格式

markdown

undefined

markdown

undefined

Predictive Maintenance Plan: {Equipment/Line}

预测性维护计划: {设备/生产线}

Equipment Criticality

设备关键性

Equipment	Downtime Cost/hr	Failure Frequency	Cascading?	Priority
{name}	${X}	{X/year}	Y/N	H/M/L

设备	每小时停机成本	故障频率	是否引发连锁反应	优先级
{名称}	${X}	{X次/年}	是/否	高/中/低

Sensor Plan

传感器规划

Equipment	Failure Mode	Sensor Type	P-F Interval
{name}	{mode}	{sensor}	{est. hours/days}

设备	故障模式	传感器类型	P-F间隔
{名称}	{模式}	{传感器}	{预估小时/天数}

Projected ROI

预期ROI

Metric	Before	After	Savings
Unplanned downtime	{hrs/year}	{hrs/year}	${X}/year
Maintenance cost	${X}/year	${X}/year	${X}/year
Sensor investment	—	${X} one-time	Payback: {months}

undefined

指标	实施前	实施后	节约额
非计划停机时间	{小时/年}	{小时/年}	${X}/年
维护成本	${X}/年	${X}/年	${X}/年
传感器投资	—	${X} 一次性投入	回收期: {个月}

undefined

Gotchas

注意事项

Start with vibration monitoring: It's the most mature, best-understood predictive technique. 80% of rotating equipment failures can be predicted by vibration analysis alone.
Data quality > model complexity: A simple threshold alert on clean sensor data outperforms a sophisticated ML model on noisy, incomplete data. Fix data quality first.
False positives kill adoption: If the model cries wolf too often, maintenance teams ignore it. Tune for high precision (few false alarms) even at the cost of some missed detections early on.
Cultural change is harder than technology: Shifting from "run to failure" culture requires management buy-in and maintenance team training. Technology alone won't change behavior.

从振动监测入手：这是最成熟、最易理解的预测技术。80%的旋转设备故障可仅通过振动分析预测。
数据质量 > 模型复杂度：基于干净传感器数据的简单阈值警报，性能优于基于嘈杂、不完整数据的复杂ML模型。先解决数据质量问题。
误报会扼杀用户接受度：如果模型频繁误报，维护团队会忽略警报。初期优先调整模型以实现高精度（少误报），即使牺牲部分检测率也无妨。
文化变革比技术更难：从“故障后再维修”的文化转变需要管理层支持和维护团队培训。仅靠技术无法改变行为。

References

参考资料

For sensor selection guide by equipment type, see
```
references/sensor-guide.md
```
For LSTM-based RUL model tutorial, see
```
references/rul-tutorial.md
```

按设备类型划分的传感器选择指南，请参阅
```
references/sensor-guide.md
```
基于LSTM的RUL模型教程，请参阅
```
references/rul-tutorial.md
```