model-deployment

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ML Model Deployment

ML模型部署

Deploy trained models to production with proper serving and monitoring.

将训练好的模型部署到生产环境，提供完善的服务与监控。

Deployment Options

部署选项

Method	Use Case	Latency
REST API	Web services	Medium
Batch	Large-scale processing	N/A
Streaming	Real-time	Low
Edge	On-device	Very low

方法	适用场景	延迟
REST API	Web服务	中等
批量处理	大规模数据处理	无
流处理	实时场景	低
边缘部署	设备端运行	极低

FastAPI Model Server

FastAPI模型服务端

python

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI()
model = joblib.load('model.pkl')

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: float
    probability: float

@app.get('/health')
def health():
    return {'status': 'healthy'}

@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
    features = np.array(request.features).reshape(1, -1)
    prediction = model.predict(features)[0]
    probability = model.predict_proba(features)[0].max()
    return PredictionResponse(prediction=prediction, probability=probability)

python

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI()
model = joblib.load('model.pkl')

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: float
    probability: float

@app.get('/health')
def health():
    return {'status': 'healthy'}

@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
    features = np.array(request.features).reshape(1, -1)
    prediction = model.predict(features)[0]
    probability = model.predict_proba(features)[0].max()
    return PredictionResponse(prediction=prediction, probability=probability)

Docker Deployment

Docker部署

dockerfile

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model.pkl .
COPY app.py .

EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

dockerfile

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model.pkl .
COPY app.py .

EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

Model Monitoring

模型监控

python

class ModelMonitor:
    def __init__(self):
        self.predictions = []
        self.latencies = []

    def log_prediction(self, input_data, prediction, latency):
        self.predictions.append({
            'input': input_data,
            'prediction': prediction,
            'latency': latency,
            'timestamp': datetime.now()
        })

    def detect_drift(self, reference_distribution):
        # Compare current predictions to reference
        pass

python

class ModelMonitor:
    def __init__(self):
        self.predictions = []
        self.latencies = []

    def log_prediction(self, input_data, prediction, latency):
        self.predictions.append({
            'input': input_data,
            'prediction': prediction,
            'latency': latency,
            'timestamp': datetime.now()
        })

    def detect_drift(self, reference_distribution):
        # Compare current predictions to reference
        pass

Deployment Checklist

部署检查清单

Quick Start: Deploy Model in 6 Steps

快速开始：6步部署模型

bash

undefined

bash

undefined

1. Save trained model

import joblib joblib.dump(model, 'model.pkl')

2. Create FastAPI app (see references/fastapi-production-server.md)

app.py with /predict and /health endpoints

3. Create Dockerfile

cat > Dockerfile << 'EOF' FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY app.py model.pkl ./ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] EOF

4. Build and test locally

docker build -t model-api:v1.0.0 . docker run -p 8000:8000 model-api:v1.0.0

5. Push to registry

docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0 docker push registry.example.com/model-api:v1.0.0

6. Deploy to Kubernetes

kubectl apply -f deployment.yaml kubectl rollout status deployment/model-api

undefined

kubectl apply -f deployment.yaml kubectl rollout status deployment/model-api

undefined

Known Issues Prevention

常见问题预防

1. No Health Checks = Downtime

1. 无健康检查 = 服务中断

Problem: Load balancer sends traffic to unhealthy pods, causing 503 errors.

Solution: Implement both liveness and readiness probes:

python

undefined

问题：负载均衡器将流量发送至不健康的Pod，导致503错误。

解决方案：同时实现存活探针和就绪探针：

python

undefined

app.py

@app.get("/health") # Liveness: Is service alive? async def health(): return {"status": "healthy"}

@app.get("/ready") # Readiness: Can handle traffic? async def ready(): try: _ = model_store.model # Verify model loaded return {"status": "ready"} except: raise HTTPException(503, "Not ready")


```yaml

@app.get("/health") # Liveness: Is service alive? async def health(): return {"status": "healthy"}

@app.get("/ready") # Readiness: Can handle traffic? async def ready(): try: _ = model_store.model # Verify model loaded return {"status": "ready"} except: raise HTTPException(503, "Not ready")


```yaml

deployment.yaml

livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 readinessProbe: httpGet: path: /ready port: 8000 initialDelaySeconds: 5

undefined

livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 readinessProbe: httpGet: path: /ready port: 8000 initialDelaySeconds: 5

undefined

2. Model Not Found Errors in Container

2. 容器中模型文件未找到错误

Problem:

FileNotFoundError: model.pkl

when container starts.

Solution: Verify model file is copied in Dockerfile and path matches:

dockerfile

undefined

问题：容器启动时出现

FileNotFoundError: model.pkl

。

解决方案：验证Dockerfile中是否正确复制模型文件，且路径一致：

dockerfile

undefined

❌ Wrong: Model in wrong directory

COPY model.pkl /app/models/ # But code expects /app/model.pkl

✅ Correct: Consistent paths

COPY model.pkl /models/model.pkl ENV MODEL_PATH=/models/model.pkl

In Python:

model_path = os.getenv("MODEL_PATH", "/models/model.pkl")

undefined

model_path = os.getenv("MODEL_PATH", "/models/model.pkl")

undefined

3. Unhandled Input Validation = 500 Errors

3. 未处理输入验证 = 500错误

Problem: Invalid inputs crash API with unhandled exceptions.

Solution: Use Pydantic for automatic validation:

python

from pydantic import BaseModel, Field, validator

class PredictionRequest(BaseModel):
    features: List[float] = Field(..., min_items=1, max_items=100)

    @validator('features')
    def validate_finite(cls, v):
        if not all(np.isfinite(val) for val in v):
            raise ValueError("All features must be finite")
        return v

问题：无效输入导致API因未处理的异常崩溃。

解决方案：使用Pydantic实现自动验证：

python

from pydantic import BaseModel, Field, validator

class PredictionRequest(BaseModel):
    features: List[float] = Field(..., min_items=1, max_items=100)

    @validator('features')
    def validate_finite(cls, v):
        if not all(np.isfinite(val) for val in v):
            raise ValueError("All features must be finite")
        return v

FastAPI auto-validates and returns 422 for invalid requests

@app.post("/predict") async def predict(request: PredictionRequest): # Request is guaranteed valid here pass

undefined

@app.post("/predict") async def predict(request: PredictionRequest): # Request is guaranteed valid here pass

undefined

4. No Drift Monitoring = Silent Degradation

4. 无漂移监控 = 性能静默退化

Problem: Model performance degrades over time, no one notices until users complain.

Solution: Implement drift detection (see references/model-monitoring-drift.md):

python

monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)

@app.post("/predict")
async def predict(request: PredictionRequest):
    prediction = model.predict(features)
    monitor.log_prediction(features, prediction, latency)

    # Alert if drift detected
    if monitor.should_retrain():
        alert_manager.send_alert("Model drift detected - retrain recommended")

    return prediction

问题：模型性能随时间下降，无人察觉直到用户反馈。

解决方案：实现漂移检测（详见

references/model-monitoring-drift.md

）：

python

monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)

@app.post("/predict")
async def predict(request: PredictionRequest):
    prediction = model.predict(features)
    monitor.log_prediction(features, prediction, latency)

    # Alert if drift detected
    if monitor.should_retrain():
        alert_manager.send_alert("Model drift detected - retrain recommended")

    return prediction

5. Missing Resource Limits = OOM Kills

5. 未设置资源限制 = OOM被终止

Problem: Pod killed by Kubernetes OOMKiller, service goes down.

Solution: Set memory/CPU limits and requests:

yaml

resources:
  requests:
    memory: "512Mi"  # Guaranteed
    cpu: "500m"
  limits:
    memory: "1Gi"    # Max allowed
    cpu: "1000m"

问题：Pod被Kubernetes OOMKiller终止，服务中断。

解决方案：设置内存/CPU限制与请求：

yaml

resources:
  requests:
    memory: "512Mi"  # Guaranteed
    cpu: "500m"
  limits:
    memory: "1Gi"    # Max allowed
    cpu: "1000m"

Monitor actual usage:

kubectl top pods

undefined

kubectl top pods

undefined

6. No Rollback Plan = Stuck on Bad Deploy

6. 无回滚计划 = 部署故障无法恢复

Problem: New model version has bugs, no way to revert quickly.

Solution: Tag images with versions, keep previous deployment:

bash

undefined

问题：新模型版本存在bug，无法快速回退。

解决方案：为镜像添加版本标签，保留之前的部署：

bash

undefined

Deploy with version tag

kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0

If issues, rollback to previous

kubectl rollout undo deployment/model-api

Or specify version

kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0

undefined

kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0

undefined

7. Synchronous Prediction = Slow Batch Processing

7. 同步预测 = 批量处理缓慢

Problem: Processing 10,000 predictions one-by-one takes hours.

Solution: Implement batch endpoint:

python

@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
    # Process all at once (vectorized)
    features = np.array(request.instances)
    predictions = model.predict(features)  # Much faster!
    return {"predictions": predictions.tolist()}

问题：逐个处理10000条预测请求需要数小时。

解决方案：实现批量处理端点：

python

@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
    # Process all at once (vectorized)
    features = np.array(request.instances)
    predictions = model.predict(features)  # Much faster!
    return {"predictions": predictions.tolist()}

8. No CI/CD Validation = Deploy Bad Models

8. 无CI/CD验证 = 部署错误模型

Problem: Deploying model that fails basic tests, breaking production.

Solution: Validate in CI pipeline (see references/cicd-ml-models.md):

yaml

undefined

问题：部署未通过基础测试的模型，导致生产环境故障。

解决方案：在CI流水线中添加验证步骤（详见

references/cicd-ml-models.md

）：

yaml

undefined

.github/workflows/deploy.yml

name: Validate model performance run: | python scripts/validate_model.py
--model model.pkl
--test-data test.csv
--min-accuracy 0.85 # Fail if below threshold

undefined

name: Validate model performance run: | python scripts/validate_model.py
--model model.pkl
--test-data test.csv
--min-accuracy 0.85 # Fail if below threshold

undefined

Best Practices

最佳实践

Version everything: Models (semantic versioning), Docker images, deployments
Monitor continuously: Latency, error rate, drift, resource usage
Test before deploy: Unit tests, integration tests, performance benchmarks
Deploy gradually: Canary (10%), then full rollout
Plan for rollback: Keep previous version, document procedure
Log predictions: Enable debugging and drift detection
Set resource limits: Prevent OOM kills and resource contention
Use health checks: Enable proper load balancing

版本化所有内容：模型（语义化版本）、Docker镜像、部署配置
持续监控：延迟、错误率、漂移、资源使用情况
部署前测试：单元测试、集成测试、性能基准测试
渐进式部署：金丝雀发布（10%流量），再全面上线
规划回滚方案：保留旧版本，文档化回滚流程
记录预测数据：便于调试与漂移检测
设置资源限制：防止OOM终止与资源竞争
使用健康检查：实现正确的负载均衡

When to Load References

何时加载参考文档

Load reference files for detailed implementations:

FastAPI Production Server: Load
```
references/fastapi-production-server.md
```
for complete production-ready FastAPI implementation with error handling, validation (Pydantic models), logging, health/readiness probes, batch predictions, model versioning, middleware, exception handlers, and performance optimizations (caching, async)
Model Monitoring & Drift: Load
```
references/model-monitoring-drift.md
```
for ModelMonitor implementation with KS-test drift detection, Jensen-Shannon divergence, Prometheus metrics integration, alert configuration (Slack, email), continuous monitoring service, and dashboard endpoints
Containerization & Deployment: Load
```
references/containerization-deployment.md
```
for multi-stage Dockerfiles, model versioning in containers, Docker Compose setup, A/B testing with Nginx, Kubernetes deployments (rolling update, blue-green, canary), GitHub Actions CI/CD, and deployment checklists
CI/CD for ML Models: Load
```
references/cicd-ml-models.md
```
for complete GitHub Actions pipeline with model validation, data validation, automated testing, security scanning, performance benchmarks, automated rollback, and deployment strategies

加载参考文件获取详细实现方案：

FastAPI生产服务端：加载
```
references/fastapi-production-server.md
```
获取完整的生产级FastAPI实现，包括错误处理、验证（Pydantic模型）、日志、健康/就绪探针、批量预测、模型版本控制、中间件、异常处理器和性能优化（缓存、异步）
模型监控与漂移检测：加载
```
references/model-monitoring-drift.md
```
获取ModelMonitor的实现，包括KS检验漂移检测、Jensen-Shannon散度、Prometheus指标集成、告警配置（Slack、邮件）、持续监控服务和仪表盘端点
容器化与部署：加载
```
references/containerization-deployment.md
```
获取多阶段Dockerfile、容器内模型版本控制、Docker Compose配置、基于Nginx的A/B测试、Kubernetes部署（滚动更新、蓝绿发布、金丝雀发布）、GitHub Actions CI/CD和部署检查清单
ML模型CI/CD：加载
```
references/cicd-ml-models.md
```
获取完整的GitHub Actions流水线，包括模型验证、数据验证、自动化测试、安全扫描、性能基准测试、自动回滚和部署策略