model-deployment

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ML Model Deployment

ML模型部署

Deploy trained models to production with proper serving and monitoring.
将训练好的模型部署到生产环境,提供完善的服务与监控。

Deployment Options

部署选项

MethodUse CaseLatency
REST APIWeb servicesMedium
BatchLarge-scale processingN/A
StreamingReal-timeLow
EdgeOn-deviceVery low
方法适用场景延迟
REST APIWeb服务中等
批量处理大规模数据处理
流处理实时场景
边缘部署设备端运行极低

FastAPI Model Server

FastAPI模型服务端

python
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI()
model = joblib.load('model.pkl')

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: float
    probability: float

@app.get('/health')
def health():
    return {'status': 'healthy'}

@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
    features = np.array(request.features).reshape(1, -1)
    prediction = model.predict(features)[0]
    probability = model.predict_proba(features)[0].max()
    return PredictionResponse(prediction=prediction, probability=probability)
python
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI()
model = joblib.load('model.pkl')

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: float
    probability: float

@app.get('/health')
def health():
    return {'status': 'healthy'}

@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
    features = np.array(request.features).reshape(1, -1)
    prediction = model.predict(features)[0]
    probability = model.predict_proba(features)[0].max()
    return PredictionResponse(prediction=prediction, probability=probability)

Docker Deployment

Docker部署

dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model.pkl .
COPY app.py .

EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model.pkl .
COPY app.py .

EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

Model Monitoring

模型监控

python
class ModelMonitor:
    def __init__(self):
        self.predictions = []
        self.latencies = []

    def log_prediction(self, input_data, prediction, latency):
        self.predictions.append({
            'input': input_data,
            'prediction': prediction,
            'latency': latency,
            'timestamp': datetime.now()
        })

    def detect_drift(self, reference_distribution):
        # Compare current predictions to reference
        pass
python
class ModelMonitor:
    def __init__(self):
        self.predictions = []
        self.latencies = []

    def log_prediction(self, input_data, prediction, latency):
        self.predictions.append({
            'input': input_data,
            'prediction': prediction,
            'latency': latency,
            'timestamp': datetime.now()
        })

    def detect_drift(self, reference_distribution):
        # Compare current predictions to reference
        pass

Deployment Checklist

部署检查清单

  • Model validated on test set
  • API endpoints documented
  • Health check endpoint
  • Authentication configured
  • Logging and monitoring setup
  • Model versioning in place
  • Rollback procedure documented
  • 模型已在测试集上验证
  • API端点已文档化
  • 健康检查端点已配置
  • 身份验证已设置
  • 日志与监控已部署
  • 模型版本控制已启用
  • 回滚流程已文档化

Quick Start: Deploy Model in 6 Steps

快速开始:6步部署模型

bash
undefined
bash
undefined

1. Save trained model

1. Save trained model

import joblib joblib.dump(model, 'model.pkl')
import joblib joblib.dump(model, 'model.pkl')

2. Create FastAPI app (see references/fastapi-production-server.md)

2. Create FastAPI app (see references/fastapi-production-server.md)

app.py with /predict and /health endpoints

app.py with /predict and /health endpoints

3. Create Dockerfile

3. Create Dockerfile

cat > Dockerfile << 'EOF' FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY app.py model.pkl ./ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] EOF
cat > Dockerfile << 'EOF' FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY app.py model.pkl ./ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] EOF

4. Build and test locally

4. Build and test locally

docker build -t model-api:v1.0.0 . docker run -p 8000:8000 model-api:v1.0.0
docker build -t model-api:v1.0.0 . docker run -p 8000:8000 model-api:v1.0.0

5. Push to registry

5. Push to registry

docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0 docker push registry.example.com/model-api:v1.0.0
docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0 docker push registry.example.com/model-api:v1.0.0

6. Deploy to Kubernetes

6. Deploy to Kubernetes

kubectl apply -f deployment.yaml kubectl rollout status deployment/model-api
undefined
kubectl apply -f deployment.yaml kubectl rollout status deployment/model-api
undefined

Known Issues Prevention

常见问题预防

1. No Health Checks = Downtime

1. 无健康检查 = 服务中断

Problem: Load balancer sends traffic to unhealthy pods, causing 503 errors.
Solution: Implement both liveness and readiness probes:
python
undefined
问题:负载均衡器将流量发送至不健康的Pod,导致503错误。
解决方案:同时实现存活探针和就绪探针:
python
undefined

app.py

app.py

@app.get("/health") # Liveness: Is service alive? async def health(): return {"status": "healthy"}
@app.get("/ready") # Readiness: Can handle traffic? async def ready(): try: _ = model_store.model # Verify model loaded return {"status": "ready"} except: raise HTTPException(503, "Not ready")

```yaml
@app.get("/health") # Liveness: Is service alive? async def health(): return {"status": "healthy"}
@app.get("/ready") # Readiness: Can handle traffic? async def ready(): try: _ = model_store.model # Verify model loaded return {"status": "ready"} except: raise HTTPException(503, "Not ready")

```yaml

deployment.yaml

deployment.yaml

livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 readinessProbe: httpGet: path: /ready port: 8000 initialDelaySeconds: 5
undefined
livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 readinessProbe: httpGet: path: /ready port: 8000 initialDelaySeconds: 5
undefined

2. Model Not Found Errors in Container

2. 容器中模型文件未找到错误

Problem:
FileNotFoundError: model.pkl
when container starts.
Solution: Verify model file is copied in Dockerfile and path matches:
dockerfile
undefined
问题:容器启动时出现
FileNotFoundError: model.pkl
解决方案:验证Dockerfile中是否正确复制模型文件,且路径一致:
dockerfile
undefined

❌ Wrong: Model in wrong directory

❌ Wrong: Model in wrong directory

COPY model.pkl /app/models/ # But code expects /app/model.pkl
COPY model.pkl /app/models/ # But code expects /app/model.pkl

✅ Correct: Consistent paths

✅ Correct: Consistent paths

COPY model.pkl /models/model.pkl ENV MODEL_PATH=/models/model.pkl
COPY model.pkl /models/model.pkl ENV MODEL_PATH=/models/model.pkl

In Python:

In Python:

model_path = os.getenv("MODEL_PATH", "/models/model.pkl")
undefined
model_path = os.getenv("MODEL_PATH", "/models/model.pkl")
undefined

3. Unhandled Input Validation = 500 Errors

3. 未处理输入验证 = 500错误

Problem: Invalid inputs crash API with unhandled exceptions.
Solution: Use Pydantic for automatic validation:
python
from pydantic import BaseModel, Field, validator

class PredictionRequest(BaseModel):
    features: List[float] = Field(..., min_items=1, max_items=100)

    @validator('features')
    def validate_finite(cls, v):
        if not all(np.isfinite(val) for val in v):
            raise ValueError("All features must be finite")
        return v
问题:无效输入导致API因未处理的异常崩溃。
解决方案:使用Pydantic实现自动验证:
python
from pydantic import BaseModel, Field, validator

class PredictionRequest(BaseModel):
    features: List[float] = Field(..., min_items=1, max_items=100)

    @validator('features')
    def validate_finite(cls, v):
        if not all(np.isfinite(val) for val in v):
            raise ValueError("All features must be finite")
        return v

FastAPI auto-validates and returns 422 for invalid requests

FastAPI auto-validates and returns 422 for invalid requests

@app.post("/predict") async def predict(request: PredictionRequest): # Request is guaranteed valid here pass
undefined
@app.post("/predict") async def predict(request: PredictionRequest): # Request is guaranteed valid here pass
undefined

4. No Drift Monitoring = Silent Degradation

4. 无漂移监控 = 性能静默退化

Problem: Model performance degrades over time, no one notices until users complain.
Solution: Implement drift detection (see references/model-monitoring-drift.md):
python
monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)

@app.post("/predict")
async def predict(request: PredictionRequest):
    prediction = model.predict(features)
    monitor.log_prediction(features, prediction, latency)

    # Alert if drift detected
    if monitor.should_retrain():
        alert_manager.send_alert("Model drift detected - retrain recommended")

    return prediction
问题:模型性能随时间下降,无人察觉直到用户反馈。
解决方案:实现漂移检测(详见
references/model-monitoring-drift.md
):
python
monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)

@app.post("/predict")
async def predict(request: PredictionRequest):
    prediction = model.predict(features)
    monitor.log_prediction(features, prediction, latency)

    # Alert if drift detected
    if monitor.should_retrain():
        alert_manager.send_alert("Model drift detected - retrain recommended")

    return prediction

5. Missing Resource Limits = OOM Kills

5. 未设置资源限制 = OOM被终止

Problem: Pod killed by Kubernetes OOMKiller, service goes down.
Solution: Set memory/CPU limits and requests:
yaml
resources:
  requests:
    memory: "512Mi"  # Guaranteed
    cpu: "500m"
  limits:
    memory: "1Gi"    # Max allowed
    cpu: "1000m"
问题:Pod被Kubernetes OOMKiller终止,服务中断。
解决方案:设置内存/CPU限制与请求:
yaml
resources:
  requests:
    memory: "512Mi"  # Guaranteed
    cpu: "500m"
  limits:
    memory: "1Gi"    # Max allowed
    cpu: "1000m"

Monitor actual usage:

Monitor actual usage:

kubectl top pods
undefined
kubectl top pods
undefined

6. No Rollback Plan = Stuck on Bad Deploy

6. 无回滚计划 = 部署故障无法恢复

Problem: New model version has bugs, no way to revert quickly.
Solution: Tag images with versions, keep previous deployment:
bash
undefined
问题:新模型版本存在bug,无法快速回退。
解决方案:为镜像添加版本标签,保留之前的部署:
bash
undefined

Deploy with version tag

Deploy with version tag

kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0
kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0

If issues, rollback to previous

If issues, rollback to previous

kubectl rollout undo deployment/model-api
kubectl rollout undo deployment/model-api

Or specify version

Or specify version

kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0
undefined
kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0
undefined

7. Synchronous Prediction = Slow Batch Processing

7. 同步预测 = 批量处理缓慢

Problem: Processing 10,000 predictions one-by-one takes hours.
Solution: Implement batch endpoint:
python
@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
    # Process all at once (vectorized)
    features = np.array(request.instances)
    predictions = model.predict(features)  # Much faster!
    return {"predictions": predictions.tolist()}
问题:逐个处理10000条预测请求需要数小时。
解决方案:实现批量处理端点:
python
@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
    # Process all at once (vectorized)
    features = np.array(request.instances)
    predictions = model.predict(features)  # Much faster!
    return {"predictions": predictions.tolist()}

8. No CI/CD Validation = Deploy Bad Models

8. 无CI/CD验证 = 部署错误模型

Problem: Deploying model that fails basic tests, breaking production.
Solution: Validate in CI pipeline (see references/cicd-ml-models.md):
yaml
undefined
问题:部署未通过基础测试的模型,导致生产环境故障。
解决方案:在CI流水线中添加验证步骤(详见
references/cicd-ml-models.md
):
yaml
undefined

.github/workflows/deploy.yml

.github/workflows/deploy.yml

  • name: Validate model performance run: | python scripts/validate_model.py
    --model model.pkl
    --test-data test.csv
    --min-accuracy 0.85 # Fail if below threshold
undefined
  • name: Validate model performance run: | python scripts/validate_model.py
    --model model.pkl
    --test-data test.csv
    --min-accuracy 0.85 # Fail if below threshold
undefined

Best Practices

最佳实践

  • Version everything: Models (semantic versioning), Docker images, deployments
  • Monitor continuously: Latency, error rate, drift, resource usage
  • Test before deploy: Unit tests, integration tests, performance benchmarks
  • Deploy gradually: Canary (10%), then full rollout
  • Plan for rollback: Keep previous version, document procedure
  • Log predictions: Enable debugging and drift detection
  • Set resource limits: Prevent OOM kills and resource contention
  • Use health checks: Enable proper load balancing
  • 版本化所有内容:模型(语义化版本)、Docker镜像、部署配置
  • 持续监控:延迟、错误率、漂移、资源使用情况
  • 部署前测试:单元测试、集成测试、性能基准测试
  • 渐进式部署:金丝雀发布(10%流量),再全面上线
  • 规划回滚方案:保留旧版本,文档化回滚流程
  • 记录预测数据:便于调试与漂移检测
  • 设置资源限制:防止OOM终止与资源竞争
  • 使用健康检查:实现正确的负载均衡

When to Load References

何时加载参考文档

Load reference files for detailed implementations:
  • FastAPI Production Server: Load
    references/fastapi-production-server.md
    for complete production-ready FastAPI implementation with error handling, validation (Pydantic models), logging, health/readiness probes, batch predictions, model versioning, middleware, exception handlers, and performance optimizations (caching, async)
  • Model Monitoring & Drift: Load
    references/model-monitoring-drift.md
    for ModelMonitor implementation with KS-test drift detection, Jensen-Shannon divergence, Prometheus metrics integration, alert configuration (Slack, email), continuous monitoring service, and dashboard endpoints
  • Containerization & Deployment: Load
    references/containerization-deployment.md
    for multi-stage Dockerfiles, model versioning in containers, Docker Compose setup, A/B testing with Nginx, Kubernetes deployments (rolling update, blue-green, canary), GitHub Actions CI/CD, and deployment checklists
  • CI/CD for ML Models: Load
    references/cicd-ml-models.md
    for complete GitHub Actions pipeline with model validation, data validation, automated testing, security scanning, performance benchmarks, automated rollback, and deployment strategies
加载参考文件获取详细实现方案:
  • FastAPI生产服务端:加载
    references/fastapi-production-server.md
    获取完整的生产级FastAPI实现,包括错误处理、验证(Pydantic模型)、日志、健康/就绪探针、批量预测、模型版本控制、中间件、异常处理器和性能优化(缓存、异步)
  • 模型监控与漂移检测:加载
    references/model-monitoring-drift.md
    获取ModelMonitor的实现,包括KS检验漂移检测、Jensen-Shannon散度、Prometheus指标集成、告警配置(Slack、邮件)、持续监控服务和仪表盘端点
  • 容器化与部署:加载
    references/containerization-deployment.md
    获取多阶段Dockerfile、容器内模型版本控制、Docker Compose配置、基于Nginx的A/B测试、Kubernetes部署(滚动更新、蓝绿发布、金丝雀发布)、GitHub Actions CI/CD和部署检查清单
  • ML模型CI/CD:加载
    references/cicd-ml-models.md
    获取完整的GitHub Actions流水线,包括模型验证、数据验证、自动化测试、安全扫描、性能基准测试、自动回滚和部署策略