model-deployment
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseML Model Deployment
ML模型部署
Deploy trained models to production with proper serving and monitoring.
将训练好的模型部署到生产环境,提供完善的服务与监控。
Deployment Options
部署选项
| Method | Use Case | Latency |
|---|---|---|
| REST API | Web services | Medium |
| Batch | Large-scale processing | N/A |
| Streaming | Real-time | Low |
| Edge | On-device | Very low |
| 方法 | 适用场景 | 延迟 |
|---|---|---|
| REST API | Web服务 | 中等 |
| 批量处理 | 大规模数据处理 | 无 |
| 流处理 | 实时场景 | 低 |
| 边缘部署 | 设备端运行 | 极低 |
FastAPI Model Server
FastAPI模型服务端
python
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI()
model = joblib.load('model.pkl')
class PredictionRequest(BaseModel):
features: list[float]
class PredictionResponse(BaseModel):
prediction: float
probability: float
@app.get('/health')
def health():
return {'status': 'healthy'}
@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
features = np.array(request.features).reshape(1, -1)
prediction = model.predict(features)[0]
probability = model.predict_proba(features)[0].max()
return PredictionResponse(prediction=prediction, probability=probability)python
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI()
model = joblib.load('model.pkl')
class PredictionRequest(BaseModel):
features: list[float]
class PredictionResponse(BaseModel):
prediction: float
probability: float
@app.get('/health')
def health():
return {'status': 'healthy'}
@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
features = np.array(request.features).reshape(1, -1)
prediction = model.predict(features)[0]
probability = model.predict_proba(features)[0].max()
return PredictionResponse(prediction=prediction, probability=probability)Docker Deployment
Docker部署
dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model.pkl .
COPY app.py .
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model.pkl .
COPY app.py .
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]Model Monitoring
模型监控
python
class ModelMonitor:
def __init__(self):
self.predictions = []
self.latencies = []
def log_prediction(self, input_data, prediction, latency):
self.predictions.append({
'input': input_data,
'prediction': prediction,
'latency': latency,
'timestamp': datetime.now()
})
def detect_drift(self, reference_distribution):
# Compare current predictions to reference
passpython
class ModelMonitor:
def __init__(self):
self.predictions = []
self.latencies = []
def log_prediction(self, input_data, prediction, latency):
self.predictions.append({
'input': input_data,
'prediction': prediction,
'latency': latency,
'timestamp': datetime.now()
})
def detect_drift(self, reference_distribution):
# Compare current predictions to reference
passDeployment Checklist
部署检查清单
- Model validated on test set
- API endpoints documented
- Health check endpoint
- Authentication configured
- Logging and monitoring setup
- Model versioning in place
- Rollback procedure documented
- 模型已在测试集上验证
- API端点已文档化
- 健康检查端点已配置
- 身份验证已设置
- 日志与监控已部署
- 模型版本控制已启用
- 回滚流程已文档化
Quick Start: Deploy Model in 6 Steps
快速开始:6步部署模型
bash
undefinedbash
undefined1. Save trained model
1. Save trained model
import joblib
joblib.dump(model, 'model.pkl')
import joblib
joblib.dump(model, 'model.pkl')
2. Create FastAPI app (see references/fastapi-production-server.md)
2. Create FastAPI app (see references/fastapi-production-server.md)
app.py with /predict and /health endpoints
app.py with /predict and /health endpoints
3. Create Dockerfile
3. Create Dockerfile
cat > Dockerfile << 'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py model.pkl ./
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
EOF
cat > Dockerfile << 'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py model.pkl ./
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
EOF
4. Build and test locally
4. Build and test locally
docker build -t model-api:v1.0.0 .
docker run -p 8000:8000 model-api:v1.0.0
docker build -t model-api:v1.0.0 .
docker run -p 8000:8000 model-api:v1.0.0
5. Push to registry
5. Push to registry
docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0
docker push registry.example.com/model-api:v1.0.0
docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0
docker push registry.example.com/model-api:v1.0.0
6. Deploy to Kubernetes
6. Deploy to Kubernetes
kubectl apply -f deployment.yaml
kubectl rollout status deployment/model-api
undefinedkubectl apply -f deployment.yaml
kubectl rollout status deployment/model-api
undefinedKnown Issues Prevention
常见问题预防
1. No Health Checks = Downtime
1. 无健康检查 = 服务中断
Problem: Load balancer sends traffic to unhealthy pods, causing 503 errors.
Solution: Implement both liveness and readiness probes:
python
undefined问题:负载均衡器将流量发送至不健康的Pod,导致503错误。
解决方案:同时实现存活探针和就绪探针:
python
undefinedapp.py
app.py
@app.get("/health") # Liveness: Is service alive?
async def health():
return {"status": "healthy"}
@app.get("/ready") # Readiness: Can handle traffic?
async def ready():
try:
_ = model_store.model # Verify model loaded
return {"status": "ready"}
except:
raise HTTPException(503, "Not ready")
```yaml@app.get("/health") # Liveness: Is service alive?
async def health():
return {"status": "healthy"}
@app.get("/ready") # Readiness: Can handle traffic?
async def ready():
try:
_ = model_store.model # Verify model loaded
return {"status": "ready"}
except:
raise HTTPException(503, "Not ready")
```yamldeployment.yaml
deployment.yaml
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
undefinedlivenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
undefined2. Model Not Found Errors in Container
2. 容器中模型文件未找到错误
Problem: when container starts.
FileNotFoundError: model.pklSolution: Verify model file is copied in Dockerfile and path matches:
dockerfile
undefined问题:容器启动时出现。
FileNotFoundError: model.pkl解决方案:验证Dockerfile中是否正确复制模型文件,且路径一致:
dockerfile
undefined❌ Wrong: Model in wrong directory
❌ Wrong: Model in wrong directory
COPY model.pkl /app/models/ # But code expects /app/model.pkl
COPY model.pkl /app/models/ # But code expects /app/model.pkl
✅ Correct: Consistent paths
✅ Correct: Consistent paths
COPY model.pkl /models/model.pkl
ENV MODEL_PATH=/models/model.pkl
COPY model.pkl /models/model.pkl
ENV MODEL_PATH=/models/model.pkl
In Python:
In Python:
model_path = os.getenv("MODEL_PATH", "/models/model.pkl")
undefinedmodel_path = os.getenv("MODEL_PATH", "/models/model.pkl")
undefined3. Unhandled Input Validation = 500 Errors
3. 未处理输入验证 = 500错误
Problem: Invalid inputs crash API with unhandled exceptions.
Solution: Use Pydantic for automatic validation:
python
from pydantic import BaseModel, Field, validator
class PredictionRequest(BaseModel):
features: List[float] = Field(..., min_items=1, max_items=100)
@validator('features')
def validate_finite(cls, v):
if not all(np.isfinite(val) for val in v):
raise ValueError("All features must be finite")
return v问题:无效输入导致API因未处理的异常崩溃。
解决方案:使用Pydantic实现自动验证:
python
from pydantic import BaseModel, Field, validator
class PredictionRequest(BaseModel):
features: List[float] = Field(..., min_items=1, max_items=100)
@validator('features')
def validate_finite(cls, v):
if not all(np.isfinite(val) for val in v):
raise ValueError("All features must be finite")
return vFastAPI auto-validates and returns 422 for invalid requests
FastAPI auto-validates and returns 422 for invalid requests
@app.post("/predict")
async def predict(request: PredictionRequest):
# Request is guaranteed valid here
pass
undefined@app.post("/predict")
async def predict(request: PredictionRequest):
# Request is guaranteed valid here
pass
undefined4. No Drift Monitoring = Silent Degradation
4. 无漂移监控 = 性能静默退化
Problem: Model performance degrades over time, no one notices until users complain.
Solution: Implement drift detection (see references/model-monitoring-drift.md):
python
monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)
@app.post("/predict")
async def predict(request: PredictionRequest):
prediction = model.predict(features)
monitor.log_prediction(features, prediction, latency)
# Alert if drift detected
if monitor.should_retrain():
alert_manager.send_alert("Model drift detected - retrain recommended")
return prediction问题:模型性能随时间下降,无人察觉直到用户反馈。
解决方案:实现漂移检测(详见):
references/model-monitoring-drift.mdpython
monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)
@app.post("/predict")
async def predict(request: PredictionRequest):
prediction = model.predict(features)
monitor.log_prediction(features, prediction, latency)
# Alert if drift detected
if monitor.should_retrain():
alert_manager.send_alert("Model drift detected - retrain recommended")
return prediction5. Missing Resource Limits = OOM Kills
5. 未设置资源限制 = OOM被终止
Problem: Pod killed by Kubernetes OOMKiller, service goes down.
Solution: Set memory/CPU limits and requests:
yaml
resources:
requests:
memory: "512Mi" # Guaranteed
cpu: "500m"
limits:
memory: "1Gi" # Max allowed
cpu: "1000m"问题:Pod被Kubernetes OOMKiller终止,服务中断。
解决方案:设置内存/CPU限制与请求:
yaml
resources:
requests:
memory: "512Mi" # Guaranteed
cpu: "500m"
limits:
memory: "1Gi" # Max allowed
cpu: "1000m"Monitor actual usage:
Monitor actual usage:
kubectl top pods
undefinedkubectl top pods
undefined6. No Rollback Plan = Stuck on Bad Deploy
6. 无回滚计划 = 部署故障无法恢复
Problem: New model version has bugs, no way to revert quickly.
Solution: Tag images with versions, keep previous deployment:
bash
undefined问题:新模型版本存在bug,无法快速回退。
解决方案:为镜像添加版本标签,保留之前的部署:
bash
undefinedDeploy with version tag
Deploy with version tag
kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0
kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0
If issues, rollback to previous
If issues, rollback to previous
kubectl rollout undo deployment/model-api
kubectl rollout undo deployment/model-api
Or specify version
Or specify version
kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0
undefinedkubectl set image deployment/model-api model-api=registry/model-api:v1.1.0
undefined7. Synchronous Prediction = Slow Batch Processing
7. 同步预测 = 批量处理缓慢
Problem: Processing 10,000 predictions one-by-one takes hours.
Solution: Implement batch endpoint:
python
@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
# Process all at once (vectorized)
features = np.array(request.instances)
predictions = model.predict(features) # Much faster!
return {"predictions": predictions.tolist()}问题:逐个处理10000条预测请求需要数小时。
解决方案:实现批量处理端点:
python
@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
# Process all at once (vectorized)
features = np.array(request.instances)
predictions = model.predict(features) # Much faster!
return {"predictions": predictions.tolist()}8. No CI/CD Validation = Deploy Bad Models
8. 无CI/CD验证 = 部署错误模型
Problem: Deploying model that fails basic tests, breaking production.
Solution: Validate in CI pipeline (see references/cicd-ml-models.md):
yaml
undefined问题:部署未通过基础测试的模型,导致生产环境故障。
解决方案:在CI流水线中添加验证步骤(详见):
references/cicd-ml-models.mdyaml
undefined.github/workflows/deploy.yml
.github/workflows/deploy.yml
- name: Validate model performance
run: |
python scripts/validate_model.py
--model model.pkl
--test-data test.csv
--min-accuracy 0.85 # Fail if below threshold
undefined- name: Validate model performance
run: |
python scripts/validate_model.py
--model model.pkl
--test-data test.csv
--min-accuracy 0.85 # Fail if below threshold
undefinedBest Practices
最佳实践
- Version everything: Models (semantic versioning), Docker images, deployments
- Monitor continuously: Latency, error rate, drift, resource usage
- Test before deploy: Unit tests, integration tests, performance benchmarks
- Deploy gradually: Canary (10%), then full rollout
- Plan for rollback: Keep previous version, document procedure
- Log predictions: Enable debugging and drift detection
- Set resource limits: Prevent OOM kills and resource contention
- Use health checks: Enable proper load balancing
- 版本化所有内容:模型(语义化版本)、Docker镜像、部署配置
- 持续监控:延迟、错误率、漂移、资源使用情况
- 部署前测试:单元测试、集成测试、性能基准测试
- 渐进式部署:金丝雀发布(10%流量),再全面上线
- 规划回滚方案:保留旧版本,文档化回滚流程
- 记录预测数据:便于调试与漂移检测
- 设置资源限制:防止OOM终止与资源竞争
- 使用健康检查:实现正确的负载均衡
When to Load References
何时加载参考文档
Load reference files for detailed implementations:
-
FastAPI Production Server: Loadfor complete production-ready FastAPI implementation with error handling, validation (Pydantic models), logging, health/readiness probes, batch predictions, model versioning, middleware, exception handlers, and performance optimizations (caching, async)
references/fastapi-production-server.md -
Model Monitoring & Drift: Loadfor ModelMonitor implementation with KS-test drift detection, Jensen-Shannon divergence, Prometheus metrics integration, alert configuration (Slack, email), continuous monitoring service, and dashboard endpoints
references/model-monitoring-drift.md -
Containerization & Deployment: Loadfor multi-stage Dockerfiles, model versioning in containers, Docker Compose setup, A/B testing with Nginx, Kubernetes deployments (rolling update, blue-green, canary), GitHub Actions CI/CD, and deployment checklists
references/containerization-deployment.md -
CI/CD for ML Models: Loadfor complete GitHub Actions pipeline with model validation, data validation, automated testing, security scanning, performance benchmarks, automated rollback, and deployment strategies
references/cicd-ml-models.md
加载参考文件获取详细实现方案:
-
FastAPI生产服务端:加载获取完整的生产级FastAPI实现,包括错误处理、验证(Pydantic模型)、日志、健康/就绪探针、批量预测、模型版本控制、中间件、异常处理器和性能优化(缓存、异步)
references/fastapi-production-server.md -
模型监控与漂移检测:加载获取ModelMonitor的实现,包括KS检验漂移检测、Jensen-Shannon散度、Prometheus指标集成、告警配置(Slack、邮件)、持续监控服务和仪表盘端点
references/model-monitoring-drift.md -
容器化与部署:加载获取多阶段Dockerfile、容器内模型版本控制、Docker Compose配置、基于Nginx的A/B测试、Kubernetes部署(滚动更新、蓝绿发布、金丝雀发布)、GitHub Actions CI/CD和部署检查清单
references/containerization-deployment.md -
ML模型CI/CD:加载获取完整的GitHub Actions流水线,包括模型验证、数据验证、自动化测试、安全扫描、性能基准测试、自动回滚和部署策略
references/cicd-ml-models.md