ml-engineering

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<!-- Adapted from: claude-skills/engineering-team/senior-ml-engineer -->
<!-- 改编自:claude-skills/engineering-team/senior-ml-engineer -->

ML Engineering Guide

ML工程指南

Production-grade ML/AI systems, MLOps, and model deployment.
生产级ML/AI系统、MLOps与模型部署。

When to Use

适用场景

  • Deploying ML models to production
  • Building ML platforms and infrastructure
  • Implementing MLOps pipelines
  • Integrating LLMs into production systems
  • Setting up model monitoring and drift detection
  • 将ML模型部署到生产环境
  • 构建ML平台与基础设施
  • 实现MLOps流水线
  • 将LLM集成到生产系统
  • 设置模型监控与漂移检测

Tech Stack

技术栈

CategoryTools
ML FrameworksPyTorch, TensorFlow, Scikit-learn, XGBoost
LLM FrameworksLangChain, LlamaIndex, DSPy
Data ToolsSpark, Airflow, dbt, Kafka, Databricks
DeploymentDocker, Kubernetes, AWS/GCP/Azure
MonitoringMLflow, Weights & Biases, Prometheus
DatabasesPostgreSQL, BigQuery, Snowflake, Pinecone
类别工具
ML框架PyTorch, TensorFlow, Scikit-learn, XGBoost
LLM框架LangChain, LlamaIndex, DSPy
数据工具Spark, Airflow, dbt, Kafka, Databricks
部署工具Docker, Kubernetes, AWS/GCP/Azure
监控工具MLflow, Weights & Biases, Prometheus
数据库PostgreSQL, BigQuery, Snowflake, Pinecone

Production Patterns

生产模式

Model Deployment Pipeline

模型部署流水线

python
undefined
python
undefined

Model serving with FastAPI

Model serving with FastAPI

from fastapi import FastAPI import torch
app = FastAPI() model = torch.load("model.pth")
@app.post("/predict") async def predict(data: dict): tensor = preprocess(data) with torch.no_grad(): prediction = model(tensor) return {"prediction": prediction.tolist()}
undefined
from fastapi import FastAPI import torch
app = FastAPI() model = torch.load("model.pth")
@app.post("/predict") async def predict(data: dict): tensor = preprocess(data) with torch.no_grad(): prediction = model(tensor) return {"prediction": prediction.tolist()}
undefined

Feature Store Integration

特征存储集成

python
undefined
python
undefined

Feast feature store

Feast feature store

from feast import FeatureStore
store = FeatureStore(repo_path=".") features = store.get_online_features( features=["user_features:age", "user_features:location"], entity_rows=[{"user_id": 123}] ).to_dict()
undefined
from feast import FeatureStore
store = FeatureStore(repo_path=".") features = store.get_online_features( features=["user_features:age", "user_features:location"], entity_rows=[{"user_id": 123}] ).to_dict()
undefined

Model Monitoring

模型监控

python
undefined
python
undefined

Drift detection

Drift detection

from evidently import ColumnMapping from evidently.report import Report from evidently.metric_preset import DataDriftPreset
report = Report(metrics=[DataDriftPreset()]) report.run(reference_data=ref_df, current_data=curr_df)
undefined
from evidently import ColumnMapping from evidently.report import Report from evidently.metric_preset import DataDriftPreset
report = Report(metrics=[DataDriftPreset()]) report.run(reference_data=ref_df, current_data=curr_df)
undefined

MLOps Best Practices

MLOps最佳实践

Development

开发阶段

  • Test-driven development for ML pipelines
  • Version control models and data
  • Reproducible experiments with MLflow
  • ML流水线的测试驱动开发
  • 版本控制模型与数据
  • 使用MLflow实现可复现的实验

Production

生产阶段

  • A/B testing infrastructure
  • Canary deployments for models
  • Automated retraining pipelines
  • Model monitoring and drift detection
  • A/B测试基础设施
  • 模型的金丝雀部署
  • 自动化重训练流水线
  • 模型监控与漂移检测

Performance Targets

性能指标目标

MetricTarget
P50 Latency< 50ms
P95 Latency< 100ms
P99 Latency< 200ms
Throughput> 1000 RPS
Availability99.9%
指标目标
P50延迟< 50毫秒
P95延迟< 100毫秒
P99延迟< 200毫秒
吞吐量> 1000 RPS
可用性99.9%

LLM Integration Patterns

LLM集成模式

RAG System

RAG系统

python
undefined
python
undefined

Basic RAG with LangChain

Basic RAG with LangChain

from langchain.vectorstores import Pinecone from langchain.embeddings import OpenAIEmbeddings from langchain.chains import RetrievalQA
vectorstore = Pinecone.from_existing_index( index_name="docs", embedding=OpenAIEmbeddings() ) qa = RetrievalQA.from_chain_type( llm=llm, retriever=vectorstore.as_retriever() )
undefined
from langchain.vectorstores import Pinecone from langchain.embeddings import OpenAIEmbeddings from langchain.chains import RetrievalQA
vectorstore = Pinecone.from_existing_index( index_name="docs", embedding=OpenAIEmbeddings() ) qa = RetrievalQA.from_chain_type( llm=llm, retriever=vectorstore.as_retriever() )
undefined

Prompt Management

提示词管理

python
undefined
python
undefined

Structured prompts with DSPy

Structured prompts with DSPy

import dspy
class QA(dspy.Signature): """Answer questions based on context.""" context = dspy.InputField() question = dspy.InputField() answer = dspy.OutputField()
qa = dspy.Predict(QA)
undefined
import dspy
class QA(dspy.Signature): """Answer questions based on context.""" context = dspy.InputField() question = dspy.InputField() answer = dspy.OutputField()
qa = dspy.Predict(QA)
undefined

Common Commands

常用命令

bash
undefined
bash
undefined

Development

Development

python -m pytest tests/ -v --cov python -m black src/ python -m pylint src/
python -m pytest tests/ -v --cov python -m black src/ python -m pylint src/

Training

Training

python scripts/train.py --config prod.yaml mlflow run . -P epochs=10
python scripts/train.py --config prod.yaml mlflow run . -P epochs=10

Deployment

Deployment

docker build -t model:v1 . kubectl apply -f k8s/model-serving.yaml
docker build -t model:v1 . kubectl apply -f k8s/model-serving.yaml

Monitoring

Monitoring

mlflow ui --port 5000
undefined
mlflow ui --port 5000
undefined

Security & Compliance

安全与合规

  • Authentication for model endpoints
  • Data encryption (at rest & in transit)
  • PII handling and anonymization
  • GDPR/CCPA compliance
  • Model access audit logging
  • 模型端点的身份验证
  • 数据加密(静态存储与传输过程中)
  • PII数据处理与匿名化
  • GDPR/CCPA合规
  • 模型访问审计日志