senior-ml-engineer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Senior ML/AI Engineer

资深ML/AI工程师

World-class senior ml/ai engineer skill for production-grade AI/ML/Data systems.
世界级资深ml/ai工程师技能,用于构建生产级AI/ML/数据系统。

Quick Start

快速开始

Main Capabilities

核心功能

bash
undefined
bash
undefined

Core Tool 1

Core Tool 1

python scripts/model_deployment_pipeline.py --input data/ --output results/
python scripts/model_deployment_pipeline.py --input data/ --output results/

Core Tool 2

Core Tool 2

python scripts/rag_system_builder.py --target project/ --analyze
python scripts/rag_system_builder.py --target project/ --analyze

Core Tool 3

Core Tool 3

python scripts/ml_monitoring_suite.py --config config.yaml --deploy
undefined
python scripts/ml_monitoring_suite.py --config config.yaml --deploy
undefined

Core Expertise

核心专业能力

This skill covers world-class capabilities in:
  • Advanced production patterns and architectures
  • Scalable system design and implementation
  • Performance optimization at scale
  • MLOps and DataOps best practices
  • Real-time processing and inference
  • Distributed computing frameworks
  • Model deployment and monitoring
  • Security and compliance
  • Cost optimization
  • Team leadership and mentoring
本技能涵盖以下世界级能力:
  • 先进的生产模式与架构
  • 可扩展系统的设计与实现
  • 大规模性能优化
  • MLOps与DataOps最佳实践
  • 实时处理与推理
  • 分布式计算框架
  • 模型部署与监控
  • 安全与合规
  • 成本优化
  • 团队领导与指导

Tech Stack

技术栈

Languages: Python, SQL, R, Scala, Go ML Frameworks: PyTorch, TensorFlow, Scikit-learn, XGBoost Data Tools: Spark, Airflow, dbt, Kafka, Databricks LLM Frameworks: LangChain, LlamaIndex, DSPy Deployment: Docker, Kubernetes, AWS/GCP/Azure Monitoring: MLflow, Weights & Biases, Prometheus Databases: PostgreSQL, BigQuery, Snowflake, Pinecone
编程语言: Python, SQL, R, Scala, Go 机器学习框架: PyTorch, TensorFlow, Scikit-learn, XGBoost 数据工具: Spark, Airflow, dbt, Kafka, Databricks 大语言模型框架: LangChain, LlamaIndex, DSPy 部署工具: Docker, Kubernetes, AWS/GCP/Azure 监控工具: MLflow, Weights & Biases, Prometheus 数据库: PostgreSQL, BigQuery, Snowflake, Pinecone

Reference Documentation

参考文档

1. Mlops Production Patterns

1. MLOps生产模式

Comprehensive guide available in
references/mlops_production_patterns.md
covering:
  • Advanced patterns and best practices
  • Production implementation strategies
  • Performance optimization techniques
  • Scalability considerations
  • Security and compliance
  • Real-world case studies
完整指南位于
references/mlops_production_patterns.md
,涵盖:
  • 先进模式与最佳实践
  • 生产实施策略
  • 性能优化技术
  • 可扩展性考量
  • 安全与合规
  • 真实案例研究

2. Llm Integration Guide

2. LLM集成指南

Complete workflow documentation in
references/llm_integration_guide.md
including:
  • Step-by-step processes
  • Architecture design patterns
  • Tool integration guides
  • Performance tuning strategies
  • Troubleshooting procedures
完整工作流文档位于
references/llm_integration_guide.md
,包括:
  • 分步流程
  • 架构设计模式
  • 工具集成指南
  • 性能调优策略
  • 故障排除流程

3. Rag System Architecture

3. RAG系统架构

Technical reference guide in
references/rag_system_architecture.md
with:
  • System design principles
  • Implementation examples
  • Configuration best practices
  • Deployment strategies
  • Monitoring and observability
技术参考指南位于
references/rag_system_architecture.md
,包含:
  • 系统设计原则
  • 实现示例
  • 配置最佳实践
  • 部署策略
  • 监控与可观测性

Production Patterns

生产模式

Pattern 1: Scalable Data Processing

模式1:可扩展数据处理

Enterprise-scale data processing with distributed computing:
  • Horizontal scaling architecture
  • Fault-tolerant design
  • Real-time and batch processing
  • Data quality validation
  • Performance monitoring
企业级分布式计算数据处理:
  • 水平扩展架构
  • 容错设计
  • 实时与批处理
  • 数据质量验证
  • 性能监控

Pattern 2: ML Model Deployment

模式2:ML模型部署

Production ML system with high availability:
  • Model serving with low latency
  • A/B testing infrastructure
  • Feature store integration
  • Model monitoring and drift detection
  • Automated retraining pipelines
高可用性生产级机器学习系统:
  • 低延迟模型服务
  • A/B测试基础设施
  • 特征存储集成
  • 模型监控与漂移检测
  • 自动重训练流水线

Pattern 3: Real-Time Inference

模式3:实时推理

High-throughput inference system:
  • Batching and caching strategies
  • Load balancing
  • Auto-scaling
  • Latency optimization
  • Cost optimization
高吞吐量推理系统:
  • 批处理与缓存策略
  • 负载均衡
  • 自动扩缩容
  • 延迟优化
  • 成本优化

Best Practices

最佳实践

Development

开发阶段

  • Test-driven development
  • Code reviews and pair programming
  • Documentation as code
  • Version control everything
  • Continuous integration
  • 测试驱动开发
  • 代码审查与结对编程
  • 文档即代码
  • 版本控制所有内容
  • 持续集成

Production

生产阶段

  • Monitor everything critical
  • Automate deployments
  • Feature flags for releases
  • Canary deployments
  • Comprehensive logging
  • 监控所有关键指标
  • 自动化部署
  • 发布使用功能标志
  • 金丝雀部署
  • 全面日志记录

Team Leadership

团队领导

  • Mentor junior engineers
  • Drive technical decisions
  • Establish coding standards
  • Foster learning culture
  • Cross-functional collaboration
  • 指导初级工程师
  • 主导技术决策
  • 建立编码标准
  • 培养学习文化
  • 跨职能协作

Performance Targets

性能指标

Latency:
  • P50: < 50ms
  • P95: < 100ms
  • P99: < 200ms
Throughput:
  • Requests/second: > 1000
  • Concurrent users: > 10,000
Availability:
  • Uptime: 99.9%
  • Error rate: < 0.1%
延迟:
  • P50: < 50ms
  • P95: < 100ms
  • P99: < 200ms
吞吐量:
  • 请求/秒:> 1000
  • 并发用户数:> 10,000
可用性:
  • 正常运行时间:99.9%
  • 错误率:< 0.1%

Security & Compliance

安全与合规

  • Authentication & authorization
  • Data encryption (at rest & in transit)
  • PII handling and anonymization
  • GDPR/CCPA compliance
  • Regular security audits
  • Vulnerability management
  • 身份验证与授权
  • 数据加密(静态与传输中)
  • PII数据处理与匿名化
  • GDPR/CCPA合规
  • 定期安全审计
  • 漏洞管理

Common Commands

常用命令

bash
undefined
bash
undefined

Development

Development

python -m pytest tests/ -v --cov python -m black src/ python -m pylint src/
python -m pytest tests/ -v --cov python -m black src/ python -m pylint src/

Training

Training

python scripts/train.py --config prod.yaml python scripts/evaluate.py --model best.pth
python scripts/train.py --config prod.yaml python scripts/evaluate.py --model best.pth

Deployment

Deployment

docker build -t service:v1 . kubectl apply -f k8s/ helm upgrade service ./charts/
docker build -t service:v1 . kubectl apply -f k8s/ helm upgrade service ./charts/

Monitoring

Monitoring

kubectl logs -f deployment/service python scripts/health_check.py
undefined
kubectl logs -f deployment/service python scripts/health_check.py
undefined

Resources

资源

  • Advanced Patterns:
    references/mlops_production_patterns.md
  • Implementation Guide:
    references/llm_integration_guide.md
  • Technical Reference:
    references/rag_system_architecture.md
  • Automation Scripts:
    scripts/
    directory
  • 高级模式:
    references/mlops_production_patterns.md
  • 实施指南:
    references/llm_integration_guide.md
  • 技术参考:
    references/rag_system_architecture.md
  • 自动化脚本:
    scripts/
    目录

Senior-Level Responsibilities

资深工程师职责

As a world-class senior professional:
  1. Technical Leadership
    • Drive architectural decisions
    • Mentor team members
    • Establish best practices
    • Ensure code quality
  2. Strategic Thinking
    • Align with business goals
    • Evaluate trade-offs
    • Plan for scale
    • Manage technical debt
  3. Collaboration
    • Work across teams
    • Communicate effectively
    • Build consensus
    • Share knowledge
  4. Innovation
    • Stay current with research
    • Experiment with new approaches
    • Contribute to community
    • Drive continuous improvement
  5. Production Excellence
    • Ensure high availability
    • Monitor proactively
    • Optimize performance
    • Respond to incidents
作为世界级资深专业人员:
  1. 技术领导力
    • 主导架构决策
    • 指导团队成员
    • 建立最佳实践
    • 确保代码质量
  2. 战略思维
    • 与业务目标对齐
    • 评估权衡方案
    • 规划可扩展性
    • 管理技术债务
  3. 协作能力
    • 跨团队协作
    • 有效沟通
    • 建立共识
    • 分享知识
  4. 创新能力
    • 紧跟研究前沿
    • 尝试新方法
    • 贡献社区
    • 推动持续改进
  5. 生产卓越
    • 确保高可用性
    • 主动监控
    • 优化性能
    • 响应事件