senior-ml-engineer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSenior ML/AI Engineer
资深ML/AI工程师
World-class senior ml/ai engineer skill for production-grade AI/ML/Data systems.
世界级资深ml/ai工程师技能,用于构建生产级AI/ML/数据系统。
Quick Start
快速开始
Main Capabilities
核心功能
bash
undefinedbash
undefinedCore Tool 1
Core Tool 1
python scripts/model_deployment_pipeline.py --input data/ --output results/
python scripts/model_deployment_pipeline.py --input data/ --output results/
Core Tool 2
Core Tool 2
python scripts/rag_system_builder.py --target project/ --analyze
python scripts/rag_system_builder.py --target project/ --analyze
Core Tool 3
Core Tool 3
python scripts/ml_monitoring_suite.py --config config.yaml --deploy
undefinedpython scripts/ml_monitoring_suite.py --config config.yaml --deploy
undefinedCore Expertise
核心专业能力
This skill covers world-class capabilities in:
- Advanced production patterns and architectures
- Scalable system design and implementation
- Performance optimization at scale
- MLOps and DataOps best practices
- Real-time processing and inference
- Distributed computing frameworks
- Model deployment and monitoring
- Security and compliance
- Cost optimization
- Team leadership and mentoring
本技能涵盖以下世界级能力:
- 先进的生产模式与架构
- 可扩展系统的设计与实现
- 大规模性能优化
- MLOps与DataOps最佳实践
- 实时处理与推理
- 分布式计算框架
- 模型部署与监控
- 安全与合规
- 成本优化
- 团队领导与指导
Tech Stack
技术栈
Languages: Python, SQL, R, Scala, Go
ML Frameworks: PyTorch, TensorFlow, Scikit-learn, XGBoost
Data Tools: Spark, Airflow, dbt, Kafka, Databricks
LLM Frameworks: LangChain, LlamaIndex, DSPy
Deployment: Docker, Kubernetes, AWS/GCP/Azure
Monitoring: MLflow, Weights & Biases, Prometheus
Databases: PostgreSQL, BigQuery, Snowflake, Pinecone
编程语言: Python, SQL, R, Scala, Go
机器学习框架: PyTorch, TensorFlow, Scikit-learn, XGBoost
数据工具: Spark, Airflow, dbt, Kafka, Databricks
大语言模型框架: LangChain, LlamaIndex, DSPy
部署工具: Docker, Kubernetes, AWS/GCP/Azure
监控工具: MLflow, Weights & Biases, Prometheus
数据库: PostgreSQL, BigQuery, Snowflake, Pinecone
Reference Documentation
参考文档
1. Mlops Production Patterns
1. MLOps生产模式
Comprehensive guide available in covering:
references/mlops_production_patterns.md- Advanced patterns and best practices
- Production implementation strategies
- Performance optimization techniques
- Scalability considerations
- Security and compliance
- Real-world case studies
完整指南位于,涵盖:
references/mlops_production_patterns.md- 先进模式与最佳实践
- 生产实施策略
- 性能优化技术
- 可扩展性考量
- 安全与合规
- 真实案例研究
2. Llm Integration Guide
2. LLM集成指南
Complete workflow documentation in including:
references/llm_integration_guide.md- Step-by-step processes
- Architecture design patterns
- Tool integration guides
- Performance tuning strategies
- Troubleshooting procedures
完整工作流文档位于,包括:
references/llm_integration_guide.md- 分步流程
- 架构设计模式
- 工具集成指南
- 性能调优策略
- 故障排除流程
3. Rag System Architecture
3. RAG系统架构
Technical reference guide in with:
references/rag_system_architecture.md- System design principles
- Implementation examples
- Configuration best practices
- Deployment strategies
- Monitoring and observability
技术参考指南位于,包含:
references/rag_system_architecture.md- 系统设计原则
- 实现示例
- 配置最佳实践
- 部署策略
- 监控与可观测性
Production Patterns
生产模式
Pattern 1: Scalable Data Processing
模式1:可扩展数据处理
Enterprise-scale data processing with distributed computing:
- Horizontal scaling architecture
- Fault-tolerant design
- Real-time and batch processing
- Data quality validation
- Performance monitoring
企业级分布式计算数据处理:
- 水平扩展架构
- 容错设计
- 实时与批处理
- 数据质量验证
- 性能监控
Pattern 2: ML Model Deployment
模式2:ML模型部署
Production ML system with high availability:
- Model serving with low latency
- A/B testing infrastructure
- Feature store integration
- Model monitoring and drift detection
- Automated retraining pipelines
高可用性生产级机器学习系统:
- 低延迟模型服务
- A/B测试基础设施
- 特征存储集成
- 模型监控与漂移检测
- 自动重训练流水线
Pattern 3: Real-Time Inference
模式3:实时推理
High-throughput inference system:
- Batching and caching strategies
- Load balancing
- Auto-scaling
- Latency optimization
- Cost optimization
高吞吐量推理系统:
- 批处理与缓存策略
- 负载均衡
- 自动扩缩容
- 延迟优化
- 成本优化
Best Practices
最佳实践
Development
开发阶段
- Test-driven development
- Code reviews and pair programming
- Documentation as code
- Version control everything
- Continuous integration
- 测试驱动开发
- 代码审查与结对编程
- 文档即代码
- 版本控制所有内容
- 持续集成
Production
生产阶段
- Monitor everything critical
- Automate deployments
- Feature flags for releases
- Canary deployments
- Comprehensive logging
- 监控所有关键指标
- 自动化部署
- 发布使用功能标志
- 金丝雀部署
- 全面日志记录
Team Leadership
团队领导
- Mentor junior engineers
- Drive technical decisions
- Establish coding standards
- Foster learning culture
- Cross-functional collaboration
- 指导初级工程师
- 主导技术决策
- 建立编码标准
- 培养学习文化
- 跨职能协作
Performance Targets
性能指标
Latency:
- P50: < 50ms
- P95: < 100ms
- P99: < 200ms
Throughput:
- Requests/second: > 1000
- Concurrent users: > 10,000
Availability:
- Uptime: 99.9%
- Error rate: < 0.1%
延迟:
- P50: < 50ms
- P95: < 100ms
- P99: < 200ms
吞吐量:
- 请求/秒:> 1000
- 并发用户数:> 10,000
可用性:
- 正常运行时间:99.9%
- 错误率:< 0.1%
Security & Compliance
安全与合规
- Authentication & authorization
- Data encryption (at rest & in transit)
- PII handling and anonymization
- GDPR/CCPA compliance
- Regular security audits
- Vulnerability management
- 身份验证与授权
- 数据加密(静态与传输中)
- PII数据处理与匿名化
- GDPR/CCPA合规
- 定期安全审计
- 漏洞管理
Common Commands
常用命令
bash
undefinedbash
undefinedDevelopment
Development
python -m pytest tests/ -v --cov
python -m black src/
python -m pylint src/
python -m pytest tests/ -v --cov
python -m black src/
python -m pylint src/
Training
Training
python scripts/train.py --config prod.yaml
python scripts/evaluate.py --model best.pth
python scripts/train.py --config prod.yaml
python scripts/evaluate.py --model best.pth
Deployment
Deployment
docker build -t service:v1 .
kubectl apply -f k8s/
helm upgrade service ./charts/
docker build -t service:v1 .
kubectl apply -f k8s/
helm upgrade service ./charts/
Monitoring
Monitoring
kubectl logs -f deployment/service
python scripts/health_check.py
undefinedkubectl logs -f deployment/service
python scripts/health_check.py
undefinedResources
资源
- Advanced Patterns:
references/mlops_production_patterns.md - Implementation Guide:
references/llm_integration_guide.md - Technical Reference:
references/rag_system_architecture.md - Automation Scripts: directory
scripts/
- 高级模式:
references/mlops_production_patterns.md - 实施指南:
references/llm_integration_guide.md - 技术参考:
references/rag_system_architecture.md - 自动化脚本:目录
scripts/
Senior-Level Responsibilities
资深工程师职责
As a world-class senior professional:
-
Technical Leadership
- Drive architectural decisions
- Mentor team members
- Establish best practices
- Ensure code quality
-
Strategic Thinking
- Align with business goals
- Evaluate trade-offs
- Plan for scale
- Manage technical debt
-
Collaboration
- Work across teams
- Communicate effectively
- Build consensus
- Share knowledge
-
Innovation
- Stay current with research
- Experiment with new approaches
- Contribute to community
- Drive continuous improvement
-
Production Excellence
- Ensure high availability
- Monitor proactively
- Optimize performance
- Respond to incidents
作为世界级资深专业人员:
-
技术领导力
- 主导架构决策
- 指导团队成员
- 建立最佳实践
- 确保代码质量
-
战略思维
- 与业务目标对齐
- 评估权衡方案
- 规划可扩展性
- 管理技术债务
-
协作能力
- 跨团队协作
- 有效沟通
- 建立共识
- 分享知识
-
创新能力
- 紧跟研究前沿
- 尝试新方法
- 贡献社区
- 推动持续改进
-
生产卓越
- 确保高可用性
- 主动监控
- 优化性能
- 响应事件