backend-principle-eng-python-ml-pro-max
Original:🇺🇸 English
Translated
Principal backend engineering intelligence for Python AI/ML systems. Actions: plan, design, build, implement, review, fix, optimize, refactor, debug, secure, scale ML services and pipelines. Focus: data quality, reproducibility, reliability, performance, security, observability, model evaluation, MLOps.
3installs
Added on
NPX Install
npx skill4agent add prakharmnnit/skills-and-personas backend-principle-eng-python-ml-pro-maxTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Backend Principle Eng Python ML Pro Max
Principal-level guidance for Python AI/ML backends, training pipelines, and inference services. Emphasizes data integrity, reproducibility, and production reliability.
When to Apply
- Designing or refactoring ML training or inference systems
- Reviewing ML code for data leakage, evaluation quality, and reliability
- Building feature pipelines, batch scoring, or real-time serving
- Incident response for model regressions or data drift
Priority Model (highest to lowest)
| Priority | Category | Goal | Signals |
|---|---|---|---|
| 1 | Data Quality & Leakage | Trust the data | Clean splits, lineage, leakage checks |
| 2 | Correctness & Reproducibility | Same inputs, same outputs | Versioned data, pinned deps, deterministic runs |
| 3 | Reliability & Resilience | Stable training and serving | Timeouts, retries, graceful degradation |
| 4 | Model Evaluation & Safety | Real-world performance | Offline + online eval, bias checks |
| 5 | Performance & Cost | Efficient training/inference | GPU utilization, batching, cost budgets |
| 6 | Observability & Monitoring | Fast detection | Drift, latency, error budgets |
| 7 | Security & Privacy | Protect sensitive data | Access controls, data minimization |
| 8 | Operability & MLOps | Sustainable delivery | CI/CD, model registry, rollback |
Quick Reference (Rules)
1. Data Quality & Leakage (CRITICAL)
- - Track dataset provenance and transformations
lineage - - Strict train/val/test separation with time-based splits when needed
leakage - - Feature definitions are versioned and documented
features - - Schema and distribution checks on every data ingest
validation
2. Correctness & Reproducibility (CRITICAL)
- - Data, code, and model versions are pinned
versioning - - Fixed seeds and deterministic ops where possible
determinism - - Single source of truth for hyperparameters
config - - Immutable model artifacts and metadata
artifact
3. Reliability & Resilience (CRITICAL)
- - Explicit timeouts for all external calls
timeouts - - Bounded retries with jitter
retries - - Safe fallback models or rules when inference fails
fallbacks - - Safe retries for batch scoring
idempotency
4. Model Evaluation & Safety (HIGH)
- - Metrics aligned to product goals
offline-eval - - Shadow or canary before full rollout
online-eval - - Bias and fairness checks for sensitive domains
bias - - Calibrate probabilities for decision thresholds
calibration
5. Performance & Cost (HIGH)
- - Batch inference to improve throughput
batching - - Cache features and embeddings when safe
caching - - Profile training and inference hot spots
profiling - - Define and enforce cost ceilings
cost-budgets
6. Observability & Monitoring (HIGH)
- - Monitor data and concept drift
drift - - Track P95/P99 for inference
latency - - Monitor model quality against ground truth
quality - - SLO-based alerts with runbooks
alerts
7. Security & Privacy (HIGH)
- - Least privilege for data and model artifacts
access - - Redact or tokenize sensitive fields
pii - - Use vault/KMS; never in code or logs
secrets - - Retention and deletion policies
compliance
8. Operability & MLOps (MEDIUM)
- - Model registry with lineage and approvals
registry - - Canary, blue/green, or shadow deployments
rollout - - Fast revert on regression
rollback - - Automated tests for data, training, and serving
ci-cd
Execution Workflow
- Define product goals, metrics, and safety constraints
- Validate data sources and prevent leakage
- Define features and versioned pipelines
- Train with reproducible configs and tracked artifacts
- Evaluate offline, then validate online via shadow or canary
- Deploy with monitoring for drift, latency, and quality
- Establish rollback and retraining triggers
Language-Specific Guidance
See for stack defaults, MLOps patterns, and tooling.
references/python-ml-core.md