Loading...
Loading...
Compare original and translation side by side
| Task | Tool/Framework | Command | When to Use |
|---|---|---|---|
| EDA & Profiling | Pandas, Great Expectations | | Initial data exploration and quality checks |
| Feature Engineering | Pandas, Polars, Feature Stores | | Creating lag, rolling, categorical features |
| Model Training | Gradient boosting, linear models, scikit-learn | | Strong baselines for tabular ML |
| Hyperparameter Tuning | Optuna, Ray Tune | | Optimizing model parameters |
| SQL Transformation | SQLMesh | | Building staging/intermediate/marts layers |
| Experiment Tracking | MLflow, W&B | | Versioning experiments and models |
| Model Evaluation | scikit-learn, custom metrics | | Validating model performance |
| 任务 | 工具/框架 | 命令 | 使用场景 |
|---|---|---|---|
| EDA与数据剖析 | Pandas, Great Expectations | | 初始数据探索与质量检查 |
| 特征工程 | Pandas, Polars, 特征存储 | | 创建滞后、滚动、分类特征 |
| 模型训练 | 梯度提升、线性模型、scikit-learn | | 表格型机器学习的优质基准模型 |
| 超参数调优 | Optuna, Ray Tune | | 优化模型参数 |
| SQL转换 | SQLMesh | | 构建分层数据(staging/intermediate/marts层) |
| 实验跟踪 | MLflow, W&B | | 实验与模型的版本控制 |
| 模型评估 | scikit-learn, 自定义指标 | | 验证模型性能 |
User needs ML for: [Problem Type]
- Tabular data?
- Small-medium (<1M rows)? -> LightGBM (fast, efficient)
- Large and complex (>1M rows)? -> LightGBM first, then NN if needed
- High-dim sparse (text, counts)? -> Linear models, then shallow NN
- Time series?
- Seasonality? -> LightGBM, then see ai-ml-timeseries
- Long-term dependencies? -> Transformers (see ai-ml-timeseries)
- Text or mixed modalities?
- LLMs/Transformers -> See ai-llm
- SQL transformations?
- SQLMesh (staging/intermediate/marts layers)用户的机器学习需求:[问题类型]
- 表格型数据?
- 中小型数据集(<100万行)? -> LightGBM(快速、高效)
- 大型复杂数据集(>100万行)? -> 优先使用LightGBM,必要时再尝试神经网络
- 高维稀疏数据(文本、计数型)? -> 线性模型,之后尝试浅层神经网络
- 时间序列数据?
- 存在季节性? -> LightGBM,之后参考ai-ml-timeseries
- 存在长期依赖? -> Transformers(参考ai-ml-timeseries)
- 文本或多模态数据?
- LLMs/Transformers -> 参考ai-llm
- 需要SQL转换?
- SQLMesh(构建staging/intermediate/marts层)assets/project/template-standard.mdassets/project/template-quick.mdassets/project/template-standard.mdassets/project/template-quick.mdassets/features/template-feature-engineering.mdassets/eda/template-eda.mdassets/features/template-feature-engineering.mdassets/eda/template-eda.mdassets/evaluation/template-evaluation-report.mdassets/evaluation/template-model-card.mdassets/review/experiment-review-template.mdassets/evaluation/template-evaluation-report.mdassets/evaluation/template-model-card.mdassets/review/experiment-review-template.md../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-project.md../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-model.md../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-incremental.md../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-dag.md../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-testing.md../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-project.md../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-model.md../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-incremental.md../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-dag.md../data-lake-platform/assets/transformation/sqlmesh/template-sqlmesh-testing.md