mlflow

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MLflow: ML Lifecycle Management Platform

MLflow:机器学习生命周期管理平台

When to Use This Skill

何时使用该工具

Use MLflow when you need to:
  • Track ML experiments with parameters, metrics, and artifacts
  • Manage model registry with versioning and stage transitions
  • Deploy models to various platforms (local, cloud, serving)
  • Reproduce experiments with project configurations
  • Compare model versions and performance metrics
  • Collaborate on ML projects with team workflows
  • Integrate with any ML framework (framework-agnostic)
Users: 20,000+ organizations | GitHub Stars: 23k+ | License: Apache 2.0
当你需要以下功能时,使用MLflow:
  • 跟踪机器学习实验:记录参数、指标和产物
  • 管理模型注册表:支持版本控制和阶段转换
  • 部署模型:可部署到多种平台(本地、云端、服务端)
  • 复现实验:通过项目配置复现实验结果
  • 对比模型版本:比较不同版本的性能指标
  • 团队协作:在机器学习项目中实现团队协作流程
  • 框架兼容:可与任意机器学习框架集成(与框架无关)
用户规模:20,000+ 组织机构 | GitHub星标:23k+ | 许可证:Apache 2.0

Installation

安装

bash
undefined
bash
undefined

Install MLflow

Install MLflow

pip install mlflow
pip install mlflow

Install with extras

Install with extras

pip install mlflow[extras] # Includes SQLAlchemy, boto3, etc.
pip install mlflow[extras] # Includes SQLAlchemy, boto3, etc.

Start MLflow UI

Start MLflow UI

mlflow ui
mlflow ui
undefined
undefined

Quick Start

快速入门

Basic Tracking

基础跟踪

python
import mlflow
python
import mlflow

Start a run

Start a run

with mlflow.start_run(): # Log parameters mlflow.log_param("learning_rate", 0.001) mlflow.log_param("batch_size", 32)
# Your training code
model = train_model()

# Log metrics
mlflow.log_metric("train_loss", 0.15)
mlflow.log_metric("val_accuracy", 0.92)

# Log model
mlflow.sklearn.log_model(model, "model")
undefined
with mlflow.start_run(): # Log parameters mlflow.log_param("learning_rate", 0.001) mlflow.log_param("batch_size", 32)
# Your training code
model = train_model()

# Log metrics
mlflow.log_metric("train_loss", 0.15)
mlflow.log_metric("val_accuracy", 0.92)

# Log model
mlflow.sklearn.log_model(model, "model")
undefined

Autologging (Automatic Tracking)

自动日志记录(Autologging)

python
import mlflow
from sklearn.ensemble import RandomForestClassifier
python
import mlflow
from sklearn.ensemble import RandomForestClassifier

Enable autologging

Enable autologging

mlflow.autolog()
mlflow.autolog()

Train (automatically logged)

Train (automatically logged)

model = RandomForestClassifier(n_estimators=100, max_depth=5) model.fit(X_train, y_train)
model = RandomForestClassifier(n_estimators=100, max_depth=5) model.fit(X_train, y_train)

Metrics, parameters, and model logged automatically!

Metrics, parameters, and model logged automatically!

undefined
undefined

Core Concepts

核心概念

1. Experiments and Runs

1. 实验与运行(Experiments and Runs)

Experiment: Logical container for related runs Run: Single execution of ML code (parameters, metrics, artifacts)
python
import mlflow
Experiment(实验):相关运行的逻辑容器 Run(运行):机器学习代码的单次执行(包含参数、指标、产物)
python
import mlflow

Create/set experiment

Create/set experiment

mlflow.set_experiment("my-experiment")
mlflow.set_experiment("my-experiment")

Start a run

Start a run

with mlflow.start_run(run_name="baseline-model"): # Log params mlflow.log_param("model", "ResNet50") mlflow.log_param("epochs", 10)
# Train
model = train()

# Log metrics
mlflow.log_metric("accuracy", 0.95)

# Log model
mlflow.pytorch.log_model(model, "model")
with mlflow.start_run(run_name="baseline-model"): # Log params mlflow.log_param("model", "ResNet50") mlflow.log_param("epochs", 10)
# Train
model = train()

# Log metrics
mlflow.log_metric("accuracy", 0.95)

# Log model
mlflow.pytorch.log_model(model, "model")

Run ID is automatically generated

Run ID is automatically generated

print(f"Run ID: {mlflow.active_run().info.run_id}")
undefined
print(f"Run ID: {mlflow.active_run().info.run_id}")
undefined

2. Logging Parameters

2. 记录参数(Logging Parameters)

python
with mlflow.start_run():
    # Single parameter
    mlflow.log_param("learning_rate", 0.001)

    # Multiple parameters
    mlflow.log_params({
        "batch_size": 32,
        "epochs": 50,
        "optimizer": "Adam",
        "dropout": 0.2
    })

    # Nested parameters (as dict)
    config = {
        "model": {
            "architecture": "ResNet50",
            "pretrained": True
        },
        "training": {
            "lr": 0.001,
            "weight_decay": 1e-4
        }
    }

    # Log as JSON string or individual params
    for key, value in config.items():
        mlflow.log_param(key, str(value))
python
with mlflow.start_run():
    # Single parameter
    mlflow.log_param("learning_rate", 0.001)

    # Multiple parameters
    mlflow.log_params({
        "batch_size": 32,
        "epochs": 50,
        "optimizer": "Adam",
        "dropout": 0.2
    })

    # Nested parameters (as dict)
    config = {
        "model": {
            "architecture": "ResNet50",
            "pretrained": True
        },
        "training": {
            "lr": 0.001,
            "weight_decay": 1e-4
        }
    }

    # Log as JSON string or individual params
    for key, value in config.items():
        mlflow.log_param(key, str(value))

3. Logging Metrics

3. 记录指标(Logging Metrics)

python
with mlflow.start_run():
    # Training loop
    for epoch in range(NUM_EPOCHS):
        train_loss = train_epoch()
        val_loss = validate()

        # Log metrics at each step
        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("val_loss", val_loss, step=epoch)

        # Log multiple metrics
        mlflow.log_metrics({
            "train_accuracy": train_acc,
            "val_accuracy": val_acc
        }, step=epoch)

    # Log final metrics (no step)
    mlflow.log_metric("final_accuracy", final_acc)
python
with mlflow.start_run():
    # Training loop
    for epoch in range(NUM_EPOCHS):
        train_loss = train_epoch()
        val_loss = validate()

        # Log metrics at each step
        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("val_loss", val_loss, step=epoch)

        # Log multiple metrics
        mlflow.log_metrics({
            "train_accuracy": train_acc,
            "val_accuracy": val_acc
        }, step=epoch)

    # Log final metrics (no step)
    mlflow.log_metric("final_accuracy", final_acc)

4. Logging Artifacts

4. 记录产物(Logging Artifacts)

python
with mlflow.start_run():
    # Log file
    model.save('model.pkl')
    mlflow.log_artifact('model.pkl')

    # Log directory
    os.makedirs('plots', exist_ok=True)
    plt.savefig('plots/loss_curve.png')
    mlflow.log_artifacts('plots')

    # Log text
    with open('config.txt', 'w') as f:
        f.write(str(config))
    mlflow.log_artifact('config.txt')

    # Log dict as JSON
    mlflow.log_dict({'config': config}, 'config.json')
python
with mlflow.start_run():
    # Log file
    model.save('model.pkl')
    mlflow.log_artifact('model.pkl')

    # Log directory
    os.makedirs('plots', exist_ok=True)
    plt.savefig('plots/loss_curve.png')
    mlflow.log_artifacts('plots')

    # Log text
    with open('config.txt', 'w') as f:
        f.write(str(config))
    mlflow.log_artifact('config.txt')

    # Log dict as JSON
    mlflow.log_dict({'config': config}, 'config.json')

5. Logging Models

5. 记录模型(Logging Models)

python
undefined
python
undefined

PyTorch

PyTorch

import mlflow.pytorch
with mlflow.start_run(): model = train_pytorch_model() mlflow.pytorch.log_model(model, "model")
import mlflow.pytorch
with mlflow.start_run(): model = train_pytorch_model() mlflow.pytorch.log_model(model, "model")

Scikit-learn

Scikit-learn

import mlflow.sklearn
with mlflow.start_run(): model = train_sklearn_model() mlflow.sklearn.log_model(model, "model")
import mlflow.sklearn
with mlflow.start_run(): model = train_sklearn_model() mlflow.sklearn.log_model(model, "model")

Keras/TensorFlow

Keras/TensorFlow

import mlflow.keras
with mlflow.start_run(): model = train_keras_model() mlflow.keras.log_model(model, "model")
import mlflow.keras
with mlflow.start_run(): model = train_keras_model() mlflow.keras.log_model(model, "model")

HuggingFace Transformers

HuggingFace Transformers

import mlflow.transformers
with mlflow.start_run(): mlflow.transformers.log_model( transformers_model={ "model": model, "tokenizer": tokenizer }, artifact_path="model" )
undefined
import mlflow.transformers
with mlflow.start_run(): mlflow.transformers.log_model( transformers_model={ "model": model, "tokenizer": tokenizer }, artifact_path="model" )
undefined

Autologging

自动日志记录(Autologging)

Automatically log metrics, parameters, and models for popular frameworks.
自动为主流框架记录指标、参数和模型。

Enable Autologging

启用自动日志记录

python
import mlflow
python
import mlflow

Enable for all supported frameworks

Enable for all supported frameworks

mlflow.autolog()
mlflow.autolog()

Or enable for specific framework

Or enable for specific framework

mlflow.sklearn.autolog() mlflow.pytorch.autolog() mlflow.keras.autolog() mlflow.xgboost.autolog()
undefined
mlflow.sklearn.autolog() mlflow.pytorch.autolog() mlflow.keras.autolog() mlflow.xgboost.autolog()
undefined

Autologging with Scikit-learn

Scikit-learn自动日志记录

python
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
python
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

Enable autologging

Enable autologging

mlflow.sklearn.autolog()
mlflow.sklearn.autolog()

Split data

Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Train (automatically logs params, metrics, model)

Train (automatically logs params, metrics, model)

with mlflow.start_run(): model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42) model.fit(X_train, y_train)
# Metrics like accuracy, f1_score logged automatically
# Model logged automatically
# Training duration logged
undefined
with mlflow.start_run(): model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42) model.fit(X_train, y_train)
# Metrics like accuracy, f1_score logged automatically
# Model logged automatically
# Training duration logged
undefined

Autologging with PyTorch Lightning

PyTorch Lightning自动日志记录

python
import mlflow
import pytorch_lightning as pl
python
import mlflow
import pytorch_lightning as pl

Enable autologging

Enable autologging

mlflow.pytorch.autolog()
mlflow.pytorch.autolog()

Train

Train

with mlflow.start_run(): trainer = pl.Trainer(max_epochs=10) trainer.fit(model, datamodule=dm)
# Hyperparameters logged
# Training metrics logged
# Best model checkpoint logged
undefined
with mlflow.start_run(): trainer = pl.Trainer(max_epochs=10) trainer.fit(model, datamodule=dm)
# Hyperparameters logged
# Training metrics logged
# Best model checkpoint logged
undefined

Model Registry

模型注册表(Model Registry)

Manage model lifecycle with versioning and stage transitions.
通过版本控制和阶段转换管理模型生命周期。

Register Model

注册模型

python
import mlflow
python
import mlflow

Log and register model

Log and register model

with mlflow.start_run(): model = train_model()
# Log model
mlflow.sklearn.log_model(
    model,
    "model",
    registered_model_name="my-classifier"  # Register immediately
)
with mlflow.start_run(): model = train_model()
# Log model
mlflow.sklearn.log_model(
    model,
    "model",
    registered_model_name="my-classifier"  # Register immediately
)

Or register later

Or register later

run_id = "abc123" model_uri = f"runs:/{run_id}/model" mlflow.register_model(model_uri, "my-classifier")
undefined
run_id = "abc123" model_uri = f"runs:/{run_id}/model" mlflow.register_model(model_uri, "my-classifier")
undefined

Model Stages

模型阶段(Model Stages)

Transition models between stages: NoneStagingProductionArchived
python
from mlflow.tracking import MlflowClient

client = MlflowClient()
在不同阶段之间转换模型:None(未分类)Staging(测试)Production(生产)Archived(归档)
python
from mlflow.tracking import MlflowClient

client = MlflowClient()

Promote to staging

Promote to staging

client.transition_model_version_stage( name="my-classifier", version=3, stage="Staging" )
client.transition_model_version_stage( name="my-classifier", version=3, stage="Staging" )

Promote to production

Promote to production

client.transition_model_version_stage( name="my-classifier", version=3, stage="Production", archive_existing_versions=True # Archive old production versions )
client.transition_model_version_stage( name="my-classifier", version=3, stage="Production", archive_existing_versions=True # Archive old production versions )

Archive model

Archive model

client.transition_model_version_stage( name="my-classifier", version=2, stage="Archived" )
undefined
client.transition_model_version_stage( name="my-classifier", version=2, stage="Archived" )
undefined

Load Model from Registry

从注册表加载模型

python
import mlflow.pyfunc
python
import mlflow.pyfunc

Load latest production model

Load latest production model

model = mlflow.pyfunc.load_model("models:/my-classifier/Production")
model = mlflow.pyfunc.load_model("models:/my-classifier/Production")

Load specific version

Load specific version

model = mlflow.pyfunc.load_model("models:/my-classifier/3")
model = mlflow.pyfunc.load_model("models:/my-classifier/3")

Load from staging

Load from staging

model = mlflow.pyfunc.load_model("models:/my-classifier/Staging")
model = mlflow.pyfunc.load_model("models:/my-classifier/Staging")

Use model

Use model

predictions = model.predict(X_test)
undefined
predictions = model.predict(X_test)
undefined

Model Versioning

模型版本控制(Model Versioning)

python
client = MlflowClient()
python
client = MlflowClient()

List all versions

List all versions

versions = client.search_model_versions("name='my-classifier'")
for v in versions: print(f"Version {v.version}: {v.current_stage}")
versions = client.search_model_versions("name='my-classifier'")
for v in versions: print(f"Version {v.version}: {v.current_stage}")

Get latest version by stage

Get latest version by stage

latest_prod = client.get_latest_versions("my-classifier", stages=["Production"]) latest_staging = client.get_latest_versions("my-classifier", stages=["Staging"])
latest_prod = client.get_latest_versions("my-classifier", stages=["Production"]) latest_staging = client.get_latest_versions("my-classifier", stages=["Staging"])

Get model version details

Get model version details

version_info = client.get_model_version(name="my-classifier", version="3") print(f"Run ID: {version_info.run_id}") print(f"Stage: {version_info.current_stage}") print(f"Tags: {version_info.tags}")
undefined
version_info = client.get_model_version(name="my-classifier", version="3") print(f"Run ID: {version_info.run_id}") print(f"Stage: {version_info.current_stage}") print(f"Tags: {version_info.tags}")
undefined

Model Annotations

模型注释(Model Annotations)

python
client = MlflowClient()
python
client = MlflowClient()

Add description

Add description

client.update_model_version( name="my-classifier", version="3", description="ResNet50 classifier trained on 1M images with 95% accuracy" )
client.update_model_version( name="my-classifier", version="3", description="ResNet50 classifier trained on 1M images with 95% accuracy" )

Add tags

Add tags

client.set_model_version_tag( name="my-classifier", version="3", key="validation_status", value="approved" )
client.set_model_version_tag( name="my-classifier", version="3", key="deployed_date", value="2025-01-15" )
undefined
client.set_model_version_tag( name="my-classifier", version="3", key="validation_status", value="approved" )
client.set_model_version_tag( name="my-classifier", version="3", key="deployed_date", value="2025-01-15" )
undefined

Searching Runs

搜索运行记录(Searching Runs)

Find runs programmatically.
python
from mlflow.tracking import MlflowClient

client = MlflowClient()
通过编程方式查找运行记录。
python
from mlflow.tracking import MlflowClient

client = MlflowClient()

Search all runs in experiment

Search all runs in experiment

experiment_id = client.get_experiment_by_name("my-experiment").experiment_id runs = client.search_runs( experiment_ids=[experiment_id], filter_string="metrics.accuracy > 0.9", order_by=["metrics.accuracy DESC"], max_results=10 )
for run in runs: print(f"Run ID: {run.info.run_id}") print(f"Accuracy: {run.data.metrics['accuracy']}") print(f"Params: {run.data.params}")
experiment_id = client.get_experiment_by_name("my-experiment").experiment_id runs = client.search_runs( experiment_ids=[experiment_id], filter_string="metrics.accuracy > 0.9", order_by=["metrics.accuracy DESC"], max_results=10 )
for run in runs: print(f"Run ID: {run.info.run_id}") print(f"Accuracy: {run.data.metrics['accuracy']}") print(f"Params: {run.data.params}")

Search with complex filters

Search with complex filters

runs = client.search_runs( experiment_ids=[experiment_id], filter_string=""" metrics.accuracy > 0.9 AND params.model = 'ResNet50' AND tags.dataset = 'ImageNet' """, order_by=["metrics.f1_score DESC"] )
undefined
runs = client.search_runs( experiment_ids=[experiment_id], filter_string=""" metrics.accuracy > 0.9 AND params.model = 'ResNet50' AND tags.dataset = 'ImageNet' """, order_by=["metrics.f1_score DESC"] )
undefined

Integration Examples

集成示例

PyTorch

PyTorch

python
import mlflow
import torch
import torch.nn as nn
python
import mlflow
import torch
import torch.nn as nn

Enable autologging

Enable autologging

mlflow.pytorch.autolog()
with mlflow.start_run(): # Log config config = { "lr": 0.001, "epochs": 10, "batch_size": 32 } mlflow.log_params(config)
# Train
model = create_model()
optimizer = torch.optim.Adam(model.parameters(), lr=config["lr"])

for epoch in range(config["epochs"]):
    train_loss = train_epoch(model, optimizer, train_loader)
    val_loss, val_acc = validate(model, val_loader)

    # Log metrics
    mlflow.log_metrics({
        "train_loss": train_loss,
        "val_loss": val_loss,
        "val_accuracy": val_acc
    }, step=epoch)

# Log model
mlflow.pytorch.log_model(model, "model")
undefined
mlflow.pytorch.autolog()
with mlflow.start_run(): # Log config config = { "lr": 0.001, "epochs": 10, "batch_size": 32 } mlflow.log_params(config)
# Train
model = create_model()
optimizer = torch.optim.Adam(model.parameters(), lr=config["lr"])

for epoch in range(config["epochs"]):
    train_loss = train_epoch(model, optimizer, train_loader)
    val_loss, val_acc = validate(model, val_loader)

    # Log metrics
    mlflow.log_metrics({
        "train_loss": train_loss,
        "val_loss": val_loss,
        "val_accuracy": val_acc
    }, step=epoch)

# Log model
mlflow.pytorch.log_model(model, "model")
undefined

HuggingFace Transformers

HuggingFace Transformers

python
import mlflow
from transformers import Trainer, TrainingArguments
python
import mlflow
from transformers import Trainer, TrainingArguments

Enable autologging

Enable autologging

mlflow.transformers.autolog()
training_args = TrainingArguments( output_dir="./results", num_train_epochs=3, per_device_train_batch_size=16, evaluation_strategy="epoch", save_strategy="epoch", load_best_model_at_end=True )
mlflow.transformers.autolog()
training_args = TrainingArguments( output_dir="./results", num_train_epochs=3, per_device_train_batch_size=16, evaluation_strategy="epoch", save_strategy="epoch", load_best_model_at_end=True )

Start MLflow run

Start MLflow run

with mlflow.start_run(): trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset )
# Train (automatically logged)
trainer.train()

# Log final model to registry
mlflow.transformers.log_model(
    transformers_model={
        "model": trainer.model,
        "tokenizer": tokenizer
    },
    artifact_path="model",
    registered_model_name="hf-classifier"
)
undefined
with mlflow.start_run(): trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset )
# Train (automatically logged)
trainer.train()

# Log final model to registry
mlflow.transformers.log_model(
    transformers_model={
        "model": trainer.model,
        "tokenizer": tokenizer
    },
    artifact_path="model",
    registered_model_name="hf-classifier"
)
undefined

XGBoost

XGBoost

python
import mlflow
import xgboost as xgb
python
import mlflow
import xgboost as xgb

Enable autologging

Enable autologging

mlflow.xgboost.autolog()
with mlflow.start_run(): dtrain = xgb.DMatrix(X_train, label=y_train) dval = xgb.DMatrix(X_val, label=y_val)
params = {
    'max_depth': 6,
    'learning_rate': 0.1,
    'objective': 'binary:logistic',
    'eval_metric': ['logloss', 'auc']
}

# Train (automatically logged)
model = xgb.train(
    params,
    dtrain,
    num_boost_round=100,
    evals=[(dtrain, 'train'), (dval, 'val')],
    early_stopping_rounds=10
)

# Model and metrics logged automatically
undefined
mlflow.xgboost.autolog()
with mlflow.start_run(): dtrain = xgb.DMatrix(X_train, label=y_train) dval = xgb.DMatrix(X_val, label=y_val)
params = {
    'max_depth': 6,
    'learning_rate': 0.1,
    'objective': 'binary:logistic',
    'eval_metric': ['logloss', 'auc']
}

# Train (automatically logged)
model = xgb.train(
    params,
    dtrain,
    num_boost_round=100,
    evals=[(dtrain, 'train'), (dval, 'val')],
    early_stopping_rounds=10
)

# Model and metrics logged automatically
undefined

Best Practices

最佳实践

1. Organize with Experiments

1. 用实验组织项目

python
undefined
python
undefined

✅ Good: Separate experiments for different tasks

✅ Good: Separate experiments for different tasks

mlflow.set_experiment("sentiment-analysis") mlflow.set_experiment("image-classification") mlflow.set_experiment("recommendation-system")
mlflow.set_experiment("sentiment-analysis") mlflow.set_experiment("image-classification") mlflow.set_experiment("recommendation-system")

❌ Bad: Everything in one experiment

❌ Bad: Everything in one experiment

mlflow.set_experiment("all-models")
undefined
mlflow.set_experiment("all-models")
undefined

2. Use Descriptive Run Names

2. 使用描述性的运行名称

python
undefined
python
undefined

✅ Good: Descriptive names

✅ Good: Descriptive names

with mlflow.start_run(run_name="resnet50-imagenet-lr0.001-bs32"): train()
with mlflow.start_run(run_name="resnet50-imagenet-lr0.001-bs32"): train()

❌ Bad: No name (auto-generated UUID)

❌ Bad: No name (auto-generated UUID)

with mlflow.start_run(): train()
undefined
with mlflow.start_run(): train()
undefined

3. Log Comprehensive Metadata

3. 记录全面的元数据

python
with mlflow.start_run():
    # Log hyperparameters
    mlflow.log_params({
        "learning_rate": 0.001,
        "batch_size": 32,
        "epochs": 50
    })

    # Log system info
    mlflow.set_tags({
        "dataset": "ImageNet",
        "framework": "PyTorch 2.0",
        "gpu": "A100",
        "git_commit": get_git_commit()
    })

    # Log data info
    mlflow.log_param("train_samples", len(train_dataset))
    mlflow.log_param("val_samples", len(val_dataset))
python
with mlflow.start_run():
    # Log hyperparameters
    mlflow.log_params({
        "learning_rate": 0.001,
        "batch_size": 32,
        "epochs": 50
    })

    # Log system info
    mlflow.set_tags({
        "dataset": "ImageNet",
        "framework": "PyTorch 2.0",
        "gpu": "A100",
        "git_commit": get_git_commit()
    })

    # Log data info
    mlflow.log_param("train_samples", len(train_dataset))
    mlflow.log_param("val_samples", len(val_dataset))

4. Track Model Lineage

4. 跟踪模型 lineage

python
undefined
python
undefined

Link runs to understand lineage

Link runs to understand lineage

with mlflow.start_run(run_name="preprocessing"): data = preprocess() mlflow.log_artifact("data.csv") preprocessing_run_id = mlflow.active_run().info.run_id
with mlflow.start_run(run_name="training"): # Reference parent run mlflow.set_tag("preprocessing_run_id", preprocessing_run_id) model = train(data)
undefined
with mlflow.start_run(run_name="preprocessing"): data = preprocess() mlflow.log_artifact("data.csv") preprocessing_run_id = mlflow.active_run().info.run_id
with mlflow.start_run(run_name="training"): # Reference parent run mlflow.set_tag("preprocessing_run_id", preprocessing_run_id) model = train(data)
undefined

5. Use Model Registry for Deployment

5. 使用模型注册表进行部署

python
undefined
python
undefined

✅ Good: Use registry for production

✅ Good: Use registry for production

model_uri = "models:/my-classifier/Production" model = mlflow.pyfunc.load_model(model_uri)
model_uri = "models:/my-classifier/Production" model = mlflow.pyfunc.load_model(model_uri)

❌ Bad: Hard-code run IDs

❌ Bad: Hard-code run IDs

model_uri = "runs:/abc123/model" model = mlflow.pyfunc.load_model(model_uri)
undefined
model_uri = "runs:/abc123/model" model = mlflow.pyfunc.load_model(model_uri)
undefined

Deployment

部署

Serve Model Locally

本地部署模型

bash
undefined
bash
undefined

Serve registered model

Serve registered model

mlflow models serve -m "models:/my-classifier/Production" -p 5001
mlflow models serve -m "models:/my-classifier/Production" -p 5001

Serve from run

Serve from run

mlflow models serve -m "runs:/<RUN_ID>/model" -p 5001
mlflow models serve -m "runs:/<RUN_ID>/model" -p 5001

Test endpoint

Test endpoint

curl http://127.0.0.1:5001/invocations -H 'Content-Type: application/json' -d '{ "inputs": [[1.0, 2.0, 3.0, 4.0]] }'
undefined
curl http://127.0.0.1:5001/invocations -H 'Content-Type: application/json' -d '{ "inputs": [[1.0, 2.0, 3.0, 4.0]] }'
undefined

Deploy to Cloud

部署到云端

bash
undefined
bash
undefined

Deploy to AWS SageMaker

Deploy to AWS SageMaker

mlflow sagemaker deploy -m "models:/my-classifier/Production" --region-name us-west-2
mlflow sagemaker deploy -m "models:/my-classifier/Production" --region-name us-west-2

Deploy to Azure ML

Deploy to Azure ML

mlflow azureml deploy -m "models:/my-classifier/Production"
undefined
mlflow azureml deploy -m "models:/my-classifier/Production"
undefined

Configuration

配置

Tracking Server

跟踪服务器(Tracking Server)

bash
undefined
bash
undefined

Start tracking server with backend store

Start tracking server with backend store

mlflow server
--backend-store-uri postgresql://user:password@localhost/mlflow
--default-artifact-root s3://my-bucket/mlflow
--host 0.0.0.0
--port 5000
undefined
mlflow server
--backend-store-uri postgresql://user:password@localhost/mlflow
--default-artifact-root s3://my-bucket/mlflow
--host 0.0.0.0
--port 5000
undefined

Client Configuration

客户端配置

python
import mlflow
python
import mlflow

Set tracking URI

Set tracking URI

mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_tracking_uri("http://localhost:5000")

Or use environment variable

Or use environment variable

export MLFLOW_TRACKING_URI=http://localhost:5000

export MLFLOW_TRACKING_URI=http://localhost:5000

undefined
undefined

Resources

资源

See Also

另请参阅

  • references/tracking.md
    - Comprehensive tracking guide
  • references/model-registry.md
    - Model lifecycle management
  • references/deployment.md
    - Production deployment patterns
  • references/tracking.md
    - 全面的跟踪指南
  • references/model-registry.md
    - 模型生命周期管理
  • references/deployment.md
    - 生产部署模式