domain-ml

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Machine Learning Domain

机器学习领域

Layer 3: Domain Constraints
第3层:领域约束

Domain Constraints → Design Implications

领域约束 → 设计含义

Domain RuleDesign ConstraintRust Implication
Large dataEfficient memoryZero-copy, streaming
GPU accelerationCUDA/Metal supportcandle, tch-rs
Model portabilityStandard formatsONNX
Batch processingThroughput over latencyBatched inference
Numerical precisionFloat handlingndarray, careful f32/f64
ReproducibilityDeterministicSeeded random, versioning

领域规则设计约束Rust 实现要点
数据量大内存高效Zero-copy, streaming
GPU 加速支持 CUDA/Metalcandle, tch-rs
模型可移植性标准格式ONNX
批量处理吞吐量优先于延迟Batched inference
数值精度浮点数处理ndarray, careful f32/f64
可复现性确定性Seeded random, versioning

Critical Constraints

核心约束

Memory Efficiency

内存效率

RULE: Avoid copying large tensors
WHY: Memory bandwidth is bottleneck
RUST: References, views, in-place ops
RULE: Avoid copying large tensors
WHY: Memory bandwidth is bottleneck
RUST: References, views, in-place ops

GPU Utilization

GPU 利用率

RULE: Batch operations for GPU efficiency
WHY: GPU overhead per kernel launch
RUST: Batch sizes, async data loading
RULE: Batch operations for GPU efficiency
WHY: GPU overhead per kernel launch
RUST: Batch sizes, async data loading

Model Portability

模型可移植性

RULE: Use standard model formats
WHY: Train in Python, deploy in Rust
RUST: ONNX via tract or candle

RULE: Use standard model formats
WHY: Train in Python, deploy in Rust
RUST: ONNX via tract or candle

Trace Down ↓

向下溯源 ↓

From constraints to design (Layer 2):
"Need efficient data pipelines"
    ↓ m10-performance: Streaming, batching
    ↓ polars: Lazy evaluation

"Need GPU inference"
    ↓ m07-concurrency: Async data loading
    ↓ candle/tch-rs: CUDA backend

"Need model loading"
    ↓ m12-lifecycle: Lazy init, caching
    ↓ tract: ONNX runtime

从约束到设计(第2层):
"Need efficient data pipelines"
    ↓ m10-performance: Streaming, batching
    ↓ polars: Lazy evaluation

"Need GPU inference"
    ↓ m07-concurrency: Async data loading
    ↓ candle/tch-rs: CUDA backend

"Need model loading"
    ↓ m12-lifecycle: Lazy init, caching
    ↓ tract: ONNX runtime

Use Case → Framework

用例 → 框架

Use CaseRecommendedWhy
Inference onlytract (ONNX)Lightweight, portable
Training + inferencecandle, burnPure Rust, GPU
PyTorch modelstch-rsDirect bindings
Data pipelinespolarsFast, lazy eval
用例推荐方案原因
仅推理tract (ONNX)轻量、可移植
训练 + 推理candle, burn纯 Rust 实现、支持 GPU
PyTorch 模型tch-rs直接绑定
数据管道polars速度快、惰性求值

Key Crates

关键 Crate

PurposeCrate
Tensorsndarray
ONNX inferencetract
ML frameworkcandle, burn
PyTorch bindingstch-rs
Data processingpolars
Embeddingsfastembed
用途Crate
张量ndarray
ONNX 推理tract
ML 框架candle, burn
PyTorch 绑定tch-rs
数据处理polars
Embedding 生成fastembed

Design Patterns

设计模式

PatternPurposeImplementation
Model loadingOnce, reuse
OnceLock<Model>
BatchingThroughputCollect then process
StreamingLarge dataIterator-based
GPU asyncParallelismData loading parallel to compute
模式用途实现方式
模型加载一次加载、重复使用
OnceLock<Model>
批处理提升吞吐量先收集再处理
流式处理处理大量数据基于迭代器实现
GPU 异步并行处理数据加载与计算并行

Code Pattern: Inference Server

代码模式:推理服务

rust
use std::sync::OnceLock;
use tract_onnx::prelude::*;

static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();

fn get_model() -> &'static SimplePlan<...> {
    MODEL.get_or_init(|| {
        tract_onnx::onnx()
            .model_for_path("model.onnx")
            .unwrap()
            .into_optimized()
            .unwrap()
            .into_runnable()
            .unwrap()
    })
}

async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> {
    let model = get_model();
    let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?;
    let result = model.run(tvec!(input.into()))?;
    Ok(result[0].to_array_view::<f32>()?.iter().copied().collect())
}
rust
use std::sync::OnceLock;
use tract_onnx::prelude::*;

static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();

fn get_model() -> &'static SimplePlan<...> {
    MODEL.get_or_init(|| {
        tract_onnx::onnx()
            .model_for_path("model.onnx")
            .unwrap()
            .into_optimized()
            .unwrap()
            .into_runnable()
            .unwrap()
    })
}

async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> {
    let model = get_model();
    let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?;
    let result = model.run(tvec!(input.into()))?;
    Ok(result[0].to_array_view::<f32>()?.iter().copied().collect())
}

Code Pattern: Batched Inference

代码模式:批量推理

rust
async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> {
    let mut results = Vec::with_capacity(inputs.len());

    for batch in inputs.chunks(batch_size) {
        // Stack inputs into batch tensor
        let batch_tensor = stack_inputs(batch);

        // Run inference on batch
        let batch_output = model.run(batch_tensor).await;

        // Unstack results
        results.extend(unstack_outputs(batch_output));
    }

    results
}

rust
async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> {
    let mut results = Vec::with_capacity(inputs.len());

    for batch in inputs.chunks(batch_size) {
        // Stack inputs into batch tensor
        let batch_tensor = stack_inputs(batch);

        // Run inference on batch
        let batch_output = model.run(batch_tensor).await;

        // Unstack results
        results.extend(unstack_outputs(batch_output));
    }

    results
}

Common Mistakes

常见错误

MistakeDomain ViolationFix
Clone tensorsMemory wasteUse views
Single inferenceGPU underutilizedBatch processing
Load model per requestSlowSingleton pattern
Sync data loadingGPU idleAsync pipeline

错误违反的领域规则修复方案
克隆张量内存浪费使用视图
单次推理GPU 利用率不足使用批处理
每个请求都加载模型速度慢使用单例模式
同步加载数据GPU 闲置使用异步流水线

Trace to Layer 1

溯源到第1层

ConstraintLayer 2 PatternLayer 1 Implementation
Memory efficiencyZero-copyndarray views
Model singletonLazy initOnceLock<Model>
Batch processingChunked iterationchunks() + parallel
GPU asyncConcurrent loadingtokio::spawn + GPU

约束第2层模式第1层实现
内存效率零拷贝ndarray views
模型单例惰性初始化OnceLock<Model>
批处理分块迭代chunks() + parallel
GPU 异步并发加载tokio::spawn + GPU

Related Skills

相关技能

WhenSee
Performancem10-performance
Lazy initializationm12-lifecycle
Async patternsm07-concurrency
Memory efficiencym01-ownership
适用场景参考
性能优化m10-performance
惰性初始化m12-lifecycle
异步模式m07-concurrency
内存效率m01-ownership