daft-udf-tuning

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Daft UDF Tuning

Daft UDF 调优

Optimize User-Defined Functions for performance.
优化用户自定义函数的运行性能。

UDF Types

UDF 类型

TypeDecoratorUse Case
Stateless
@daft.func
Simple transforms. Use
async
for I/O-bound tasks.
Stateful
@daft.cls
Expensive init (e.g., loading models). Supports
gpus=N
.
Batch
@daft.func.batch
Vectorized CPU/GPU ops (NumPy/PyTorch). Faster.
类型装饰器适用场景
无状态
@daft.func
简单转换场景。I/O密集型任务可使用
async
有状态
@daft.cls
初始化开销高的场景(例如加载模型)。支持配置
gpus=N
参数。
批处理
@daft.func.batch
矢量化CPU/GPU运算(NumPy/PyTorch),运行速度更快。

Quick Recipes

快速实践方案

1. Async I/O (Web APIs)

1. 异步I/O(Web API场景)

python
@daft.func
async def fetch(url: str):
    async with aiohttp.ClientSession() as s:
        return await s.get(url).text()
python
@daft.func
async def fetch(url: str):
    async with aiohttp.ClientSession() as s:
        return await s.get(url).text()

2. GPU Batch Inference (PyTorch/Models)

2. GPU批推理(PyTorch/模型场景)

python
@daft.cls(gpus=1)
class Classifier:
    def __init__(self):
        self.model = load_model().cuda() # Run once per worker

    @daft.method.batch(batch_size=32)
    def predict(self, images):
        return self.model(images.to_pylist())
python
@daft.cls(gpus=1)
class Classifier:
    def __init__(self):
        self.model = load_model().cuda() # 每个worker仅运行一次

    @daft.method.batch(batch_size=32)
    def predict(self, images):
        return self.model(images.to_pylist())

Run with concurrency

并发运行

df.with_column("preds", Classifier(max_concurrency=4).predict(df["img"]))
undefined
df.with_column("preds", Classifier(max_concurrency=4).predict(df["img"]))
undefined

Tuning Keys

调优关键参数

  • max_concurrency
    : Total parallel UDF instances.
  • gpus=N
    : GPU request per instance.
  • batch_size
    : Rows per call. Too small = overhead; too big = OOM.
  • into_batches(N)
    : Pre-slice partitions if memory is tight.
  • max_concurrency
    : UDF实例总并行数。
  • gpus=N
    : 每个实例申请的GPU数量。
  • batch_size
    : 单次调用处理的行数。数值过小会产生额外开销,过大则会引发内存不足(OOM)问题。
  • into_batches(N)
    : 内存紧张时可预先对分区进行切片。