h100-sglang-diffusion

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

H100 — SGLang Diffusion

H100 — SGLang Diffusion

Overview

概述

Use this skill to do SGLang diffusion development on the H100 box through
h100_sglang
. The default container is
sglang_bbuf
and the repo lives at
/data/bbuf/repos/sglang
.
Prefer this skill when:
  • Validating diffusion Triton / CUDA JIT kernels
  • Running diffusion model smoke tests (
    DiffGenerator
    , flux, etc.)
  • Comparing eager vs
    torch.compile
    diffusion performance
  • Verifying
    python[diffusion]
    editable install changes
This environment is already prepared:
  • sglang_bbuf
    is running on
    lmsysorg/sglang:dev
  • the repo is cloned at
    /data/bbuf/repos/sglang
  • editable installs for
    python[all]
    and
    python[diffusion]
    are already done
  • /data/.cache
    is mounted to
    /root/.cache
  • Infiniband paths are mounted for RDMA-aware workflows:
    /sys/class/infiniband
    ,
    /dev/infiniband
    , and
    /usr/sbin/show_gids
使用本指南,可通过
h100_sglang
在H100服务器上进行SGLang diffusion开发。默认容器为
sglang_bbuf
,代码仓库位于
/data/bbuf/repos/sglang
优先使用本场景的情况:
  • 验证diffusion Triton/CUDA JIT内核
  • 运行扩散模型冒烟测试(
    DiffGenerator
    、flux等)
  • 对比eager模式与
    torch.compile
    的diffusion性能
  • 验证
    python[diffusion]
    可编辑安装的修改
本环境已预先配置完成:
  • sglang_bbuf
    基于
    lmsysorg/sglang:dev
    镜像运行
  • 代码仓库已克隆至
    /data/bbuf/repos/sglang
  • 已完成
    python[all]
    python[diffusion]
    的可编辑安装
  • /data/.cache
    已挂载至
    /root/.cache
  • 已挂载InfiniBand路径以支持RDMA相关工作流:
    /sys/class/infiniband
    /dev/infiniband
    /usr/sbin/show_gids

Quick Start

快速开始

  1. Check the host, container, and GPU state.
bash
ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'
  1. Enter the container and confirm HF token visibility.
bash
ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /data/bbuf/repos/sglang
echo ${HF_TOKEN:+set}
If
HF_TOKEN
is missing, export it before any Hub-backed diffusion run:
bash
export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"
For non-interactive
docker exec ... bash -lc "<cmd>"
runs, export both variables inline instead of relying on shell startup:
bash
ssh h100_sglang 'docker exec sglang_bbuf env HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> zsh -lc "..."'
  1. Pick a free GPU.
Use a GPU with
0
utilization and only a few MiB allocated. Always set
CUDA_VISIBLE_DEVICES=<gpu_id>
for diffusion validation commands.
  1. If the container is not running, start it.
bash
ssh h100_sglang 'docker start sglang_bbuf'
  1. 检查主机、容器和GPU状态。
bash
ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'
  1. 进入容器并确认HF_TOKEN是否可见。
bash
ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /data/bbuf/repos/sglang
echo ${HF_TOKEN:+set}
如果
HF_TOKEN
缺失,请在运行任何依赖Hub的diffusion任务前导出该变量:
bash
export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"
对于非交互式的
docker exec ... bash -lc "<cmd>"
运行,请直接在命令中导出两个变量,而非依赖shell启动配置:
bash
ssh h100_sglang 'docker exec sglang_bbuf env HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> zsh -lc "..."'
  1. 选择空闲GPU。
请选择利用率为
0
且仅占用少量MiB显存的GPU。在运行diffusion验证命令时,务必设置
CUDA_VISIBLE_DEVICES=<gpu_id>
  1. 如果容器未运行,请启动它。
bash
ssh h100_sglang 'docker start sglang_bbuf'

Safe Remote Workflow

安全远程工作流

  1. Inspect the repo state before editing.
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git branch --show-current && git status --short"'
  1. Fast-forward to latest clean
    main
    before creating a validation worktree.
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'
  1. Never write directly into
    /data/bbuf/repos/sglang
    when it is dirty.
  2. Use one of these isolation strategies.
Create a detached worktree for remote-only experiments:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'
Stream the local working tree into the container (validates exactly what is local right now):
bash
COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'
For patch-oriented validation:
  • fast-forward remote
    main
  • create a detached worktree from that commit
  • stream or
    git apply
    only the focused local diff into the worktree
This keeps
/data/bbuf/repos/sglang
clean while still validating the exact local delta.
  1. 在编辑前检查仓库状态。
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git branch --show-current && git status --short"'
  1. 在创建验证工作树前,先快进至最新的干净
    main
    分支。
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'
  1. /data/bbuf/repos/sglang
    处于脏状态时,切勿直接写入内容。
  2. 使用以下隔离策略之一。
为仅远程实验创建分离的工作树:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'
将本地工作树流式传输至容器(验证当前本地的精确内容):
bash
COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'
针对补丁验证的场景:
  • 快进远程
    main
    分支
  • 基于该提交创建分离的工作树
  • 将本地聚焦的diff流式传输或通过
    git apply
    应用至工作树
此方式可保持
/data/bbuf/repos/sglang
干净,同时仍能验证本地的精确修改。

Diffusion Validation Workflow

Diffusion验证工作流

1. Syntax / Import Check

1. 语法/导入检查

Always start here before running any GPU kernel or model test.
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'
For broader coverage:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'
在运行任何GPU内核或模型测试前,请务必先执行此步骤。
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'
如需更全面的覆盖:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'

2. JIT Kernel Smoke

2. JIT内核冒烟测试

Run a targeted smoke script covering the changed primitives before any model-level test.
Cover at least these when relevant:
  • rms_norm_fn
  • RMSNorm
    under
    torch.compile
  • norm_infer
  • apply_rotary_embedding
Pipe the smoke script through
docker exec -i
:
bash
ssh h100_sglang 'docker exec -i sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python python' < /path/to/local_smoke.py
在进行任何模型级测试前,先运行针对修改原语的定向冒烟脚本。
相关场景下至少需覆盖以下内容:
  • rms_norm_fn
  • torch.compile
    下的
    RMSNorm
  • norm_infer
  • apply_rotary_embedding
通过
docker exec -i
管道传输冒烟脚本:
bash
ssh h100_sglang 'docker exec -i sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python python' < /path/to/local_smoke.py

3. Fused Modulation Regression

3. 融合调制回归测试

Run this after any change to
jit_kernel/diffusion/triton
:
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/test_qwen_image_modulation.py -q"'
在修改
jit_kernel/diffusion/triton
后,请运行此测试:
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/test_qwen_image_modulation.py -q"'

4. General Diffusion Tests

4. 通用Diffusion测试

bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/diffusion_test.py -q"'
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/diffusion_test.py -q"'

5. Model-Level Smoke (
DiffGenerator
)

5. 模型级冒烟测试(
DiffGenerator

Only after steps 1–4 pass.
Use a real
.py
file with
if __name__ == "__main__":
guard —
multiprocessing.spawn
will fail if the entry point is stdin or unguarded top-level code.
bash
undefined
仅在步骤1-4通过后执行。
请使用包含
if __name__ == "__main__":
guard的真实
.py
文件——若入口点为标准输入或无guard的顶层代码,
multiprocessing.spawn
会执行失败。
bash
undefined

stream the script file to the container

将脚本文件传输至容器

scp /path/to/local_smoke_model.py h100_sglang:/tmp/smoke_model.py ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> PYTHONPATH=/tmp/sglang_local_validate/python zsh -lc "python /tmp/smoke_model.py"'

Treat checkpoint, dependency, and environment failures separately from code regressions.
scp /path/to/local_smoke_model.py h100_sglang:/tmp/smoke_model.py ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> PYTHONPATH=/tmp/sglang_local_validate/python zsh -lc "python /tmp/smoke_model.py"'

请将检查点、依赖和环境失败与代码回归问题分开处理。

6. Server-Level Smoke

6. 服务器级冒烟测试

Only attempt after model-level smoke passes.
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && python -m sglang.launch_server --model-path <model> --port 30000 &"'
仅在模型级冒烟测试通过后尝试。
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && python -m sglang.launch_server --model-path <model> --port 30000 &"'

Torch Compile Attribution

Torch Compile归因分析

When a benchmark compares eager vs
torch.compile
, do not stop at the speedup number. Capture matching eager and compile traces or perf dumps, then run:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python scripts/analyze_diffusion_torch_compile.py"'
当基准测试对比eager模式与
torch.compile
时,不要仅停留在加速数值上。请捕获匹配的eager和compile跟踪或性能转储,然后运行:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python scripts/analyze_diffusion_torch_compile.py"'

Cleanup

清理

bash
ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100 /tmp/smoke_model.py'
bash
ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100 /tmp/smoke_model.py'