h100-sglang-diffusion
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseH100 — SGLang Diffusion
H100 — SGLang Diffusion
Overview
概述
Use this skill to do SGLang diffusion development on the H100 box through .
The default container is and the repo lives at .
h100_sglangsglang_bbuf/data/bbuf/repos/sglangPrefer this skill when:
- Validating diffusion Triton / CUDA JIT kernels
- Running diffusion model smoke tests (, flux, etc.)
DiffGenerator - Comparing eager vs diffusion performance
torch.compile - Verifying editable install changes
python[diffusion]
This environment is already prepared:
- is running on
sglang_bbuflmsysorg/sglang:dev - the repo is cloned at
/data/bbuf/repos/sglang - editable installs for and
python[all]are already donepython[diffusion] - is mounted to
/data/.cache/root/.cache - Infiniband paths are mounted for RDMA-aware workflows:
,
/sys/class/infiniband, and/dev/infiniband/usr/sbin/show_gids
使用本指南,可通过在H100服务器上进行SGLang diffusion开发。默认容器为,代码仓库位于。
h100_sglangsglang_bbuf/data/bbuf/repos/sglang优先使用本场景的情况:
- 验证diffusion Triton/CUDA JIT内核
- 运行扩散模型冒烟测试(、flux等)
DiffGenerator - 对比eager模式与的diffusion性能
torch.compile - 验证可编辑安装的修改
python[diffusion]
本环境已预先配置完成:
- 基于
sglang_bbuf镜像运行lmsysorg/sglang:dev - 代码仓库已克隆至
/data/bbuf/repos/sglang - 已完成和
python[all]的可编辑安装python[diffusion] - 已挂载至
/data/.cache/root/.cache - 已挂载InfiniBand路径以支持RDMA相关工作流:
、
/sys/class/infiniband和/dev/infiniband/usr/sbin/show_gids
Quick Start
快速开始
- Check the host, container, and GPU state.
bash
ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'- Enter the container and confirm HF token visibility.
bash
ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /data/bbuf/repos/sglang
echo ${HF_TOKEN:+set}If is missing, export it before any Hub-backed diffusion run:
HF_TOKENbash
export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"For non-interactive runs, export both variables
inline instead of relying on shell startup:
docker exec ... bash -lc "<cmd>"bash
ssh h100_sglang 'docker exec sglang_bbuf env HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> zsh -lc "..."'- Pick a free GPU.
Use a GPU with utilization and only a few MiB allocated.
Always set for diffusion validation commands.
0CUDA_VISIBLE_DEVICES=<gpu_id>- If the container is not running, start it.
bash
ssh h100_sglang 'docker start sglang_bbuf'- 检查主机、容器和GPU状态。
bash
ssh h100_sglang 'hostname && whoami'
ssh h100_sglang 'docker ps --format "table {{.Names}}\t{{.Status}}" | sed -n "1,20p"'
ssh h100_sglang 'nvidia-smi --query-gpu=index,name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits'- 进入容器并确认HF_TOKEN是否可见。
bash
ssh h100_sglang 'docker exec -it sglang_bbuf /bin/zsh'
cd /data/bbuf/repos/sglang
echo ${HF_TOKEN:+set}如果缺失,请在运行任何依赖Hub的diffusion任务前导出该变量:
HF_TOKENbash
export HF_TOKEN=<your-hf-token>
export HUGGINGFACE_HUB_TOKEN="$HF_TOKEN"对于非交互式的运行,请直接在命令中导出两个变量,而非依赖shell启动配置:
docker exec ... bash -lc "<cmd>"bash
ssh h100_sglang 'docker exec sglang_bbuf env HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> zsh -lc "..."'- 选择空闲GPU。
请选择利用率为且仅占用少量MiB显存的GPU。在运行diffusion验证命令时,务必设置。
0CUDA_VISIBLE_DEVICES=<gpu_id>- 如果容器未运行,请启动它。
bash
ssh h100_sglang 'docker start sglang_bbuf'Safe Remote Workflow
安全远程工作流
- Inspect the repo state before editing.
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git branch --show-current && git status --short"'- Fast-forward to latest clean before creating a validation worktree.
main
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'-
Never write directly intowhen it is dirty.
/data/bbuf/repos/sglang -
Use one of these isolation strategies.
Create a detached worktree for remote-only experiments:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'Stream the local working tree into the container (validates exactly what is local right now):
bash
COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'For patch-oriented validation:
- fast-forward remote
main - create a detached worktree from that commit
- stream or only the focused local diff into the worktree
git apply
This keeps clean while still validating the exact local delta.
/data/bbuf/repos/sglang- 在编辑前检查仓库状态。
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git branch --show-current && git status --short"'- 在创建验证工作树前,先快进至最新的干净分支。
main
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git fetch origin && git checkout main && git pull --ff-only origin main"'-
当处于脏状态时,切勿直接写入内容。
/data/bbuf/repos/sglang -
使用以下隔离策略之一。
为仅远程实验创建分离的工作树:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /data/bbuf/repos/sglang && git worktree add --detach /tmp/sglang_validate_h100 HEAD"'将本地工作树流式传输至容器(验证当前本地的精确内容):
bash
COPYFILE_DISABLE=1 tar --exclude=.git -cf - . | \
ssh h100_sglang 'docker exec -i sglang_bbuf sh -lc "rm -rf /tmp/sglang_local_validate && mkdir -p /tmp/sglang_local_validate && tar -xf - -C /tmp/sglang_local_validate"'
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "find /tmp/sglang_local_validate -name '\''._*'\'' -delete"'针对补丁验证的场景:
- 快进远程分支
main - 基于该提交创建分离的工作树
- 将本地聚焦的diff流式传输或通过应用至工作树
git apply
此方式可保持干净,同时仍能验证本地的精确修改。
/data/bbuf/repos/sglangDiffusion Validation Workflow
Diffusion验证工作流
1. Syntax / Import Check
1. 语法/导入检查
Always start here before running any GPU kernel or model test.
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'For broader coverage:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'在运行任何GPU内核或模型测试前,请务必先执行此步骤。
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang/jit_kernel/diffusion/triton python/sglang/multimodal_gen/runtime/layers"'如需更全面的覆盖:
bash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python -m compileall python/sglang"'2. JIT Kernel Smoke
2. JIT内核冒烟测试
Run a targeted smoke script covering the changed primitives before any model-level test.
Cover at least these when relevant:
rms_norm_fn- under
RMSNormtorch.compile norm_inferapply_rotary_embedding
Pipe the smoke script through :
docker exec -ibash
ssh h100_sglang 'docker exec -i sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python python' < /path/to/local_smoke.py在进行任何模型级测试前,先运行针对修改原语的定向冒烟脚本。
相关场景下至少需覆盖以下内容:
rms_norm_fn- 下的
torch.compileRMSNorm norm_inferapply_rotary_embedding
通过管道传输冒烟脚本:
docker exec -ibash
ssh h100_sglang 'docker exec -i sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python python' < /path/to/local_smoke.py3. Fused Modulation Regression
3. 融合调制回归测试
Run this after any change to :
jit_kernel/diffusion/tritonbash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/test_qwen_image_modulation.py -q"'在修改后,请运行此测试:
jit_kernel/diffusion/tritonbash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q python/sglang/jit_kernel/tests/test_qwen_image_modulation.py -q"'4. General Diffusion Tests
4. 通用Diffusion测试
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/diffusion_test.py -q"'bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && pytest -q path/to/diffusion_test.py -q"'5. Model-Level Smoke (DiffGenerator
)
DiffGenerator5. 模型级冒烟测试(DiffGenerator
)
DiffGeneratorOnly after steps 1–4 pass.
Use a real file with guard —
will fail if the entry point is stdin or unguarded top-level code.
.pyif __name__ == "__main__":multiprocessing.spawnbash
undefined仅在步骤1-4通过后执行。
请使用包含 guard的真实文件——若入口点为标准输入或无guard的顶层代码,会执行失败。
if __name__ == "__main__":.pymultiprocessing.spawnbash
undefinedstream the script file to the container
将脚本文件传输至容器
scp /path/to/local_smoke_model.py h100_sglang:/tmp/smoke_model.py
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> PYTHONPATH=/tmp/sglang_local_validate/python zsh -lc "python /tmp/smoke_model.py"'
Treat checkpoint, dependency, and environment failures separately from code regressions.scp /path/to/local_smoke_model.py h100_sglang:/tmp/smoke_model.py
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 HF_TOKEN=<your-hf-token> HUGGINGFACE_HUB_TOKEN=<your-hf-token> PYTHONPATH=/tmp/sglang_local_validate/python zsh -lc "python /tmp/smoke_model.py"'
请将检查点、依赖和环境失败与代码回归问题分开处理。6. Server-Level Smoke
6. 服务器级冒烟测试
Only attempt after model-level smoke passes.
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && python -m sglang.launch_server --model-path <model> --port 30000 &"'仅在模型级冒烟测试通过后尝试。
bash
ssh h100_sglang 'docker exec sglang_bbuf env CUDA_VISIBLE_DEVICES=0 PYTHONPATH=python zsh -lc "cd /tmp/sglang_local_validate && python -m sglang.launch_server --model-path <model> --port 30000 &"'Torch Compile Attribution
Torch Compile归因分析
When a benchmark compares eager vs , do not stop at the speedup number.
Capture matching eager and compile traces or perf dumps, then run:
torch.compilebash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python scripts/analyze_diffusion_torch_compile.py"'当基准测试对比eager模式与时,不要仅停留在加速数值上。请捕获匹配的eager和compile跟踪或性能转储,然后运行:
torch.compilebash
ssh h100_sglang 'docker exec sglang_bbuf zsh -lc "cd /tmp/sglang_local_validate && python scripts/analyze_diffusion_torch_compile.py"'Cleanup
清理
bash
ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100 /tmp/smoke_model.py'bash
ssh h100_sglang 'docker exec sglang_bbuf rm -rf /tmp/sglang_local_validate /tmp/sglang_validate_h100 /tmp/smoke_model.py'