trtllm-flashinfer-upgrade
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFlashInfer Version Upgrade Skill
FlashInfer版本升级技能
Automates upgrading the package version across TensorRT-LLM.
flashinfer-python自动完成TensorRT-LLM中包版本的全仓库升级。
flashinfer-pythonWhen to Use
使用场景
- User asks to upgrade / bump / update flashinfer
- Routine dependency update duty for flashinfer-python
- 用户要求升级/更新flashinfer
- 定期执行flashinfer-python的依赖更新任务
Prerequisites
前置条件
Step 0a: Determine GitHub Username
步骤0a:确定GitHub用户名
Query for the authenticated user's login:
ghbash
GITHUB_USERNAME=$(gh api user --jq .login)
echo "$GITHUB_USERNAME"If this fails, is not authenticated — resolve Step 0c first, then retry.
As a fallback, derive the username from the fork remote:
ghbash
GITHUB_USERNAME=$(git remote -v | grep -E 'github\.com/[^/]+/TensorRT-LLM' \
| head -1 | sed -E 's|.*github\.com[:/]([^/]+)/TensorRT-LLM.*|\1|')If neither works, ask the user via .
AskUserQuestion查询获取已认证用户的登录名:
ghbash
GITHUB_USERNAME=$(gh api user --jq .login)
echo "$GITHUB_USERNAME"如果此命令失败,说明未认证——先解决步骤0c,然后重试。
作为备选方案,从fork远程仓库推导用户名:
ghbash
GITHUB_USERNAME=$(git remote -v | grep -E 'github\.com/[^/]+/TensorRT-LLM' \
| head -1 | sed -E 's|.*github\.com[:/]([^/]+)/TensorRT-LLM.*|\1|')如果两种方法都无效,通过询问用户。
AskUserQuestionStep 0b: Verify Fork Remote
步骤0b:验证Fork远程仓库
Check that a git remote pointing to the user's fork of TensorRT-LLM exists:
bash
git remote -v | grep -E 'github\.com/${GITHUB_USERNAME}/TensorRT-LLM'If no fork remote is found, stop and notify the user:
No GitHub fork remote detected. A fork ofis required to push branches and create PRs.NVIDIA/TensorRT-LLM
- Fork the repo at https://github.com/NVIDIA/TensorRT-LLM/fork
- Add it as a git remote:
bashgit remote add fork https://github.com/<GITHUB_USERNAME>/TensorRT-LLM.git- Re-run this skill.
检查是否存在指向用户TensorRT-LLM fork的git远程仓库:
bash
git remote -v | grep -E 'github\.com/${GITHUB_USERNAME}/TensorRT-LLM'如果未找到fork远程仓库,停止操作并通知用户:
未检测到GitHub fork远程仓库。需要的fork才能推送分支并创建PR。NVIDIA/TensorRT-LLM
- 在https://github.com/NVIDIA/TensorRT-LLM/fork处fork仓库
- 将其添加为git远程仓库:
bashgit remote add fork https://github.com/<GITHUB_USERNAME>/TensorRT-LLM.git- 重新运行此技能。
Step 0c: Verify gh
CLI Is Authenticated
gh步骤0c:验证gh
CLI已认证
ghThis skill uses the GitHub CLI () to push branches and open PRs. Confirm it is
installed and authenticated:
ghbash
gh auth statusExpected: with at least the scope. covers
pushing to the user's fork and opening PRs on , so no
separate fine-grained PATs are needed.
Logged in to github.comreporepoNVIDIA/TensorRT-LLMIf reports "not logged in", instruct the user:
ghbashgh auth loginChoose: GitHub.com → HTTPS → authenticate with a web browser (or paste a PAT withscope).repo
Note on : If the user keeps multiple accounts (e.g. a
personal account and a separate account for work), they may
point at a non-default config directory. Check /
or the environment for ; if unclear, ask the user.
When set, prefix every invocation: .
GH_CONFIG_DIRghNVIDIA/TensorRT-LLMghCLAUDE.local.mdAGENTS.mdGH_CONFIG_DIRghGH_CONFIG_DIR=<path> gh ...Do not proceed with the upgrade workflow until is clean and
the fork remote (Step 0b) is confirmed.
gh auth status此技能使用GitHub CLI ()推送分支并创建PR。确认其已安装并完成认证:
ghbash
gh auth status预期结果:且至少拥有权限。权限涵盖向用户fork推送代码以及在上创建PR,因此无需单独的细粒度PAT。
Logged in to github.comreporepoNVIDIA/TensorRT-LLM如果显示“未登录”,指导用户执行:
ghbashgh auth login选择:GitHub.com → HTTPS → 通过浏览器认证(或粘贴拥有权限的PAT)。repo
关于的说明:如果用户使用多个账户(例如个人账户和用于工作的独立账户),他们可能会将指向非默认配置目录。查看/或环境变量中的;如果不确定,询问用户。设置后,在每个命令前添加前缀:。
GH_CONFIG_DIRghNVIDIA/TensorRT-LLMghCLAUDE.local.mdAGENTS.mdGH_CONFIG_DIRghGH_CONFIG_DIR=<path> gh ...在显示正常且步骤0b中的fork远程仓库确认存在之前,不要继续升级流程。
gh auth statusWorkflow
工作流程
Execute these steps in order. Use for user choices and
/ GitHub API for release data.
AskUserQuestionWebFetch按顺序执行以下步骤。使用获取用户选择,使用/GitHub API获取版本发布数据。
AskUserQuestionWebFetchStep 1: Fetch Available Releases from GitHub
步骤1:从GitHub获取可用版本
Fetch the release list from .
https://github.com/flashinfer-ai/flashinfer/releasesUse with the URL
and extract all release tag names and dates. Collect both stable releases
(e.g., ) and pre-release / nightly tags (e.g., ).
WebFetchhttps://github.com/flashinfer-ai/flashinfer/releasesv0.6.7v0.7.0.dev20260401Alternatively, use the GitHub API via curl:
bash
curl -s "https://api.github.com/repos/flashinfer-ai/flashinfer/releases?per_page=30" \
| python3 -c "
import json, sys
releases = json.load(sys.stdin)
for r in releases:
tag = r['tag_name']
pre = ' (pre-release)' if r['prerelease'] else ' (stable)'
date = r['published_at'][:10]
print(f'{tag} {date}{pre}')
"从获取版本列表。
https://github.com/flashinfer-ai/flashinfer/releases使用访问URL ,提取所有版本标签名称和日期。收集稳定版本(例如)和预发布/夜间版本(例如)。
WebFetchhttps://github.com/flashinfer-ai/flashinfer/releasesv0.6.7v0.7.0.dev20260401或者,通过curl调用GitHub API:
bash
curl -s "https://api.github.com/repos/flashinfer-ai/flashinfer/releases?per_page=30" \
| python3 -c "
import json, sys
releases = json.load(sys.stdin)
for r in releases:
tag = r['tag_name']
pre = ' (pre-release)' if r['prerelease'] else ' (stable)'
date = r['published_at'][:10]
print(f'{tag} {date}{pre}')
"Step 2: Check Current Version
步骤2:检查当前版本
Read the current pinned version from :
requirements.txtbash
grep flashinfer-python requirements.txtExpected format:
flashinfer-python==X.Y.Z从读取当前固定版本:
requirements.txtbash
grep flashinfer-python requirements.txt预期格式:
flashinfer-python==X.Y.ZStep 3: Ask User Preferences
步骤3:询问用户偏好
Ask the user three questions using :
AskUserQuestion-
"Prefer a latest nightly release version?"
- Options: "Yes, show nightly/dev releases" | "No, stable releases only (Recommended)"
- This filters the release list shown in the next question.
-
"Which flashinfer-python version do you want to upgrade to?"
- Present up to 4 versions newer than the current version (filtered by the nightly preference above), with the latest as the recommended option.
- If the current version is already the latest, inform the user and stop.
-
"Also update?"
security_scanning/poetry.lock- Options: "No, skip the lockfile (Recommended)" | "Yes, update version + hashes"
- Default: No. The lockfile is typically regenerated by maintainers
separately; editing it here can produce spurious hash diffs and stale
values.
metadata.content-hash - If the user answers Yes, follow the "Updating hashes" subsection below; otherwise skip it entirely (do not touch
security_scanning/poetry.lock).security_scanning/poetry.lock
使用向用户询问三个问题:
AskUserQuestion-
“是否偏好最新的夜间版本?”
- 选项:“是,显示夜间/开发版本” | “否,仅显示稳定版本(推荐)”
- 此选项将过滤下一个问题中显示的版本列表。
-
“你想要将flashinfer-python升级到哪个版本?”
- 展示最多4个比当前版本新的版本(根据上述夜间版本偏好过滤),并将最新版本设为推荐选项。
- 如果当前版本已是最新版本,通知用户并停止流程。
-
“是否同时更新?”
security_scanning/poetry.lock- 选项:“否,跳过锁文件(推荐)” | “是,更新版本和哈希值”
- 默认:否。锁文件通常由维护者单独重新生成;在此处编辑可能会产生虚假的哈希差异和过期的值。
metadata.content-hash - 如果用户回答是,请遵循下面的“更新哈希值”小节;否则完全跳过(不要修改
security_scanning/poetry.lock)。security_scanning/poetry.lock
Step 4: Update All Version References
步骤4:更新所有版本引用
After the user selects a target version, update these files:
| File | What to change | Always |
|---|---|---|
| | Yes |
| | Yes |
| | Yes |
| Update | Only if user opted in at Step 3 question 3 |
用户选择目标版本后,更新以下文件:
| 文件 | 修改内容 | 是否必须 |
|---|---|---|
| | 是 |
| | 是 |
| | 是 |
| 更新 | 仅当用户在步骤3的问题3中选择是时 |
Updating security_scanning/poetry.lock
hashes
security_scanning/poetry.lock更新security_scanning/poetry.lock
哈希值
security_scanning/poetry.lockOnly perform this subsection if the user answered Yes to question 3 in Step 3. Otherwise skip it entirely.
The poetry.lock file contains SHA256 hashes for the wheel and sdist. Fetch them
from PyPI:
bash
curl -s "https://pypi.org/pypi/flashinfer-python/NEW_VERSION/json" \
| python3 -c "
import json, sys
data = json.load(sys.stdin)
for f in data['urls']:
print(f'{f[\"filename\"]} sha256:{f[\"digests\"][\"sha256\"]}')
"Replace the old block under
with the new filenames and hashes. Also update the section
if the new version has different dependencies (check PyPI JSON ).
files = [...][[package]] name = "flashinfer-python"[package.dependencies]requires_distImportant: After manually editing both and
, the lockfile's becomes stale.
Regenerate it by running:
security_scanning/pyproject.tomlsecurity_scanning/poetry.lockmetadata.content-hashbash
cd security_scanning && poetry lock --no-update && cd ..This refreshes the hash without changing any other package versions. If is
available, you can alternatively use in the
directory to update both and
automatically (including the content-hash).
poetrypoetry add flashinfer-python@NEW_VERSIONsecurity_scanning/pyproject.tomlpoetry.lock仅当用户在步骤3的问题3中回答是时执行此小节。否则完全跳过。
poetry.lock文件包含wheel和sdist的SHA256哈希值。从PyPI获取这些值:
bash
curl -s "https://pypi.org/pypi/flashinfer-python/NEW_VERSION/json" \
| python3 -c "
import json, sys
data = json.load(sys.stdin)
for f in data['urls']:
print(f'{f[\"filename\"]} sha256:{f[\"digests\"][\"sha256\"]}')
"将下旧的块替换为新的文件名和哈希值。如果新版本有不同的依赖项,请同时更新部分(查看PyPI JSON中的)。
[[package]] name = "flashinfer-python"files = [...][package.dependencies]requires_dist重要提示:手动编辑和后,锁文件的会过期。通过运行以下命令重新生成:
security_scanning/pyproject.tomlsecurity_scanning/poetry.lockmetadata.content-hashbash
cd security_scanning && poetry lock --no-update && cd ..此命令会刷新哈希值而不更改其他包版本。如果可用,也可以在目录中使用自动更新和(包括content-hash)。
poetrysecurity_scanning/poetry add flashinfer-python@NEW_VERSIONpyproject.tomlpoetry.lockNightly / dev version special handling
夜间/开发版本特殊处理
If the user selects a nightly/dev version (e.g., ):
0.7.0.dev20260401- The PyPI package may not exist — check first with .
curl -s "https://pypi.org/pypi/flashinfer-python/VERSION/json" - If not on PyPI, the hashes cannot be updated. Warn the user and leave a
security_scanning/poetry.lockcomment.# TODO: update hashes when published to PyPI - The can pin to a git install instead:
requirements.txtAsk the user which approach they prefer (PyPI pin vs git pin).flashinfer-python @ git+https://github.com/flashinfer-ai/flashinfer.git@TAG#egg=flashinfer-python
如果用户选择夜间/开发版本(例如):
0.7.0.dev20260401- PyPI上可能不存在该包——先通过检查。
curl -s "https://pypi.org/pypi/flashinfer-python/VERSION/json" - 如果不在PyPI上,则无法更新的哈希值。警告用户并留下
security_scanning/poetry.lock注释。# TODO: 发布到PyPI后更新哈希值 - 可以固定为git安装方式:
requirements.txt询问用户偏好哪种方式(PyPI固定 vs git固定)。flashinfer-python @ git+https://github.com/flashinfer-ai/flashinfer.git@TAG#egg=flashinfer-python
Step 5: Verify Version Compatibility
步骤5:验证版本兼容性
After updating, check if any code has version-gated logic that needs adjusting:
bash
grep -rn 'flashinfer.*__version__\|flashinfer.*version' \
tensorrt_llm/ --include="*.py"Known locations with version checks:
- —
tensorrt_llm/_torch/speculative/interface.pyflashinfer.__version__ >= "0.6.4"
If the new version is still >= the gated version, no changes needed. Otherwise, flag
to the user.
更新完成后,检查是否有任何代码包含需要调整的版本门控逻辑:
bash
grep -rn 'flashinfer.*__version__\|flashinfer.*version' \
tensorrt_llm/ --include="*.py"已知存在版本检查的位置:
- —
tensorrt_llm/_torch/speculative/interface.pyflashinfer.__version__ >= "0.6.4"
如果新版本仍大于等于门控版本,则无需更改。否则,向用户标记此问题。
Step 6: Summary
步骤6:总结
Print a summary of all changes made:
- Old version → New version
- Files modified (with line numbers)
- Any warnings (e.g., poetry.lock hashes couldn't be updated for nightly)
- Remind user to run to test locally
pip install -r requirements.txt - Remind user to run relevant unit tests:
bash
pytest tests/unittest/_torch/flashinfer/ -v pytest tests/unittest/_torch/attention/test_flashinfer_attention.py -v
打印所有已做更改的摘要:
- 旧版本 → 新版本
- 修改的文件(含行号)
- 任何警告(例如夜间版本无法更新poetry.lock哈希值)
- 提醒用户运行进行本地测试
pip install -r requirements.txt - 提醒用户运行相关单元测试:
bash
pytest tests/unittest/_torch/flashinfer/ -v pytest tests/unittest/_torch/attention/test_flashinfer_attention.py -v
Step 7: Commit, Push, and Create PR
步骤7:提交、推送并创建PR
After all files are updated and verified:
If the user opted out of theupdate at Step 3 question 3, droppoetry.lockfrom thesecurity_scanning/poetry.lock,git stash, and commit message in the snippets below.git add
所有文件更新并验证完成后:
如果用户在步骤3的问题3中选择不更新,则在以下代码片段的poetry.lock、git stash和提交消息中移除git add。security_scanning/poetry.lock
7a. Create a new branch from upstream main
7a:基于上游main创建新分支
bash
undefinedbash
undefinedDrop security_scanning/poetry.lock from this list if the user opted out.
如果用户选择不更新,从此列表中移除security_scanning/poetry.lock。
git stash push -m "flashinfer-upgrade-wip" -- requirements.txt security_scanning/pyproject.toml security_scanning/poetry.lock ATTRIBUTIONS-Python.md
git checkout main
git pull --rebase https://github.com/NVIDIA/TensorRT-LLM.git main
git checkout -b ${GITHUB_USERNAME}/update_flashinfer_${NEW_VERSION}
git stash pop
Where `GITHUB_USERNAME` comes from the fork remote (e.g., `yihwang-nv`) and
`NEW_VERSION` is the selected version (e.g., `0.6.7.post3`).git stash push -m "flashinfer-upgrade-wip" -- requirements.txt security_scanning/pyproject.toml security_scanning/poetry.lock ATTRIBUTIONS-Python.md
git checkout main
git pull --rebase https://github.com/NVIDIA/TensorRT-LLM.git main
git checkout -b ${GITHUB_USERNAME}/update_flashinfer_${NEW_VERSION}
git stash pop
其中`GITHUB_USERNAME`来自fork远程仓库(例如`yihwang-nv`),`NEW_VERSION`是所选版本(例如`0.6.7.post3`)。7b. Commit with DCO sign-off
7b:带DCO签署的提交
bash
undefinedbash
undefinedDrop security_scanning/poetry.lock from the git add
list and the commit
git add如果用户选择不更新,从git add
列表和提交正文中移除security_scanning/poetry.lock。
git addbody if the user opted out.
—
git add requirements.txt security_scanning/pyproject.toml security_scanning/poetry.lock ATTRIBUTIONS-Python.md
git commit -s -m "[None][chore] Update flashinfer-python from OLD to NEW
Bump flashinfer-python dependency to the latest stable release.
Updated version pins in requirements.txt, security_scanning/pyproject.toml,
security_scanning/poetry.lock (if updated), and ATTRIBUTIONS-Python.md."
undefinedgit add requirements.txt security_scanning/pyproject.toml security_scanning/poetry.lock ATTRIBUTIONS-Python.md
git commit -s -m "[None][chore] Update flashinfer-python from OLD to NEW
Bump flashinfer-python dependency to the latest stable release.
Updated version pins in requirements.txt, security_scanning/pyproject.toml,
security_scanning/poetry.lock (if updated), and ATTRIBUTIONS-Python.md."
undefined7c. Push the branch to the user's fork
7c:将分支推送到用户的fork
Identify the fork remote (from Step 0b — commonly named ), then push:
forkbash
FORK_REMOTE=fork # adjust if the user named their fork remote differently
BRANCH="${GITHUB_USERNAME}/update_flashinfer_${NEW_VERSION}"
git push -u "${FORK_REMOTE}" "${BRANCH}"If the push is rejected for auth reasons, confirm shows
scope — installs a git credential helper that reuses its token for HTTPS
pushes. Users on a non-default config dir must export in the
same shell.
gh auth statusrepoghGH_CONFIG_DIR确定fork远程仓库(来自步骤0b——通常命名为),然后推送:
forkbash
FORK_REMOTE=fork # 如果用户为fork远程仓库命名不同,请调整
BRANCH="${GITHUB_USERNAME}/update_flashinfer_${NEW_VERSION}"
git push -u "${FORK_REMOTE}" "${BRANCH}"如果推送因认证原因被拒绝,确认显示权限——会安装一个git凭证助手,将其令牌重用于HTTPS推送。使用非默认配置目录的用户必须在同一个shell中导出。
gh auth statusrepoghGH_CONFIG_DIR7d. Open the PR on NVIDIA/TensorRT-LLM
NVIDIA/TensorRT-LLM7d:在NVIDIA/TensorRT-LLM
上创建PR
NVIDIA/TensorRT-LLMbash
gh pr create \
--repo NVIDIA/TensorRT-LLM \
--base main \
--head "${GITHUB_USERNAME}:${BRANCH}" \
--title "[None][chore] Update flashinfer-python from ${OLD_VERSION} to ${NEW_VERSION}" \
--body "$(cat <<EOFbash
gh pr create \
--repo NVIDIA/TensorRT-LLM \
--base main \
--head "${GITHUB_USERNAME}:${BRANCH}" \
--title "[None][chore] Update flashinfer-python from ${OLD_VERSION} to ${NEW_VERSION}" \
--body "$(cat <<EOFSummary
Summary
- Bump flashinfer-python from ${OLD_VERSION} to ${NEW_VERSION} (latest stable)
- Updated version pins in requirements.txt, security_scanning/pyproject.toml, and ATTRIBUTIONS-Python.md (and security_scanning/poetry.lock if the user opted in)
- Bump flashinfer-python from ${OLD_VERSION} to ${NEW_VERSION} (latest stable)
- Updated version pins in requirements.txt, security_scanning/pyproject.toml, and ATTRIBUTIONS-Python.md (and security_scanning/poetry.lock if the user opted in)
Test plan
Test plan
- pip install -r requirements.txt installs successfully
- pytest tests/unittest/_torch/flashinfer/ -v
- pytest tests/unittest/_torch/attention/test_flashinfer_attention.py -v
- CI pre-merge passes EOF )"
`gh pr create` prints the new PR URL on success. Report it back to the user.- pip install -r requirements.txt installs successfully
- pytest tests/unittest/_torch/flashinfer/ -v
- pytest tests/unittest/_torch/attention/test_flashinfer_attention.py -v
- CI pre-merge passes EOF )"
`gh pr create`成功后会打印新PR的URL。将其反馈给用户。Files Reference
文件参考
All files that contain flashinfer-python version pins:
| File | Pattern |
|---|---|
| |
| |
| |
| |
所有包含flashinfer-python版本固定的文件:
| 文件 | 匹配模式 |
|---|---|
| |
| |
| |
| |
Notes
注意事项
- The has a comment about git+https install URLs — no version pin to update there.
setup.py - The and
.pre-commit-config.yamlreference flashinfer source files, not versions — no changes needed.pyproject.toml - The submodule (if present) is separate from the
flashinfer/PyPI package.flashinfer-python
- 中有关于git+https安装URL的注释——此处无需更新版本固定。
setup.py - 和
.pre-commit-config.yaml引用flashinfer源文件,而非版本——无需更改。pyproject.toml - 子模块(如果存在)与
flashinfer/PyPI包是独立的。flashinfer-python