exec-local-compile

Original：🇺🇸 English

Translated

Compile TensorRT-LLM on a compute node inside a Docker container. Use this when already on a compute node with GPUs visible.

7installs

Sourcenvidia/skills

Added on2026-05-22

NPX Install

npx skill4agent add nvidia/skills exec-local-compile

SKILL.md Content

View Translation Comparison →

Compile TensorRT-LLM (Local / Compute Node)

Compile TensorRT-LLM from source on a compute node inside a Docker container.

When to Use

Scenario	Use This Skill?
On a compute node with GPUs visible ( `nvidia-smi` works)	Yes
On a SLURM login node (no GPUs)	No — use `exec-slurm-compile` instead

Prerequisites

You are inside a Docker/enroot container on a compute node
```
nvidia-smi
```
succeeds (GPUs visible)
```
/usr/local/tensorrt
```
exists (TensorRT installation in the container)

Instructions

Step 1: Verify Environment

Run

nvidia-smi

to confirm you are on a compute node with GPU access.

Step 2: Locate the Codebase

cd

to the TensorRT-LLM repository. If the path is not provided by the user, ask for it.

Step 3: (Optional) Checkout Branch

If the user specifies a branch (e.g., "compile ToT"), checkout and pull:

bash

git checkout main && git pull

Step 4: Build

Run the build command (incremental by default — omit

-c

/

--clean

unless explicitly requested or the incremental build fails):

bash

./scripts/build_wheel.py --trt_root /usr/local/tensorrt --benchmarks --use_ccache -a "<arch>" -f --nvtx

Replace

<arch>

with the target GPU architecture (see Architecture Reference below). If not specified by the user, auto-detect from

nvidia-smi

.

Step 5: Install

bash

pip install -e .[devel]

Step 6: Verify

bash

python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"

Build Flags

Flag	Description
`--trt_root /usr/local/tensorrt`	TensorRT installation path (standard in NVIDIA containers)
`--benchmarks`	Build the C++ benchmarks
`-a "<arch>"`	Target GPU architecture(s)
`--nvtx`	Enable NVTX markers for profiling
`--use_ccache`	Use ccache for faster recompilation
`-f` / `--fast_build`	Skip some kernels for faster dev compilation. Always use for dev builds.
`-c` / `--clean`	Clean build directory before building. Only when needed (see below).
`--skip_building_wheel`	Build in-place without creating a wheel file
`--no-venv`	Skip virtual environment creation

Architecture Reference

Value	GPU Family
`"100-real"`	Blackwell (B200, GB200)
`"90-real"`	Hopper (H100, H200)
`"89-real"`	Ada Lovelace (L40S)
`"80-real"`	Ampere (A100)
`"90;100-real"`	Multiple architectures

Incremental vs. Clean Builds

Default to incremental builds — CMake only recompiles changed files, saving significant time.

Use a clean build (

-c

) only when:

The user explicitly requests a clean/fresh build
An incremental build fails with linker errors, stale object files, or CMake cache issues
Major branch changes (e.g., rebasing across many commits) that may invalidate the build cache
Build system files changed (
```
CMakeLists.txt
```
,
```
*.cmake
```
)

exec-local-compile

NPX Install

Tags

SKILL.md Content

Compile TensorRT-LLM (Local / Compute Node)

When to Use

Prerequisites

Instructions

Step 1: Verify Environment

Step 2: Locate the Codebase

Step 3: (Optional) Checkout Branch

Step 4: Build

Step 5: Install

Step 6: Verify

Build Flags

Architecture Reference

Incremental vs. Clean Builds