Loading...
Loading...
Automatically detect GPU vendor, find appropriate PyTorch container image, launch with correct mounts, and validate GPU functionality. Supports NVIDIA, Ascend, Metax, Iluvatar, and AMD/ROCm. Use when user says "setup container", "start pytorch container", or invokes /gpu-container-setup.
npx skill4agent add flagos-ai/skills gpu-container-setup-flagos| Vendor | PyTorch Backend | Detection |
|---|---|---|
| NVIDIA | CUDA | |
| AMD | ROCm (HIP) | |
| Ascend | torch_npu | |
| Metax | torch_musa | |
| Iluvatar | torch_corex | |
--vendor <name>--image <image>--data <path>--name <name>pytorch-gpupython3 .claude/skills/gpu-container-setup/scripts/detect_gpu.py{"vendor": "ascend", "devices": ["Ascend 910B"], "count": 8}--vendorpython3 .claude/skills/gpu-container-setup/scripts/find_data_disk.py{"data_disk": "/mnt/data", "found": true, "size": "2.0T", "available": "1.5T"}1. Primary Vendor Hub (hardcoded) → 2. BAAI Harbor → 3. Web Search → 4. Local Images → 5. Ask User| Vendor | Registry | API/Query |
|---|---|---|
| NVIDIA | | |
| Ascend | | Portal: https://ascendhub.huawei.com |
| Metax | | |
| Iluvatar | | |
| AMD | | |
# Example: Query NGC for latest NVIDIA PyTorch
TAG=$(curl -s "https://api.ngc.nvidia.com/v2/repos/nvidia/pytorch/tags" | jq -r '.tags[].name' | grep -E '^[0-9]{2}\.[0-9]{2}-py3$' | sort -rV | head -1)
IMAGE="nvcr.io/nvidia/pytorch:${TAG}"# Query BAAI Harbor
curl -s "https://harbor.baai.ac.cn/api/v2.0/projects/flagrelease-public/repositories?page_size=100" | jq -r '.[].name' | grep "flagrelease-<vendor>""<vendor> pytorch docker official"docker images | grep pytorchdocker pull "${IMAGE}" && docker run --rm "${IMAGE}" python -c "import torch; print(torch.__version__)"references/image-sources.md# After successful web search discovery:
# 1. Verify image works (pull + pytorch test + GPU test)
# 2. Extract registry URL pattern
# 3. Update references/image-sources.md Step 1 section with new vendor hubreferences/mount-requirements.mddocker run -d --gpus all \
--name pytorch-gpu \
--shm-size=16g \
-v <data_disk>:/data \
<image> sleep infinitydocker run -d \
--device=/dev/kfd --device=/dev/dri \
--group-add video --group-add render \
--name pytorch-gpu \
--shm-size=16g \
-v <data_disk>:/data \
<image> sleep infinitydocker run -d \
--device=/dev/davinci0 --device=/dev/davinci1 ... \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/Ascend:/usr/local/Ascend:ro \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi:ro \
--name pytorch-gpu \
--shm-size=16g \
-v <data_disk>:/data \
<image> sleep infinitydocker run -d \
--device=/dev/mx0 --device=/dev/mx1 ... \
-v /opt/metax:/opt/metax:ro \
--name pytorch-gpu \
--shm-size=16g \
-v <data_disk>:/data \
<image> sleep infinitydocker run -d \
--device=/dev/bi0 --device=/dev/bi1 ... \
-v /opt/iluvatar:/opt/iluvatar:ro \
--name pytorch-gpu \
--shm-size=16g \
-v <data_disk>:/data \
<image> sleep infinitydocker cp .claude/skills/gpu-container-setup/scripts/validate_pytorch.py pytorch-gpu:/tmp/
docker exec pytorch-gpu python3 /tmp/validate_pytorch.py{
"status": "PASS",
"backend": "npu",
"device_count": 8,
"device_names": ["Ascend 910B", ...],
"tests": {
"device_detection": true,
"tensor_creation": true,
"matrix_multiply": true,
"gpu_to_cpu_transfer": true
}
}docker exec -it pytorch-gpu bash| Error | Action |
|---|---|
| No GPU detected | Ask user for vendor or check drivers |
| Image pull fails | Try alternative registry or web search |
| Container start fails | Check device permissions, show error |
| Validation fails | Show detailed error, suggest fixes |
references/gpu-detection.mdreferences/image-sources.mdreferences/mount-requirements.mdUser: /gpu-container-setup
User: setup a pytorch container
User: start container with ascend GPU
User: /gpu-container-setup --image nvcr.io/nvidia/pytorch:24.01-py3
User: /gpu-container-setup --image harbor.baai.ac.cn/flagrelease-public/ngctorch:2601