qzcli
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseqzcli — 启智平台任务管理
qzcli — 启智平台任务管理
A kubectl/docker-style CLI for managing GPU compute jobs on the Qizhi (启智) platform.
GitHub: tianyilt/qzcli_tool
一款类kubectl/docker风格的CLI工具,用于管理启智平台上的GPU计算任务。
GitHub: tianyilt/qzcli_tool
Installation
安装
bash
pip install rich requests prompt_toolkit mcp
git clone https://github.com/tianyilt/qzcli_tool
cd qzcli_tool && pip install -e .bash
pip install rich requests prompt_toolkit mcp
git clone https://github.com/tianyilt/qzcli_tool
cd qzcli_tool && pip install -e .MCP Integration (optional)
MCP集成(可选)
To use qzcli as an MCP tool directly from Claude Code or Codex:
bash
undefined要直接从Claude Code或Codex将qzcli用作MCP工具:
bash
undefinedClaude Code
Claude Code
claude mcp add qzcli -- qzcli-mcp
claude mcp add qzcli -- qzcli-mcp
Codex
Codex
codex mcp add qzcli -- qzcli-mcp
---codex mcp add qzcli -- qzcli-mcp
---Configuration
配置
Credentials are read in this priority order:
CLI args > --password-stdin > env vars > QZCLI_ENV_FILE (.env) > ~/.qzcli/config.json > interactive inputbash
undefined凭证读取优先级如下:
CLI参数 > --password-stdin > 环境变量 > QZCLI_ENV_FILE(.env) > ~/.qzcli/config.json > 交互式输入bash
undefinedOption A: env file (recommended)
Option A: env file (recommended)
mkdir -p ~/.qzcli
cat > ~/.qzcli/.env <<'EOF'
QZCLI_USERNAME="your_username"
QZCLI_PASSWORD="your_password"
EOF
mkdir -p ~/.qzcli
cat > ~/.qzcli/.env <<'EOF'
QZCLI_USERNAME="your_username"
QZCLI_PASSWORD="your_password"
EOF
Option B: environment variables
Option B: environment variables
export QZCLI_USERNAME="your_username"
export QZCLI_PASSWORD="your_password"
export QZCLI_API_URL="https://qz.yourorg.edu.cn"
Config files are stored in `~/.qzcli/`: `config.json`, `.cookie`, `resources.json`, `jobs.json`.
---export QZCLI_USERNAME="your_username"
export QZCLI_PASSWORD="your_password"
export QZCLI_API_URL="https://qz.yourorg.edu.cn"
配置文件存储在`~/.qzcli/`目录下:`config.json`、`.cookie`、`resources.json`、`jobs.json`。
---Quick Start
快速开始
bash
undefinedbash
undefined1. Login
1. 登录
qzcli login
qzcli login
2. Discover and cache workspaces/compute groups (run once, re-run after joining new workspaces)
2. 发现并缓存工作区/计算组(运行一次,加入新工作区后重新运行)
qzcli res -u
qzcli res -u
3. Check available nodes
3. 查看可用节点
qzcli avail
qzcli avail
4. List running jobs
4. 列出运行中的任务
qzcli ls -c -r
---qzcli ls -c -r
---Authentication
身份验证
bash
undefinedbash
undefinedInteractive login
交互式登录
qzcli login
qzcli login
With credentials
使用凭证登录
qzcli login -u YOUR_USERNAME -p 'YOUR_PASSWORD'
qzcli login -u YOUR_USERNAME -p 'YOUR_PASSWORD'
Read password from stdin (for scripts)
从标准输入读取密码(适用于脚本)
echo 'YOUR_PASSWORD' | qzcli login -u YOUR_USERNAME --password-stdin
echo 'YOUR_PASSWORD' | qzcli login -u YOUR_USERNAME --password-stdin
Check current cookie
查看当前Cookie
qzcli cookie --show
qzcli cookie --show
Clear cookie
清除Cookie
qzcli cookie --clear
**Note:** `qzcli avail` auto-refreshes the cookie if it expires and credentials are configured.
---qzcli cookie --clear
**注意:** 如果Cookie过期且已配置凭证,`qzcli avail`会自动刷新Cookie。
---Resource Discovery
资源发现
bash
undefinedbash
undefinedList cached workspaces
列出已缓存的工作区
qzcli res --list
qzcli res --list
Refresh all workspace resource cache (run this first!)
刷新所有工作区资源缓存(请先运行此命令!)
qzcli res -u
qzcli res -u
Refresh a specific workspace
刷新指定工作区
qzcli res -w MY_WORKSPACE -u
qzcli res -w MY_WORKSPACE -u
Set a human-readable alias for a workspace
为工作区设置易读别名
qzcli res -w ws-xxxxxxxx --name "My Workspace"
---qzcli res -w ws-xxxxxxxx --name "My Workspace"
---Check Available Nodes
查看可用节点
bash
undefinedbash
undefinedAll workspaces
所有工作区
qzcli avail
qzcli avail
Including low-priority task nodes (slower but more accurate)
包含低优先级任务节点(速度较慢但更准确)
qzcli avail --lp
qzcli avail --lp
Specific workspace
指定工作区
qzcli avail -w MY_WORKSPACE
qzcli avail -w MY_WORKSPACE
Find compute groups with N free nodes
查找有N个空闲节点的计算组
qzcli avail -n 4
qzcli avail -n 4
Export IDs for scripting
导出ID用于脚本
qzcli avail -n 4 -e
qzcli avail -n 4 -e
Show idle node names
显示空闲节点名称
qzcli avail -w MY_WORKSPACE -v
---qzcli avail -w MY_WORKSPACE -v
---Job Submission
任务提交
Interactive (recommended for first-time use)
交互式(首次使用推荐)
bash
undefinedbash
undefinedFull interactive selection: workspace → project → compute group → spec
完整交互式选择:工作区 → 项目 → 计算组 → 规格
qzcli create -i
qzcli create -i
Interactive for a specific workspace only
仅针对指定工作区的交互式选择
qzcli create -i -w "My Workspace"
The TUI shows GPU type, availability, and spec status at each level. Press `Enter/→` to go deeper, `←` to go back.qzcli create -i -w "My Workspace"
TUI会在每个层级显示GPU类型、可用性和规格状态。按`Enter/→`进入下一层,按`←`返回上一层。Non-interactive
非交互式
bash
undefinedbash
undefinedUsing names (resolved from qzcli res cache)
使用名称(从qzcli res缓存中解析)
qzcli create
--name "my-training-job"
--command "bash /path/to/train.sh"
--workspace "My Workspace"
--compute-group "My Compute Group"
--image YOUR_REGISTRY/team/image:tag
--instances 4
--priority 10
--name "my-training-job"
--command "bash /path/to/train.sh"
--workspace "My Workspace"
--compute-group "My Compute Group"
--image YOUR_REGISTRY/team/image:tag
--instances 4
--priority 10
qzcli create
--name "my-training-job"
--command "bash /path/to/train.sh"
--workspace "My Workspace"
--compute-group "My Compute Group"
--image YOUR_REGISTRY/team/image:tag
--instances 4
--priority 10
--name "my-training-job"
--command "bash /path/to/train.sh"
--workspace "My Workspace"
--compute-group "My Compute Group"
--image YOUR_REGISTRY/team/image:tag
--instances 4
--priority 10
Using IDs directly
直接使用ID
qzcli create
--name "my-job"
--command "bash /path/to/train.sh"
--workspace ws-YOUR_WORKSPACE_ID
--compute-group lcg-YOUR_LCG_ID
--spec YOUR_SPEC_ID
--image YOUR_REGISTRY/team/image:tag
--instances 4
--name "my-job"
--command "bash /path/to/train.sh"
--workspace ws-YOUR_WORKSPACE_ID
--compute-group lcg-YOUR_LCG_ID
--spec YOUR_SPEC_ID
--image YOUR_REGISTRY/team/image:tag
--instances 4
**Key parameters:**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--name` / `-n` | required | Job name |
| `--command` / `-c` | required | Command to run |
| `--workspace` / `-w` | | Workspace name or ID (`ws-...`) |
| `--compute-group` / `-g` | auto | Compute group name or ID (`lcg-...`) |
| `--spec` / `-s` | auto | Resource spec ID |
| `--image` / `-m` | | Docker image |
| `--instances` | 1 | Number of instances |
| `--shm` | 1200 | Shared memory (GiB) |
| `--priority` | 10 | Priority (1–10) |
| `--dry-run` | | Preview only, don't submit |
| `--json` | | JSON output for scripting |
```bashqzcli create
--name "my-job"
--command "bash /path/to/train.sh"
--workspace ws-YOUR_WORKSPACE_ID
--compute-group lcg-YOUR_LCG_ID
--spec YOUR_SPEC_ID
--image YOUR_REGISTRY/team/image:tag
--instances 4
--name "my-job"
--command "bash /path/to/train.sh"
--workspace ws-YOUR_WORKSPACE_ID
--compute-group lcg-YOUR_LCG_ID
--spec YOUR_SPEC_ID
--image YOUR_REGISTRY/team/image:tag
--instances 4
**关键参数:**
| 参数 | 默认值 | 描述 |
|-----------|---------|-------------|
| `--name` / `-n` | 必填 | 任务名称 |
| `--command` / `-c` | 必填 | 要执行的命令 |
| `--workspace` / `-w` | | 工作区名称或ID(`ws-...`) |
| `--compute-group` / `-g` | 自动 | 计算组名称或ID(`lcg-...`) |
| `--spec` / `-s` | 自动 | 资源规格ID |
| `--image` / `-m` | | Docker镜像 |
| `--instances` | 1 | 实例数量 |
| `--shm` | 1200 | 共享内存(GiB) |
| `--priority` | 10 | 优先级(1–10) |
| `--dry-run` | | 仅预览,不提交 |
| `--json` | | 输出JSON格式用于脚本 |
```bashPreview before submitting
提交前预览
qzcli create --name test --command "echo hi" --workspace "My Workspace"
--image YOUR_IMAGE --dry-run
--image YOUR_IMAGE --dry-run
undefinedqzcli create --name test --command "echo hi" --workspace "My Workspace"
--image YOUR_IMAGE --dry-run
--image YOUR_IMAGE --dry-run
undefinedEnv-var passthrough (for existing submission scripts)
环境变量传递(适用于现有提交脚本)
bash
undefinedbash
undefinedPass vars directly — do NOT use "export VAR; bash script.sh"
直接传递变量 — 请勿使用 "export VAR; bash script.sh"
WORKSPACE_ID="ws-YOUR_WORKSPACE_ID"
LCG_ID="lcg-YOUR_LCG_ID"
SPEC_ID="YOUR_SPEC_ID"
CHECKPOINT_DIR="/path/to/checkpoint"
bash YOUR_SUBMIT_SCRIPT.sh
LCG_ID="lcg-YOUR_LCG_ID"
SPEC_ID="YOUR_SPEC_ID"
CHECKPOINT_DIR="/path/to/checkpoint"
bash YOUR_SUBMIT_SCRIPT.sh
undefinedWORKSPACE_ID="ws-YOUR_WORKSPACE_ID"
LCG_ID="lcg-YOUR_LCG_ID"
SPEC_ID="YOUR_SPEC_ID"
CHECKPOINT_DIR="/path/to/checkpoint"
bash YOUR_SUBMIT_SCRIPT.sh
LCG_ID="lcg-YOUR_LCG_ID"
SPEC_ID="YOUR_SPEC_ID"
CHECKPOINT_DIR="/path/to/checkpoint"
bash YOUR_SUBMIT_SCRIPT.sh
undefinedHPC / CPU jobs (Slurm)
HPC / CPU任务(Slurm)
bash
qzcli hpc \
--name "my-cpu-job" \
--workspace ws-YOUR_WORKSPACE_ID \
--compute-group lcg-YOUR_LCG_ID \
--predef-quota-id YOUR_QUOTA_ID \
--cpu 55 --mem-gi 300 --instances 30 \
--image YOUR_REGISTRY/team/cpu-image:tag \
--entrypoint "cd /path/to/dir && bash run.sh"bash
qzcli hpc \
--name "my-cpu-job" \
--workspace ws-YOUR_WORKSPACE_ID \
--compute-group lcg-YOUR_LCG_ID \
--predef-quota-id YOUR_QUOTA_ID \
--cpu 55 --mem-gi 300 --instances 30 \
--image YOUR_REGISTRY/team/cpu-image:tag \
--entrypoint "cd /path/to/dir && bash run.sh"Batch Submission
批量提交
bash
undefinedbash
undefinedSubmit from config file
从配置文件提交
qzcli batch batch_config.json --delay 3
qzcli batch batch_config.json --delay 3
Preview all jobs
预览所有任务
qzcli batch batch_config.json --dry-run
qzcli batch batch_config.json --dry-run
Continue on error
出错时继续执行
qzcli batch batch_config.json --continue-on-error
**Config format** (`batch_config.json`):
```json
{
"defaults": {
"workspace": "ws-YOUR_WORKSPACE_ID",
"compute_group": "lcg-YOUR_LCG_ID",
"spec": "YOUR_SPEC_ID",
"image": "YOUR_REGISTRY/team/image:tag",
"instances": 4,
"priority": 10
},
"matrix": {
"checkpoint": ["/path/to/ckpt1", "/path/to/ckpt2"],
"step": [50000, 100000]
},
"name_template": "eval-{checkpoint_basename}-step{step}",
"command_template": "bash eval.sh --checkpoint {checkpoint} --step {step}"
}Matrix keys are Cartesian-producted (2×2 = 4 jobs above). Use for path basenames.
{key_basename}qzcli batch batch_config.json --continue-on-error
**配置格式**(`batch_config.json`):
```json
{
"defaults": {
"workspace": "ws-YOUR_WORKSPACE_ID",
"compute_group": "lcg-YOUR_LCG_ID",
"spec": "YOUR_SPEC_ID",
"image": "YOUR_REGISTRY/team/image:tag",
"instances": 4,
"priority": 10
},
"matrix": {
"checkpoint": ["/path/to/ckpt1", "/path/to/ckpt2"],
"step": [50000, 100000]
},
"name_template": "eval-{checkpoint_basename}-step{step}",
"command_template": "bash eval.sh --checkpoint {checkpoint} --step {step}"
}矩阵键会进行笛卡尔积运算(上方示例为2×2=4个任务)。使用获取路径的基名。
{key_basename}Shell loop (alternative)
Shell循环(替代方案)
bash
for step in 040000 050000 060000; do
qzcli create \
--name "eval-step${step}" \
--command "bash eval.sh --step $step" \
--workspace "My Workspace" \
--compute-group "My Compute Group" \
--instances 4
sleep 3
donebash
for step in 040000 050000 060000; do
qzcli create \
--name "eval-step${step}" \
--command "bash eval.sh --step $step" \
--workspace "My Workspace" \
--compute-group "My Compute Group" \
--instances 4
sleep 3
doneJob Management
任务管理
bash
undefinedbash
undefinedList jobs
列出任务
qzcli ls -c -w MY_WORKSPACE # specific workspace
qzcli ls -c --all-ws # all workspaces
qzcli ls -c -w MY_WORKSPACE -r # running only
qzcli ls -c -w MY_WORKSPACE -n 50 # show 50
qzcli ls -c -w MY_WORKSPACE # 指定工作区
qzcli ls -c --all-ws # 所有工作区
qzcli ls -c -w MY_WORKSPACE -r # 仅运行中任务
qzcli ls -c -w MY_WORKSPACE -n 50 # 显示50个任务
Stop a job
停止任务
qzcli stop JOB_ID
qzcli stop JOB_ID
Job status / details
任务状态/详情
qzcli status JOB_ID
qzcli status JOB_ID
Watch all running jobs (refresh every 10s)
监控所有运行中任务(每10秒刷新一次)
qzcli watch -i 10
qzcli watch -i 10
Workspace view with GPU utilization
查看工作区GPU使用率
qzcli ws
qzcli ws -a # all projects
qzcli ws -p "My Project"
---qzcli ws
qzcli ws -a # 所有项目
qzcli ws -p "My Project"
---Troubleshooting
故障排查
| Problem | Cause | Fix |
|---|---|---|
| Cookie expired | Session gap | Re-run |
| Stale cache | Run |
No resources in | Cache empty | Run |
| Not installed | |
| Spec not in workspace | ID mismatch | Match spec ID to the correct workspace |
| Silent job failure | Script | Check job logs directly |
| zsh glob errors | Remote shell is zsh | Wrap commands in |
| 问题 | 原因 | 解决方法 |
|---|---|---|
| Cookie过期 | 会话中断 | 重新运行 |
| 缓存过期 | 运行 |
| 缓存为空 | 运行 |
| 未安装 | |
| 规格不在工作区中 | ID不匹配 | 将规格ID与正确的工作区匹配 |
| 任务静默失败 | 脚本返回 | 直接查看任务日志 |
| zsh通配符错误 | 远程Shell为zsh | 将命令包裹在 |