qzcli

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

qzcli — 启智平台任务管理

qzcli — 启智平台任务管理

A kubectl/docker-style CLI for managing GPU compute jobs on the Qizhi (启智) platform.
一款类kubectl/docker风格的CLI工具,用于管理启智平台上的GPU计算任务。

Installation

安装

bash
pip install rich requests prompt_toolkit mcp
git clone https://github.com/tianyilt/qzcli_tool
cd qzcli_tool && pip install -e .
bash
pip install rich requests prompt_toolkit mcp
git clone https://github.com/tianyilt/qzcli_tool
cd qzcli_tool && pip install -e .

MCP Integration (optional)

MCP集成(可选)

To use qzcli as an MCP tool directly from Claude Code or Codex:
bash
undefined
要直接从Claude Code或Codex将qzcli用作MCP工具:
bash
undefined

Claude Code

Claude Code

claude mcp add qzcli -- qzcli-mcp
claude mcp add qzcli -- qzcli-mcp

Codex

Codex

codex mcp add qzcli -- qzcli-mcp

---
codex mcp add qzcli -- qzcli-mcp

---

Configuration

配置

Credentials are read in this priority order:
CLI args > --password-stdin > env vars > QZCLI_ENV_FILE (.env) > ~/.qzcli/config.json > interactive input
bash
undefined
凭证读取优先级如下:
CLI参数 > --password-stdin > 环境变量 > QZCLI_ENV_FILE(.env) > ~/.qzcli/config.json > 交互式输入
bash
undefined

Option A: env file (recommended)

Option A: env file (recommended)

mkdir -p ~/.qzcli cat > ~/.qzcli/.env <<'EOF' QZCLI_USERNAME="your_username" QZCLI_PASSWORD="your_password" EOF
mkdir -p ~/.qzcli cat > ~/.qzcli/.env <<'EOF' QZCLI_USERNAME="your_username" QZCLI_PASSWORD="your_password" EOF

Option B: environment variables

Option B: environment variables

export QZCLI_USERNAME="your_username" export QZCLI_PASSWORD="your_password" export QZCLI_API_URL="https://qz.yourorg.edu.cn"

Config files are stored in `~/.qzcli/`: `config.json`, `.cookie`, `resources.json`, `jobs.json`.

---
export QZCLI_USERNAME="your_username" export QZCLI_PASSWORD="your_password" export QZCLI_API_URL="https://qz.yourorg.edu.cn"

配置文件存储在`~/.qzcli/`目录下:`config.json`、`.cookie`、`resources.json`、`jobs.json`。

---

Quick Start

快速开始

bash
undefined
bash
undefined

1. Login

1. 登录

qzcli login
qzcli login

2. Discover and cache workspaces/compute groups (run once, re-run after joining new workspaces)

2. 发现并缓存工作区/计算组(运行一次,加入新工作区后重新运行)

qzcli res -u
qzcli res -u

3. Check available nodes

3. 查看可用节点

qzcli avail
qzcli avail

4. List running jobs

4. 列出运行中的任务

qzcli ls -c -r

---
qzcli ls -c -r

---

Authentication

身份验证

bash
undefined
bash
undefined

Interactive login

交互式登录

qzcli login
qzcli login

With credentials

使用凭证登录

qzcli login -u YOUR_USERNAME -p 'YOUR_PASSWORD'
qzcli login -u YOUR_USERNAME -p 'YOUR_PASSWORD'

Read password from stdin (for scripts)

从标准输入读取密码(适用于脚本)

echo 'YOUR_PASSWORD' | qzcli login -u YOUR_USERNAME --password-stdin
echo 'YOUR_PASSWORD' | qzcli login -u YOUR_USERNAME --password-stdin

Check current cookie

查看当前Cookie

qzcli cookie --show
qzcli cookie --show

Clear cookie

清除Cookie

qzcli cookie --clear

**Note:** `qzcli avail` auto-refreshes the cookie if it expires and credentials are configured.

---
qzcli cookie --clear

**注意:** 如果Cookie过期且已配置凭证,`qzcli avail`会自动刷新Cookie。

---

Resource Discovery

资源发现

bash
undefined
bash
undefined

List cached workspaces

列出已缓存的工作区

qzcli res --list
qzcli res --list

Refresh all workspace resource cache (run this first!)

刷新所有工作区资源缓存(请先运行此命令!)

qzcli res -u
qzcli res -u

Refresh a specific workspace

刷新指定工作区

qzcli res -w MY_WORKSPACE -u
qzcli res -w MY_WORKSPACE -u

Set a human-readable alias for a workspace

为工作区设置易读别名

qzcli res -w ws-xxxxxxxx --name "My Workspace"

---
qzcli res -w ws-xxxxxxxx --name "My Workspace"

---

Check Available Nodes

查看可用节点

bash
undefined
bash
undefined

All workspaces

所有工作区

qzcli avail
qzcli avail

Including low-priority task nodes (slower but more accurate)

包含低优先级任务节点(速度较慢但更准确)

qzcli avail --lp
qzcli avail --lp

Specific workspace

指定工作区

qzcli avail -w MY_WORKSPACE
qzcli avail -w MY_WORKSPACE

Find compute groups with N free nodes

查找有N个空闲节点的计算组

qzcli avail -n 4
qzcli avail -n 4

Export IDs for scripting

导出ID用于脚本

qzcli avail -n 4 -e
qzcli avail -n 4 -e

Show idle node names

显示空闲节点名称

qzcli avail -w MY_WORKSPACE -v

---
qzcli avail -w MY_WORKSPACE -v

---

Job Submission

任务提交

Interactive (recommended for first-time use)

交互式(首次使用推荐)

bash
undefined
bash
undefined

Full interactive selection: workspace → project → compute group → spec

完整交互式选择:工作区 → 项目 → 计算组 → 规格

qzcli create -i
qzcli create -i

Interactive for a specific workspace only

仅针对指定工作区的交互式选择

qzcli create -i -w "My Workspace"

The TUI shows GPU type, availability, and spec status at each level. Press `Enter/→` to go deeper, `←` to go back.
qzcli create -i -w "My Workspace"

TUI会在每个层级显示GPU类型、可用性和规格状态。按`Enter/→`进入下一层,按`←`返回上一层。

Non-interactive

非交互式

bash
undefined
bash
undefined

Using names (resolved from qzcli res cache)

使用名称(从qzcli res缓存中解析)

qzcli create
--name "my-training-job"
--command "bash /path/to/train.sh"
--workspace "My Workspace"
--compute-group "My Compute Group"
--image YOUR_REGISTRY/team/image:tag
--instances 4
--priority 10
qzcli create
--name "my-training-job"
--command "bash /path/to/train.sh"
--workspace "My Workspace"
--compute-group "My Compute Group"
--image YOUR_REGISTRY/team/image:tag
--instances 4
--priority 10

Using IDs directly

直接使用ID

qzcli create
--name "my-job"
--command "bash /path/to/train.sh"
--workspace ws-YOUR_WORKSPACE_ID
--compute-group lcg-YOUR_LCG_ID
--spec YOUR_SPEC_ID
--image YOUR_REGISTRY/team/image:tag
--instances 4

**Key parameters:**

| Parameter | Default | Description |
|-----------|---------|-------------|
| `--name` / `-n` | required | Job name |
| `--command` / `-c` | required | Command to run |
| `--workspace` / `-w` | | Workspace name or ID (`ws-...`) |
| `--compute-group` / `-g` | auto | Compute group name or ID (`lcg-...`) |
| `--spec` / `-s` | auto | Resource spec ID |
| `--image` / `-m` | | Docker image |
| `--instances` | 1 | Number of instances |
| `--shm` | 1200 | Shared memory (GiB) |
| `--priority` | 10 | Priority (1–10) |
| `--dry-run` | | Preview only, don't submit |
| `--json` | | JSON output for scripting |

```bash
qzcli create
--name "my-job"
--command "bash /path/to/train.sh"
--workspace ws-YOUR_WORKSPACE_ID
--compute-group lcg-YOUR_LCG_ID
--spec YOUR_SPEC_ID
--image YOUR_REGISTRY/team/image:tag
--instances 4

**关键参数:**

| 参数 | 默认值 | 描述 |
|-----------|---------|-------------|
| `--name` / `-n` | 必填 | 任务名称 |
| `--command` / `-c` | 必填 | 要执行的命令 |
| `--workspace` / `-w` | | 工作区名称或ID(`ws-...`) |
| `--compute-group` / `-g` | 自动 | 计算组名称或ID(`lcg-...`) |
| `--spec` / `-s` | 自动 | 资源规格ID |
| `--image` / `-m` | | Docker镜像 |
| `--instances` | 1 | 实例数量 |
| `--shm` | 1200 | 共享内存(GiB) |
| `--priority` | 10 | 优先级(1–10) |
| `--dry-run` | | 仅预览,不提交 |
| `--json` | | 输出JSON格式用于脚本 |

```bash

Preview before submitting

提交前预览

qzcli create --name test --command "echo hi" --workspace "My Workspace"
--image YOUR_IMAGE --dry-run
undefined
qzcli create --name test --command "echo hi" --workspace "My Workspace"
--image YOUR_IMAGE --dry-run
undefined

Env-var passthrough (for existing submission scripts)

环境变量传递(适用于现有提交脚本)

bash
undefined
bash
undefined

Pass vars directly — do NOT use "export VAR; bash script.sh"

直接传递变量 — 请勿使用 "export VAR; bash script.sh"

WORKSPACE_ID="ws-YOUR_WORKSPACE_ID"
LCG_ID="lcg-YOUR_LCG_ID"
SPEC_ID="YOUR_SPEC_ID"
CHECKPOINT_DIR="/path/to/checkpoint"
bash YOUR_SUBMIT_SCRIPT.sh
undefined
WORKSPACE_ID="ws-YOUR_WORKSPACE_ID"
LCG_ID="lcg-YOUR_LCG_ID"
SPEC_ID="YOUR_SPEC_ID"
CHECKPOINT_DIR="/path/to/checkpoint"
bash YOUR_SUBMIT_SCRIPT.sh
undefined

HPC / CPU jobs (Slurm)

HPC / CPU任务(Slurm)

bash
qzcli hpc \
  --name "my-cpu-job" \
  --workspace ws-YOUR_WORKSPACE_ID \
  --compute-group lcg-YOUR_LCG_ID \
  --predef-quota-id YOUR_QUOTA_ID \
  --cpu 55 --mem-gi 300 --instances 30 \
  --image YOUR_REGISTRY/team/cpu-image:tag \
  --entrypoint "cd /path/to/dir && bash run.sh"

bash
qzcli hpc \
  --name "my-cpu-job" \
  --workspace ws-YOUR_WORKSPACE_ID \
  --compute-group lcg-YOUR_LCG_ID \
  --predef-quota-id YOUR_QUOTA_ID \
  --cpu 55 --mem-gi 300 --instances 30 \
  --image YOUR_REGISTRY/team/cpu-image:tag \
  --entrypoint "cd /path/to/dir && bash run.sh"

Batch Submission

批量提交

bash
undefined
bash
undefined

Submit from config file

从配置文件提交

qzcli batch batch_config.json --delay 3
qzcli batch batch_config.json --delay 3

Preview all jobs

预览所有任务

qzcli batch batch_config.json --dry-run
qzcli batch batch_config.json --dry-run

Continue on error

出错时继续执行

qzcli batch batch_config.json --continue-on-error

**Config format** (`batch_config.json`):

```json
{
  "defaults": {
    "workspace": "ws-YOUR_WORKSPACE_ID",
    "compute_group": "lcg-YOUR_LCG_ID",
    "spec": "YOUR_SPEC_ID",
    "image": "YOUR_REGISTRY/team/image:tag",
    "instances": 4,
    "priority": 10
  },
  "matrix": {
    "checkpoint": ["/path/to/ckpt1", "/path/to/ckpt2"],
    "step": [50000, 100000]
  },
  "name_template": "eval-{checkpoint_basename}-step{step}",
  "command_template": "bash eval.sh --checkpoint {checkpoint} --step {step}"
}
Matrix keys are Cartesian-producted (2×2 = 4 jobs above). Use
{key_basename}
for path basenames.
qzcli batch batch_config.json --continue-on-error

**配置格式**(`batch_config.json`):

```json
{
  "defaults": {
    "workspace": "ws-YOUR_WORKSPACE_ID",
    "compute_group": "lcg-YOUR_LCG_ID",
    "spec": "YOUR_SPEC_ID",
    "image": "YOUR_REGISTRY/team/image:tag",
    "instances": 4,
    "priority": 10
  },
  "matrix": {
    "checkpoint": ["/path/to/ckpt1", "/path/to/ckpt2"],
    "step": [50000, 100000]
  },
  "name_template": "eval-{checkpoint_basename}-step{step}",
  "command_template": "bash eval.sh --checkpoint {checkpoint} --step {step}"
}
矩阵键会进行笛卡尔积运算(上方示例为2×2=4个任务)。使用
{key_basename}
获取路径的基名。

Shell loop (alternative)

Shell循环(替代方案)

bash
for step in 040000 050000 060000; do
  qzcli create \
    --name "eval-step${step}" \
    --command "bash eval.sh --step $step" \
    --workspace "My Workspace" \
    --compute-group "My Compute Group" \
    --instances 4
  sleep 3
done

bash
for step in 040000 050000 060000; do
  qzcli create \
    --name "eval-step${step}" \
    --command "bash eval.sh --step $step" \
    --workspace "My Workspace" \
    --compute-group "My Compute Group" \
    --instances 4
  sleep 3
done

Job Management

任务管理

bash
undefined
bash
undefined

List jobs

列出任务

qzcli ls -c -w MY_WORKSPACE # specific workspace qzcli ls -c --all-ws # all workspaces qzcli ls -c -w MY_WORKSPACE -r # running only qzcli ls -c -w MY_WORKSPACE -n 50 # show 50
qzcli ls -c -w MY_WORKSPACE # 指定工作区 qzcli ls -c --all-ws # 所有工作区 qzcli ls -c -w MY_WORKSPACE -r # 仅运行中任务 qzcli ls -c -w MY_WORKSPACE -n 50 # 显示50个任务

Stop a job

停止任务

qzcli stop JOB_ID
qzcli stop JOB_ID

Job status / details

任务状态/详情

qzcli status JOB_ID
qzcli status JOB_ID

Watch all running jobs (refresh every 10s)

监控所有运行中任务(每10秒刷新一次)

qzcli watch -i 10
qzcli watch -i 10

Workspace view with GPU utilization

查看工作区GPU使用率

qzcli ws qzcli ws -a # all projects qzcli ws -p "My Project"

---
qzcli ws qzcli ws -a # 所有项目 qzcli ws -p "My Project"

---

Troubleshooting

故障排查

ProblemCauseFix
Cookie expiredSession gapRe-run
qzcli login
未找到名称为 'xxx' 的工作空间
Stale cacheRun
qzcli res -u
No resources in
create -i
Cache emptyRun
qzcli login && qzcli res -u
qzcli-mcp
not found
Not installed
cd qzcli_tool && pip install -e .
Spec not in workspaceID mismatchMatch spec ID to the correct workspace
Silent job failureScript
sys.exit(0)
Check job logs directly
zsh glob errorsRemote shell is zshWrap commands in
bash -c
or use Python
问题原因解决方法
Cookie过期会话中断重新运行
qzcli login
未找到名称为 'xxx' 的工作空间
缓存过期运行
qzcli res -u
create -i
中无资源
缓存为空运行
qzcli login && qzcli res -u
qzcli-mcp
未找到
未安装
cd qzcli_tool && pip install -e .
规格不在工作区中ID不匹配将规格ID与正确的工作区匹配
任务静默失败脚本返回
sys.exit(0)
直接查看任务日志
zsh通配符错误远程Shell为zsh将命令包裹在
bash -c
中或使用Python