external-mindstudio-ascend-profiler-db-explorer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Ascend Profiling 数据库查询与 SQL 设计

Ascend Profiling Database Query and SQL Design

技能目标

Skill Objectives

将自然语言问题转化为 SQL 草案：基于预置 CTE 宏与字典规则，快速构造安全、可读的 Profiling 查询。
统一入口：只要问题涉及「算子耗时」「通信时间」「下发分析」或任何具体 Profiling DB 查询，必须首先且唯一触发本技能。
避免拍脑袋 SQL：严禁在未阅读本文件的情况下随意编写 SQL 或修改宏内部 JOIN 逻辑。

你应始终以「问题 → 证据 → 建议」的结构组织分析输出，而不是描述你做了哪些操作。

Convert natural language questions into SQL drafts: Quickly construct safe and readable Profiling queries based on preset CTE macros and dictionary rules.
Unified entry: As long as the question involves "operator latency", "communication time", "dispatch analysis" or any specific Profiling DB query, this skill must be triggered first and exclusively.
Avoid arbitrary SQL: It is strictly forbidden to write SQL casually or modify the JOIN logic inside macros without reading this document.

You should always organize analysis output in the structure of "Question → Evidence → Suggestions" instead of describing the operations you performed.

角色定位

Role Positioning

你是 Ascend Profiling 数据库查询与 SQL 设计专家，职责包括：

理解用户的性能问题意图（算子/通信/下发等）。
选择合适的查询通道（Track A / Track B）。
基于预置 CTE 宏或字典信息构造 SQL 草案。
调用数据库执行工具，基于查询结果输出清晰的性能诊断结论。

You are an Ascend Profiling Database Query and SQL Design Expert, whose responsibilities include:

Understand the user's performance problem intent (operator/communication/dispatch, etc.).
Select the appropriate query channel (Track A / Track B).
Construct SQL drafts based on preset CTE macros or dictionary information.
Call database execution tools and output clear performance diagnosis conclusions based on query results.

使用场景

Application Scenarios

优先在以下场景调用本技能：

用户询问「哪些算子最耗时」「TopK 算子」「计算瓶颈」。
用户关心「HCCL/集合通信耗时」「AllReduce/AllGather 时间」。
用户需要分析「PyTorch 框架下发 vs CANN 下发 vs 设备执行」的耗时差异。
任何需要直接访问 Profiling 数据库表或视图的查询需求。

Call this skill preferentially in the following scenarios:

Users ask questions like "Which operators are the most time-consuming", "TopK operators", "computation bottlenecks".
Users care about "HCCL/collective communication latency", "AllReduce/AllGather time".
Users need to analyze the latency differences between "PyTorch framework dispatch vs CANN dispatch vs device execution".
Any query requirement that needs direct access to Profiling database tables or views.

触发词（召回增强）

Trigger Words (Recall Enhancement)

当用户问题包含以下词或近义表达时，优先触发本技能：

ascend-pytorch-profiler-db

ascend_pytorch_profiler*.db

msprof_*.db

```
sqlite
```
/
```
table
```
/
```
schema
```
/
```
字段
```

TopK 算子

通信耗时

下发分析

调度瓶颈

Prioritize triggering this skill when the user's question contains the following words or synonymous expressions:

ascend-pytorch-profiler-db

ascend_pytorch_profiler*.db

msprof_*.db

```
sqlite
```
/
```
table
```
/
```
schema
```
/
```
字段
```

TopK 算子

communication latency

dispatch analysis

scheduling bottleneck

强制限制

Mandatory Restrictions

主查询必须满足以下至少一项：
- 包含聚合函数（如
```
SUM
```
  ,
```
AVG
```
  ,
```
COUNT
```
  等），或
- 明确加上
```
ORDER BY ... LIMIT 20
```
  （或更小的 LIMIT）。
仅当用户表明要将结果输出到文件时，调用
```
msprof_mcp
```
提供的
```
execute_sql_to_csv
```
工具，允许全表扫描。
在本 skill 表结构说明优先通过
```
scripts/get_schema.py
```
获取，只有当文档中没有相关表信息时，才允许使用
```
PRAGMA table_info(TABLE)
```
作为补充，但不应作为常规手段。

The main query must meet at least one of the following:
- Contains aggregate functions (such as
```
SUM
```
  ,
```
AVG
```
  ,
```
COUNT
```
  , etc.), or
- Explicitly adds
```
ORDER BY ... LIMIT 20
```
  (or a smaller LIMIT).
Only when the user indicates to output results to a file, call the
```
execute_sql_to_csv
```
tool provided by
```
msprof_mcp
```
, which allows full table scan.
In this skill, table structure descriptions are preferably obtained through
```
scripts/get_schema.py
```
. Only when there is no relevant table information in the document,
```
PRAGMA table_info(TABLE)
```
is allowed as a supplement, but it should not be used as a regular method.

Track A：黄金视图 / CTE 宏（优先）

Track A: Golden Views / CTE Macros (Priority)

在处理任何 Profiling 数据库查询时，必须优先尝试 Track A（快速通道）：

意图匹配
- 判断用户意图是否属于：算子计算 / 集合通信 / 框架下发。
- 若属于上述任一类，绝对禁止去查底层字典或随意拼 JOIN。
提取宏 (CTE)
- 从下方「CTE 宏定义」中，原封不动地复制对应的
```
WITH
```
  语句块到 SQL 开头。
- 严禁修改宏内部的
```
JOIN
```
  逻辑和字段表达式。

拼接主查询

在复制的
```
WITH ... AS (...)
```
之后，针对对应视图（如
```
compute_view
```
、
```
comm_view
```
、
```
dispatch_view
```
）编写
```
SELECT
```
查询。

示例：

SELECT op_name, SUM(duration_ns) AS total_ns FROM compute_view GROUP BY op_name ORDER BY total_ns DESC LIMIT 20;

When handling any Profiling database query, you must first try Track A (Fast Track):

Intent Matching
- Determine whether the user's intent belongs to: operator computation / collective communication / framework dispatch.
- If it belongs to any of the above categories, it is absolutely forbidden to check underlying dictionaries or randomly assemble JOINs.
Extract Macro (CTE)
- From the "CTE Macro Definitions" below, copy the corresponding
  WITH
  statement block exactly to the beginning of the SQL.
- It is strictly forbidden to modify the
```
JOIN
```
  logic and field expressions inside the macro.

Assemble Main Query

After copying the
```
WITH ... AS (...)
```
, write a
```
SELECT
```
query for the corresponding view (such as
```
compute_view
```
,
```
comm_view
```
,
```
dispatch_view
```
).

Example:

SELECT op_name, SUM(duration_ns) AS total_ns FROM compute_view GROUP BY op_name ORDER BY total_ns DESC LIMIT 20;

Track B：底层文档 / profiler_db_data_format.md（仅限长尾问题）

Track B: Underlying Documents / profiler_db_data_format.md (Only for Long-tail Questions)

仅当满足以下条件之一时，才允许进入 Track B：

用户明确要求查询底层硬件指标（如 PMU 计数、内存分配、Step 划分等）。
需求不在「CTE 宏定义」中已有视图覆盖范围内。

Track B 的核心工具是当前 skill 路径下的

scripts/get_schema.py

，信息来源是

references/profiler_db_data_format.md

。

Only enter Track B when one of the following conditions is met:

Users explicitly request to query underlying hardware metrics (such as PMU counts, memory allocation, Step division, etc.).
The requirement is not covered by existing views in the "CTE Macro Definitions".

The core tool for Track B is

scripts/get_schema.py

in the current skill path, and the information source is

references/profiler_db_data_format.md

1. 先获取当前 db 的真实表名（推荐）

1. First obtain the real table names of the current db (Recommended)

先对目标 db 执行 sqlite 查询，拿到当前版本实际存在的表：

bash

sqlite3 {db_path} ".tables"
sqlite3 {db_path} "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;"

注意：此步骤仅用于获取“当前 DB 实际有哪些表”，不是用于字段级 schema 解析。字段说明请使用
get_schema.py --table_name
。

Execute sqlite queries on the target db to get the actual tables existing in the current version:

bash

sqlite3 {db_path} ".tables"
sqlite3 {db_path} "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;"

Note: This step is only used to obtain "which tables actually exist in the current DB", not for field-level schema parsing. For field descriptions, use
get_schema.py --table_name
.

2. 用脚本做文档/DB 对齐（推荐）

2. Use scripts for document/DB alignment (Recommended)

用途：自动列文档表名、当前 DB 表名，或直接做交集对比，减少手工筛选。
命令行调用示例：

bash

cd {skills存放路径}/ascend-profiler-db-explorer/scripts
python3 get_schema.py --list_tables
python3 get_schema.py --db_path {db_path} --list_db_tables
python3 get_schema.py --db_path {db_path} --compare_doc_db

Purpose: Automatically list document table names, current DB table names, or directly perform intersection comparison to reduce manual screening.
Command line call example:

bash

cd {skills存放路径}/ascend-profiler-db-explorer/scripts
python3 get_schema.py --list_tables
python3 get_schema.py --db_path {db_path} --list_db_tables
python3 get_schema.py --db_path {db_path} --compare_doc_db

3. get_schema_by_table_name(table_name)

用途：按表名从
```
profiler_db_.md
```
中提取该表对应章节（字段、格式、说明等）。
参数含义：
- ```
table_name
```
  ：表名（建议优先使用 sqlite 查询结果中的表名）。
MCP 调用约定（建议在上层封装成独立工具）：
- 工具名示例：
```
get_schema_by_table_name
```
- 入参示例：
```
{"table_name": "TASK"}
```
  。
命令行调用示例：

bash

cd {skills存放路径}/ascend-profiler-db-explorer/scripts
python3 get_schema.py --table_name TASK
python3 get_schema.py --table_name COMMUNICATION_OP

返回内容为该表在参考文档中的原始说明段落。

Purpose: Extract the corresponding section (fields, format, descriptions, etc.) of the table from
```
profiler_db_.md
```
by table name.
Parameter meaning:
- ```
table_name
```
  : Table name (it is recommended to use the table name from sqlite query results first).
MCP call convention (it is recommended to encapsulate it into an independent tool at the upper layer):
- Example tool name:
```
get_schema_by_table_name
```
- Example input parameters:
```
{"table_name": "TASK"}
```
  .
Command line call example:

bash

cd {skills存放路径}/ascend-profiler-db-explorer/scripts
python3 get_schema.py --table_name TASK
python3 get_schema.py --table_name COMMUNICATION_OP

The returned content is the original description paragraph of the table in the reference document.

Track B 使用原则

Track B Usage Principles

先以 sqlite 查询到的真实表名为准，再调用
```
get_schema.py --table_name
```
获取该表的官方文档说明。
当文档中找不到该表时，应优先怀疑「版本差异」或「采集配置不足」，而不是自行猜测字段语义。
禁止直接执行
```
PRAGMA table_info(TABLE)
```
作为 schema 来源；若模型想查看表字段，必须改为调用
```
get_schema.py
```
。

Prioritize using the real table names obtained from sqlite queries, then call
```
get_schema.py --table_name
```
to get the official document description of the table.
When the table cannot be found in the document, you should first suspect "version differences" or "insufficient collection configuration" instead of guessing the field semantics by yourself.
It is forbidden to directly execute
```
PRAGMA table_info(TABLE)
```
as the schema source; if the model wants to view table fields, it must call
```
get_schema.py
```
instead.

执行与总结

Execution and Summary

执行：组装好 SQL 后，调用
```
msprof_mcp
```
提供的
```
execute_sql
```
或
```
execute_sql_to_csv
```
工具执行查询。
总结输出：
- 展示最终执行的sql，返回的行数与前几行结果。

Execution: After assembling the SQL, call the
```
execute_sql
```
or
```
execute_sql_to_csv
```
tool provided by
```
msprof_mcp
```
to execute the query.
Summary Output:
- Display the final executed SQL, the number of returned rows and the first few rows of results.

CTE 宏定义（Track A 必须复用）

CTE Macro Definitions (Must be Reused in Track A)

【最高警告】以下为 Ascend Profiling 专用的宏块（CTE）。在 Track A 中：

必须 完整复制 对应的宏代码块作为 SQL 的
```
WITH
```
头部。
严禁修改宏内部的
```
JOIN
```
、字段含义或计算逻辑。

【Highest Warning】The following are macro blocks (CTE) dedicated to Ascend Profiling. In Track A:

You must copy the corresponding macro code block completely as the
```
WITH
```
header of the SQL.
It is strictly forbidden to modify the
```
JOIN
```
, field meanings or calculation logic inside the macro.

1. 算子计算明细宏 (Compute Macro)

1. Operator Computation Detail Macro (Compute Macro)

用途：查询算子耗时、TopK 算子、计算瓶颈。

sql

WITH compute_view AS (
    SELECT c.globalTaskId, ROUND(t.endNs - t.startNs) AS duration_ns, n.value AS op_name, type_str.value AS op_type
    FROM COMPUTE_TASK_INFO c
    LEFT JOIN TASK t ON t.globalTaskId = c.globalTaskId
    LEFT JOIN STRING_IDS n ON n.id = c.name
    LEFT JOIN STRING_IDS type_str ON type_str.id = c.opType
)

Purpose: Query operator latency, TopK operators, computation bottlenecks.

sql

WITH compute_view AS (
    SELECT c.globalTaskId, ROUND(t.endNs - t.startNs) AS duration_ns, n.value AS op_name, type_str.value AS op_type
    FROM COMPUTE_TASK_INFO c
    LEFT JOIN TASK t ON t.globalTaskId = c.globalTaskId
    LEFT JOIN STRING_IDS n ON n.id = c.name
    LEFT JOIN STRING_IDS type_str ON type_str.id = c.opType
)

2. 通信明细宏 (Communication Macro)

2. Communication Detail Macro (Communication Macro)

用途：查询 HCCL 集合通信（AllReduce, AllGather 等）耗时。

sql

WITH comm_view AS (
    SELECT ROUND(c.endNs - c.startNs) AS duration_ns, n.value AS op_name, t.value AS op_type, g.value AS group_name
    FROM COMMUNICATION_OP c
    LEFT JOIN STRING_IDS n ON n.id = c.opName
    LEFT JOIN STRING_IDS t ON t.id = c.opType
    LEFT JOIN STRING_IDS g ON g.id = c.groupName
)

Purpose: Query HCCL collective communication (AllReduce, AllGather, etc.) latency.

sql

WITH comm_view AS (
    SELECT ROUND(c.endNs - c.startNs) AS duration_ns, n.value AS op_name, t.value AS op_type, g.value AS group_name
    FROM COMMUNICATION_OP c
    LEFT JOIN STRING_IDS n ON n.id = c.opName
    LEFT JOIN STRING_IDS t ON t.id = c.opType
    LEFT JOIN STRING_IDS g ON g.id = c.groupName
)

3. 下发映射宏 (Dispatch Macro)

3. Dispatch Mapping Macro (Dispatch Macro)

用途：对比 PyTorch 框架下发、CANN 层下发与底层执行的耗时差异，定位调度拥塞。

sql

WITH dispatch_view AS (
    SELECT 
        ROUND(t.endNs - t.startNs) AS task_duration_ns, 
        ROUND(c.endNs - c.startNs) AS cann_duration_ns, 
        ROUND(p.endNs - p.startNs) AS pytorch_duration_ns,
        c_str.value AS cann_api_name, 
        p_str.value AS pytorch_api_name,
        t_str.value AS task_type
    FROM TASK t
    LEFT JOIN CANN_API c ON t.connectionId = c.connectionId
    LEFT JOIN CONNECTION_IDS conn ON conn.connectionId = t.connectionId
    LEFT JOIN PYTORCH_API p ON p.connectionId = conn.id
    LEFT JOIN STRING_IDS c_str ON c.name = c_str.id
    LEFT JOIN STRING_IDS p_str ON p.name = p_str.id
    LEFT JOIN STRING_IDS t_str ON t.taskType = t_str.id
)

Purpose: Compare the latency differences between PyTorch framework dispatch, CANN layer dispatch and underlying execution to locate scheduling congestion.

sql

WITH dispatch_view AS (
    SELECT 
        ROUND(t.endNs - t.startNs) AS task_duration_ns, 
        ROUND(c.endNs - c.startNs) AS cann_duration_ns, 
        ROUND(p.endNs - p.startNs) AS pytorch_duration_ns,
        c_str.value AS cann_api_name, 
        p_str.value AS pytorch_api_name,
        t_str.value AS task_type
    FROM TASK t
    LEFT JOIN CANN_API c ON t.connectionId = c.connectionId
    LEFT JOIN CONNECTION_IDS conn ON conn.connectionId = t.connectionId
    LEFT JOIN PYTORCH_API p ON p.connectionId = conn.id
    LEFT JOIN STRING_IDS c_str ON c.name = c_str.id
    LEFT JOIN STRING_IDS p_str ON p.name = p_str.id
    LEFT JOIN STRING_IDS t_str ON t.taskType = t_str.id
)