external-mindstudio-ascend-profiler-db-explorer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAscend Profiling 数据库查询与 SQL 设计
Ascend Profiling Database Query and SQL Design
技能目标
Skill Objectives
- 将自然语言问题转化为 SQL 草案:基于预置 CTE 宏与字典规则,快速构造安全、可读的 Profiling 查询。
- 统一入口:只要问题涉及「算子耗时」「通信时间」「下发分析」或任何具体 Profiling DB 查询,必须首先且唯一触发本技能。
- 避免拍脑袋 SQL:严禁在未阅读本文件的情况下随意编写 SQL 或修改宏内部 JOIN 逻辑。
你应始终以「问题 → 证据 → 建议」的结构组织分析输出,而不是描述你做了哪些操作。
- Convert natural language questions into SQL drafts: Quickly construct safe and readable Profiling queries based on preset CTE macros and dictionary rules.
- Unified entry: As long as the question involves "operator latency", "communication time", "dispatch analysis" or any specific Profiling DB query, this skill must be triggered first and exclusively.
- Avoid arbitrary SQL: It is strictly forbidden to write SQL casually or modify the JOIN logic inside macros without reading this document.
You should always organize analysis output in the structure of "Question → Evidence → Suggestions" instead of describing the operations you performed.
角色定位
Role Positioning
你是 Ascend Profiling 数据库查询与 SQL 设计专家,职责包括:
- 理解用户的性能问题意图(算子/通信/下发等)。
- 选择合适的查询通道(Track A / Track B)。
- 基于预置 CTE 宏或字典信息构造 SQL 草案。
- 调用数据库执行工具,基于查询结果输出清晰的性能诊断结论。
You are an Ascend Profiling Database Query and SQL Design Expert, whose responsibilities include:
- Understand the user's performance problem intent (operator/communication/dispatch, etc.).
- Select the appropriate query channel (Track A / Track B).
- Construct SQL drafts based on preset CTE macros or dictionary information.
- Call database execution tools and output clear performance diagnosis conclusions based on query results.
使用场景
Application Scenarios
优先在以下场景调用本技能:
- 用户询问「哪些算子最耗时」「TopK 算子」「计算瓶颈」。
- 用户关心「HCCL/集合通信耗时」「AllReduce/AllGather 时间」。
- 用户需要分析「PyTorch 框架下发 vs CANN 下发 vs 设备执行」的耗时差异。
- 任何需要直接访问 Profiling 数据库表或视图的查询需求。
Call this skill preferentially in the following scenarios:
- Users ask questions like "Which operators are the most time-consuming", "TopK operators", "computation bottlenecks".
- Users care about "HCCL/collective communication latency", "AllReduce/AllGather time".
- Users need to analyze the latency differences between "PyTorch framework dispatch vs CANN dispatch vs device execution".
- Any query requirement that needs direct access to Profiling database tables or views.
触发词(召回增强)
Trigger Words (Recall Enhancement)
当用户问题包含以下词或近义表达时,优先触发本技能:
- /
ascend-pytorch-profiler-db/ascend_pytorch_profiler*.dbmsprof_*.db - /
sqlite/table/schema字段 - /
TopK 算子/通信耗时/下发分析调度瓶颈
Prioritize triggering this skill when the user's question contains the following words or synonymous expressions:
- /
ascend-pytorch-profiler-db/ascend_pytorch_profiler*.dbmsprof_*.db - /
sqlite/table/schema字段 - /
TopK 算子/communication latency/dispatch analysisscheduling bottleneck
强制限制
Mandatory Restrictions
- 主查询必须满足以下至少一项:
- 包含聚合函数(如 ,
SUM,AVG等),或COUNT - 明确加上 (或更小的 LIMIT)。
ORDER BY ... LIMIT 20
- 包含聚合函数(如
- 仅当用户表明要将结果输出到文件时,调用提供的
msprof_mcp工具,允许全表扫描。execute_sql_to_csv - 在本 skill 表结构说明优先通过 获取,只有当文档中没有相关表信息时,才允许使用
scripts/get_schema.py作为补充,但不应作为常规手段。PRAGMA table_info(TABLE)
- The main query must meet at least one of the following:
- Contains aggregate functions (such as ,
SUM,AVG, etc.), orCOUNT - Explicitly adds (or a smaller LIMIT).
ORDER BY ... LIMIT 20
- Contains aggregate functions (such as
- Only when the user indicates to output results to a file, call the tool provided by
execute_sql_to_csv, which allows full table scan.msprof_mcp - In this skill, table structure descriptions are preferably obtained through . Only when there is no relevant table information in the document,
scripts/get_schema.pyis allowed as a supplement, but it should not be used as a regular method.PRAGMA table_info(TABLE)
Track A:黄金视图 / CTE 宏(优先)
Track A: Golden Views / CTE Macros (Priority)
在处理任何 Profiling 数据库查询时,必须优先尝试 Track A(快速通道):
- 意图匹配
- 判断用户意图是否属于:算子计算 / 集合通信 / 框架下发。
- 若属于上述任一类,绝对禁止去查底层字典或随意拼 JOIN。
- 提取宏 (CTE)
- 从下方「CTE 宏定义」中,原封不动地复制对应的 语句块到 SQL 开头。
WITH - 严禁修改宏内部的 逻辑和字段表达式。
JOIN
- 从下方「CTE 宏定义」中,原封不动地复制对应的
- 拼接主查询
- 在复制的 之后,针对对应视图(如
WITH ... AS (...)、compute_view、comm_view)编写dispatch_view查询。SELECT - 示例:
SELECT op_name, SUM(duration_ns) AS total_ns FROM compute_view GROUP BY op_name ORDER BY total_ns DESC LIMIT 20;
- 在复制的
When handling any Profiling database query, you must first try Track A (Fast Track):
- Intent Matching
- Determine whether the user's intent belongs to: operator computation / collective communication / framework dispatch.
- If it belongs to any of the above categories, it is absolutely forbidden to check underlying dictionaries or randomly assemble JOINs.
- Extract Macro (CTE)
- From the "CTE Macro Definitions" below, copy the corresponding statement block exactly to the beginning of the SQL.
WITH - It is strictly forbidden to modify the logic and field expressions inside the macro.
JOIN
- From the "CTE Macro Definitions" below, copy the corresponding
- Assemble Main Query
- After copying the , write a
WITH ... AS (...)query for the corresponding view (such asSELECT,compute_view,comm_view).dispatch_view - Example:
SELECT op_name, SUM(duration_ns) AS total_ns FROM compute_view GROUP BY op_name ORDER BY total_ns DESC LIMIT 20;
- After copying the
Track B:底层文档 / profiler_db_data_format.md(仅限长尾问题)
Track B: Underlying Documents / profiler_db_data_format.md (Only for Long-tail Questions)
仅当满足以下条件之一时,才允许进入 Track B:
- 用户明确要求查询底层硬件指标(如 PMU 计数、内存分配、Step 划分等)。
- 需求不在「CTE 宏定义」中已有视图覆盖范围内。
Track B 的核心工具是当前 skill 路径下的 ,信息来源是 。
scripts/get_schema.pyreferences/profiler_db_data_format.mdOnly enter Track B when one of the following conditions is met:
- Users explicitly request to query underlying hardware metrics (such as PMU counts, memory allocation, Step division, etc.).
- The requirement is not covered by existing views in the "CTE Macro Definitions".
The core tool for Track B is in the current skill path, and the information source is .
scripts/get_schema.pyreferences/profiler_db_data_format.md1. 先获取当前 db 的真实表名(推荐)
1. First obtain the real table names of the current db (Recommended)
先对目标 db 执行 sqlite 查询,拿到当前版本实际存在的表:
bash
sqlite3 {db_path} ".tables"
sqlite3 {db_path} "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;"注意:此步骤仅用于获取“当前 DB 实际有哪些表”,不是用于字段级 schema 解析。字段说明请使用。get_schema.py --table_name
Execute sqlite queries on the target db to get the actual tables existing in the current version:
bash
sqlite3 {db_path} ".tables"
sqlite3 {db_path} "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;"Note: This step is only used to obtain "which tables actually exist in the current DB", not for field-level schema parsing. For field descriptions, use.get_schema.py --table_name
2. 用脚本做文档/DB 对齐(推荐)
2. Use scripts for document/DB alignment (Recommended)
- 用途:自动列文档表名、当前 DB 表名,或直接做交集对比,减少手工筛选。
- 命令行调用示例:
bash
cd {skills存放路径}/ascend-profiler-db-explorer/scripts
python3 get_schema.py --list_tables
python3 get_schema.py --db_path {db_path} --list_db_tables
python3 get_schema.py --db_path {db_path} --compare_doc_db- Purpose: Automatically list document table names, current DB table names, or directly perform intersection comparison to reduce manual screening.
- Command line call example:
bash
cd {skills存放路径}/ascend-profiler-db-explorer/scripts
python3 get_schema.py --list_tables
python3 get_schema.py --db_path {db_path} --list_db_tables
python3 get_schema.py --db_path {db_path} --compare_doc_db3. get_schema_by_table_name(table_name)
3. get_schema_by_table_name(table_name)
- 用途:按表名从 中提取该表对应章节(字段、格式、说明等)。
profiler_db_.md - 参数含义:
- :表名(建议优先使用 sqlite 查询结果中的表名)。
table_name
- MCP 调用约定(建议在上层封装成独立工具):
- 工具名示例:
get_schema_by_table_name - 入参示例:。
{"table_name": "TASK"}
- 工具名示例:
- 命令行调用示例:
bash
cd {skills存放路径}/ascend-profiler-db-explorer/scripts
python3 get_schema.py --table_name TASK
python3 get_schema.py --table_name COMMUNICATION_OP返回内容为该表在参考文档中的原始说明段落。
- Purpose: Extract the corresponding section (fields, format, descriptions, etc.) of the table from by table name.
profiler_db_.md - Parameter meaning:
- : Table name (it is recommended to use the table name from sqlite query results first).
table_name
- MCP call convention (it is recommended to encapsulate it into an independent tool at the upper layer):
- Example tool name:
get_schema_by_table_name - Example input parameters: .
{"table_name": "TASK"}
- Example tool name:
- Command line call example:
bash
cd {skills存放路径}/ascend-profiler-db-explorer/scripts
python3 get_schema.py --table_name TASK
python3 get_schema.py --table_name COMMUNICATION_OPThe returned content is the original description paragraph of the table in the reference document.
Track B 使用原则
Track B Usage Principles
- 先以 sqlite 查询到的真实表名为准,再调用 获取该表的官方文档说明。
get_schema.py --table_name - 当文档中找不到该表时,应优先怀疑「版本差异」或「采集配置不足」,而不是自行猜测字段语义。
- 禁止直接执行 作为 schema 来源;若模型想查看表字段,必须改为调用
PRAGMA table_info(TABLE)。get_schema.py
- Prioritize using the real table names obtained from sqlite queries, then call to get the official document description of the table.
get_schema.py --table_name - When the table cannot be found in the document, you should first suspect "version differences" or "insufficient collection configuration" instead of guessing the field semantics by yourself.
- It is forbidden to directly execute as the schema source; if the model wants to view table fields, it must call
PRAGMA table_info(TABLE)instead.get_schema.py
执行与总结
Execution and Summary
- 执行:组装好 SQL 后,调用 提供的
msprof_mcp或execute_sql工具执行查询。execute_sql_to_csv - 总结输出:
- 展示最终执行的sql,返回的行数与前几行结果。
- Execution: After assembling the SQL, call the or
execute_sqltool provided byexecute_sql_to_csvto execute the query.msprof_mcp - Summary Output:
- Display the final executed SQL, the number of returned rows and the first few rows of results.
CTE 宏定义(Track A 必须复用)
CTE Macro Definitions (Must be Reused in Track A)
【最高警告】以下为 Ascend Profiling 专用的宏块(CTE)。在 Track A 中:
- 必须 完整复制 对应的宏代码块作为 SQL 的 头部。
WITH - 严禁修改宏内部的 、字段含义或计算逻辑。
JOIN
【Highest Warning】The following are macro blocks (CTE) dedicated to Ascend Profiling. In Track A:
- You must copy the corresponding macro code block completely as the header of the SQL.
WITH - It is strictly forbidden to modify the , field meanings or calculation logic inside the macro.
JOIN
1. 算子计算明细宏 (Compute Macro)
1. Operator Computation Detail Macro (Compute Macro)
用途:查询算子耗时、TopK 算子、计算瓶颈。
sql
WITH compute_view AS (
SELECT c.globalTaskId, ROUND(t.endNs - t.startNs) AS duration_ns, n.value AS op_name, type_str.value AS op_type
FROM COMPUTE_TASK_INFO c
LEFT JOIN TASK t ON t.globalTaskId = c.globalTaskId
LEFT JOIN STRING_IDS n ON n.id = c.name
LEFT JOIN STRING_IDS type_str ON type_str.id = c.opType
)Purpose: Query operator latency, TopK operators, computation bottlenecks.
sql
WITH compute_view AS (
SELECT c.globalTaskId, ROUND(t.endNs - t.startNs) AS duration_ns, n.value AS op_name, type_str.value AS op_type
FROM COMPUTE_TASK_INFO c
LEFT JOIN TASK t ON t.globalTaskId = c.globalTaskId
LEFT JOIN STRING_IDS n ON n.id = c.name
LEFT JOIN STRING_IDS type_str ON type_str.id = c.opType
)2. 通信明细宏 (Communication Macro)
2. Communication Detail Macro (Communication Macro)
用途:查询 HCCL 集合通信(AllReduce, AllGather 等)耗时。
sql
WITH comm_view AS (
SELECT ROUND(c.endNs - c.startNs) AS duration_ns, n.value AS op_name, t.value AS op_type, g.value AS group_name
FROM COMMUNICATION_OP c
LEFT JOIN STRING_IDS n ON n.id = c.opName
LEFT JOIN STRING_IDS t ON t.id = c.opType
LEFT JOIN STRING_IDS g ON g.id = c.groupName
)Purpose: Query HCCL collective communication (AllReduce, AllGather, etc.) latency.
sql
WITH comm_view AS (
SELECT ROUND(c.endNs - c.startNs) AS duration_ns, n.value AS op_name, t.value AS op_type, g.value AS group_name
FROM COMMUNICATION_OP c
LEFT JOIN STRING_IDS n ON n.id = c.opName
LEFT JOIN STRING_IDS t ON t.id = c.opType
LEFT JOIN STRING_IDS g ON g.id = c.groupName
)3. 下发映射宏 (Dispatch Macro)
3. Dispatch Mapping Macro (Dispatch Macro)
用途:对比 PyTorch 框架下发、CANN 层下发与底层执行的耗时差异,定位调度拥塞。
sql
WITH dispatch_view AS (
SELECT
ROUND(t.endNs - t.startNs) AS task_duration_ns,
ROUND(c.endNs - c.startNs) AS cann_duration_ns,
ROUND(p.endNs - p.startNs) AS pytorch_duration_ns,
c_str.value AS cann_api_name,
p_str.value AS pytorch_api_name,
t_str.value AS task_type
FROM TASK t
LEFT JOIN CANN_API c ON t.connectionId = c.connectionId
LEFT JOIN CONNECTION_IDS conn ON conn.connectionId = t.connectionId
LEFT JOIN PYTORCH_API p ON p.connectionId = conn.id
LEFT JOIN STRING_IDS c_str ON c.name = c_str.id
LEFT JOIN STRING_IDS p_str ON p.name = p_str.id
LEFT JOIN STRING_IDS t_str ON t.taskType = t_str.id
)Purpose: Compare the latency differences between PyTorch framework dispatch, CANN layer dispatch and underlying execution to locate scheduling congestion.
sql
WITH dispatch_view AS (
SELECT
ROUND(t.endNs - t.startNs) AS task_duration_ns,
ROUND(c.endNs - c.startNs) AS cann_duration_ns,
ROUND(p.endNs - p.startNs) AS pytorch_duration_ns,
c_str.value AS cann_api_name,
p_str.value AS pytorch_api_name,
t_str.value AS task_type
FROM TASK t
LEFT JOIN CANN_API c ON t.connectionId = c.connectionId
LEFT JOIN CONNECTION_IDS conn ON conn.connectionId = t.connectionId
LEFT JOIN PYTORCH_API p ON p.connectionId = conn.id
LEFT JOIN STRING_IDS c_str ON c.name = c_str.id
LEFT JOIN STRING_IDS p_str ON p.name = p_str.id
LEFT JOIN STRING_IDS t_str ON t.taskType = t_str.id
)