mcp-developer-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMaterialize Developer Analysis
Materialize 开发者分析
Analyze a Materialize environment by querying system catalog tables via the MCP
Developer endpoint ( tool), and produce a structured
report with health status, performance findings, and optimization recommendations.
query_system_catalog通过MCP Developer端点(工具)查询系统目录表,分析Materialize环境,并生成包含健康状态、性能发现及优化建议的结构化报告。
query_system_catalogDiscovering Tables and Columns
发现表与列
Do NOT guess column names. Before writing queries, check if the
schema is available by running:
mz_ontologysql
SHOW TABLES FROM mz_ontology请勿猜测列名。 在编写查询语句前,先执行以下命令检查模式是否可用:
mz_ontologysql
SHOW TABLES FROM mz_ontologyIf mz_ontology is available
若mz_ontology可用
Use it to discover the correct tables, columns, join paths, and ID types:
| Table | What it tells you |
|---|---|
| What catalog entities exist and which |
| Relationships between entities (foreign keys, metrics, etc.). |
| Column names, types, and descriptions for each entity. |
| Typed ID domains (CatalogItemId, ReplicaId, etc.). |
Example queries:
sql
-- Find the right table for an entity
SELECT name, relation, description
FROM mz_ontology.mz_ontology_entity_types
WHERE name LIKE '%source%'
-- Find join paths between entities
SELECT name, source_entity, target_entity, properties, description
FROM mz_ontology.mz_ontology_link_types
WHERE source_entity = 'source' OR target_entity = 'source'
-- Find columns for a table
SELECT column_name, semantic_type, description
FROM mz_ontology.mz_ontology_properties
WHERE entity_type = 'source_status'使用它来查找正确的表、列、关联路径及ID类型:
| 表 | 作用 |
|---|---|
| 存在哪些目录实体,以及它们映射到哪个 |
| 实体间的关系(外键、指标等)。 |
| 每个实体的列名、类型及描述。 |
| 带类型的ID域(CatalogItemId、ReplicaId等)。 |
示例查询:
sql
-- 查找实体对应的表
SELECT name, relation, description
FROM mz_ontology.mz_ontology_entity_types
WHERE name LIKE '%source%'
-- 查找实体间的关联路径
SELECT name, source_entity, target_entity, properties, description
FROM mz_ontology.mz_ontology_link_types
WHERE source_entity = 'source' OR target_entity = 'source'
-- 查找表的列
SELECT column_name, semantic_type, description
FROM mz_ontology.mz_ontology_properties
WHERE entity_type = 'source_status'If mz_ontology is NOT available
若mz_ontology不可用
Use to verify column names before querying.
Refer to the Critical Rules below for known pitfalls.
SHOW COLUMNS FROM <schema>.<table>在查询前使用验证列名。参考下方关键规则中的常见误区。
SHOW COLUMNS FROM <schema>.<table>Critical Rules
关键规则
Known column name pitfalls
已知列名误区
Even with the ontology, be aware of these common mistakes:
| Wrong | Correct | Table |
|---|---|---|
| | |
| Must JOIN through | |
When unsure, run to verify.
SHOW COLUMNS FROM <schema>.<table>即使有本体可用,也要注意以下常见错误:
| 错误写法 | 正确写法 | 表 |
|---|---|---|
| | |
| 必须通过 | |
不确定时,执行进行验证。
SHOW COLUMNS FROM <schema>.<table>Do NOT query mz_dataflow_arrangement_sizes
请勿查询mz_dataflow_arrangement_sizes
NEVER query via MCP. It
fails for two reasons:
mz_introspection.mz_dataflow_arrangement_sizes- Cluster-scoped: Only returns data for the session's current cluster,
and the MCP tool does not support to switch clusters.
SET cluster = ... - Type mismatch: Its column is
id, notuint8liketext. JOINs fail withmz_catalog.mz_objects.id.operator does not exist: uint8 = text
Instead, use:
- — memory/CPU/disk percentage
mz_internal.mz_cluster_replica_utilization - — raw memory bytes
mz_internal.mz_cluster_replica_metrics - — find MVs/indexes that can be removed
mz_internal.mz_index_advice
绝对不要通过MCP查询,该查询会失败,原因有二:
mz_introspection.mz_dataflow_arrangement_sizes- 集群范围限制:仅返回会话当前集群的数据,且MCP工具不支持切换集群。
SET cluster = ... - 类型不匹配:其列类型为
id,而非uint8的mz_catalog.mz_objects.id类型,关联时会报错text。operator does not exist: uint8 = text
请改用以下视图:
- —— 内存/CPU/磁盘使用率
mz_internal.mz_cluster_replica_utilization - —— 原始内存字节数
mz_internal.mz_cluster_replica_metrics - —— 查找可移除的MV/索引
mz_internal.mz_index_advice
Type casting notes
类型转换说明
Some views use for ID columns instead of .
Avoid JOINing views with views unless you
cast IDs explicitly. The views all use IDs and are safe
to JOIN with .
mz_introspectionuint8textmz_introspectionmz_catalogmz_internaltextmz_catalog部分视图的ID列使用而非。除非显式转换ID类型,否则请勿将视图与视图关联。视图的ID均为类型,可安全与视图关联。
mz_introspectionuint8textmz_introspectionmz_catalogmz_internaltextmz_catalogDiscovering tables without the ontology
无本体时发现表的方法
If is not available, use these fallbacks:
mz_ontology- to check a table's columns
SHOW COLUMNS FROM <schema>.<table> - Do NOT use — this only shows tables, not views. Most system catalog objects are views.
SHOW TABLES FROM mz_internal LIKE '...'
若不可用,可使用以下替代方案:
mz_ontology- 使用查看表的列
SHOW COLUMNS FROM <schema>.<table> - 请勿使用—— 该命令仅显示表,不显示视图。大多数系统目录对象是视图。
SHOW TABLES FROM mz_internal LIKE '...'
Workflow Overview
工作流程概述
- Connect — Verify the MCP Developer tools are available
- Discover — Use the ontology + catalog queries to inventory all deployed objects
- Analyze — Assess performance metrics: freshness, hydration, memory, utilization
- Report — Produce a structured markdown report with findings and recommendations
- 连接 —— 验证MCP Developer工具是否可用
- 发现 —— 使用本体和目录查询清点所有已部署对象
- 分析 —— 评估性能指标:新鲜度、水化状态、内存、利用率
- 报告 —— 生成包含发现结果和建议的结构化Markdown报告
Step 1: Verify MCP Connection
步骤1:验证MCP连接
Confirm you have access to the tool. Run a quick test:
query_system_catalogquery_system_catalog: SELECT mz_version()If this fails, check:
- The MCP server is configured in
.mcp.json - The feature flag is enabled on the environment
enable_mcp_developer - Your authentication credentials are valid
确认你有权访问工具。执行快速测试:
query_system_catalogquery_system_catalog: SELECT mz_version()若测试失败,请检查:
- MCP服务器已在中配置
.mcp.json - 环境已启用功能标志
enable_mcp_developer - 你的认证凭证有效
Running Queries
执行查询
All queries are run via the MCP tool. Constraints:
query_system_catalog- One statement per call (no semicolons)
- Read-only: SELECT, SHOW, EXPLAIN only
- System tables only: no access to user tables
- No statements
SET
When filtering out system schemas, always exclude: , ,
, , and .
mz_catalogmz_internalpg_cataloginformation_schemamz_introspection所有查询均通过 MCP工具执行,约束如下:
query_system_catalog- 每次调用仅执行一条语句(无分号)
- 只读:仅支持SELECT、SHOW、EXPLAIN
- 仅系统表:无法访问用户表
- 不支持语句
SET
过滤系统模式时,需排除:、、、和。
mz_catalogmz_internalpg_cataloginformation_schemamz_introspectionStep 2: Discover — Inventory the Environment
步骤2:发现 —— 清点环境
Run the discovery queries to understand what is deployed. See
for the full query set. The discovery phase covers:
references/queries.md执行发现查询以了解已部署的对象。完整查询集请参考。发现阶段涵盖:
references/queries.mdEnvironment Overview
环境概述
- Materialize version ()
SELECT mz_version() - Clusters and replicas — names, sizes, and replica counts
- Schemas in use
- Materialize版本()
SELECT mz_version() - 集群和副本——名称、规格、副本数量
- 正在使用的模式
Deployed Objects Inventory
已部署对象清单
- Sources: type (Kafka, Postgres, MySQL, Webhook, etc.), cluster assignment, status
- Materialized Views: cluster assignment, indexes, dependencies
- Views: (non-materialized) and their usage patterns
- Sinks: type, destination, cluster assignment
- Indexes: what they're on, cluster assignment
- Connections: external system connections configured
Build a mental model of the data pipeline: what data comes in (sources), how it's
transformed (views/MVs), and where it goes out (sinks).
- 数据源(Sources):类型(Kafka、Postgres、MySQL、Webhook等)、集群分配、状态
- 物化视图(Materialized Views):集群分配、索引、依赖关系
- 视图(Views):(非物化)及其使用模式
- 数据输出(Sinks):类型、目标、集群分配
- 索引(Indexes):关联对象、集群分配
- 连接(Connections):已配置的外部系统连接
构建数据管道的心智模型:数据从何处进入(数据源)、如何转换(视图/MV)、流向何处(数据输出)。
Object Definitions
对象定义
Retrieve SQL definitions for materialized views, views, indexes, and sources
using . This is critical for optimization analysis —
the SQL definitions tell you how things are computed:
references/queries.md- Join patterns and join order
- Filter predicates (or lack thereof — missing temporal filters are a common issue)
- Aggregation strategies
- Whether MVs duplicate logic that could be shared
使用中的查询获取物化视图、视图、索引和数据源的SQL定义。这对优化分析至关重要——SQL定义能告诉你计算逻辑:
references/queries.md- 关联模式和关联顺序
- 过滤条件(或缺失的条件——缺少时间过滤是常见问题)
- 聚合策略
- MV是否存在可共享的重复逻辑
Step 3: Analyze — Performance and Resource Metrics
步骤3:分析 —— 性能与资源指标
Freshness (Lag Analysis)
新鲜度(延迟分析)
Query for per-object lag.
mz_internal.mz_materialization_lagImportant: The column is of type (a uint8),
not a standard timestamp. You cannot subtract it from directly. Cast to
get a human-readable time: .
write_frontiermz_timestampnow()to_timestamp(write_frontier::bigint / 1000)查询获取每个对象的延迟。
mz_internal.mz_materialization_lag重要提示:列类型为(uint8),而非标准时间戳。无法直接与相减,需转换为人类可读时间:。
write_frontiermz_timestampnow()to_timestamp(write_frontier::bigint / 1000)Hydration Status
水化状态
Query to check whether all dataflows are
hydrated. Non-hydrated objects after initial startup may indicate resource
pressure or configuration issues.
mz_internal.mz_hydration_statuses查询检查所有数据流是否已完成水化。初始启动后仍未水化的对象可能表明资源压力或配置问题。
mz_internal.mz_hydration_statusesMemory and Resource Consumption
内存与资源消耗
- for memory/CPU percentage per replica
mz_internal.mz_cluster_replica_utilization - for raw memory metrics
mz_internal.mz_cluster_replica_metrics - to identify which MVs/indexes can be optimized
mz_internal.mz_index_advice
- —— 每个副本的内存/CPU使用率
mz_internal.mz_cluster_replica_utilization - —— 原始内存指标
mz_internal.mz_cluster_replica_metrics - —— 识别可优化的MV/索引
mz_internal.mz_index_advice
Index Advice
索引建议
Query — Materialize's built-in advisor. Hint types:
mz_internal.mz_index_advice- "keep" — the MV/index is needed as-is
- "drop unless queried directly" — no structural dependencies; only useful for direct SELECT queries
- "convert to a view" — MV can be dematerialized entirely, saving all arrangement memory
- "convert to a view with an index" — convert MV to a view but keep its indexes
- "add index" — object would benefit from an index
查询——Materialize内置的建议工具。提示类型:
mz_internal.mz_index_advice- "keep" —— MV/索引需保持现状
- "drop unless queried directly" —— 无结构依赖;仅对直接SELECT查询有用
- "convert to a view" —— MV可完全转为非物化视图,节省所有排列内存
- "convert to a view with an index" —— 将MV转为视图但保留其索引
- "add index" —— 对象可从新增索引中获益
Cost Analysis (optional)
成本分析(可选)
Query to get credit rates per cluster
size, then calculate: .
mz_catalog.mz_cluster_replica_sizescredits_per_hour * replication_factor * 730 hours/monthWhen writing recommendations, always quantify the credit impact.
查询获取各集群规格的信用费率,然后计算:。
mz_catalog.mz_cluster_replica_sizescredits_per_hour * replication_factor * 730 hours/month撰写建议时,务必量化信用影响。
Object Dependencies
对象依赖关系
Query to understand the dependency graph.
mz_internal.mz_object_dependencies查询了解依赖关系图。
mz_internal.mz_object_dependenciesStep 4: Report — Generate the Analysis
步骤4:报告 —— 生成分析结果
Produce a structured markdown report:
markdown
undefined生成结构化Markdown报告:
markdown
undefinedEnvironment Analysis
环境分析
Date: <date>
Materialize Version: <version>
日期:<日期>
Materialize版本:<版本>
Executive Summary
执行摘要
<2-3 paragraph high-level assessment>
<2-3段的高层评估>
Cluster Topology
集群拓扑
| Cluster | Size | Replicas | Credits/Hr | Monthly Credits | Utilization |
| 集群 | 规格 | 副本数 | 每小时信用 | 月度信用 | 利用率 |
Deployed Objects
已部署对象
Sources (<count>)
数据源(<数量>)
Materialized Views (<count>)
物化视图(<数量>)
Sinks (<count>)
数据输出(<数量>)
Indexes (<count>)
索引(<数量>)
Performance Analysis
性能分析
Freshness
新鲜度
Hydration
水化状态
Cluster Utilization
集群利用率
Cost Analysis (if requested)
成本分析(若需)
Index Advice Summary
索引建议摘要
SQL-Level Analysis
SQL层面分析
Materialized View Definitions
物化视图定义
Index Analysis
索引分析
Optimization Recommendations
优化建议
<numbered list with specific SQL for each>
```
<带具体SQL命令的编号列表>
undefinedWriting Recommendations
撰写建议
Always include specific SQL commands. For example:
Good:
Recommendation: Dematerializeto save memory.my_schema.unused_mvsqlSHOW CREATE MATERIALIZED VIEW my_schema.unused_mv; DROP MATERIALIZED VIEW my_schema.unused_mv; CREATE VIEW my_schema.unused_mv AS <definition>;
Bad:
Recommendation: Consider dematerializing.my_schema.unused_mv
务必包含具体SQL命令。例如:
正确示例:
推荐: 将转为非物化视图以节省内存。my_schema.unused_mvsqlSHOW CREATE MATERIALIZED VIEW my_schema.unused_mv; DROP MATERIALIZED VIEW my_schema.unused_mv; CREATE VIEW my_schema.unused_mv AS <定义>;
错误示例:
推荐: 考虑将转为非物化视图。my_schema.unused_mv
Troubleshooting Runbooks
故障排查手册
For focused troubleshooting, use these diagnostic paths.
Always end with specific SQL commands to fix the issue.
针对聚焦式故障排查,使用以下诊断路径。最终务必提供具体的修复SQL命令。
"Why is my materialized view stale?"
"我的物化视图为什么过期?"
Diagnostic steps:
- Check for the MV's lag
mz_internal.mz_materialization_lag - Check — is it hydrated?
mz_internal.mz_hydration_statuses - Check — is the replica healthy?
mz_internal.mz_cluster_replica_statuses - Check — memory pressure causing restarts?
mz_internal.mz_cluster_replica_utilization - Check — upstream source errors?
mz_internal.mz_source_statuses
Common fixes:
If the cluster is overloaded (high memory/CPU):
sql
-- Option A: Scale up the cluster
ALTER CLUSTER <cluster_name> SET (SIZE = '<next_size_up>');
-- Option B: Move the MV to a different cluster
SHOW CREATE MATERIALIZED VIEW <schema>.<mv_name>;
DROP MATERIALIZED VIEW <schema>.<mv_name>;
CREATE MATERIALIZED VIEW <schema>.<mv_name> IN CLUSTER <new_cluster> AS <definition>;If the MV is not hydrated and the cluster recently restarted:
Hydration will complete on its own once the cluster stabilizes. If it persists,
the cluster likely needs more memory.
If an upstream source has errors:
sql
SELECT name, status, error, last_status_change_at
FROM mz_internal.mz_source_statuses
WHERE status != 'running'Fix the upstream source issue first — MV freshness depends on source health.
诊断步骤:
- 检查中该MV的延迟
mz_internal.mz_materialization_lag - 检查——是否已水化?
mz_internal.mz_hydration_statuses - 检查——副本是否健康?
mz_internal.mz_cluster_replica_statuses - 检查——内存压力是否导致重启?
mz_internal.mz_cluster_replica_utilization - 检查——上游数据源是否有错误?
mz_internal.mz_source_statuses
常见修复方案:
若集群过载(内存/CPU使用率高):
sql
-- 方案A:升级集群规格
ALTER CLUSTER <cluster_name> SET (SIZE = '<更高规格>');
-- 方案B:将MV迁移至其他集群
SHOW CREATE MATERIALIZED VIEW <schema>.<mv_name>;
DROP MATERIALIZED VIEW <schema>.<mv_name>;
CREATE MATERIALIZED VIEW <schema>.<mv_name> IN CLUSTER <新集群> AS <定义>;若MV未水化且集群最近重启过:
集群稳定后水化会自动完成。若问题持续,集群可能需要更多内存。
若上游数据源有错误:
sql
SELECT name, status, error, last_status_change_at
FROM mz_internal.mz_source_statuses
WHERE status != 'running'先修复上游数据源问题——MV的新鲜度依赖于数据源健康。
"Why is my cluster running out of memory?"
"我的集群为什么内存不足?"
Diagnostic steps:
- Check for memory percentage
mz_internal.mz_cluster_replica_utilization - Check for MVs that can be dematerialized
mz_internal.mz_index_advice - Check MV definitions for missing temporal filters
- Check for redundant indexes
Common fixes:
Dematerialize MVs that don't need to be materialized:
sql
SELECT o.name, o.type, sc.name AS schema_name, ia.hint, ia.details
FROM mz_internal.mz_index_advice ia
JOIN mz_catalog.mz_objects o ON ia.object_id = o.id
JOIN mz_catalog.mz_schemas sc ON o.schema_id = sc.id
WHERE ia.hint = 'convert to a view'
-- For each candidate:
SHOW CREATE MATERIALIZED VIEW <schema>.<mv_name>;
DROP MATERIALIZED VIEW <schema>.<mv_name>;
CREATE VIEW <schema>.<mv_name> AS <definition>;Drop unused indexes:
sql
SELECT o.name, o.type, sc.name AS schema_name, ia.hint, ia.details
FROM mz_internal.mz_index_advice ia
JOIN mz_catalog.mz_objects o ON ia.object_id = o.id
JOIN mz_catalog.mz_schemas sc ON o.schema_id = sc.id
WHERE ia.hint = 'drop unless queried directly'
-- Verify with the user before dropping
DROP INDEX <schema>.<index_name>;Scale up the cluster:
sql
ALTER CLUSTER <cluster_name> SET (SIZE = '<next_size_up>');诊断步骤:
- 检查的内存使用率
mz_internal.mz_cluster_replica_utilization - 检查中可转为非物化视图的MV
mz_internal.mz_index_advice - 检查MV定义是否缺少时间过滤
- 检查是否存在冗余索引
常见修复方案:
将无需物化的MV转为非物化视图:
sql
SELECT o.name, o.type, sc.name AS schema_name, ia.hint, ia.details
FROM mz_internal.mz_index_advice ia
JOIN mz_catalog.mz_objects o ON ia.object_id = o.id
JOIN mz_catalog.mz_schemas sc ON o.schema_id = sc.id
WHERE ia.hint = 'convert to a view'
-- 对每个候选对象执行:
SHOW CREATE MATERIALIZED VIEW <schema>.<mv_name>;
DROP MATERIALIZED VIEW <schema>.<mv_name>;
CREATE VIEW <schema>.<mv_name> AS <定义>;删除未使用的索引:
sql
SELECT o.name, o.type, sc.name AS schema_name, ia.hint, ia.details
FROM mz_internal.mz_index_advice ia
JOIN mz_catalog.mz_objects o ON ia.object_id = o.id
JOIN mz_catalog.mz_schemas sc ON o.schema_id = sc.id
WHERE ia.hint = 'drop unless queried directly'
-- 先与用户确认再删除
DROP INDEX <schema>.<index_name>;升级集群规格:
sql
ALTER CLUSTER <cluster_name> SET (SIZE = '<更高规格>');"Are my sources healthy? / Has my source finished snapshotting?"
"我的数据源是否健康?/ 我的数据源快照是否完成?"
Diagnostic steps:
- Check for source errors
mz_internal.mz_source_statuses - Check for ingestion progress
mz_internal.mz_source_statistics - Check for end-to-end lag
mz_internal.mz_materialization_lag
Common fixes:
If a source is stalled or erroring:
sql
SELECT name, status, error
FROM mz_internal.mz_source_statuses
WHERE status != 'running'
-- If the connection credentials are wrong:
ALTER SECRET <secret_name> AS '<new_value>';If is , the source is still loading its initial
snapshot. This is normal for large sources — wait for it to complete.
snapshot_committedfalse诊断步骤:
- 检查中的数据源错误
mz_internal.mz_source_statuses - 检查中的 ingestion 进度
mz_internal.mz_source_statistics - 检查中的端到端延迟
mz_internal.mz_materialization_lag
常见修复方案:
若数据源停滞或报错:
sql
SELECT name, status, error
FROM mz_internal.mz_source_statuses
WHERE status != 'running'
-- 若连接凭证错误:
ALTER SECRET <secret_name> AS '<新值>';若为,说明数据源仍在加载初始快照。对于大型数据源,这是正常现象——等待其完成即可。
snapshot_committedfalse"What's the health of my environment?"
"我的环境健康状况如何?"
Run these checks in order:
- — all replicas ready?
mz_internal.mz_cluster_replica_statuses - — all sources running?
mz_internal.mz_source_statuses - — all sinks running?
mz_internal.mz_sink_statuses - — resource pressure?
mz_internal.mz_cluster_replica_utilization
按以下顺序执行检查:
- —— 所有副本是否就绪?
mz_internal.mz_cluster_replica_statuses - —— 所有数据源是否正常运行?
mz_internal.mz_source_statuses - —— 所有数据输出是否正常运行?
mz_internal.mz_sink_statuses - —— 是否存在资源压力?
mz_internal.mz_cluster_replica_utilization
"What can I optimize to save costs?"
"我可以优化哪些内容以节省成本?"
- Check for optimization candidates
mz_internal.mz_index_advice - Check cluster utilization — are clusters over-provisioned?
- Check credit rates from
mz_catalog.mz_cluster_replica_sizes
- 检查中的优化候选对象
mz_internal.mz_index_advice - 检查集群利用率——集群是否过度配置?
- 从中查看信用费率
mz_catalog.mz_cluster_replica_sizes
Notes
注意事项
- All queries run through the MCP Developer endpoint are read-only.
- Query results are limited to system catalog tables — no access to user data.
- Access is governed by RBAC — you only see objects your credentials have access to.
- Freshness numbers are point-in-time snapshots. Re-run to check if lag is stable or growing.
- 通过MCP Developer端点执行的所有查询均为只读。
- 查询结果仅限于系统目录表——无法访问用户数据。
- 访问权限受RBAC管控——仅能查看凭证有权访问的对象。
- 新鲜度数据为时点快照。需重新执行查询以检查延迟是否稳定或持续增长。