Materialize Developer Analysis
Analyze a Materialize environment by querying system catalog tables via the MCP
Developer endpoint (
tool), and produce a structured
report with health status, performance findings, and optimization recommendations.
Discovering Tables and Columns
Do NOT guess column names. Before writing queries, check if the
schema is available by running:
sql
SHOW TABLES FROM mz_ontology
If mz_ontology is available
Use it to discover the correct tables, columns, join paths, and ID types:
| Table | What it tells you |
|---|
mz_ontology.mz_ontology_entity_types
| What catalog entities exist and which table they map to. |
mz_ontology.mz_ontology_link_types
| Relationships between entities (foreign keys, metrics, etc.). |
mz_ontology.mz_ontology_properties
| Column names, types, and descriptions for each entity. |
mz_ontology.mz_ontology_semantic_types
| Typed ID domains (CatalogItemId, ReplicaId, etc.). |
Example queries:
sql
-- Find the right table for an entity
SELECT name, relation, description
FROM mz_ontology.mz_ontology_entity_types
WHERE name LIKE '%source%'
-- Find join paths between entities
SELECT name, source_entity, target_entity, properties, description
FROM mz_ontology.mz_ontology_link_types
WHERE source_entity = 'source' OR target_entity = 'source'
-- Find columns for a table
SELECT column_name, semantic_type, description
FROM mz_ontology.mz_ontology_properties
WHERE entity_type = 'source_status'
If mz_ontology is NOT available
Use
SHOW COLUMNS FROM <schema>.<table>
to verify column names before querying.
Refer to the Critical Rules below for known pitfalls.
Critical Rules
Known column name pitfalls
Even with the ontology, be aware of these common mistakes:
| Wrong | Correct | Table |
|---|
| | , |
| Must JOIN through to then | mz_cluster_replica_utilization
|
When unsure, run
SHOW COLUMNS FROM <schema>.<table>
to verify.
Do NOT query mz_dataflow_arrangement_sizes
NEVER query mz_introspection.mz_dataflow_arrangement_sizes
via MCP. It
fails for two reasons:
- Cluster-scoped: Only returns data for the session's current cluster,
and the MCP tool does not support to switch clusters.
- Type mismatch: Its column is , not like
. JOINs fail with
operator does not exist: uint8 = text
.
Instead, use:
mz_internal.mz_cluster_replica_utilization
— memory/CPU/disk percentage
mz_internal.mz_cluster_replica_metrics
— raw memory bytes
mz_internal.mz_index_advice
— find MVs/indexes that can be removed
Type casting notes
Some
views use
for ID columns instead of
.
Avoid JOINing views with views unless you
cast IDs explicitly. The
views all use
IDs and are safe
to JOIN with
.
Discovering tables without the ontology
If
is not available, use these fallbacks:
SHOW COLUMNS FROM <schema>.<table>
to check a table's columns
- Do NOT use
SHOW TABLES FROM mz_internal LIKE '...'
— this only shows
tables, not views. Most system catalog objects are views.
Workflow Overview
- Connect — Verify the MCP Developer tools are available
- Discover — Use the ontology + catalog queries to inventory all deployed objects
- Analyze — Assess performance metrics: freshness, hydration, memory, utilization
- Report — Produce a structured markdown report with findings and recommendations
Step 1: Verify MCP Connection
Confirm you have access to the
tool. Run a quick test:
query_system_catalog: SELECT mz_version()
If this fails, check:
- The MCP server is configured in
- The feature flag is enabled on the environment
- Your authentication credentials are valid
Running Queries
All queries are run via the
MCP tool. Constraints:
- One statement per call (no semicolons)
- Read-only: SELECT, SHOW, EXPLAIN only
- System tables only: no access to user tables
- No statements
When filtering out system schemas, always exclude:
,
,
,
, and
.
Step 2: Discover — Inventory the Environment
Run the discovery queries to understand what is deployed. See
for the full query set. The discovery phase covers:
Environment Overview
- Materialize version ()
- Clusters and replicas — names, sizes, and replica counts
- Schemas in use
Deployed Objects Inventory
- Sources: type (Kafka, Postgres, MySQL, Webhook, etc.), cluster assignment, status
- Materialized Views: cluster assignment, indexes, dependencies
- Views: (non-materialized) and their usage patterns
- Sinks: type, destination, cluster assignment
- Indexes: what they're on, cluster assignment
- Connections: external system connections configured
Build a mental model of the data pipeline: what data comes in (sources), how it's
transformed (views/MVs), and where it goes out (sinks).
Object Definitions
Retrieve SQL definitions for materialized views, views, indexes, and sources
using
. This is critical for optimization analysis —
the SQL definitions tell you
how things are computed:
- Join patterns and join order
- Filter predicates (or lack thereof — missing temporal filters are a common issue)
- Aggregation strategies
- Whether MVs duplicate logic that could be shared
Step 3: Analyze — Performance and Resource Metrics
Freshness (Lag Analysis)
Query
mz_internal.mz_materialization_lag
for per-object lag.
Important: The
column is of type
(a uint8),
not a standard timestamp. You cannot subtract it from
directly. Cast to
get a human-readable time:
to_timestamp(write_frontier::bigint / 1000)
.
Hydration Status
Query
mz_internal.mz_hydration_statuses
to check whether all dataflows are
hydrated. Non-hydrated objects after initial startup may indicate resource
pressure or configuration issues.
Memory and Resource Consumption
mz_internal.mz_cluster_replica_utilization
for memory/CPU percentage per replica
mz_internal.mz_cluster_replica_metrics
for raw memory metrics
mz_internal.mz_index_advice
to identify which MVs/indexes can be optimized
Index Advice
Query
mz_internal.mz_index_advice
— Materialize's built-in advisor. Hint types:
- "keep" — the MV/index is needed as-is
- "drop unless queried directly" — no structural dependencies; only useful for direct SELECT queries
- "convert to a view" — MV can be dematerialized entirely, saving all arrangement memory
- "convert to a view with an index" — convert MV to a view but keep its indexes
- "add index" — object would benefit from an index
Cost Analysis (optional)
Query
mz_catalog.mz_cluster_replica_sizes
to get credit rates per cluster
size, then calculate:
credits_per_hour * replication_factor * 730 hours/month
.
When writing recommendations, always quantify the credit impact.
Object Dependencies
Query
mz_internal.mz_object_dependencies
to understand the dependency graph.
Step 4: Report — Generate the Analysis
Produce a structured markdown report:
markdown
# Environment Analysis
**Date**: <date>
**Materialize Version**: <version>
## Executive Summary
<2-3 paragraph high-level assessment>
## Cluster Topology
| Cluster | Size | Replicas | Credits/Hr | Monthly Credits | Utilization |
## Deployed Objects
### Sources (<count>)
### Materialized Views (<count>)
### Sinks (<count>)
### Indexes (<count>)
## Performance Analysis
### Freshness
### Hydration
### Cluster Utilization
## Cost Analysis (if requested)
## Index Advice Summary
## SQL-Level Analysis
### Materialized View Definitions
### Index Analysis
## Optimization Recommendations
<numbered list with specific SQL for each>
Writing Recommendations
Always include specific SQL commands. For example:
Good:
Recommendation: Dematerialize
to save memory.
sql
SHOW CREATE MATERIALIZED VIEW my_schema.unused_mv;
DROP MATERIALIZED VIEW my_schema.unused_mv;
CREATE VIEW my_schema.unused_mv AS <definition>;
Bad:
Recommendation: Consider dematerializing
.
Troubleshooting Runbooks
For focused troubleshooting, use these diagnostic paths.
Always end with specific SQL commands to fix the issue.
"Why is my materialized view stale?"
Diagnostic steps:
- Check
mz_internal.mz_materialization_lag
for the MV's lag
- Check
mz_internal.mz_hydration_statuses
— is it hydrated?
- Check
mz_internal.mz_cluster_replica_statuses
— is the replica healthy?
- Check
mz_internal.mz_cluster_replica_utilization
— memory pressure causing restarts?
- Check
mz_internal.mz_source_statuses
— upstream source errors?
Common fixes:
If the cluster is overloaded (high memory/CPU):
sql
-- Option A: Scale up the cluster
ALTER CLUSTER <cluster_name> SET (SIZE = '<next_size_up>');
-- Option B: Move the MV to a different cluster
SHOW CREATE MATERIALIZED VIEW <schema>.<mv_name>;
DROP MATERIALIZED VIEW <schema>.<mv_name>;
CREATE MATERIALIZED VIEW <schema>.<mv_name> IN CLUSTER <new_cluster> AS <definition>;
If the MV is not hydrated and the cluster recently restarted:
Hydration will complete on its own once the cluster stabilizes. If it persists,
the cluster likely needs more memory.
If an upstream source has errors:
sql
SELECT name, status, error, last_status_change_at
FROM mz_internal.mz_source_statuses
WHERE status != 'running'
Fix the upstream source issue first — MV freshness depends on source health.
"Why is my cluster running out of memory?"
Diagnostic steps:
- Check
mz_internal.mz_cluster_replica_utilization
for memory percentage
- Check
mz_internal.mz_index_advice
for MVs that can be dematerialized
- Check MV definitions for missing temporal filters
- Check for redundant indexes
Common fixes:
Dematerialize MVs that don't need to be materialized:
sql
SELECT o.name, o.type, sc.name AS schema_name, ia.hint, ia.details
FROM mz_internal.mz_index_advice ia
JOIN mz_catalog.mz_objects o ON ia.object_id = o.id
JOIN mz_catalog.mz_schemas sc ON o.schema_id = sc.id
WHERE ia.hint = 'convert to a view'
-- For each candidate:
SHOW CREATE MATERIALIZED VIEW <schema>.<mv_name>;
DROP MATERIALIZED VIEW <schema>.<mv_name>;
CREATE VIEW <schema>.<mv_name> AS <definition>;
Drop unused indexes:
sql
SELECT o.name, o.type, sc.name AS schema_name, ia.hint, ia.details
FROM mz_internal.mz_index_advice ia
JOIN mz_catalog.mz_objects o ON ia.object_id = o.id
JOIN mz_catalog.mz_schemas sc ON o.schema_id = sc.id
WHERE ia.hint = 'drop unless queried directly'
-- Verify with the user before dropping
DROP INDEX <schema>.<index_name>;
Scale up the cluster:
sql
ALTER CLUSTER <cluster_name> SET (SIZE = '<next_size_up>');
"Are my sources healthy? / Has my source finished snapshotting?"
Diagnostic steps:
- Check
mz_internal.mz_source_statuses
for source errors
- Check
mz_internal.mz_source_statistics
for ingestion progress
- Check
mz_internal.mz_materialization_lag
for end-to-end lag
Common fixes:
If a source is stalled or erroring:
sql
SELECT name, status, error
FROM mz_internal.mz_source_statuses
WHERE status != 'running'
-- If the connection credentials are wrong:
ALTER SECRET <secret_name> AS '<new_value>';
If
is
, the source is still loading its initial
snapshot. This is normal for large sources — wait for it to complete.
"What's the health of my environment?"
Run these checks in order:
mz_internal.mz_cluster_replica_statuses
— all replicas ready?
mz_internal.mz_source_statuses
— all sources running?
mz_internal.mz_sink_statuses
— all sinks running?
mz_internal.mz_cluster_replica_utilization
— resource pressure?
"What can I optimize to save costs?"
- Check
mz_internal.mz_index_advice
for optimization candidates
- Check cluster utilization — are clusters over-provisioned?
- Check credit rates from
mz_catalog.mz_cluster_replica_sizes
Notes
- All queries run through the MCP Developer endpoint are read-only.
- Query results are limited to system catalog tables — no access to user data.
- Access is governed by RBAC — you only see objects your credentials have access to.
- Freshness numbers are point-in-time snapshots. Re-run to check if lag is stable or growing.