mirror-doctor
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMirror Pipeline Doctor
Mirror管道诊断工具
Diagnose and fix existing Mirror pipeline problems by running CLI commands, identifying root causes, and executing fixes.
通过运行CLI命令、确定根本原因并执行修复方案,诊断并修复现有Mirror管道的问题。
Boundaries
适用范围
- Diagnose and fix EXISTING Mirror pipeline problems.
- Do not build new pipelines — use for config reference or
/mirrorfor new Turbo pipelines./turbo-builder - Do not serve as a command reference — use for CLI syntax and flag lookups.
/mirror - Do not handle Turbo pipelines — use for
/turbo-doctorproblems.goldsky turbo - Do not create secrets — use for credential management. But DO check whether secrets exist as part of diagnosis.
/secrets
- 诊断并修复现有Mirror管道的问题。
- 不负责构建新管道——如需配置参考请使用,如需构建新Turbo管道请使用
/mirror。/turbo-builder - 不提供命令参考——如需CLI语法和参数查询请使用。
/mirror - 不处理Turbo管道问题——相关问题请使用
goldsky turbo。/turbo-doctor - 不创建密钥——如需凭证管理请使用。但会在诊断过程中检查密钥是否存在。
/secrets
Diagnostic Workflow
诊断流程
Follow these steps in order. Each step builds on the previous one.
按以下顺序执行步骤,每一步都基于上一步的结果推进。
Step 1: Verify Authentication
步骤1:验证身份认证
Run to confirm the user is logged in.
goldsky project list 2>&1- If logged in: Note the project name and continue.
- If not logged in: Direct the user to . Do not proceed until auth works.
/auth-setup
运行确认用户已登录。
goldsky project list 2>&1- 已登录:记录项目名称并继续。
- 未登录:引导用户使用完成认证。认证完成前不要继续后续步骤。
/auth-setup
Step 2: Identify the Pipeline
步骤2:定位目标管道
Run to list all Mirror pipelines with their status.
goldsky pipeline list --include-runtime-details 2>&1If the user already named a pipeline, confirm it exists in the list. If not, show the list and ask which pipeline they want to diagnose.
Note both the desired status (ACTIVE, INACTIVE, PAUSED) and the runtime status (STARTING, RUNNING, FAILING, TERMINATED) — Mirror pipelines have both, and the combination tells the story.
运行列出所有Mirror管道及其状态。
goldsky pipeline list --include-runtime-details 2>&1如果用户已指定管道名称,确认该管道存在于列表中。如果未指定,展示列表并询问用户要诊断的管道。
同时记录期望状态(ACTIVE、INACTIVE、PAUSED)和运行时状态(STARTING、RUNNING、FAILING、TERMINATED)——Mirror管道同时具备这两种状态,二者的组合能反映问题全貌。
Step 3: Triage by Status
步骤3:按状态分类处理
The desired + runtime status combination determines the diagnostic path:
| Desired | Runtime | Meaning | Action |
|---|---|---|---|
| ACTIVE | RUNNING | Healthy — pipeline is processing data | Ask user what symptom they're seeing. Proceed to Step 4. |
| ACTIVE | STARTING | Pipeline is initializing | Ask how long. If >10 min, proceed to Step 4. |
| ACTIVE | FAILING | Pipeline is encountering errors but hasn't terminated yet | Proceed to Step 4 immediately — this is time-sensitive. |
| ACTIVE | TERMINATED | Most common failure. Pipeline wanted to run but crashed. | Proceed to Step 4. |
| PAUSED | TERMINATED | User paused the pipeline (snapshot was taken). | Ask if they want to resume: |
| INACTIVE | TERMINATED | User stopped the pipeline (no snapshot). | Ask if they want to start: |
ACTIVE + TERMINATED is the most common case. The pipeline's desired status is ACTIVE (it should be running) but the runtime has terminated due to an error. Focus the diagnosis here.
期望状态与运行时状态的组合决定了诊断路径:
| 期望状态 | 运行时状态 | 含义 | 操作 |
|---|---|---|---|
| ACTIVE | RUNNING | 健康状态——管道正在处理数据 | 询问用户遇到的症状,继续步骤4。 |
| ACTIVE | STARTING | 管道正在初始化 | 询问已持续时长。如果超过10分钟,继续步骤4。 |
| ACTIVE | FAILING | 管道正在报错但尚未终止 | 立即进入步骤4——此情况具有时效性。 |
| ACTIVE | TERMINATED | 最常见故障。管道本应运行但已崩溃。 | 进入步骤4。 |
| PAUSED | TERMINATED | 用户已暂停管道(已生成快照)。 | 询问用户是否要恢复: |
| INACTIVE | TERMINATED | 用户已停止管道(无快照)。 | 询问用户是否要启动: |
ACTIVE + TERMINATED是最常见的情况。管道的期望状态为ACTIVE(应处于运行状态),但运行时因错误已终止,需重点针对此情况进行诊断。
Step 4: Gather Diagnostic Data
步骤4:收集诊断数据
Run these commands to understand what went wrong:
bash
undefined运行以下命令排查问题原因:
bash
undefinedGet error details and runtime metrics
获取错误详情和运行时指标
goldsky pipeline monitor <name> 2>&1
goldsky pipeline monitor <name> 2>&1
Check for in-flight requests blocking operations
检查是否有进行中的请求阻塞操作
goldsky pipeline monitor <name> --update-request 2>&1
goldsky pipeline monitor <name> --update-request 2>&1
Get the pipeline definition to check for misconfig
获取管道定义以检查配置错误
goldsky pipeline get <name> --definition 2>&1
goldsky pipeline get <name> --definition 2>&1
Get pipeline info including version
获取包含版本信息的管道详情
goldsky pipeline info <name> 2>&1
goldsky pipeline info <name> 2>&1
Check available snapshots
检查可用快照
goldsky pipeline snapshots list <name> 2>&1
Run these in sequence and analyze the output before proceeding. The monitor output is the most important — it shows error messages, records received/written metrics, and runtime status transitions.goldsky pipeline snapshots list <name> 2>&1
按顺序运行这些命令并分析输出后再继续。其中`monitor`命令的输出最为重要——它会展示错误信息、记录接收/写入指标以及运行时状态变化。Step 5: Match Error Patterns
步骤5:匹配错误模式
Based on the diagnostic data, match against these known patterns:
根据诊断数据,匹配以下已知错误模式:
Bad or Missing Secret
密钥错误或缺失
Symptoms: Pipeline terminates shortly after starting. Monitor shows credential or authentication errors.
Verify: Run and cross-reference with the values in the pipeline definition from Step 4.
goldsky secret list 2>&1secret_nameFix:
- If the secret doesn't exist, direct the user to to create it.
/secrets - If the secret exists but credentials are wrong, create a new secret (secrets are immutable — you create a replacement with the same name).
- Restart:
goldsky pipeline restart <name> --from-snapshot last
症状:管道启动后不久即终止。输出显示凭证或身份认证错误。
monitor验证:运行,并与步骤4中管道定义里的值交叉核对。
goldsky secret list 2>&1secret_name修复方案:
- 如果密钥不存在,引导用户使用创建密钥。
/secrets - 如果密钥存在但凭证错误,创建新密钥(密钥不可变——需创建同名替代密钥)。
- 重启管道:
goldsky pipeline restart <name> --from-snapshot last
Sink Unreachable
目标存储不可达
Symptoms: Connection timeout, connection refused, or network errors in the monitor output. Pipeline may cycle between FAILING and TERMINATED.
Common causes:
- Firewall not allowing inbound from AWS us-west-2 (Mirror pipelines write from this region)
- Database is down or restarted
- Connection pool exhausted
- Wrong port or host in the secret
Fix:
- Verify the sink is reachable from us-west-2.
- Check that the secret has the correct host, port, and credentials.
- Once connectivity is restored, restart:
goldsky pipeline restart <name> --from-snapshot last
症状:输出显示连接超时、连接被拒绝或网络错误。管道可能在FAILING和TERMINATED状态间循环。
monitor常见原因:
- 防火墙未允许来自AWS us-west-2区域的入站请求(Mirror管道从此区域写入数据)
- 数据库已下线或重启
- 连接池耗尽
- 密钥中的端口或主机地址错误
修复方案:
- 验证目标存储可从us-west-2区域访问。
- 检查密钥中的主机、端口和凭证是否正确。
- 恢复连接后,重启管道:
goldsky pipeline restart <name> --from-snapshot last
Resource Exhaustion
资源耗尽
Symptoms: Pipeline runs for a while then terminates. Monitor may show high record counts or slow processing. Common during large backfills or pipelines with many sources/JOINs.
Fix:
- Resize: — sizes are
goldsky pipeline resize <name> <size>,s,m,l,xl.xxl - Start small and go up. handles most workloads (up to 300K records/sec, ~8 subgraph sources). Use
sor larger for big chain backfills or heavy JOINs.l
症状:管道运行一段时间后终止。可能显示高记录量或处理缓慢。常见于大规模回填或包含多个数据源/JOIN操作的管道。
monitor修复方案:
- 调整资源规格:——规格包括
goldsky pipeline resize <name> <size>,s,m,l,xl。xxl - 从小规格开始逐步升级。规格可处理大多数工作负载(最高30万条记录/秒,约8个子图数据源)。大规模链数据回填或复杂JOIN操作请使用
s或更大规格。l
In-Flight Request Blocking
进行中的请求阻塞
Symptoms: User tries to update, delete, or restart the pipeline but gets "Cannot process request, found existing request in-flight."
Diagnose: — this shows what operation is in progress (usually a snapshot).
goldsky pipeline monitor <name> --update-requestFix:
- If the in-flight operation is a snapshot that's making progress, wait for it.
- If it's stuck or unwanted:
goldsky pipeline cancel-update <name> - Then retry the original operation.
症状:用户尝试更新、删除或重启管道时收到错误:"Cannot process request, found existing request in-flight."
诊断:运行——此命令会显示正在进行的操作(通常是快照创建)。
goldsky pipeline monitor <name> --update-request修复方案:
- 如果进行中的操作是正在推进的快照,请等待其完成。
- 如果操作卡住或无需继续:
goldsky pipeline cancel-update <name> - 然后重试原操作。
Stuck Snapshot
快照卡住
Symptoms: Pipeline can't be paused, updated, or restarted because a snapshot creation is taking too long or failing. The monitor shows snapshot progress stuck at a percentage.
--update-requestFix:
- Cancel the stuck snapshot:
goldsky pipeline cancel-update <name> - Restart without waiting for a new snapshot:
goldsky pipeline restart <name> --from-snapshot last - If there's no usable snapshot: (starts from scratch — warn the user this reprocesses data)
goldsky pipeline restart <name> --from-snapshot none
症状:由于快照创建耗时过长或失败,管道无法暂停、更新或重启。监控显示快照进度卡在某个百分比。
--update-request修复方案:
- 取消卡住的快照:
goldsky pipeline cancel-update <name> - 无需等待新快照即可重启:
goldsky pipeline restart <name> --from-snapshot last - 如果没有可用快照:(从头开始——需提醒用户这会重新处理所有数据)
goldsky pipeline restart <name> --from-snapshot none
Transform SQL Error
转换SQL错误
Symptoms: Pipeline terminates with SQL-related error messages. Could be syntax errors, referencing a non-existent column, or type mismatches.
Diagnose: Check the pipeline definition () and look at the section.
goldsky pipeline get <name> --definitiontransformsFix:
- Identify the SQL error from the monitor output.
- Fix the SQL in the pipeline YAML file.
- Validate:
goldsky pipeline validate <file.yaml> - Reapply:
goldsky pipeline apply <file.yaml> --status ACTIVE --from-snapshot last
Use for SQL transform syntax reference if needed.
/mirror症状:管道因SQL相关错误终止。可能是语法错误、引用不存在的列或类型不匹配。
诊断:检查管道定义()中的部分。
goldsky pipeline get <name> --definitiontransforms修复方案:
- 从输出中定位SQL错误。
monitor - 修复管道YAML文件中的SQL代码。
- 验证配置:
goldsky pipeline validate <file.yaml> - 重新应用配置:
goldsky pipeline apply <file.yaml> --status ACTIVE --from-snapshot last
如需SQL转换语法参考,请使用。
/mirrorPipeline in Restart Loop
管道处于重启循环
Symptoms: Pipeline repeatedly cycles through STARTING → FAILING → TERMINATED. Monitor shows the same error recurring.
This is usually a symptom of another root cause — bad secret, sink unreachable, or resource issues. The pipeline keeps trying to start but hits the same wall.
Fix:
- Identify the underlying error from the monitor (it's usually one of the patterns above).
- Fix the root cause first.
- Then restart:
goldsky pipeline restart <name> --from-snapshot last
症状:管道反复在STARTING → FAILING → TERMINATED状态间循环。显示相同错误重复出现。
monitor这通常是其他根本原因的表现——密钥错误、目标存储不可达或资源问题。管道持续尝试启动但遇到相同障碍。
修复方案:
- 从输出中确定底层错误(通常属于上述模式之一)。
monitor - 先修复根本原因。
- 然后重启管道:
goldsky pipeline restart <name> --from-snapshot last
Sink Downtime Cascade
目标存储宕机连锁反应
Symptoms: Pipeline was running fine, then the sink (database) went down temporarily. Pipeline auto-retried, then restarted its writers, then eventually terminated.
This is expected behavior — Mirror handles transient sink errors automatically (retry batch → restart writers → fail after prolonged issues).
Fix:
- Confirm the sink is back up and healthy.
- Restart from the last snapshot:
goldsky pipeline restart <name> --from-snapshot last - The pipeline will resume from where it left off, not reprocess everything.
症状:管道原本运行正常,之后目标存储(数据库)临时下线。管道自动重试,随后重启写入器,最终终止。
这是预期行为——Mirror会自动处理临时目标存储错误(重试批次 → 重启写入器 → 长时间故障后终止)。
修复方案:
- 确认目标存储已恢复正常。
- 从最后一个快照重启:
goldsky pipeline restart <name> --from-snapshot last - 管道会从中断处恢复,不会重新处理所有数据。
Step 6: Present Diagnosis
步骤6:呈现诊断结果
After identifying the issue, present findings clearly:
undefined确定问题后,清晰展示诊断结果:
undefinedDiagnosis
诊断结果
Pipeline: <name>
Status: <desired> + <runtime>
Issue: <one-line summary>
Root cause:
<What's wrong and why>
Evidence:
- <Error message or observation from monitor>
- <Relevant detail from pipeline definition>
Recommended fix:
- <Step 1>
- <Step 2>
Prevention:
<How to avoid this in the future, if applicable>
undefined管道名称: <name>
状态: <期望状态> + <运行时状态>
问题: <一句话总结>
根本原因:
<问题详情及原因>
证据:
- <来自monitor的错误信息或观察结果>
- <来自管道定义的相关细节>
推荐修复方案:
- <步骤1>
- <步骤2>
预防建议:
<如何避免未来出现此类问题(如适用)>
undefinedStep 7: Execute Fix
步骤7:执行修复
Offer to run the fix commands directly. Always confirm with the user before executing:
- Restart:
goldsky pipeline restart <name> --from-snapshot last - Resize:
goldsky pipeline resize <name> <size> - Cancel blocked operation:
goldsky pipeline cancel-update <name> - Restart from scratch: (warn: reprocesses data)
goldsky pipeline restart <name> --from-snapshot none - Reapply config:
goldsky pipeline apply <file.yaml> --status ACTIVE --from-snapshot last - Delete and recreate: then
goldsky pipeline delete <name> -f(last resort)goldsky pipeline apply <file.yaml> --status ACTIVE
After executing, verify recovery by running and watching for STARTING → RUNNING transition.
goldsky pipeline monitor <name>主动提出直接运行修复命令。执行前务必征得用户确认:
- 重启管道:
goldsky pipeline restart <name> --from-snapshot last - 调整资源规格:
goldsky pipeline resize <name> <size> - 取消阻塞操作:
goldsky pipeline cancel-update <name> - 从头重启: (提醒:会重新处理数据)
goldsky pipeline restart <name> --from-snapshot none - 重新应用配置:
goldsky pipeline apply <file.yaml> --status ACTIVE --from-snapshot last - 删除并重建: 然后
goldsky pipeline delete <name> -f(最后手段)goldsky pipeline apply <file.yaml> --status ACTIVE
执行后,运行验证恢复情况,观察状态是否从STARTING转为RUNNING。
goldsky pipeline monitor <name>Important Rules
重要规则
- Always gather data before diagnosing. Never guess at the problem.
- Check both desired AND runtime status — the combination matters.
- Confirm with the user before running any destructive commands (delete, restart from scratch).
- preserves progress.
--from-snapshot laststarts over. Default to--from-snapshot noneunless there's a reason not to.last - Transient errors are auto-retried for up to 6 hours. Non-transient errors terminate immediately. If the pipeline terminated quickly after starting, it's likely a config issue (bad secret, wrong SQL), not a transient network blip.
- If the problem is beyond CLI diagnosis, suggest contacting support@goldsky.com with the pipeline name, error messages, and project ID.
- 诊断前务必收集数据,切勿猜测问题。
- 同时检查期望状态和运行时状态——二者的组合至关重要。
- 运行任何破坏性命令(删除、从头重启)前需征得用户确认。
- 会保留进度。
--from-snapshot last会从头开始。除非有特殊原因,默认使用--from-snapshot none。last - 临时错误会自动重试最长6小时。非临时错误会立即终止。如果管道启动后很快终止,大概率是配置问题(密钥错误、SQL错误),而非临时网络故障。
- 如果问题超出CLI诊断范围,建议用户联系support@goldsky.com,并提供管道名称、错误信息和项目ID。
When Bash is Not Available
当无法使用Bash时
If you don't have the Bash tool, output the diagnostic commands for the user to run, but structure them clearly:
- Give one command at a time.
- Explain what to look for in the output.
- Based on their description of the output, proceed with the diagnosis.
This is the fallback path — always prefer running commands directly when Bash is available.
如果无法使用Bash工具,输出诊断命令供用户自行运行,但需清晰结构化:
- 一次提供一个命令。
- 说明需要从输出中关注的内容。
- 根据用户描述的输出结果,继续诊断流程。
这是 fallback 方案——当Bash可用时,优先直接运行命令。
Related
相关工具
- — Pipeline YAML configuration, CLI flag reference, sink setup
/mirror - — Create and manage sink credentials
/secrets - — CLI installation and authentication
/auth-setup - — Diagnose Turbo pipeline problems (not Mirror)
/turbo-doctor
- — 管道YAML配置、CLI参数参考、目标存储设置
/mirror - — 创建和管理目标存储凭证
/secrets - — CLI安装和身份认证
/auth-setup - — 诊断Turbo管道问题(非Mirror管道)
/turbo-doctor