diagnostic-issue-resolver

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Diagnostic Issue Resolver

问题诊断解决工具

Diagnose and fix common TTS + Telegram bot issues through systematic symptom collection, automated diagnostics, and targeted fixes.
Platform: macOS (Apple Silicon)

通过系统化的症状收集、自动化诊断和针对性修复,诊断并解决常见的TTS + Telegram机器人问题。
适用平台: macOS (Apple Silicon)

When to Use This Skill

适用场景

  • TTS audio is not playing or sounds wrong
  • Telegram bot is not responding to messages
  • Kokoro engine errors or timeouts
  • Lock file appears stuck
  • Audio plays twice (race condition)
  • MPS acceleration is not working
  • Queue appears full or backed up

  • TTS音频无法播放或声音异常
  • Telegram机器人不响应消息
  • Kokoro引擎报错或超时
  • 锁文件卡住
  • 音频重复播放(竞态条件)
  • MPS加速不生效
  • 队列满或任务堆积

Requirements

前置要求

  • Access to
    ~/.claude/automation/claude-telegram-sync/
    (bot source)
  • Access to
    ~/.local/share/kokoro/
    (Kokoro engine)
  • Access to
    ~/.local/share/tts-telegram-sync/logs/
    (centralized logs)

  • 有权限访问
    ~/.claude/automation/claude-telegram-sync/
    (机器人源码目录)
  • 有权限访问
    ~/.local/share/kokoro/
    (Kokoro引擎目录)
  • 有权限访问
    ~/.local/share/tts-telegram-sync/logs/
    (统一日志目录)

Known Issue Table

已知问题对照表

IssueLikely CauseDiagnosticFix
No audio outputStale TTS lock
stat /tmp/kokoro-tts.lock
rm -f /tmp/kokoro-tts.lock
Bot not respondingProcess crashed
pgrep -la 'bun.*src/main.ts'
Restart:
cd ~/.claude/automation/claude-telegram-sync && bun --watch run src/main.ts
Kokoro timeoutFirst-run model loadCheck
~/.cache/huggingface/
Wait for download, or re-run
kokoro-install.sh --install
Queue fullRapid-fire notificationsCheck queue depth in audit logIncrease
TTS_MAX_QUEUE_DEPTH
in mise.toml or drain queue
Lock stuck foreverHeartbeat process died
stat /tmp/kokoro-tts.lock
+
pgrep -x afplay
If lock stale >30s AND no audio process, rm lock
No MPS accelerationWrong Python/torch
python -c "import torch; print(torch.backends.mps.is_available())"
Reinstall torch via
kokoro-install.sh --upgrade
Double audio playbackLock race conditionCheck for multiple afplay processesKill all:
pkill -x afplay
, then restart

问题可能原因诊断方法修复方案
无音频输出TTS锁过期失效
stat /tmp/kokoro-tts.lock
rm -f /tmp/kokoro-tts.lock
机器人无响应进程崩溃
pgrep -la 'bun.*src/main.ts'
重启命令:
cd ~/.claude/automation/claude-telegram-sync && bun --watch run src/main.ts
Kokoro超时首次运行加载模型检查
~/.cache/huggingface/
等待下载完成,或重新执行
kokoro-install.sh --install
队列满短时间内大量通知检查审计日志中的队列深度调高mise.toml中的
TTS_MAX_QUEUE_DEPTH
配置,或清空队列
锁永久卡住心跳进程崩溃
stat /tmp/kokoro-tts.lock
+
pgrep -x afplay
如果锁过期超过30秒且无音频进程运行,删除锁
无MPS加速Python/torch版本错误
python -c "import torch; print(torch.backends.mps.is_available())"
执行
kokoro-install.sh --upgrade
重装torch
音频重复播放锁竞态条件检查是否存在多个afplay进程执行
pkill -x afplay
杀死所有相关进程后重启服务

Workflow Phases

工作流程阶段

Phase 1: Symptom Collection

阶段1: 症状收集

Use AskUserQuestion to understand what the user is experiencing. Key questions:
  • What happened? (no audio, wrong audio, bot silent, error message)
  • When did it start? (after upgrade, suddenly, always)
  • What were you doing? (clipboard read, Telegram notification, manual TTS)
使用AskUserQuestion了解用户遇到的问题,核心询问点:
  • 发生了什么问题?(无音频、音频错误、机器人无响应、报错信息)
  • 什么时候开始出现的?(升级后、突然出现、一直存在)
  • 出现问题时你正在做什么?(读取剪贴板、Telegram通知、手动触发TTS)

Phase 2: Automated Diagnostics

阶段2: 自动化诊断

Based on symptoms, run the relevant subset of these checks:
bash
undefined
根据症状,运行以下相关检查项:
bash
undefined

Lock state

锁状态

ls -la /tmp/kokoro-tts.lock 2>/dev/null && stat -f "%Sm" /tmp/kokoro-tts.lock || echo "No lock file"
ls -la /tmp/kokoro-tts.lock 2>/dev/null && stat -f "%Sm" /tmp/kokoro-tts.lock || echo "No lock file"

Audio processes

音频进程

pgrep -la afplay; pgrep -la say
pgrep -la afplay; pgrep -la say

Bot process

机器人进程

pgrep -la 'bun.*src/main.ts'
pgrep -la 'bun.*src/main.ts'

Kokoro health

Kokoro健康状态

~/.local/share/kokoro/.venv/bin/python -c "import kokoro; import torch; print(f'kokoro OK, MPS: {torch.backends.mps.is_available()}')"
~/.local/share/kokoro/.venv/bin/python -c "import kokoro; import torch; print(f'kokoro OK, MPS: {torch.backends.mps.is_available()}')"

Recent errors in audit log

审计日志中的近期错误

tail -20 ~/.local/share/tts-telegram-sync/logs/audit/*.ndjson 2>/dev/null | grep -i error
tail -20 ~/.local/share/tts-telegram-sync/logs/audit/*.ndjson 2>/dev/null | grep -i error

Recent bot console output

近期机器人控制台输出

tail -50 /private/tmp/telegram-bot.log 2>/dev/null | grep -i -E '(error|fail|timeout)'
undefined
tail -50 /private/tmp/telegram-bot.log 2>/dev/null | grep -i -E '(error|fail|timeout)'
undefined

Phase 3: Root Cause Analysis

阶段3: 根因分析

Map diagnostic output to the Known Issue Table above. Common patterns:
  • Lock file exists + mtime > 30s ago + no afplay = stale lock
  • No bot PID found = bot crashed
  • torch.backends.mps.is_available()
    returns False = MPS broken
  • Multiple afplay PIDs = race condition
将诊断输出对应到上方的已知问题对照表,常见模式:
  • 锁文件存在 + 修改时间超过30秒 + 无afplay进程 = 锁过期失效
  • 找不到机器人PID = 机器人崩溃
  • torch.backends.mps.is_available()
    返回False = MPS失效
  • 多个afplay PID = 竞态条件

Phase 4: Fix Application

阶段4: 应用修复方案

Apply the targeted fix from the Known Issue Table. Always use the least disruptive fix first.
应用已知问题对照表中对应的修复方案,优先选择影响最小的修复方式。

Phase 5: Verification

阶段5: 效果验证

After applying the fix, verify the issue is resolved:
bash
undefined
应用修复后,验证问题是否已解决:
bash
undefined

Quick TTS test

快速TTS测试

~/.local/share/kokoro/.venv/bin/python ~/.local/share/kokoro/tts_generate.py
--text "Diagnostic test complete" --voice af_heart --lang en-us --speed 1.0
--output /tmp/kokoro-tts-diag-test.wav && afplay /tmp/kokoro-tts-diag-test.wav && echo "OK"
~/.local/share/kokoro/.venv/bin/python ~/.local/share/kokoro/tts_generate.py
--text "Diagnostic test complete" --voice af_heart --lang en-us --speed 1.0
--output /tmp/kokoro-tts-diag-test.wav && afplay /tmp/kokoro-tts-diag-test.wav && echo "OK"

Full health check

全量健康检查

~/eon/cc-skills/plugins/tts-telegram-sync/scripts/kokoro-install.sh --health

---
~/eon/cc-skills/plugins/tts-telegram-sync/scripts/kokoro-install.sh --health

---

TodoWrite Task Templates

待办任务模板

1. [Symptoms] Collect symptoms via AskUserQuestion
2. [Triage] Map symptoms to likely causes
3. [Lock] Check TTS lock state (mtime, PID, stale detection)
4. [Process] Check bot process and audio processes
5. [Kokoro] Verify Kokoro venv and MPS availability
6. [Logs] Check recent audit logs for errors
7. [Fix] Apply targeted fix for identified root cause
8. [Verify] Run health check to confirm resolution

1. [症状收集] 通过AskUserQuestion收集问题症状
2. [问题分诊] 将症状匹配到可能的原因
3. [锁检查] 检查TTS锁状态(修改时间、PID、过期检测)
4. [进程检查] 检查机器人进程和音频进程
5. [Kokoro检查] 验证Kokoro虚拟环境和MPS可用性
6. [日志检查] 检查近期审计日志中的错误
7. [修复] 针对定位到的根因应用针对性修复
8. [验证] 运行健康检查确认问题已解决

Post-Change Checklist

变更后检查清单

  • Root cause identified and documented
  • Fix applied successfully
  • Health check passes
  • Test audio plays correctly
  • No stale locks or orphan processes remain

  • 根因已定位并记录
  • 修复成功应用
  • 健康检查通过
  • 测试音频播放正常
  • 无过期锁或孤立进程残留

Troubleshooting

故障排查

This skill IS the troubleshooting skill. If the standard diagnostics do not identify the issue:
  1. Check the full bot console log:
    cat /private/tmp/telegram-bot.log
  2. Check all NDJSON audit logs:
    ls -lt ~/.local/share/tts-telegram-sync/logs/audit/
  3. Check system audio:
    afplay /System/Library/Sounds/Tink.aiff
    (if this fails, it is a macOS audio issue, not TTS)
  4. Run a manual Kokoro generation outside the bot to isolate the problem
  5. If all else fails, do a full teardown and reinstall using
    clean-component-removal
    then
    full-stack-bootstrap

本技能本身就是故障排查工具,如果标准诊断无法定位问题:
  1. 查看完整的机器人控制台日志:
    cat /private/tmp/telegram-bot.log
  2. 查看所有NDJSON审计日志:
    ls -lt ~/.local/share/tts-telegram-sync/logs/audit/
  3. 检查系统音频:
    afplay /System/Library/Sounds/Tink.aiff
    (如果此命令失败,说明是macOS音频问题,不是TTS问题)
  4. 在机器人外手动运行Kokoro生成音频来隔离问题
  5. 如果以上方法都无效,使用
    clean-component-removal
    full-stack-bootstrap
    执行完整卸载和重装

Reference Documentation

参考文档

  • Common Issues -- Expanded diagnostic procedures for each known issue
  • Lock Debugging -- Deep dive into the two-layer lock mechanism
  • Evolution Log -- Change history for this skill
  • 常见问题 -- 每个已知问题的扩展诊断流程
  • 锁调试 -- 双层锁机制深度解析
  • 迭代日志 -- 本技能的变更历史