check-production
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese/check-production
/check-production
Audit production health. Output findings as structured report.
审计生产环境健康状态,输出结构化检查报告。
What This Does
功能说明
- Query Sentry for unresolved issues
- Check Vercel logs for recent errors
- Test health endpoints
- Check GitHub Actions for CI/CD failures
- Output prioritized findings (P0-P3)
This is a primitive. It only investigates and reports. Use to create GitHub issues or to fix.
/log-production-issues/triage- 查询Sentry中的未解决问题
- 检查Vercel日志中的近期错误
- 测试健康检查端点
- 检查GitHub Actions的CI/CD失败情况
- 输出按优先级划分的检查结果(P0-P3)
这是一个基础检查工具,仅负责调查和报告。可使用创建GitHub问题工单,或使用来修复问题。
/log-production-issues/triageProcess
检查流程
1. Sentry Check
1. Sentry检查
bash
undefinedbash
undefinedRun triage script if available
Run triage script if available
~/.claude/skills/triage/scripts/check_sentry.sh 2>/dev/null || echo "Sentry check unavailable"
Or spawn Sentry MCP query if configured.~/.claude/skills/triage/scripts/check_sentry.sh 2>/dev/null || echo "Sentry check unavailable"
如果已配置,也可触发Sentry MCP查询。2. Vercel Logs Check
2. Vercel日志检查
bash
undefinedbash
undefinedCheck for recent errors
Check for recent errors
~/.claude/skills/triage/scripts/check_vercel_logs.sh 2>/dev/null || vercel logs --output json 2>/dev/null | head -50
undefined~/.claude/skills/triage/scripts/check_vercel_logs.sh 2>/dev/null || vercel logs --output json 2>/dev/null | head -50
undefined3. Health Endpoints
3. 健康检查端点
bash
undefinedbash
undefinedTest health endpoint
Test health endpoint
~/.claude/skills/triage/scripts/check_health_endpoints.sh 2>/dev/null || curl -sf "$(grep NEXT_PUBLIC_APP_URL .env.local 2>/dev/null | cut -d= -f2)/api/health" | jq .
undefined~/.claude/skills/triage/scripts/check_health_endpoints.sh 2>/dev/null || curl -sf "$(grep NEXT_PUBLIC_APP_URL .env.local 2>/dev/null | cut -d= -f2)/api/health" | jq .
undefined4. GitHub CI/CD Check
4. GitHub CI/CD检查
bash
undefinedbash
undefinedCheck for failed workflow runs on default branch
Check for failed workflow runs on default branch
gh run list --branch main --status failure --limit 5 2>/dev/null ||
gh run list --branch master --status failure --limit 5 2>/dev/null
gh run list --branch master --status failure --limit 5 2>/dev/null
gh run list --branch main --status failure --limit 5 2>/dev/null ||
gh run list --branch master --status failure --limit 5 2>/dev/null
gh run list --branch master --status failure --limit 5 2>/dev/null
Get details on most recent failure
Get details on most recent failure
gh run list --status failure --limit 1 --json databaseId,name,conclusion,createdAt,headBranch 2>/dev/null
gh run list --status failure --limit 1 --json databaseId,name,conclusion,createdAt,headBranch 2>/dev/null
Check for stale/stuck workflows
Check for stale/stuck workflows
gh run list --status in_progress --json databaseId,name,createdAt 2>/dev/null
**What to look for:**
- Failed runs on main/master branch (broken CI)
- Failed runs on feature branches blocking PRs
- Stuck/in-progress runs that should have completed
- Patterns in failure types (tests, lint, build, deploy)gh run list --status in_progress --json databaseId,name,createdAt 2>/dev/null
**检查重点:**
- 主分支(main/master)的运行失败情况(CI流程断裂)
- 功能分支的运行失败情况(阻塞PR合并)
- 长时间处于运行中状态的停滞流程
- 失败类型的规律(测试、代码检查、构建、部署)5. Quick Application Checks
5. 快速应用检查
bash
undefinedbash
undefinedCheck for error handling gaps
Check for error handling gaps
grep -rE "catch\s*(\s*)" --include=".ts" --include=".tsx" src/ app/ 2>/dev/null | head -5
grep -rE "catch\s*(\s*)" --include=".ts" --include=".tsx" src/ app/ 2>/dev/null | head -5
Empty catch blocks = silent failures
Empty catch blocks = silent failures
undefinedundefinedOutput Format
输出格式
markdown
undefinedmarkdown
undefinedProduction Health Check
生产环境健康检查报告
P0: Critical (Active Production Issues)
P0:严重级别(当前生产环境问题)
- [SENTRY-123] PaymentIntent failed - 23 users affected (Score: 147) Location: api/checkout.ts:45 First seen: 2h ago
- [SENTRY-123] PaymentIntent失败 - 影响23位用户(评分:147) 位置:api/checkout.ts:45 首次出现时间:2小时前
P1: High (Degraded Performance / Broken CI)
P1:高优先级(性能下降/CI流程断裂)
- Health endpoint slow: /api/health responding in 2.3s (should be <500ms)
- Vercel logs show 5xx errors in last hour (count: 12)
- [CI] Main branch failing: "Build" workflow (run #1234) Failed step: "Type check" Error: Type 'string' is not assignable to type 'number'
- 健康检查端点响应缓慢:/api/health响应时间2.3秒(标准应<500毫秒)
- Vercel日志显示最近1小时内出现12次5xx错误
- [CI] 主分支运行失败:"Build"工作流(运行编号#1234) 失败步骤:"类型检查" 错误信息:类型'string'无法赋值给类型'number'
P2: Medium (Warnings)
P2:中优先级(警告)
- 3 empty catch blocks found (silent failures)
- Health endpoint missing database connectivity check
- [CI] 3 feature branch workflows failing (blocking PRs)
- 发现3个空catch块(静默失败隐患)
- 健康检查端点缺少数据库连通性检查
- [CI] 3个功能分支工作流失败(阻塞PR合并)
P3: Low (Improvements)
P3:低优先级(优化建议)
- Consider adding Sentry performance monitoring
- Health endpoint could include more service checks
- 建议添加Sentry性能监控
- 健康检查端点可增加更多服务状态检查
Summary
总结
- P0: 1 | P1: 3 | P2: 3 | P3: 2
- Recommendation: Fix P0 immediately, then fix main branch CI
undefined- P0:1项 | P1:3项 | P2:3项 | P3:2项
- 建议:立即修复P0问题,随后修复主分支CI流程
undefinedPriority Mapping
优先级映射
| Signal | Priority |
|---|---|
| Active errors affecting users | P0 |
| 5xx errors, slow responses | P1 |
| Main branch CI/CD failing | P1 |
| Feature branch CI blocking PRs | P2 |
| Silent failures, missing checks | P2 |
| Missing monitoring, improvements | P3 |
| 信号 | 优先级 |
|---|---|
| 影响用户的活跃错误 | P0 |
| 5xx错误、响应缓慢 | P1 |
| 主分支CI/CD失败 | P1 |
| 功能分支CI阻塞PR | P2 |
| 静默失败、缺失检查 | P2 |
| 缺失监控、优化建议 | P3 |
Health Endpoint Anti-Pattern
健康检查端点反模式
Health checks that lie are worse than no health check. Example:
typescript
// ❌ BAD: Reports "ok" without checking
return { status: "ok", services: { database: "ok" } };
// ✅ GOOD: Honest liveness probe (no fake service status)
return { status: "ok", timestamp: new Date().toISOString() };
// ✅ BETTER: Real readiness probe
const dbStatus = await checkDatabase() ? "ok" : "error";
return { status: dbStatus === "ok" ? "ok" : "degraded", services: { database: dbStatus } };If you can't verify a service, don't report on it. False "ok" status masks outages.
提供虚假状态的健康检查不如没有健康检查。示例:
typescript
// ❌ 错误示例:未做实际检查就返回"ok"
return { status: "ok", services: { database: "ok" } };
// ✅ 正确示例:真实的存活探针(不返回虚假服务状态)
return { status: "ok", timestamp: new Date().toISOString() };
// ✅ 更优示例:真实的就绪探针
const dbStatus = await checkDatabase() ? "ok" : "error";
return { status: dbStatus === "ok" ? "ok" : "degraded", services: { database: dbStatus } };如果无法验证某个服务的状态,不要上报该服务的状态。虚假的"ok"状态会掩盖故障。
Analytics Note
分析说明
This skill checks production health (errors, logs, endpoints), not product analytics.
For analytics auditing, see . Note:
/check-observability- PostHog is REQUIRED for product analytics (has MCP server)
- Vercel Analytics is NOT acceptable (no CLI/API/MCP - unusable for our workflow)
If you need to investigate user behavior or funnels during incident response, query PostHog via MCP.
本技能用于检查生产环境健康状态(错误、日志、端点),不涉及产品分析。
如需进行分析审计,请查看。注意:
/check-observability- PostHog是产品分析的必备工具(具备MCP服务器)
- Vercel Analytics不符合要求(无CLI/API/MCP,无法融入我们的工作流)
如果在事件响应过程中需要调查用户行为或转化漏斗,请通过MCP查询PostHog。
Related
相关技能
- - Create GitHub issues from findings
/log-production-issues - - Fix production issues
/triage - - Set up monitoring infrastructure
/observability
- - 根据检查结果创建GitHub问题工单
/log-production-issues - - 修复生产环境问题
/triage - - 搭建监控基础设施
/observability