Loading...
Loading...
Compare original and translation side by side
Basic Monitoring:
- [ ] Uptime monitoring (is site up?)
- [ ] Error tracking (are errors happening?)
- [ ] Performance monitoring (is it slow?)
- [ ] User activity (are people using it?)
- [ ] Critical alerts configured
- [ ] Check dashboard dailyBasic Monitoring:
- [ ] Uptime monitoring (is site up?)
- [ ] Error tracking (are errors happening?)
- [ ] Performance monitoring (is it slow?)
- [ ] User activity (are people using it?)
- [ ] Critical alerts configured
- [ ] Check dashboard daily1. Sign up for UptimeRobot
2. Add monitor for https://yourapp.com
3. Add your email for alerts
4. Get texted if site is down1. Sign up for UptimeRobot
2. Add monitor for https://yourapp.com
3. Add your email for alerts
4. Get texted if site is downTell AI:
"Add Sentry error tracking:
- Capture all frontend errors
- Capture all API errors
- Include user context
- Send to Sentry dashboard"Tell AI:
"Add Sentry error tracking:
- Capture all frontend errors
- Capture all API errors
- Include user context
- Send to Sentry dashboard"Configure monitoring alerts:
- Critical: Text to [phone]
- Important: Email to [email]
- Send summary: Daily at 9amConfigure monitoring alerts:
- Critical: Text to [phone]
- Important: Email to [email]
- Send summary: Daily at 9amDaily Check:
1. Open monitoring dashboard
2. Check uptime (should be 100% yesterday)
3. Check error count (any spikes?)
4. Check performance (slower than usual?)
5. Review any alerts from overnightDaily Check:
1. Open monitoring dashboard
2. Check uptime (should be 100% yesterday)
3. Check error count (any spikes?)
4. Check performance (slower than usual?)
5. Review any alerts from overnight1. Open error tracking dashboard (Sentry)
2. Find the most frequent error
3. Read error message and stack trace
4. Note: How many users affected?
5. Note: Started when?
6. Check: Did we deploy recently?Error in production:
[Paste error message and stack trace]
Affected: [X] users in last [Y] hours
Started: [timestamp]
Recent deploys: [any?]
Please:
1. Explain what's wrong
2. Propose hotfix
3. How to test before deploying1. Open error tracking dashboard (Sentry)
2. Find the most frequent error
3. Read error message and stack trace
4. Note: How many users affected?
5. Note: Started when?
6. Check: Did we deploy recently?Error in production:
[Paste error message and stack trace]
Affected: [X] users in last [Y] hours
Started: [timestamp]
Recent deploys: [any?]
Please:
1. Explain what's wrong
2. Propose hotfix
3. How to test before deployingUser Report Investigation:
1. Can you reproduce it?
2. Check monitoring for errors at that time
3. Check logs for that user
4. Check if others affected
5. Determine severity
Then use debug skill to fix.User reported: [issue description]
User: [email or ID]
Timestamp: [when it happened]
Check monitoring and logs for this user at this time.
What errors or issues do you see?User Report Investigation:
1. Can you reproduce it?
2. Check monitoring for errors at that time
3. Check logs for that user
4. Check if others affected
5. Determine severity
Then use debug skill to fix.User reported: [issue description]
User: [email or ID]
Timestamp: [when it happened]
Check monitoring and logs for this user at this time.
What errors or issues do you see?Weekly Review:
- [ ] Error trends (going up or down?)
- [ ] Performance trends (slower?)
- [ ] New error types introduced
- [ ] Uptime issues resolved
- [ ] Alert noise (too many false alerts?)Monthly Health:
- [ ] Compare to last month
- [ ] Any degradation?
- [ ] Any improvements?
- [ ] Monitoring gaps (what's not tracked?)Weekly Review:
- [ ] Error trends (going up or down?)
- [ ] Performance trends (slower?)
- [ ] New error types introduced
- [ ] Uptime issues resolved
- [ ] Alert noise (too many false alerts?)Monthly Health:
- [ ] Compare to last month
- [ ] Any degradation?
- [ ] Any improvements?
- [ ] Monitoring gaps (what's not tracked?)| Mistake | Fix |
|---|---|
| No monitoring set up | Set up before launch |
| Alert fatigue (too many alerts) | Only alert on critical issues |
| Checking once a month | Check daily (5 minutes) |
| Ignoring trends | Watch for degradation over time |
| No alerts configured | Set up text alerts for downtime |
| Monitoring but not acting | Use monitoring to find and fix issues |
| 误区 | 解决方法 |
|---|---|
| 未设置任何监控 | 上线前完成监控设置 |
| 告警疲劳(过多告警) | 仅针对紧急情况发送告警 |
| 每月仅检查一次 | 每日检查(5分钟即可) |
| 忽略趋势变化 | 关注长期的性能或错误率下降趋势 |
| 未配置告警 | 为宕机情况设置短信告警 |
| 仅监控但不处理 | 利用监控发现并修复问题 |
Add application logging:
- Log all errors with context
- Log API requests/responses
- Log slow operations (>1s)
- Log authentication events
- Don't log sensitive data
Format: JSON with timestamp, level, message, context
Send to: [Platform logs or external service]Add application logging:
- Log all errors with context
- Log API requests/responses
- Log slow operations (>1s)
- Log authentication events
- Don't log sensitive data
Format: JSON with timestamp, level, message, context
Send to: [Platform logs or external service]Add monitoring for [service]:
- Alert on failures
- Track success rate
- Log errors with contextAdd monitoring for [service]:
- Alert on failures
- Track success rate
- Log errors with contextIncident Response:
1. Acknowledge alert (mark as seen)
2. Assess severity:
- Critical: Site down, payments failing
- High: Errors affecting many users
- Medium: Isolated issues
3. Immediate action:
- Critical: Hotfix or rollback
- High: Fix within hours
- Medium: Fix in next deploy
4. Update users if needed
5. Post-mortem after resolved1. Assess impact (how many affected?)
2. Quick fix or rollback
3. Deploy hotfix
4. Verify fixed
5. Monitor closely for hour
6. Update status page if you have oneIncident Response:
1. Acknowledge alert (mark as seen)
2. Assess severity:
- Critical: Site down, payments failing
- High: Errors affecting many users
- Medium: Isolated issues
3. Immediate action:
- Critical: Hotfix or rollback
- High: Fix within hours
- Medium: Fix in next deploy
4. Update users if needed
5. Post-mortem after resolved1. Assess impact (how many affected?)
2. Quick fix or rollback
3. Deploy hotfix
4. Verify fixed
5. Monitor closely for hour
6. Update status page if you have one