resilience-review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseResilience Review
弹性评估
Evaluate how your application behaves when things go wrong — network failures, API errors, slow connections, missing data, and edge cases. Most apps are built for the happy path; this review systematically tests the unhappy paths that real users encounter.
评估应用在出现问题时的表现——网络故障、API错误、缓慢连接、数据缺失以及边缘情况。大多数应用都是针对理想场景构建的;本次评估会系统地测试真实用户遇到的非理想场景。
When to use
适用场景
Use when:
/resilience-review- Before launching a user-facing feature
- After adding new API integrations or data sources
- When reliability is critical (healthcare, finance, e-commerce checkout)
- After production incidents caused by unhandled errors
- When moving from prototype to production quality
在以下场景使用:
/resilience-review- 用户面向功能上线前
- 添加新的API集成或数据源后
- 可靠性要求极高的场景(医疗、金融、电商结账)
- 因未处理错误导致生产事故后
- 从原型向生产级质量迭代时
Standards Referenced
参考标准
- Google SRE Principles — Error budgets, graceful degradation
- Netflix Chaos Engineering Principles — Verify steady state, inject real-world failures
- OWASP Error Handling — Secure and user-friendly error responses
- Nielsen Norman Group — Error message usability heuristics
- Google SRE Principles —— 错误预算、优雅降级
- Netflix Chaos Engineering Principles —— 验证稳态、注入真实故障
- OWASP Error Handling —— 安全且用户友好的错误响应
- Nielsen Norman Group —— 错误消息可用性启发式原则
Phase Overview
阶段概述
Phase 1: EDUCATE → Why resilience matters and what we test
Phase 2: SCOPE → Map failure points, dependencies, critical flows
Phase 3: ANALYZE → Browser-based fault injection and edge case testing
Phase 4: REPORT → Findings with evidence and user impact assessment
Phase 5: REMEDIATE → Fix guidance + YAML regression testsPhase 1: EDUCATE → 弹性的重要性及测试内容
Phase 2: SCOPE → 映射故障点、依赖项、关键流程
Phase 3: ANALYZE → 基于浏览器的故障注入和边缘场景测试
Phase 4: REPORT → 包含证据和用户影响评估的测试结果
Phase 5: REMEDIATE → 修复指南 + YAML回归测试Phase 1: Educate
阶段1:认知普及
Why this matters: Users don't experience your app in ideal conditions. 53% of mobile visits are abandoned if a page takes >3 seconds. Error pages with no guidance increase support tickets 5x. A blank screen is the worst possible failure mode — it tells the user nothing and offers no recovery path. Resilient apps maintain trust even when backend systems fail.
This review simulates real-world failure conditions in the browser and evaluates how your UI responds.
重要性说明: 用户不会在理想环境下使用你的应用。如果页面加载时间超过3秒,53%的移动访问会被放弃。无引导的错误页面会使支持工单增加5倍。空白屏幕是最糟糕的故障模式——它无法向用户传递任何信息,也没有恢复路径。具备弹性的应用即使在后端系统故障时也能维持用户信任。
本次评估会在浏览器中模拟真实世界的故障场景,评估UI的响应情况。
Phase 2: Scope
阶段2:范围定义
Gather context
收集上下文信息
-
Auto-detect from codebase:
- API calls and their endpoints
- Error boundary components (React ErrorBoundary, Vue errorHandler)
- Loading state implementations (spinners, skeletons, suspense)
- Empty state components
- Retry logic / error recovery patterns
- Offline support (service workers, cache strategies)
- Third-party service dependencies
-
Ask the user (one at a time):
- Target URL: Where is the app running?
- Critical user flows: Which flows must never show a blank screen? (auto-detect from routes)
- Key API dependencies: Which APIs does the frontend depend on? (auto-detected)
- Known fragile areas: Any pages/features that break frequently? (optional)
-
Map failure points:
- API endpoints the frontend calls (and what happens if each fails)
- Third-party dependencies (CDN, auth provider, analytics, maps, payment)
- Data-dependent UI (what shows when data is empty, missing, or malformed)
- User input edge cases (long text, special characters, empty submissions)
-
从代码库自动检测:
- API调用及其端点
- 错误边界组件(React ErrorBoundary、Vue errorHandler)
- 加载状态实现(加载动画、骨架屏、Suspense)
- 空状态组件
- 重试逻辑/错误恢复模式
- 离线支持(Service Worker、缓存策略)
- 第三方服务依赖
-
向用户询问(逐一进行):
- 目标URL:应用运行在哪里?
- 关键用户流程:哪些流程绝对不能显示空白屏幕?(从路由自动检测)
- 核心API依赖:前端依赖哪些API?(已自动检测)
- 已知脆弱区域:是否存在频繁崩溃的页面/功能?(可选)
-
映射故障点:
- 前端调用的API端点(以及每个端点故障时的影响)
- 第三方依赖(CDN、认证提供商、分析工具、地图、支付服务)
- 依赖数据的UI(数据为空、缺失或格式错误时的显示情况)
- 用户输入边缘场景(长文本、特殊字符、空提交)
Phase 3: Analyze
阶段3:分析测试
Open a browser session with using . Run all applicable check categories.
new_sessionrecord_evidence: true使用打开浏览器会话,设置。运行所有适用的检查类别。
new_sessionrecord_evidence: trueCategory A: Error Handling (ERR)
类别A:错误处理(ERR)
| Check ID | Check | Standard | Method |
|---|---|---|---|
| ERR-01 | API errors show user-friendly message (not blank screen) | UX best practice | Mock API to return 500, check UI response |
| ERR-02 | Network timeout shows appropriate state | UX best practice | Mock network delay (30s), check UI |
| ERR-03 | 404 page exists and is helpful | UX best practice | Navigate to non-existent route |
| ERR-04 | JavaScript errors don't crash the page | Error boundaries | Inject JS error, check if page recovers |
| ERR-05 | Error messages are actionable | NN/g heuristics | Check error messages for: what happened, why, what to do |
| ERR-06 | Errors don't expose technical details | OWASP | Check error messages for stack traces, SQL, internal paths |
| ERR-07 | Form validation errors are clear and positioned | UX best practice | Submit invalid forms, check error placement and text |
| ERR-08 | Error states allow retry without page refresh | UX best practice | After error, check for retry button or recovery action |
| ERR-09 | Concurrent error handling (multiple simultaneous failures) | Resilience | Mock multiple API failures, check UI doesn't cascade |
| ERR-10 | Error logging doesn't expose PII | OWASP / Privacy | Check |
Browser validation: Use blocks to intercept network requests via to simulate failures. Check UI state after each failure. Use for JavaScript errors.
CODEpage.route()get_browser_console_logsjavascript
// Example: Mock API 500 error
await page.route('**/api/**', route => {
route.fulfill({ status: 500, body: JSON.stringify({ error: 'Internal Server Error' }) });
});| 检查ID | 检查项 | 参考标准 | 测试方法 |
|---|---|---|---|
| ERR-01 | API错误显示用户友好的提示信息(而非空白页面) | UX最佳实践 | 模拟API返回500错误,检查UI响应 |
| ERR-02 | 网络超时显示合适的状态 | UX最佳实践 | 模拟网络延迟(30秒),检查UI |
| ERR-03 | 404页面存在且具备引导性 | UX最佳实践 | 访问不存在的路由 |
| ERR-04 | JavaScript错误不会导致页面崩溃 | 错误边界机制 | 注入JS错误,检查页面是否恢复 |
| ERR-05 | 错误消息具备可操作性 | NN/g启发式原则 | 检查错误消息是否包含:发生了什么、原因、解决方案 |
| ERR-06 | 错误不会暴露技术细节 | OWASP | 检查错误消息是否包含堆栈跟踪、SQL语句、内部路径 |
| ERR-07 | 表单验证错误清晰且位置合理 | UX最佳实践 | 提交无效表单,检查错误提示的位置和文本 |
| ERR-08 | 错误状态支持无需刷新页面即可重试 | UX最佳实践 | 错误发生后,检查是否有重试按钮或恢复操作 |
| ERR-09 | 并发错误处理(多个故障同时发生) | 弹性能力 | 模拟多个API故障,检查UI不会出现连锁崩溃 |
| ERR-10 | 错误日志不会暴露个人可识别信息(PII) | OWASP / 隐私规范 | 错误发生时检查 |
浏览器验证: 使用块通过拦截网络请求来模拟故障。每次故障后检查UI状态。使用排查JavaScript错误。
CODEpage.route()get_browser_console_logsjavascript
// Example: Mock API 500 error
await page.route('**/api/**', route => {
route.fulfill({ status: 500, body: JSON.stringify({ error: 'Internal Server Error' }) });
});Category B: Graceful Degradation (DEG)
类别B:优雅降级(DEG)
| Check ID | Check | Standard | Method |
|---|---|---|---|
| DEG-01 | Page works with JavaScript disabled (basic content) | Progressive enhancement | Disable JS, check if content is accessible |
| DEG-02 | Page works on slow connection (3G simulation) | Performance | Throttle to Slow 3G, check load behavior |
| DEG-03 | Non-critical features degrade without breaking critical ones | Graceful degradation | Disable third-party scripts, check core functionality |
| DEG-04 | Offline state is handled (if applicable) | PWA best practice | Go offline, check UI state and messaging |
| DEG-05 | Third-party service failure doesn't block page load | Resilience | Block third-party domains, check page loads |
| DEG-06 | Image loading failure shows fallback | UX best practice | Block image URLs, check for alt text/placeholder |
| DEG-07 | Font loading failure doesn't hide text | FOUT handling | Block font URLs, check text remains visible |
| DEG-08 | Feature detection over browser sniffing | Progressive enhancement | Check code for |
Browser validation: Use to block specific resources. Use CDP to simulate network conditions. Disable JavaScript via browser settings. Verify each degradation scenario.
page.route()| 检查ID | 检查项 | 参考标准 | 测试方法 |
|---|---|---|---|
| DEG-01 | 禁用JavaScript后页面仍可正常显示基础内容 | 渐进式增强 | 禁用JS,检查内容是否可访问 |
| DEG-02 | 慢连接下页面可正常运行(3G模拟) | 性能标准 | 限速为Slow 3G,检查加载行为 |
| DEG-03 | 非核心功能故障不会影响核心功能运行 | 优雅降级 | 禁用第三方脚本,检查核心功能 |
| DEG-04 | 离线状态处理正常(若适用) | PWA最佳实践 | 切换到离线模式,检查UI状态和提示信息 |
| DEG-05 | 第三方服务故障不会阻塞页面加载 | 弹性能力 | 拦截第三方域名,检查页面是否能加载 |
| DEG-06 | 图片加载失败时显示 fallback 内容 | UX最佳实践 | 拦截图片URL,检查是否有替代文本/占位符 |
| DEG-07 | 字体加载失败不会隐藏文本 | FOUT处理规范 | 拦截字体URL,检查文本是否保持可见 |
| DEG-08 | 使用特性检测而非浏览器嗅探 | 渐进式增强 | 检查代码中是否使用 |
浏览器验证: 使用拦截特定资源。使用CDP模拟网络条件。通过浏览器设置禁用JavaScript。验证每个降级场景。
page.route()Category C: Empty & Edge States (EDGE)
类别C:空状态与边缘场景(EDGE)
| Check ID | Check | Standard | Method |
|---|---|---|---|
| EDGE-01 | Empty data state shows helpful message | UX best practice | Navigate to pages with no data, check display |
| EDGE-02 | Pagination handles zero results | UX best practice | Search for nonexistent term, check pagination |
| EDGE-03 | Long text doesn't break layout | Defensive CSS | Enter very long strings (500+ chars), check overflow |
| EDGE-04 | Special characters in input don't break UI | Input handling | Enter |
| EDGE-05 | Large data sets don't freeze UI | Performance | Load pages with maximum data, check responsiveness |
| EDGE-06 | Rapid user actions don't cause duplicate submissions | State management | Double-click submit buttons, rapid nav |
| EDGE-07 | Back/forward navigation maintains state | History management | Fill form, navigate away, come back |
| EDGE-08 | Refresh preserves expected state | State persistence | Refresh during multi-step flow, check state |
| EDGE-09 | Concurrent tab/session behavior | Session management | Open same page in two tabs, perform actions |
| EDGE-10 | Maximum file upload size handled | Input validation | Upload oversized file, check error message |
Browser validation: Navigate to pages and test each edge case. Use to interact with forms, submit empty/extreme data. Use JavaScript to check for UI overflow, frozen states.
act| 检查ID | 检查项 | 参考标准 | 测试方法 |
|---|---|---|---|
| EDGE-01 | 空数据状态显示有用的提示信息 | UX最佳实践 | 访问无数据页面,检查显示内容 |
| EDGE-02 | 分页功能可处理零结果场景 | UX最佳实践 | 搜索不存在的关键词,检查分页表现 |
| EDGE-03 | 长文本不会破坏页面布局 | 防御式CSS | 输入超长字符串(500+字符),检查溢出情况 |
| EDGE-04 | 输入特殊字符不会破坏UI | 输入处理规范 | 输入 |
| EDGE-05 | 大数据集不会导致UI冻结 | 性能标准 | 加载包含最大量数据的页面,检查响应性 |
| EDGE-06 | 快速用户操作不会导致重复提交 | 状态管理 | 双击提交按钮、快速导航 |
| EDGE-07 | 前进/后退导航可保留状态 | 历史管理 | 填写表单后导航离开,再返回 |
| EDGE-08 | 刷新页面可保留预期状态 | 状态持久化 | 在多步骤流程中刷新页面,检查状态 |
| EDGE-09 | 多标签/会话并发行为 | 会话管理 | 在两个标签页打开同一页面,执行操作 |
| EDGE-10 | 处理最大文件上传限制 | 输入验证 | 上传超大文件,检查错误提示 |
浏览器验证: 访问页面并测试每个边缘场景。使用与表单交互,提交空数据/极端数据。使用JavaScript检查UI溢出、冻结状态。
actCategory D: API Contract & Data Handling (API)
类别D:API契约与数据处理(API)
| Check ID | Check | Standard | Method |
|---|---|---|---|
| API-01 | UI handles all HTTP error codes gracefully | API contract | Mock 400, 401, 403, 404, 422, 429, 500, 503 |
| API-02 | UI handles null/undefined fields without crashing | Defensive coding | Mock API response with null fields |
| API-03 | UI handles empty arrays/objects | Defensive coding | Mock API response with empty collections |
| API-04 | UI handles unexpected data types | Defensive coding | Mock API response with wrong types |
| API-05 | Loading states shown during API calls | UX best practice | Add 2s delay to API, verify loading indicator |
| API-06 | Race conditions handled (stale responses) | State management | Trigger rapid sequential requests, verify latest wins |
| API-07 | Rate limiting (429) handled with user feedback | API contract | Mock 429 response, check UI feedback |
| API-08 | Authentication expiry handled mid-session | Session management | Mock 401 during session, check redirect to login |
Browser validation: Use to mock each response scenario. Verify UI state after each mock.
page.route()| 检查ID | 检查项 | 参考标准 | 测试方法 |
|---|---|---|---|
| API-01 | UI可优雅处理所有HTTP错误码 | API契约 | 模拟400、401、403、404、422、429、500、503错误 |
| API-02 | UI可处理null/undefined字段而不崩溃 | 防御式编码 | 模拟API返回包含null字段的响应 |
| API-03 | UI可处理空数组/对象 | 防御式编码 | 模拟API返回空集合的响应 |
| API-04 | UI可处理意外数据类型 | 防御式编码 | 模拟API返回错误类型的响应 |
| API-05 | API调用期间显示加载状态 | UX最佳实践 | 为API添加2秒延迟,验证加载指示器 |
| API-06 | 处理竞态条件(过期响应) | 状态管理 | 触发快速连续请求,验证最新响应生效 |
| API-07 | 限流(429)处理并提供用户反馈 | API契约 | 模拟429响应,检查UI反馈 |
| API-08 | 会话中认证过期处理正常 | 会话管理 | 会话中模拟401错误,检查是否重定向到登录页 |
浏览器验证: 使用模拟每个响应场景。每次模拟后验证UI状态。
page.route()Category E: Recovery & User Communication (REC)
类别E:恢复与用户沟通(REC)
| Check ID | Check | Standard | Method |
|---|---|---|---|
| REC-01 | Retry mechanisms exist for transient failures | Resilience | Mock intermittent failure, check auto-retry |
| REC-02 | User can manually retry after failure | UX best practice | After error, verify retry action available |
| REC-03 | Progress is not lost on errors | UX best practice | Fill long form, trigger error, check data persists |
| REC-04 | User is informed of degraded functionality | Communication | When features fail, check for degradation notice |
| REC-05 | Recovery actions are clear and accessible | NN/g heuristics | After each error type, evaluate recovery UX |
| REC-06 | Status indicators for background operations | UX best practice | Start async operation, verify progress feedback |
Browser validation: Use fault injection then verify recovery paths.
| 检查ID | 检查项 | 参考标准 | 测试方法 |
|---|---|---|---|
| REC-01 | 针对瞬时故障存在重试机制 | 弹性能力 | 模拟间歇性故障,检查自动重试 |
| REC-02 | 用户可在故障后手动重试 | UX最佳实践 | 错误发生后,验证是否有重试操作 |
| REC-03 | 错误发生时不会丢失进度 | UX最佳实践 | 填写长表单后触发错误,检查数据是否保留 |
| REC-04 | 向用户告知功能降级情况 | 沟通规范 | 功能故障时,检查是否有降级通知 |
| REC-05 | 恢复操作清晰且易于访问 | NN/g启发式原则 | 针对每种错误类型,评估恢复体验 |
| REC-06 | 后台操作提供状态指示器 | UX最佳实践 | 启动异步操作,验证进度反馈 |
浏览器验证: 注入故障后验证恢复路径。
Phase 4: Report
阶段4:生成报告
Generate a structured report saved to :
shiplight/reports/resilience-review-{date}.mdmarkdown
undefined生成结构化报告并保存至:
shiplight/reports/resilience-review-{date}.mdmarkdown
undefinedResilience Review Report
Resilience Review Report
Date: {date}
URL: {url}
Critical flows tested: {list}
API dependencies tested: {count}
Failure scenarios simulated: {count}
Date: {date}
URL: {url}
Critical flows tested: {list}
API dependencies tested: {count}
Failure scenarios simulated: {count}
Overall Score: {X}/10 | Confidence: {X}%
Overall Score: {X}/10 | Confidence: {X}%
Score Breakdown
Score Breakdown
| Category | Score | Findings |
|---|---|---|
| Error Handling (ERR) | 5/10 | 2 critical, 1 high |
| Graceful Degradation (DEG) | 6/10 | 1 high, 2 medium |
| Empty & Edge States (EDGE) | 4/10 | 1 critical, 3 high |
| API Contract (API) | 7/10 | 1 high, 1 medium |
| Recovery (REC) | 3/10 | 2 high, 1 medium |
| Category | Score | Findings |
|---|---|---|
| Error Handling (ERR) | 5/10 | 2 critical, 1 high |
| Graceful Degradation (DEG) | 6/10 | 1 high, 2 medium |
| Empty & Edge States (EDGE) | 4/10 | 1 critical, 3 high |
| API Contract (API) | 7/10 | 1 high, 1 medium |
| Recovery (REC) | 3/10 | 2 high, 1 medium |
Failure Matrix
Failure Matrix
| Failure Scenario | Expected Behavior | Actual Behavior | Status |
|---|---|---|---|
| API returns 500 | Error message + retry | Blank screen | FAIL |
| Network timeout | Loading → timeout message | Infinite spinner | FAIL |
| Empty data set | "No results" message | Blank page | FAIL |
| ... |
| Failure Scenario | Expected Behavior | Actual Behavior | Status |
|---|---|---|---|
| API returns 500 | Error message + retry | Blank screen | FAIL |
| Network timeout | Loading → timeout message | Infinite spinner | FAIL |
| Empty data set | "No results" message | Blank page | FAIL |
| ... |
Findings
Findings
(structured findings with evidence, screenshots of failure states)
undefined(structured findings with evidence, screenshots of failure states)
undefinedConfidence Scoring
置信度评分
- 90-100%: Fault injected and failure behavior verified in browser
- 70-89%: Code analysis shows missing error handling, not validated at runtime
- 50-69%: Pattern-based assessment (e.g., no error boundary detected)
- Below 50%: Don't report
- 90-100%:已在浏览器中注入故障并验证故障行为
- 70-89%:代码分析显示缺少错误处理,未在运行时验证
- 50-69%:基于模式的评估(例如未检测到错误边界)
- 低于50%:不生成报告
Phase 5: Remediate
阶段5:修复优化
1. Fix guidance (example)
1. 修复指南(示例)
markdown
undefinedmarkdown
undefinedERR-01: API error shows blank screen instead of error message
ERR-01: API错误显示空白页面而非错误提示
Impact: Users see empty page, think app is broken, leave
File: src/pages/Dashboard.tsx:45
Current:
Problem: No error handling — fetch throws on network error, .json() throws on non-JSON response
Fix:
const data = await fetch('/api/data').then(r => r.json())- Wrap in try/catch
- Add error state:
const [error, setError] = useState(null) - Render error UI with retry button
- Add React Error Boundary as fallback
undefined影响: 用户看到空白页面,认为应用已崩溃并离开
文件: src/pages/Dashboard.tsx:45
当前代码:
问题: 无错误处理——fetch在网络错误时抛出异常,.json()在非JSON响应时抛出异常
修复方案:
const data = await fetch('/api/data').then(r => r.json())- 包裹try/catch块
- 添加错误状态:
const [error, setError] = useState(null) - 渲染带重试按钮的错误UI
- 添加React Error Boundary作为兜底
undefined2. YAML regression test
2. YAML回归测试
yaml
- name: err-01-api-error-shows-message
description: Verify API failure shows user-friendly error message instead of blank screen
severity: critical
standard: UX-Error-Handling
steps:
- CODE: |
await page.route('**/api/data**', route => {
route.fulfill({
status: 500,
contentType: 'application/json',
body: JSON.stringify({ error: 'Internal Server Error' })
});
});
- URL: /dashboard
- WAIT_UNTIL: Page has finished attempting to load data
timeout_seconds: 15
- VERIFY: An error message is visible explaining that data could not be loaded
- VERIFY: A retry button or recovery action is available to the user
- VERIFY: The page is NOT blank — navigation and header are still visibleSave all YAML tests to .
shiplight/tests/resilience-review.test.yamlyaml
- name: err-01-api-error-shows-message
description: Verify API failure shows user-friendly error message instead of blank screen
severity: critical
standard: UX-Error-Handling
steps:
- CODE: |
await page.route('**/api/data**', route => {
route.fulfill({
status: 500,
contentType: 'application/json',
body: JSON.stringify({ error: 'Internal Server Error' })
});
});
- URL: /dashboard
- WAIT_UNTIL: Page has finished attempting to load data
timeout_seconds: 15
- VERIFY: An error message is visible explaining that data could not be loaded
- VERIFY: A retry button or recovery action is available to the user
- VERIFY: The page is NOT blank — navigation and header are still visible将所有YAML测试保存至。
shiplight/tests/resilience-review.test.yamlTips
提示
- Use in CODE blocks — it's the primary tool for fault injection
page.route() - Test the most critical user flows first (checkout, signup, core feature)
- A blank screen is always a CRITICAL finding — it's the worst failure mode
- Check for uncaught promise rejections — they indicate missing error handling
get_browser_console_logs - Edge case testing (EDGE category) often reveals the most bugs per minute spent
- Close session with and use
close_sessionfor evidencegenerate_html_report
- 在CODE块中使用——这是故障注入的核心工具
page.route() - 优先测试最关键的用户流程(结账、注册、核心功能)
- 空白屏幕始终是严重级问题——这是最糟糕的故障模式
- 检查寻找未捕获的Promise拒绝——这表明存在未处理的错误
get_browser_console_logs - 边缘场景测试(EDGE类别)通常能在最短时间内发现最多Bug
- 使用关闭会话,并通过
close_session生成证据报告generate_html_report