resilience-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Resilience Review

弹性评估

Evaluate how your application behaves when things go wrong — network failures, API errors, slow connections, missing data, and edge cases. Most apps are built for the happy path; this review systematically tests the unhappy paths that real users encounter.
评估应用在出现问题时的表现——网络故障、API错误、缓慢连接、数据缺失以及边缘情况。大多数应用都是针对理想场景构建的;本次评估会系统地测试真实用户遇到的非理想场景。

When to use

适用场景

Use
/resilience-review
when:
  • Before launching a user-facing feature
  • After adding new API integrations or data sources
  • When reliability is critical (healthcare, finance, e-commerce checkout)
  • After production incidents caused by unhandled errors
  • When moving from prototype to production quality
在以下场景使用
/resilience-review
  • 用户面向功能上线前
  • 添加新的API集成或数据源后
  • 可靠性要求极高的场景(医疗、金融、电商结账)
  • 因未处理错误导致生产事故后
  • 从原型向生产级质量迭代时

Standards Referenced

参考标准

  • Google SRE Principles — Error budgets, graceful degradation
  • Netflix Chaos Engineering Principles — Verify steady state, inject real-world failures
  • OWASP Error Handling — Secure and user-friendly error responses
  • Nielsen Norman Group — Error message usability heuristics
  • Google SRE Principles —— 错误预算、优雅降级
  • Netflix Chaos Engineering Principles —— 验证稳态、注入真实故障
  • OWASP Error Handling —— 安全且用户友好的错误响应
  • Nielsen Norman Group —— 错误消息可用性启发式原则

Phase Overview

阶段概述

Phase 1: EDUCATE   → Why resilience matters and what we test
Phase 2: SCOPE     → Map failure points, dependencies, critical flows
Phase 3: ANALYZE   → Browser-based fault injection and edge case testing
Phase 4: REPORT    → Findings with evidence and user impact assessment
Phase 5: REMEDIATE → Fix guidance + YAML regression tests

Phase 1: EDUCATE   → 弹性的重要性及测试内容
Phase 2: SCOPE     → 映射故障点、依赖项、关键流程
Phase 3: ANALYZE   → 基于浏览器的故障注入和边缘场景测试
Phase 4: REPORT    → 包含证据和用户影响评估的测试结果
Phase 5: REMEDIATE → 修复指南 + YAML回归测试

Phase 1: Educate

阶段1:认知普及

Why this matters: Users don't experience your app in ideal conditions. 53% of mobile visits are abandoned if a page takes >3 seconds. Error pages with no guidance increase support tickets 5x. A blank screen is the worst possible failure mode — it tells the user nothing and offers no recovery path. Resilient apps maintain trust even when backend systems fail.
This review simulates real-world failure conditions in the browser and evaluates how your UI responds.

重要性说明: 用户不会在理想环境下使用你的应用。如果页面加载时间超过3秒,53%的移动访问会被放弃。无引导的错误页面会使支持工单增加5倍。空白屏幕是最糟糕的故障模式——它无法向用户传递任何信息,也没有恢复路径。具备弹性的应用即使在后端系统故障时也能维持用户信任。
本次评估会在浏览器中模拟真实世界的故障场景,评估UI的响应情况。

Phase 2: Scope

阶段2:范围定义

Gather context

收集上下文信息

  1. Auto-detect from codebase:
    • API calls and their endpoints
    • Error boundary components (React ErrorBoundary, Vue errorHandler)
    • Loading state implementations (spinners, skeletons, suspense)
    • Empty state components
    • Retry logic / error recovery patterns
    • Offline support (service workers, cache strategies)
    • Third-party service dependencies
  2. Ask the user (one at a time):
    • Target URL: Where is the app running?
    • Critical user flows: Which flows must never show a blank screen? (auto-detect from routes)
    • Key API dependencies: Which APIs does the frontend depend on? (auto-detected)
    • Known fragile areas: Any pages/features that break frequently? (optional)
  3. Map failure points:
    • API endpoints the frontend calls (and what happens if each fails)
    • Third-party dependencies (CDN, auth provider, analytics, maps, payment)
    • Data-dependent UI (what shows when data is empty, missing, or malformed)
    • User input edge cases (long text, special characters, empty submissions)

  1. 从代码库自动检测:
    • API调用及其端点
    • 错误边界组件(React ErrorBoundary、Vue errorHandler)
    • 加载状态实现(加载动画、骨架屏、Suspense)
    • 空状态组件
    • 重试逻辑/错误恢复模式
    • 离线支持(Service Worker、缓存策略)
    • 第三方服务依赖
  2. 向用户询问(逐一进行):
    • 目标URL:应用运行在哪里?
    • 关键用户流程:哪些流程绝对不能显示空白屏幕?(从路由自动检测)
    • 核心API依赖:前端依赖哪些API?(已自动检测)
    • 已知脆弱区域:是否存在频繁崩溃的页面/功能?(可选)
  3. 映射故障点:
    • 前端调用的API端点(以及每个端点故障时的影响)
    • 第三方依赖(CDN、认证提供商、分析工具、地图、支付服务)
    • 依赖数据的UI(数据为空、缺失或格式错误时的显示情况)
    • 用户输入边缘场景(长文本、特殊字符、空提交)

Phase 3: Analyze

阶段3:分析测试

Open a browser session with
new_session
using
record_evidence: true
. Run all applicable check categories.
使用
new_session
打开浏览器会话,设置
record_evidence: true
。运行所有适用的检查类别。

Category A: Error Handling (ERR)

类别A:错误处理(ERR)

Check IDCheckStandardMethod
ERR-01API errors show user-friendly message (not blank screen)UX best practiceMock API to return 500, check UI response
ERR-02Network timeout shows appropriate stateUX best practiceMock network delay (30s), check UI
ERR-03404 page exists and is helpfulUX best practiceNavigate to non-existent route
ERR-04JavaScript errors don't crash the pageError boundariesInject JS error, check if page recovers
ERR-05Error messages are actionableNN/g heuristicsCheck error messages for: what happened, why, what to do
ERR-06Errors don't expose technical detailsOWASPCheck error messages for stack traces, SQL, internal paths
ERR-07Form validation errors are clear and positionedUX best practiceSubmit invalid forms, check error placement and text
ERR-08Error states allow retry without page refreshUX best practiceAfter error, check for retry button or recovery action
ERR-09Concurrent error handling (multiple simultaneous failures)ResilienceMock multiple API failures, check UI doesn't cascade
ERR-10Error logging doesn't expose PIIOWASP / PrivacyCheck
get_browser_console_logs
during errors
Browser validation: Use
CODE
blocks to intercept network requests via
page.route()
to simulate failures. Check UI state after each failure. Use
get_browser_console_logs
for JavaScript errors.
javascript
// Example: Mock API 500 error
await page.route('**/api/**', route => {
  route.fulfill({ status: 500, body: JSON.stringify({ error: 'Internal Server Error' }) });
});
检查ID检查项参考标准测试方法
ERR-01API错误显示用户友好的提示信息(而非空白页面)UX最佳实践模拟API返回500错误,检查UI响应
ERR-02网络超时显示合适的状态UX最佳实践模拟网络延迟(30秒),检查UI
ERR-03404页面存在且具备引导性UX最佳实践访问不存在的路由
ERR-04JavaScript错误不会导致页面崩溃错误边界机制注入JS错误,检查页面是否恢复
ERR-05错误消息具备可操作性NN/g启发式原则检查错误消息是否包含:发生了什么、原因、解决方案
ERR-06错误不会暴露技术细节OWASP检查错误消息是否包含堆栈跟踪、SQL语句、内部路径
ERR-07表单验证错误清晰且位置合理UX最佳实践提交无效表单,检查错误提示的位置和文本
ERR-08错误状态支持无需刷新页面即可重试UX最佳实践错误发生后,检查是否有重试按钮或恢复操作
ERR-09并发错误处理(多个故障同时发生)弹性能力模拟多个API故障,检查UI不会出现连锁崩溃
ERR-10错误日志不会暴露个人可识别信息(PII)OWASP / 隐私规范错误发生时检查
get_browser_console_logs
浏览器验证: 使用
CODE
块通过
page.route()
拦截网络请求来模拟故障。每次故障后检查UI状态。使用
get_browser_console_logs
排查JavaScript错误。
javascript
// Example: Mock API 500 error
await page.route('**/api/**', route => {
  route.fulfill({ status: 500, body: JSON.stringify({ error: 'Internal Server Error' }) });
});

Category B: Graceful Degradation (DEG)

类别B:优雅降级(DEG)

Check IDCheckStandardMethod
DEG-01Page works with JavaScript disabled (basic content)Progressive enhancementDisable JS, check if content is accessible
DEG-02Page works on slow connection (3G simulation)PerformanceThrottle to Slow 3G, check load behavior
DEG-03Non-critical features degrade without breaking critical onesGraceful degradationDisable third-party scripts, check core functionality
DEG-04Offline state is handled (if applicable)PWA best practiceGo offline, check UI state and messaging
DEG-05Third-party service failure doesn't block page loadResilienceBlock third-party domains, check page loads
DEG-06Image loading failure shows fallbackUX best practiceBlock image URLs, check for alt text/placeholder
DEG-07Font loading failure doesn't hide textFOUT handlingBlock font URLs, check text remains visible
DEG-08Feature detection over browser sniffingProgressive enhancementCheck code for
navigator.userAgent
vs feature detection
Browser validation: Use
page.route()
to block specific resources. Use CDP to simulate network conditions. Disable JavaScript via browser settings. Verify each degradation scenario.
检查ID检查项参考标准测试方法
DEG-01禁用JavaScript后页面仍可正常显示基础内容渐进式增强禁用JS,检查内容是否可访问
DEG-02慢连接下页面可正常运行(3G模拟)性能标准限速为Slow 3G,检查加载行为
DEG-03非核心功能故障不会影响核心功能运行优雅降级禁用第三方脚本,检查核心功能
DEG-04离线状态处理正常(若适用)PWA最佳实践切换到离线模式,检查UI状态和提示信息
DEG-05第三方服务故障不会阻塞页面加载弹性能力拦截第三方域名,检查页面是否能加载
DEG-06图片加载失败时显示 fallback 内容UX最佳实践拦截图片URL,检查是否有替代文本/占位符
DEG-07字体加载失败不会隐藏文本FOUT处理规范拦截字体URL,检查文本是否保持可见
DEG-08使用特性检测而非浏览器嗅探渐进式增强检查代码中是否使用
navigator.userAgent
而非特性检测
浏览器验证: 使用
page.route()
拦截特定资源。使用CDP模拟网络条件。通过浏览器设置禁用JavaScript。验证每个降级场景。

Category C: Empty & Edge States (EDGE)

类别C:空状态与边缘场景(EDGE)

Check IDCheckStandardMethod
EDGE-01Empty data state shows helpful messageUX best practiceNavigate to pages with no data, check display
EDGE-02Pagination handles zero resultsUX best practiceSearch for nonexistent term, check pagination
EDGE-03Long text doesn't break layoutDefensive CSSEnter very long strings (500+ chars), check overflow
EDGE-04Special characters in input don't break UIInput handlingEnter
<script>
,
"'&<>
, emoji, Unicode
EDGE-05Large data sets don't freeze UIPerformanceLoad pages with maximum data, check responsiveness
EDGE-06Rapid user actions don't cause duplicate submissionsState managementDouble-click submit buttons, rapid nav
EDGE-07Back/forward navigation maintains stateHistory managementFill form, navigate away, come back
EDGE-08Refresh preserves expected stateState persistenceRefresh during multi-step flow, check state
EDGE-09Concurrent tab/session behaviorSession managementOpen same page in two tabs, perform actions
EDGE-10Maximum file upload size handledInput validationUpload oversized file, check error message
Browser validation: Navigate to pages and test each edge case. Use
act
to interact with forms, submit empty/extreme data. Use JavaScript to check for UI overflow, frozen states.
检查ID检查项参考标准测试方法
EDGE-01空数据状态显示有用的提示信息UX最佳实践访问无数据页面,检查显示内容
EDGE-02分页功能可处理零结果场景UX最佳实践搜索不存在的关键词,检查分页表现
EDGE-03长文本不会破坏页面布局防御式CSS输入超长字符串(500+字符),检查溢出情况
EDGE-04输入特殊字符不会破坏UI输入处理规范输入
<script>
"'&<>
, 表情、Unicode字符
EDGE-05大数据集不会导致UI冻结性能标准加载包含最大量数据的页面,检查响应性
EDGE-06快速用户操作不会导致重复提交状态管理双击提交按钮、快速导航
EDGE-07前进/后退导航可保留状态历史管理填写表单后导航离开,再返回
EDGE-08刷新页面可保留预期状态状态持久化在多步骤流程中刷新页面,检查状态
EDGE-09多标签/会话并发行为会话管理在两个标签页打开同一页面,执行操作
EDGE-10处理最大文件上传限制输入验证上传超大文件,检查错误提示
浏览器验证: 访问页面并测试每个边缘场景。使用
act
与表单交互,提交空数据/极端数据。使用JavaScript检查UI溢出、冻结状态。

Category D: API Contract & Data Handling (API)

类别D:API契约与数据处理(API)

Check IDCheckStandardMethod
API-01UI handles all HTTP error codes gracefullyAPI contractMock 400, 401, 403, 404, 422, 429, 500, 503
API-02UI handles null/undefined fields without crashingDefensive codingMock API response with null fields
API-03UI handles empty arrays/objectsDefensive codingMock API response with empty collections
API-04UI handles unexpected data typesDefensive codingMock API response with wrong types
API-05Loading states shown during API callsUX best practiceAdd 2s delay to API, verify loading indicator
API-06Race conditions handled (stale responses)State managementTrigger rapid sequential requests, verify latest wins
API-07Rate limiting (429) handled with user feedbackAPI contractMock 429 response, check UI feedback
API-08Authentication expiry handled mid-sessionSession managementMock 401 during session, check redirect to login
Browser validation: Use
page.route()
to mock each response scenario. Verify UI state after each mock.
检查ID检查项参考标准测试方法
API-01UI可优雅处理所有HTTP错误码API契约模拟400、401、403、404、422、429、500、503错误
API-02UI可处理null/undefined字段而不崩溃防御式编码模拟API返回包含null字段的响应
API-03UI可处理空数组/对象防御式编码模拟API返回空集合的响应
API-04UI可处理意外数据类型防御式编码模拟API返回错误类型的响应
API-05API调用期间显示加载状态UX最佳实践为API添加2秒延迟,验证加载指示器
API-06处理竞态条件(过期响应)状态管理触发快速连续请求,验证最新响应生效
API-07限流(429)处理并提供用户反馈API契约模拟429响应,检查UI反馈
API-08会话中认证过期处理正常会话管理会话中模拟401错误,检查是否重定向到登录页
浏览器验证: 使用
page.route()
模拟每个响应场景。每次模拟后验证UI状态。

Category E: Recovery & User Communication (REC)

类别E:恢复与用户沟通(REC)

Check IDCheckStandardMethod
REC-01Retry mechanisms exist for transient failuresResilienceMock intermittent failure, check auto-retry
REC-02User can manually retry after failureUX best practiceAfter error, verify retry action available
REC-03Progress is not lost on errorsUX best practiceFill long form, trigger error, check data persists
REC-04User is informed of degraded functionalityCommunicationWhen features fail, check for degradation notice
REC-05Recovery actions are clear and accessibleNN/g heuristicsAfter each error type, evaluate recovery UX
REC-06Status indicators for background operationsUX best practiceStart async operation, verify progress feedback
Browser validation: Use fault injection then verify recovery paths.

检查ID检查项参考标准测试方法
REC-01针对瞬时故障存在重试机制弹性能力模拟间歇性故障,检查自动重试
REC-02用户可在故障后手动重试UX最佳实践错误发生后,验证是否有重试操作
REC-03错误发生时不会丢失进度UX最佳实践填写长表单后触发错误,检查数据是否保留
REC-04向用户告知功能降级情况沟通规范功能故障时,检查是否有降级通知
REC-05恢复操作清晰且易于访问NN/g启发式原则针对每种错误类型,评估恢复体验
REC-06后台操作提供状态指示器UX最佳实践启动异步操作,验证进度反馈
浏览器验证: 注入故障后验证恢复路径。

Phase 4: Report

阶段4:生成报告

Generate a structured report saved to
shiplight/reports/resilience-review-{date}.md
:
markdown
undefined
生成结构化报告并保存至
shiplight/reports/resilience-review-{date}.md
markdown
undefined

Resilience Review Report

Resilience Review Report

Date: {date} URL: {url} Critical flows tested: {list} API dependencies tested: {count} Failure scenarios simulated: {count}
Date: {date} URL: {url} Critical flows tested: {list} API dependencies tested: {count} Failure scenarios simulated: {count}

Overall Score: {X}/10 | Confidence: {X}%

Overall Score: {X}/10 | Confidence: {X}%

Score Breakdown

Score Breakdown

CategoryScoreFindings
Error Handling (ERR)5/102 critical, 1 high
Graceful Degradation (DEG)6/101 high, 2 medium
Empty & Edge States (EDGE)4/101 critical, 3 high
API Contract (API)7/101 high, 1 medium
Recovery (REC)3/102 high, 1 medium
CategoryScoreFindings
Error Handling (ERR)5/102 critical, 1 high
Graceful Degradation (DEG)6/101 high, 2 medium
Empty & Edge States (EDGE)4/101 critical, 3 high
API Contract (API)7/101 high, 1 medium
Recovery (REC)3/102 high, 1 medium

Failure Matrix

Failure Matrix

Failure ScenarioExpected BehaviorActual BehaviorStatus
API returns 500Error message + retryBlank screenFAIL
Network timeoutLoading → timeout messageInfinite spinnerFAIL
Empty data set"No results" messageBlank pageFAIL
...
Failure ScenarioExpected BehaviorActual BehaviorStatus
API returns 500Error message + retryBlank screenFAIL
Network timeoutLoading → timeout messageInfinite spinnerFAIL
Empty data set"No results" messageBlank pageFAIL
...

Findings

Findings

(structured findings with evidence, screenshots of failure states)
undefined
(structured findings with evidence, screenshots of failure states)
undefined

Confidence Scoring

置信度评分

  • 90-100%: Fault injected and failure behavior verified in browser
  • 70-89%: Code analysis shows missing error handling, not validated at runtime
  • 50-69%: Pattern-based assessment (e.g., no error boundary detected)
  • Below 50%: Don't report

  • 90-100%:已在浏览器中注入故障并验证故障行为
  • 70-89%:代码分析显示缺少错误处理,未在运行时验证
  • 50-69%:基于模式的评估(例如未检测到错误边界)
  • 低于50%:不生成报告

Phase 5: Remediate

阶段5:修复优化

1. Fix guidance (example)

1. 修复指南(示例)

markdown
undefined
markdown
undefined

ERR-01: API error shows blank screen instead of error message

ERR-01: API错误显示空白页面而非错误提示

Impact: Users see empty page, think app is broken, leave File: src/pages/Dashboard.tsx:45 Current:
const data = await fetch('/api/data').then(r => r.json())
Problem: No error handling — fetch throws on network error, .json() throws on non-JSON response Fix:
  • Wrap in try/catch
  • Add error state:
    const [error, setError] = useState(null)
  • Render error UI with retry button
  • Add React Error Boundary as fallback
undefined
影响: 用户看到空白页面,认为应用已崩溃并离开 文件: src/pages/Dashboard.tsx:45 当前代码:
const data = await fetch('/api/data').then(r => r.json())
问题: 无错误处理——fetch在网络错误时抛出异常,.json()在非JSON响应时抛出异常 修复方案:
  • 包裹try/catch块
  • 添加错误状态:
    const [error, setError] = useState(null)
  • 渲染带重试按钮的错误UI
  • 添加React Error Boundary作为兜底
undefined

2. YAML regression test

2. YAML回归测试

yaml
- name: err-01-api-error-shows-message
  description: Verify API failure shows user-friendly error message instead of blank screen
  severity: critical
  standard: UX-Error-Handling
  steps:
    - CODE: |
        await page.route('**/api/data**', route => {
          route.fulfill({
            status: 500,
            contentType: 'application/json',
            body: JSON.stringify({ error: 'Internal Server Error' })
          });
        });
    - URL: /dashboard
    - WAIT_UNTIL: Page has finished attempting to load data
      timeout_seconds: 15
    - VERIFY: An error message is visible explaining that data could not be loaded
    - VERIFY: A retry button or recovery action is available to the user
    - VERIFY: The page is NOT blank — navigation and header are still visible
Save all YAML tests to
shiplight/tests/resilience-review.test.yaml
.

yaml
- name: err-01-api-error-shows-message
  description: Verify API failure shows user-friendly error message instead of blank screen
  severity: critical
  standard: UX-Error-Handling
  steps:
    - CODE: |
        await page.route('**/api/data**', route => {
          route.fulfill({
            status: 500,
            contentType: 'application/json',
            body: JSON.stringify({ error: 'Internal Server Error' })
          });
        });
    - URL: /dashboard
    - WAIT_UNTIL: Page has finished attempting to load data
      timeout_seconds: 15
    - VERIFY: An error message is visible explaining that data could not be loaded
    - VERIFY: A retry button or recovery action is available to the user
    - VERIFY: The page is NOT blank — navigation and header are still visible
将所有YAML测试保存至
shiplight/tests/resilience-review.test.yaml

Tips

提示

  • Use
    page.route()
    in CODE blocks — it's the primary tool for fault injection
  • Test the most critical user flows first (checkout, signup, core feature)
  • A blank screen is always a CRITICAL finding — it's the worst failure mode
  • Check
    get_browser_console_logs
    for uncaught promise rejections — they indicate missing error handling
  • Edge case testing (EDGE category) often reveals the most bugs per minute spent
  • Close session with
    close_session
    and use
    generate_html_report
    for evidence
  • 在CODE块中使用
    page.route()
    ——这是故障注入的核心工具
  • 优先测试最关键的用户流程(结账、注册、核心功能)
  • 空白屏幕始终是严重级问题——这是最糟糕的故障模式
  • 检查
    get_browser_console_logs
    寻找未捕获的Promise拒绝——这表明存在未处理的错误
  • 边缘场景测试(EDGE类别)通常能在最短时间内发现最多Bug
  • 使用
    close_session
    关闭会话,并通过
    generate_html_report
    生成证据报告