websocket-client-resilience
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWebSocket Client Resilience
WebSocket客户端弹性机制
6 resilience patterns for WebSocket clients, extracted from real-world mobile network conditions.
Mobile WebSocket connections fail in ways that local development environments don't surface. P99 latency on 4G networks is 5-8 seconds. A 5-second health check timeout causes false positives on every slow network.
When to use: Implementing WebSocket client reconnection logic, building real-time features with persistent connections, mobile app WebSocket handling, any client that maintains long-lived server connections.
When not to use: Server-side WebSocket handlers, HTTP request/response patterns, Server-Sent Events (SSE).
6种WebSocket客户端弹性模式,源自真实移动网络环境下的实践经验。
移动网络中的WebSocket连接会出现本地开发环境中不会暴露的故障。4G网络的P99延迟为5-8秒。5秒的健康检查超时在所有慢网络中都会导致误判。
适用场景:实现WebSocket客户端重连逻辑、构建基于持久连接的实时功能、移动应用WebSocket处理、任何需要维持长连接的客户端。
不适用场景:服务端WebSocket处理器、HTTP请求/响应模式、Server-Sent Events (SSE)。
Rationalizations (Do Not Skip)
常见误区(请勿跳过)
| Rationalization | Why It's Wrong | Required Action |
|---|---|---|
| "Our users are on fast networks" | Mobile users exist. Even desktop WiFi has transient blips. | Test with throttled networks |
| "Simple retry is enough" | Without jitter, all clients retry at once after an outage | Add randomized jitter |
| "One missed heartbeat means disconnected" | Network blips last 1-3 seconds. Single miss = false positive. | Use hysteresis (2+ misses) |
| "We'll add resilience later" | Reconnection logic is foundational. Retrofitting it is much harder. | Build it in from the start |
| "5 seconds is plenty of timeout" | Mobile P99 is 5-8s. That "timeout" is normal latency for mobile. | Use 10s+ for mobile |
| 错误认知 | 问题所在 | 修正措施 |
|---|---|---|
| "我们的用户都在使用高速网络" | 存在移动用户。即使是桌面端WiFi也会有短暂中断。 | 在限速网络环境下进行测试 |
| "简单重试就足够了" | 没有抖动的话,故障恢复后所有客户端会同时重试 | 添加随机抖动 |
| "一次心跳丢失就意味着断开连接" | 网络中断通常持续1-3秒。单次丢失属于误判 | 使用滞后机制(需2次及以上丢失) |
| "以后再添加弹性机制" | 重连逻辑是基础功能。后期改造难度大得多 | 从项目初期就集成 |
| "5秒超时足够了" | 移动网络的P99延迟是5-8秒。这个"超时"在移动网络中属于正常延迟 | 移动端使用10秒以上的超时 |
Included Utilities
包含的工具函数
typescript
// WebSocket resilience pattern implementations (zero dependencies)
import {
getBackoffDelay,
circuitBreakerTransition,
shouldDisconnect,
CommandAckTracker,
detectSequenceGap,
classifyTimeout,
} from './resilience.ts';typescript
// WebSocket弹性模式实现(无依赖)
import {
getBackoffDelay,
circuitBreakerTransition,
shouldDisconnect,
CommandAckTracker,
detectSequenceGap,
classifyTimeout,
} from './resilience.ts';Quick Reference
快速参考
| Pattern | Detect | Fix | Severity |
|---|---|---|---|
| Backoff without jitter | | Add +/- 25% jitter | must-fail |
| No circuit breaker | Reconnect without failure counter | Trip after 5 failures, 60s cooldown | must-fail |
| Single heartbeat miss | | Require 2+ missed heartbeats | should-fail |
| No command ack | | Track pending commands, timeout at 30s | nice-to-have |
| No sequence tracking | | Track lastReceivedSequence, detect gaps | nice-to-have |
| Short mobile timeout | Health timeout < 10s | Use 10s+ for all health checks | must-fail |
| 模式 | 问题检测 | 修复方案 | 严重程度 |
|---|---|---|---|
| 无抖动的退避 | 使用 | 添加±25%的抖动 | 必须修复 |
| 无断路器 | 无失败计数直接重连 | 失败5次后触发断路,冷却时间60秒 | 必须修复 |
| 单次心跳丢失即断开 | 仅用 | 需要2次及以上心跳丢失才断开 | 应该修复 |
| 无命令确认 | 调用 | 跟踪待处理命令,设置30秒超时 | 建议修复 |
| 无序列跟踪 | | 跟踪lastReceivedSequence,检测间隙 | 建议修复 |
| 移动端短超时 | 健康检查超时小于10秒 | 所有健康检查使用10秒以上的超时 | 必须修复 |
Coverage
覆盖范围
| Pattern | Utility | Status |
|---|---|---|
| 1. Backoff with jitter | | Code + tests |
| 2. Circuit breaker | | Code + tests |
| 3. Heartbeat hysteresis | | Code + tests |
| 4. Command acknowledgment | | Code + tests |
| 5. Sequence gap detection | | Code + tests |
| 6. Mobile-aware timeouts | | Code + tests |
All 6 patterns have executable utilities and tests.
| 模式 | 工具函数 | 状态 |
|---|---|---|
| 1. 带抖动的退避 | | 已实现代码+测试 |
| 2. 断路器 | | 已实现代码+测试 |
| 3. 心跳滞后 | | 已实现代码+测试 |
| 4. 命令确认 | | 已实现代码+测试 |
| 5. 序列间隙检测 | | 已实现代码+测试 |
| 6. 移动感知超时 | | 已实现代码+测试 |
所有6种模式都有可执行的工具函数和测试用例。
Framework Adaptation
框架适配
These patterns are framework-agnostic. They work with:
- Browser: Native , Socket.IO, ws library
WebSocket - React/Vue/Svelte: Wrap in composable/hook
- React Native / Flutter: Same patterns, different APIs
- Node.js: library for server-to-server WebSocket clients
ws
The core principle: real-world network conditions are more variable than controlled environments. Design for mobile latency, not localhost.
See patterns.md for full before/after code examples and detection commands for each pattern.
这些模式与框架无关,适用于:
- 浏览器:原生、Socket.IO、ws库
WebSocket - React/Vue/Svelte:封装为可组合函数/钩子
- React Native / Flutter:模式相同,API不同
- Node.js:使用库实现服务端到服务端的WebSocket客户端
ws
核心原则:真实网络环境的变化远大于受控环境。需针对移动网络延迟进行设计,而非本地环境。
查看patterns.md获取每种模式的完整前后代码示例及检测命令。