health-check-endpoints
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHealth Check Endpoints Skill
健康检查端点技能
Step 1 — Liveness Endpoint
步骤1 — 存活(Liveness)端点
GET /health/livePurpose: tells Kubernetes whether the process is alive. If this fails, the pod
is restarted. Do not check external dependencies here — a database outage should
not restart all pods.
go
// Go example
func livenessHandler(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
}Rules:
- Always returns unless the process is deadlocked or terminally broken.
200 OK - No database, cache, or broker calls.
- Response time < 5 ms.
GET /health/live用途:告知Kubernetes进程是否存活。如果检查失败,Pod将被重启。此处不要检查外部依赖——数据库故障不应导致所有Pod重启。
go
// Go example
func livenessHandler(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
}规则:
- 除非进程死锁或完全损坏,否则始终返回。
200 OK - 不进行数据库、缓存或消息代理调用。
- 响应时间 < 5 ms.
Step 2 — Readiness Endpoint
步骤2 — 就绪(Readiness)端点
GET /health/readyPurpose: tells Kubernetes whether the pod can accept traffic. If this fails,
the pod is removed from the load balancer (but not restarted). Check all
dependencies the service needs to serve requests.
go
// Go example
func readinessHandler(w http.ResponseWriter, r *http.Request) {
checks := map[string]CheckResult{}
overall := "ok"
// Database check
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
if err := db.PingContext(ctx); err != nil {
checks["db"] = CheckResult{Status: "fail", Error: err.Error()}
overall = "fail"
} else {
start := time.Now()
db.PingContext(ctx)
checks["db"] = CheckResult{Status: "ok", LatencyMs: time.Since(start).Milliseconds()}
}
// Redis check
if err := cache.Ping(r.Context()).Err(); err != nil {
checks["cache"] = CheckResult{Status: "fail", Error: err.Error()}
overall = "fail"
} else {
checks["cache"] = CheckResult{Status: "ok"}
}
status := http.StatusOK
if overall == "fail" {
status = http.StatusServiceUnavailable
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(status)
json.NewEncoder(w).Encode(HealthResponse{Status: overall, Checks: checks})
}GET /health/ready用途:告知Kubernetes Pod是否可以接收流量。如果检查失败,Pod将被从负载均衡器中移除(但不会重启)。检查服务处理请求所需的所有依赖项。
go
// Go example
func readinessHandler(w http.ResponseWriter, r *http.Request) {
checks := map[string]CheckResult{}
overall := "ok"
// Database check
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
if err := db.PingContext(ctx); err != nil {
checks["db"] = CheckResult{Status: "fail", Error: err.Error()}
overall = "fail"
} else {
start := time.Now()
db.PingContext(ctx)
checks["db"] = CheckResult{Status: "ok", LatencyMs: time.Since(start).Milliseconds()}
}
// Redis check
if err := cache.Ping(r.Context()).Err(); err != nil {
checks["cache"] = CheckResult{Status: "fail", Error: err.Error()}
overall = "fail"
} else {
checks["cache"] = CheckResult{Status: "ok"}
}
status := http.StatusOK
if overall == "fail" {
status = http.StatusServiceUnavailable
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(status)
json.NewEncoder(w).Encode(HealthResponse{Status: overall, Checks: checks})
}Step 3 — JSON Response Schema
步骤3 — JSON响应Schema
Both endpoints use the same response shape:
json
{
"status": "ok",
"checks": {
"db": {
"status": "ok",
"latency_ms": 2
},
"cache": {
"status": "ok",
"latency_ms": 1
},
"broker": {
"status": "fail",
"error": "connection refused"
}
}
}Top-level : | | .
Per-check : | .
HTTP status: for /, for .
status"ok""degraded""fail"status"ok""fail"200okdegraded503fail两个端点使用相同的响应格式:
json
{
"status": "ok",
"checks": {
"db": {
"status": "ok",
"latency_ms": 2
},
"cache": {
"status": "ok",
"latency_ms": 1
},
"broker": {
"status": "fail",
"error": "connection refused"
}
}
}顶层: | | 。
每个检查项的: | 。
HTTP状态码:/返回,返回。
status"ok""degraded""fail"status"ok""fail"okdegraded200fail503Step 4 — Kubernetes Probe Configuration
步骤4 — Kubernetes探针配置
yaml
undefinedyaml
undefineddeployment.yaml
deployment.yaml
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10 # give the app time to start
periodSeconds: 10
failureThreshold: 3 # restart after 3 consecutive failures
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3 # remove from LB after 3 failures
successThreshold: 1 # re-add after 1 success
Tune `initialDelaySeconds` to match the application's actual startup time.
Set `timeoutSeconds` to slightly above the dependency check timeout (e.g., 3 s).livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10 # give the app time to start
periodSeconds: 10
failureThreshold: 3 # restart after 3 consecutive failures
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3 # remove from LB after 3 failures
successThreshold: 1 # re-add after 1 success
调整`initialDelaySeconds`以匹配应用程序的实际启动时间。将`timeoutSeconds`设置为略高于依赖项检查的超时时间(例如3秒)。Step 5 — Circuit Breaker Integration
步骤5 — 断路器集成
When a dependency's circuit breaker is open, the readiness check for that
dependency should return :
"fail"go
if circuitBreaker.IsOpen("db") {
checks["db"] = CheckResult{Status: "fail", Error: "circuit open"}
overall = "fail"
}This prevents traffic from reaching the pod when the circuit breaker would reject
all requests anyway, reducing cascading error propagation.
当依赖项的断路器处于打开状态时,该依赖项的就绪检查应返回:
"fail"go
if circuitBreaker.IsOpen("db") {
checks["db"] = CheckResult{Status: "fail", Error: "circuit open"}
overall = "fail"
}这可以防止在断路器会拒绝所有请求的情况下将流量发送到Pod,减少级联错误的传播。
Verify
验证
- returns
GET /health/livein < 5 ms with no external calls.200 OK - returns
GET /health/readywhen a dependency is down.503 - returns
GET /health/readyonce dependencies recover.200 - Kubernetes and
livenessProbeare configured.readinessProbe - Dependency check timeouts are shorter than Kubernetes probe .
timeoutSeconds - Health endpoints are excluded from access logs (to avoid noise).
- 在< 5 ms内返回
GET /health/live,且无外部调用。200 OK - 当依赖项宕机时,返回
GET /health/ready。503 - 依赖项恢复后,返回
GET /health/ready。200 - 已配置Kubernetes的和
livenessProbe。readinessProbe - 依赖项检查超时时间短于Kubernetes探针的。
timeoutSeconds - 健康端点被排除在访问日志之外(避免日志冗余)。