health-check-endpoints

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Health Check Endpoints Skill

健康检查端点技能

Step 1 — Liveness Endpoint

步骤1 — 存活（Liveness）端点

GET /health/live

Purpose: tells Kubernetes whether the process is alive. If this fails, the pod is restarted. Do not check external dependencies here — a database outage should not restart all pods.

// Go example
func livenessHandler(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusOK)
    json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
}

Rules:

Always returns
```
200 OK
```
unless the process is deadlocked or terminally broken.
No database, cache, or broker calls.
Response time < 5 ms.

GET /health/live

用途：告知Kubernetes进程是否存活。如果检查失败，Pod将被重启。此处不要检查外部依赖——数据库故障不应导致所有Pod重启。

// Go example
func livenessHandler(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusOK)
    json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
}

规则:

除非进程死锁或完全损坏，否则始终返回
```
200 OK
```
。
不进行数据库、缓存或消息代理调用。
响应时间 < 5 ms.

Step 2 — Readiness Endpoint

步骤2 — 就绪（Readiness）端点

GET /health/ready

Purpose: tells Kubernetes whether the pod can accept traffic. If this fails, the pod is removed from the load balancer (but not restarted). Check all dependencies the service needs to serve requests.

// Go example
func readinessHandler(w http.ResponseWriter, r *http.Request) {
    checks := map[string]CheckResult{}
    overall := "ok"

    // Database check
    ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
    defer cancel()
    if err := db.PingContext(ctx); err != nil {
        checks["db"] = CheckResult{Status: "fail", Error: err.Error()}
        overall = "fail"
    } else {
        start := time.Now()
        db.PingContext(ctx)
        checks["db"] = CheckResult{Status: "ok", LatencyMs: time.Since(start).Milliseconds()}
    }

    // Redis check
    if err := cache.Ping(r.Context()).Err(); err != nil {
        checks["cache"] = CheckResult{Status: "fail", Error: err.Error()}
        overall = "fail"
    } else {
        checks["cache"] = CheckResult{Status: "ok"}
    }

    status := http.StatusOK
    if overall == "fail" {
        status = http.StatusServiceUnavailable
    }

    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(status)
    json.NewEncoder(w).Encode(HealthResponse{Status: overall, Checks: checks})
}

GET /health/ready

用途：告知Kubernetes Pod是否可以接收流量。如果检查失败，Pod将被从负载均衡器中移除（但不会重启）。检查服务处理请求所需的所有依赖项。

// Go example
func readinessHandler(w http.ResponseWriter, r *http.Request) {
    checks := map[string]CheckResult{}
    overall := "ok"

    // Database check
    ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
    defer cancel()
    if err := db.PingContext(ctx); err != nil {
        checks["db"] = CheckResult{Status: "fail", Error: err.Error()}
        overall = "fail"
    } else {
        start := time.Now()
        db.PingContext(ctx)
        checks["db"] = CheckResult{Status: "ok", LatencyMs: time.Since(start).Milliseconds()}
    }

    // Redis check
    if err := cache.Ping(r.Context()).Err(); err != nil {
        checks["cache"] = CheckResult{Status: "fail", Error: err.Error()}
        overall = "fail"
    } else {
        checks["cache"] = CheckResult{Status: "ok"}
    }

    status := http.StatusOK
    if overall == "fail" {
        status = http.StatusServiceUnavailable
    }

    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(status)
    json.NewEncoder(w).Encode(HealthResponse{Status: overall, Checks: checks})
}

Step 3 — JSON Response Schema

步骤3 — JSON响应Schema

Both endpoints use the same response shape:

json

{
  "status": "ok",
  "checks": {
    "db": {
      "status": "ok",
      "latency_ms": 2
    },
    "cache": {
      "status": "ok",
      "latency_ms": 1
    },
    "broker": {
      "status": "fail",
      "error": "connection refused"
    }
  }
}

Top-level

status

"ok"

"degraded"

"fail"

. Per-check

status

"ok"

"fail"

. HTTP status:

for

ok

degraded

for

fail

两个端点使用相同的响应格式：

json

{
  "status": "ok",
  "checks": {
    "db": {
      "status": "ok",
      "latency_ms": 2
    },
    "cache": {
      "status": "ok",
      "latency_ms": 1
    },
    "broker": {
      "status": "fail",
      "error": "connection refused"
    }
  }
}

顶层

status

：

"ok"

"degraded"

"fail"

。每个检查项的

status

：

"ok"

"fail"

。 HTTP状态码：

ok

degraded

，

fail

。

Step 4 — Kubernetes Probe Configuration

步骤4 — Kubernetes探针配置

yaml

undefined

yaml

undefined

deployment.yaml

livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 10 # give the app time to start periodSeconds: 10 failureThreshold: 3 # restart after 3 consecutive failures

readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 3 # remove from LB after 3 failures successThreshold: 1 # re-add after 1 success


Tune `initialDelaySeconds` to match the application's actual startup time.
Set `timeoutSeconds` to slightly above the dependency check timeout (e.g., 3 s).

livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 10 # give the app time to start periodSeconds: 10 failureThreshold: 3 # restart after 3 consecutive failures

readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 3 # remove from LB after 3 failures successThreshold: 1 # re-add after 1 success


调整`initialDelaySeconds`以匹配应用程序的实际启动时间。将`timeoutSeconds`设置为略高于依赖项检查的超时时间（例如3秒）。

Step 5 — Circuit Breaker Integration

步骤5 — 断路器集成

When a dependency's circuit breaker is open, the readiness check for that dependency should return

"fail"

if circuitBreaker.IsOpen("db") {
    checks["db"] = CheckResult{Status: "fail", Error: "circuit open"}
    overall = "fail"
}

This prevents traffic from reaching the pod when the circuit breaker would reject all requests anyway, reducing cascading error propagation.

当依赖项的断路器处于打开状态时，该依赖项的就绪检查应返回

"fail"

：

if circuitBreaker.IsOpen("db") {
    checks["db"] = CheckResult{Status: "fail", Error: "circuit open"}
    overall = "fail"
}

这可以防止在断路器会拒绝所有请求的情况下将流量发送到Pod，减少级联错误的传播。

Verify

验证

```
GET /health/live
```
returns
```
200 OK
```
in < 5 ms with no external calls.
```
GET /health/ready
```
returns
```
503
```
when a dependency is down.
```
GET /health/ready
```
returns
```
200
```
once dependencies recover.
Kubernetes
```
livenessProbe
```
and
```
readinessProbe
```
are configured.
Dependency check timeouts are shorter than Kubernetes probe
```
timeoutSeconds
```
.
Health endpoints are excluded from access logs (to avoid noise).

```
GET /health/live
```
在< 5 ms内返回
```
200 OK
```
，且无外部调用。
当依赖项宕机时，
```
GET /health/ready
```
返回
```
503
```
。
依赖项恢复后，
```
GET /health/ready
```
返回
```
200
```
。
已配置Kubernetes的
```
livenessProbe
```
和
```
readinessProbe
```
。
依赖项检查超时时间短于Kubernetes探针的
```
timeoutSeconds
```
。
健康端点被排除在访问日志之外（避免日志冗余）。