infrastructure-health-check
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWorks with docker-compose, Caddy, Pi-hole, and Cloudflare services.
适用于docker-compose、Caddy、Pi-hole和Cloudflare服务。
Infrastructure Health Check
基础设施健康检查
Comprehensive health verification for all network infrastructure services.
对所有网络基础设施服务进行全面的健康验证。
Quick Start
快速开始
Run a full infrastructure health check:
bash
cd /home/dawiddutoit/projects/network && ./scripts/health-check.shOr invoke this skill with: "Check infrastructure health" or "Is everything running?"
运行完整的基础设施健康检查:
bash
cd /home/dawiddutoit/projects/network && ./scripts/health-check.sh或者通过以下指令调用该技能:"检查基础设施健康状况" 或 "所有服务都在运行吗?"
Table of Contents
目录
- When to Use This Skill
- What This Skill Does
- Instructions
- 3.1 Docker Container Status
- 3.2 Caddy HTTPS Verification
- 3.3 Pi-hole DNS Check
- 3.4 Cloudflare Tunnel Status
- 3.5 Webhook Endpoint Test
- 3.6 SSL Certificate Validity
- 3.7 Cloudflare Access Verification
- 3.8 Generate Health Report
- Supporting Files
- Expected Outcomes
- Requirements
- Red Flags to Avoid
When to Use This Skill
何时使用该技能
Explicit Triggers:
- "Check infrastructure health"
- "Is everything running?"
- "Check service status"
- "Verify SSL certificates"
- "Check tunnel connection"
- "Diagnose network issues"
Implicit Triggers:
- After restarting Docker services
- After network configuration changes
- Before deploying new services
- When services seem unresponsive
Debugging Triggers:
- "Why can't I access pihole.temet.ai?"
- "Services are not responding"
- "SSL certificate errors"
- "Authentication not working"
明确触发场景:
- "检查基础设施健康状况"
- "所有服务都在运行吗?"
- "检查服务状态"
- "验证SSL证书"
- "检查隧道连接"
- "诊断网络问题"
隐含触发场景:
- 重启Docker服务后
- 修改网络配置后
- 部署新服务前
- 服务无响应时
调试触发场景:
- "为什么我无法访问pihole.temet.ai?"
- "服务无响应"
- "SSL证书错误"
- "认证失败"
What This Skill Does
该技能的功能
Performs 8 health checks and generates a comprehensive status report:
- Docker Containers - Verifies all containers are running and healthy
- Caddy HTTPS - Tests reverse proxy is serving HTTPS correctly
- Pi-hole DNS - Confirms DNS resolution is working
- Cloudflare Tunnel - Checks tunnel connectivity to Cloudflare
- Webhook Endpoint - Tests GitHub webhook accessibility
- SSL Certificates - Validates certificate validity and expiration
- Cloudflare Access - Verifies authentication is configured
- Overall Status - Aggregates results into pass/fail summary
执行8项健康检查并生成综合状态报告:
- Docker容器 - 验证所有容器是否正常运行且状态健康
- Caddy HTTPS - 测试反向代理是否正确提供HTTPS服务
- Pi-hole DNS - 确认DNS解析功能正常
- Cloudflare Tunnel - 检查与Cloudflare的隧道连通性
- Webhook端点 - 测试GitHub Webhook的可访问性
- SSL证书 - 验证证书的有效性和过期时间
- Cloudflare Access - 验证认证配置是否正确
- 整体状态 - 将所有结果汇总为通过/失败的总结报告
Instructions
操作说明
3.1 Docker Container Status
3.1 Docker容器状态
Check all containers are running:
bash
cd /home/dawiddutoit/projects/network && docker compose ps --format "table {{.Name}}\t{{.Status}}\t{{.Health}}"Expected containers:
| Container | Status | Purpose |
|---|---|---|
| pihole | Up (healthy) | DNS + Ad blocking |
| caddy | Up | Reverse proxy |
| cloudflared | Up | Cloudflare Tunnel |
| webhook | Up | GitHub auto-deploy |
Check for issues:
bash
docker compose ps --filter "status=exited"
docker compose ps --filter "health=unhealthy"检查所有容器是否运行:
bash
cd /home/dawiddutoit/projects/network && docker compose ps --format "table {{.Name}}\t{{.Status}}\t{{.Health}}"预期运行的容器:
| 容器名称 | 状态 | 用途 |
|---|---|---|
| pihole | Up (healthy) | DNS + 广告拦截 |
| caddy | Up | 反向代理 |
| cloudflared | Up | Cloudflare Tunnel |
| webhook | Up | GitHub自动部署 |
检查问题:
bash
docker compose ps --filter "status=exited"
docker compose ps --filter "health=unhealthy"3.2 Caddy HTTPS Verification
3.2 Caddy HTTPS验证
Test Caddy is serving HTTPS for each domain:
bash
undefined测试Caddy是否为每个域名提供HTTPS服务:
bash
undefinedTest Pi-hole
测试Pi-hole
curl -sI https://pihole.temet.ai --max-time 5 | head -1
curl -sI https://pihole.temet.ai --max-time 5 | head -1
Test Jaeger
测试Jaeger
curl -sI https://jaeger.temet.ai --max-time 5 | head -1
curl -sI https://jaeger.temet.ai --max-time 5 | head -1
Test Langfuse
测试Langfuse
curl -sI https://langfuse.temet.ai --max-time 5 | head -1
**Expected:** `HTTP/2 200` or `HTTP/2 302` (redirect to auth)
**Check Caddy logs for errors:**
```bash
docker logs caddy --tail 20 2>&1 | grep -iE "error|warn|fail"curl -sI https://langfuse.temet.ai --max-time 5 | head -1
**预期结果:** `HTTP/2 200` 或 `HTTP/2 302`(重定向至认证页面)
**查看Caddy日志中的错误:**
```bash
docker logs caddy --tail 20 2>&1 | grep -iE "error|warn|fail"3.3 Pi-hole DNS Check
3.3 Pi-hole DNS检查
Verify DNS resolution is working:
bash
undefined验证DNS解析功能是否正常:
bash
undefinedCheck Pi-hole can resolve local domains
检查Pi-hole能否解析本地域名
docker exec pihole dig +short @127.0.0.1 pihole.temet.ai
docker exec pihole dig +short @127.0.0.1 pihole.temet.ai
Check from host
从主机端检查
dig @localhost pihole.temet.ai +short
dig @localhost pihole.temet.ai +short
Check external DNS
检查外部DNS
dig @1.1.1.1 pihole.temet.ai +short
**Expected:** Returns IP address (192.168.68.135 for local, Cloudflare IP for external)
**Check Pi-hole status:**
```bash
docker exec pihole pihole statusdig @1.1.1.1 pihole.temet.ai +short
**预期结果:** 返回IP地址(本地为192.168.68.135,外部为Cloudflare IP)
**检查Pi-hole状态:**
```bash
docker exec pihole pihole status3.4 Cloudflare Tunnel Status
3.4 Cloudflare Tunnel状态
Verify tunnel is connected:
bash
undefined验证隧道是否已连接:
bash
undefinedCheck tunnel logs for connection status
查看隧道日志中的连接状态
docker logs cloudflared --tail 30 2>&1 | grep -iE "connected|registered|error|failed"
docker logs cloudflared --tail 30 2>&1 | grep -iE "connected|registered|error|failed"
Check tunnel process is running
检查隧道进程是否运行
docker exec cloudflared pgrep -f cloudflared
**Expected output contains:**
- `Registered tunnel connection` - Tunnel is connected
- `Connection ... registered` - Healthy connection
**Warning signs:**
- `connection failed` - Network issues
- `error` - Configuration problems
- No recent log entries - Process may be stuckdocker exec cloudflared pgrep -f cloudflared
**预期输出包含:**
- `Registered tunnel connection` - 隧道已连接
- `Connection ... registered` - 连接状态健康
**警示信号:**
- `connection failed` - 网络问题
- `error` - 配置问题
- 无最新日志条目 - 进程可能已停滞3.5 Webhook Endpoint Test
3.5 Webhook端点测试
Verify webhook is accessible:
bash
undefined验证Webhook是否可访问:
bash
undefinedTest webhook health endpoint locally
本地测试Webhook健康端点
Test via domain (if local)
通过域名测试(本地环境)
curl -sI https://webhook.temet.ai/hooks/health --max-time 5 | head -1
**Expected:** `OK` response or `HTTP/2 200`curl -sI https://webhook.temet.ai/hooks/health --max-time 5 | head -1
**预期结果:** 返回`OK`响应或`HTTP/2 200`3.6 SSL Certificate Validity
3.6 SSL证书有效性
Check certificate details for each domain:
bash
for domain in pihole jaeger langfuse ha code; do
echo "=== $domain.temet.ai ==="
echo | openssl s_client -servername $domain.temet.ai \
-connect $domain.temet.ai:443 2>/dev/null | \
openssl x509 -noout -dates -issuer 2>/dev/null || echo "FAILED"
echo
doneExpected output:
notBefore=<date>
notAfter=<date>
issuer=C = US, O = Let's Encrypt, CN = R11Check certificate expiration:
bash
undefined检查每个域名的证书详情:
bash
for domain in pihole jaeger langfuse ha code; do
echo "=== $domain.temet.ai ==="
echo | openssl s_client -servername $domain.temet.ai \
-connect $domain.temet.ai:443 2>/dev/null | \
openssl x509 -noout -dates -issuer 2>/dev/null || echo "FAILED"
echo
done预期输出:
notBefore=<日期>
notAfter=<日期>
issuer=C = US, O = Let's Encrypt, CN = R11检查证书过期时间:
bash
undefinedGet days until expiration
获取证书剩余有效期天数
for domain in pihole jaeger langfuse; do
echo -n "$domain.temet.ai: "
echo | openssl s_client -servername $domain.temet.ai
-connect $domain.temet.ai:443 2>/dev/null |
openssl x509 -noout -checkend 2592000 && echo "OK (>30 days)" || echo "RENEW SOON" done
-connect $domain.temet.ai:443 2>/dev/null |
openssl x509 -noout -checkend 2592000 && echo "OK (>30 days)" || echo "RENEW SOON" done
undefinedfor domain in pihole jaeger langfuse; do
echo -n "$domain.temet.ai: "
echo | openssl s_client -servername $domain.temet.ai
-connect $domain.temet.ai:443 2>/dev/null |
openssl x509 -noout -checkend 2592000 && echo "OK (>30天)" || echo "需尽快续签" done
-connect $domain.temet.ai:443 2>/dev/null |
openssl x509 -noout -checkend 2592000 && echo "OK (>30天)" || echo "需尽快续签" done
undefined3.7 Cloudflare Access Verification
3.7 Cloudflare Access验证
Check Access is configured for protected services:
bash
undefined检查受保护服务的Access配置:
bash
undefinedTest that Access is intercepting (should redirect to login)
测试Access是否拦截请求(应重定向至登录页面)
curl -sI https://pihole.temet.ai --max-time 5 | grep -E "^(HTTP|location|cf-)"
**Expected for protected services:**
- `HTTP/2 302` with redirect to cloudflareaccess.com login
- OR `HTTP/2 200` if already authenticated
**Check Access configuration via API:**
```bash
source /home/dawiddutoit/projects/network/.env
curl -s "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/access/apps" \
-H "Authorization: Bearer ${CLOUDFLARE_ACCESS_API_TOKEN}" | \
python3 -c "import sys,json; apps=json.load(sys.stdin).get('result',[]); print('\n'.join([f\"{a['name']}: {a['domain']}\" for a in apps]))"curl -sI https://pihole.temet.ai --max-time 5 | grep -E "^(HTTP|location|cf-)"
**受保护服务的预期结果:**
- `HTTP/2 302` 并重定向至cloudflareaccess.com登录页面
- 或`HTTP/2 200`(已完成认证)
**通过API检查Access配置:**
```bash
source /home/dawiddutoit/projects/network/.env
curl -s "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/access/apps" \
-H "Authorization: Bearer ${CLOUDFLARE_ACCESS_API_TOKEN}" | \
python3 -c "import sys,json; apps=json.load(sys.stdin).get('result',[]); print('\n'.join([f\"{a['name']}: {a['domain']}\" for a in apps]))"3.8 Generate Health Report
3.8 生成健康报告
Aggregate all checks into a summary report:
========================================
Infrastructure Health Report
Generated: $(date)
========================================
DOCKER CONTAINERS
-----------------
[PASS] pihole: running (healthy)
[PASS] caddy: running
[PASS] cloudflared: running
[PASS] webhook: running
HTTPS ENDPOINTS
---------------
[PASS] pihole.temet.ai: HTTP/2 200
[PASS] jaeger.temet.ai: HTTP/2 200
[PASS] langfuse.temet.ai: HTTP/2 200
DNS RESOLUTION
--------------
[PASS] Local DNS: 192.168.68.135
[PASS] External DNS: resolving via Cloudflare
CLOUDFLARE TUNNEL
-----------------
[PASS] Tunnel: connected
WEBHOOK
-------
[PASS] Endpoint: responding
SSL CERTIFICATES
----------------
[PASS] pihole.temet.ai: valid, expires in 67 days
[PASS] jaeger.temet.ai: valid, expires in 67 days
[PASS] langfuse.temet.ai: valid, expires in 67 days
CLOUDFLARE ACCESS
-----------------
[PASS] pihole.temet.ai: protected
[PASS] jaeger.temet.ai: protected
[PASS] langfuse.temet.ai: protected
[PASS] webhook.temet.ai: bypass (public)
========================================
Overall Status: ALL CHECKS PASSED
========================================将所有检查结果汇总为总结报告:
========================================
基础设施健康报告
生成时间: $(date)
========================================
DOCKER容器状态
-----------------
[通过] pihole: 运行中(健康)
[通过] caddy: 运行中
[通过] cloudflared: 运行中
[通过] webhook: 运行中
HTTPS端点状态
---------------
[通过] pihole.temet.ai: HTTP/2 200
[通过] jaeger.temet.ai: HTTP/2 200
[通过] langfuse.temet.ai: HTTP/2 200
DNS解析状态
--------------
[通过] 本地DNS: 192.168.68.135
[通过] 外部DNS: 通过Cloudflare解析
CLOUDFLARE隧道状态
-----------------
[通过] 隧道: 已连接
WEBHOOK状态
-------
[通过] 端点: 正常响应
SSL证书状态
----------------
[通过] pihole.temet.ai: 有效,剩余67天过期
[通过] jaeger.temet.ai: 有效,剩余67天过期
[通过] langfuse.temet.ai: 有效,剩余67天过期
CLOUDFLARE Access状态
-----------------
[通过] pihole.temet.ai: 已受保护
[通过] jaeger.temet.ai: 已受保护
[通过] langfuse.temet.ai: 已受保护
[通过] webhook.temet.ai: 绕过(公开访问)
========================================
整体状态: 所有检查均通过
========================================Supporting Files
支持文件
| File | Purpose |
|---|---|
| Automated health check script |
| Common issues and solutions |
| Example health check outputs |
| 文件 | 用途 |
|---|---|
| 自动化健康检查脚本 |
| 常见问题与解决方案 |
| 健康检查输出示例 |
Expected Outcomes
预期结果
Success (All Checks Pass):
- All 4 containers running
- HTTPS endpoints responding with 200/302
- DNS resolving correctly
- Tunnel connected to Cloudflare
- Webhook accessible
- Certificates valid with >30 days remaining
- Access configured for protected services
Partial Failure:
- One or more containers down -> Restart with
docker compose up -d - Certificate expiring soon -> Will auto-renew, monitor
- Access misconfigured -> Run
./scripts/cf-access-setup.sh setup
Critical Failure:
- Multiple containers down -> Check Docker daemon, disk space
- Tunnel disconnected -> Check internet, tunnel token
- DNS not resolving -> Check Pi-hole container, router DNS settings
- All certificates invalid -> Check Cloudflare API token
成功(所有检查通过):
- 4个容器全部运行
- HTTPS端点返回200/302响应
- DNS解析正常
- 隧道已连接至Cloudflare
- Webhook可访问
- 证书有效且剩余有效期超过30天
- 受保护服务已配置Access
部分失败:
- 一个或多个容器停止运行 -> 使用重启
docker compose up -d - 证书即将过期 -> 会自动续签,需监控状态
- Access配置错误 -> 运行修复
./scripts/cf-access-setup.sh setup
严重失败:
- 多个容器停止运行 -> 检查Docker守护进程、磁盘空间
- 隧道断开连接 -> 检查网络、隧道令牌
- DNS无法解析 -> 检查Pi-hole容器、路由器DNS设置
- 所有证书无效 -> 检查Cloudflare API令牌
Requirements
环境要求
Environment:
- Docker and Docker Compose running
- Access to
/home/dawiddutoit/projects/network - file with Cloudflare credentials
.env - Network connectivity
Services:
- pihole container
- caddy container
- cloudflared container
- webhook container
运行环境:
- Docker和Docker Compose已运行
- 可访问目录
/home/dawiddutoit/projects/network - 包含Cloudflare凭据的文件
.env - 网络连通性
依赖服务:
- pihole容器
- caddy容器
- cloudflared容器
- webhook容器
Red Flags to Avoid
需要避免的警示信号
- Do not ignore certificate expiration warnings
- Do not skip DNS checks when troubleshooting access issues
- Do not assume tunnel is connected without checking logs
- Do not run health checks without network connectivity
- Do not ignore container health status (unhealthy state)
- Do not forget to check both local and external DNS resolution
- Do not assume HTTP 302 is a failure (it's auth redirect)
- 不要忽略证书过期警告
- 排查访问问题时不要跳过DNS检查
- 不要在未查看日志的情况下假设隧道已连接
- 不要在无网络连通性时运行健康检查
- 不要忽略容器健康状态(不健康状态)
- 不要忘记同时检查本地和外部DNS解析
- 不要假设HTTP 302是失败(这是认证重定向)
Notes
注意事项
- Health checks should be run from the Pi (192.168.68.135) for accurate local results
- Remote access testing requires being outside the home network
- Certificate auto-renewal happens 30 days before expiration
- Cloudflare Tunnel reconnects automatically after brief disconnections
- Pi-hole DNS may cache results for up to 5 minutes
- Run for automated checking
./scripts/health-check.sh
- 为获得准确的本地检查结果,应在Pi设备(192.168.68.135)上运行健康检查
- 远程访问测试需要处于家庭网络之外
- 证书会在过期前30天自动续签
- Cloudflare Tunnel在短暂断开后会自动重连
- Pi-hole DNS的缓存结果最长保留5分钟
- 运行执行自动化检查
./scripts/health-check.sh