infrastructure-monitor
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseInfrastructure Monitor
基础设施监控
Set up comprehensive monitoring and observability.
搭建全面的监控与可观测体系。
Quick Start
快速开始
Use Prometheus for metrics, Grafana for dashboards, Loki for logs, set up alerts for critical issues.
使用Prometheus采集指标,Grafana搭建仪表盘,Loki处理日志,并为关键问题配置告警。
Instructions
操作指南
Metrics with Prometheus
基于Prometheus的指标采集
Application instrumentation:
javascript
const prometheus = require('prom-client');
const httpRequestDuration = new prometheus.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code']
});
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
httpRequestDuration.labels(req.method, req.route?.path, res.statusCode).observe(duration);
});
next();
});Prometheus config:
yaml
scrape_configs:
- job_name: 'app'
static_configs:
- targets: ['app:3000']
scrape_interval: 15s应用程序埋点:
javascript
const prometheus = require('prom-client');
const httpRequestDuration = new prometheus.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code']
});
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
httpRequestDuration.labels(req.method, req.route?.path, res.statusCode).observe(duration);
});
next();
});Prometheus配置:
yaml
scrape_configs:
- job_name: 'app'
static_configs:
- targets: ['app:3000']
scrape_interval: 15sDashboards with Grafana
基于Grafana的仪表盘
Key metrics to monitor:
- Request rate (requests/second)
- Error rate (errors/total requests)
- Response time (p50, p95, p99)
- CPU and memory usage
- Database query time
需监控的关键指标:
- 请求速率(请求/秒)
- 错误率(错误数/总请求数)
- 响应时间(p50、p95、p99分位数)
- CPU与内存使用率
- 数据库查询耗时
Logging with Loki
基于Loki的日志管理
Structured logging:
javascript
const winston = require('winston');
const logger = winston.createLogger({
format: winston.format.json(),
transports: [
new winston.transports.Console()
]
});
logger.info('User logged in', { userId: user.id, ip: req.ip });结构化日志:
javascript
const winston = require('winston');
const logger = winston.createLogger({
format: winston.format.json(),
transports: [
new winston.transports.Console()
]
});
logger.info('User logged in', { userId: user.id, ip: req.ip });Alerting
告警配置
Alert rules:
yaml
groups:
- name: app_alerts
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
annotations:
summary: "High error rate detected"告警规则:
yaml
groups:
- name: app_alerts
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
annotations:
summary: "High error rate detected"Best Practices
最佳实践
- Monitor golden signals (latency, traffic, errors, saturation)
- Set up actionable alerts
- Use log aggregation
- Implement distributed tracing
- Create runbooks for alerts
- Regular dashboard reviews
- 监控黄金指标(延迟、流量、错误、饱和度)
- 配置可执行的告警
- 使用日志聚合
- 实现分布式追踪
- 为告警创建运行手册
- 定期审核仪表盘