senior-backend
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSenior Backend Engineer
资深后端工程师
Overview
概述
Design and implement robust, scalable backend systems with a focus on API design, service architecture, data management, and operational excellence. This skill covers RESTful and GraphQL API patterns, message-driven architecture, caching strategies, rate limiting, health checks, and full observability with OpenTelemetry.
Announce at start: "I'm using the senior-backend skill for backend system design and implementation."
设计并实现健壮、可扩展的后端系统,重点关注API设计、服务架构、数据管理和运维卓越性。本技能覆盖RESTful和GraphQL API模式、消息驱动架构、缓存策略、限流、健康检查,以及基于OpenTelemetry的全链路可观测性。
启动时声明: "我将使用senior-backend技能完成后端系统设计与实现工作。"
Phase 1: API Design
第一阶段:API设计
Goal: Define the contract before writing implementation code.
目标: 在编写实现代码前先定义接口契约。
Actions
动作
- Define resource models and relationships
- Design endpoint structure (REST) or schema (GraphQL)
- Establish authentication and authorization strategy
- Define rate limiting and throttling policies
- Create API documentation (OpenAPI/GraphQL schema)
- 定义资源模型及关联关系
- 设计端点结构(REST)或Schema(GraphQL)
- 制定认证与授权策略
- 定义限流与节流规则
- 生成API文档(OpenAPI/GraphQL Schema)
API Style Decision Table
API风格决策表
| Factor | REST | GraphQL | gRPC |
|---|---|---|---|
| Multiple consumers with different data needs | Poor fit | Strong fit | Poor fit |
| Simple CRUD operations | Strong fit | Overkill | Overkill |
| Real-time subscriptions | Requires WebSocket add-on | Built-in | Built-in (streaming) |
| Service-to-service | Good | Overkill | Strong fit |
| Public API | Strong fit | Good | Poor fit (tooling) |
| Mobile with bandwidth constraints | Overfetching risk | Strong fit | Strong fit |
| 考量因素 | REST | GraphQL | gRPC |
|---|---|---|---|
| 多消费者存在差异化数据需求 | 不适用 | 非常适用 | 不适用 |
| 简单CRUD操作 | 非常适用 | 过度设计 | 过度设计 |
| 实时订阅能力 | 需要额外集成WebSocket | 原生支持 | 原生支持(流式) |
| 服务间通信 | 良好 | 过度设计 | 非常适用 |
| 公开API | 非常适用 | 良好 | 不适用(工具链支持不足) |
| 带宽受限的移动端场景 | 存在过度请求风险 | 非常适用 | 非常适用 |
STOP — Do NOT proceed to Phase 2 until:
停止 — 满足以下条件前请勿进入第二阶段:
- Resource models are defined
- Endpoint structure or schema is documented
- Auth strategy is chosen
- API contract is reviewable (OpenAPI/GraphQL schema)
- 资源模型已定义完成
- 端点结构或Schema已文档化
- 认证策略已选定
- API契约可评审(OpenAPI/GraphQL Schema)
Phase 2: Implementation
第二阶段:功能实现
Goal: Build the service layer with clear separation of concerns.
目标: 构建关注点清晰分离的服务层。
Actions
动作
- Set up project structure with clear layering
- Implement data access layer (repositories/DAOs)
- Build service layer with business logic
- Create API controllers/resolvers
- Add middleware (auth, logging, error handling, CORS)
- Implement caching strategy
- 搭建分层清晰的项目结构
- 实现数据访问层(repository/DAO)
- 编写包含业务逻辑的服务层
- 开发API控制器/解析器
- 添加中间件(认证、日志、错误处理、CORS)
- 实现缓存策略
RESTful URL Structure
RESTful URL结构规范
GET /api/v1/users # List users (paginated)
GET /api/v1/users/:id # Get single user
POST /api/v1/users # Create user
PUT /api/v1/users/:id # Full update
PATCH /api/v1/users/:id # Partial update
DELETE /api/v1/users/:id # Delete user
GET /api/v1/users/:id/orders # Nested resources
POST /api/v1/users/:id/activate # State transitionsGET /api/v1/users # 用户列表(分页)
GET /api/v1/users/:id # 获取单个用户
POST /api/v1/users # 创建用户
PUT /api/v1/users/:id # 全量更新
PATCH /api/v1/users/:id # 部分更新
DELETE /api/v1/users/:id # 删除用户
GET /api/v1/users/:id/orders # 嵌套资源
POST /api/v1/users/:id/activate # 状态流转HTTP Status Code Decision Table
HTTP状态码决策表
| Code | Meaning | When to Use |
|---|---|---|
| 200 | OK | Successful GET, PUT, PATCH |
| 201 | Created | Successful POST creating resource |
| 204 | No Content | Successful DELETE |
| 400 | Bad Request | Validation errors |
| 401 | Unauthorized | Missing or invalid auth |
| 403 | Forbidden | Auth valid but insufficient permissions |
| 404 | Not Found | Resource does not exist |
| 409 | Conflict | Duplicate or state conflict |
| 422 | Unprocessable Entity | Semantically invalid input |
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Internal Server Error | Unexpected server failure |
| 状态码 | 含义 | 使用场景 |
|---|---|---|
| 200 | OK | GET、PUT、PATCH请求成功 |
| 201 | Created | POST创建资源成功 |
| 204 | No Content | DELETE请求成功 |
| 400 | Bad Request | 参数校验失败 |
| 401 | Unauthorized | 缺失认证信息或认证无效 |
| 403 | Forbidden | 认证有效但权限不足 |
| 404 | Not Found | 资源不存在 |
| 409 | Conflict | 重复创建或状态冲突 |
| 422 | Unprocessable Entity | 输入参数语义无效 |
| 429 | Too Many Requests | 触发限流规则 |
| 500 | Internal Server Error | 非预期的服务端故障 |
Response Format
响应格式规范
json
// Success (single)
{ "data": { "id": "123", "name": "Alice" }, "meta": { "requestId": "req_abc123" } }
// Success (collection)
{ "data": [...], "meta": { "page": 1, "pageSize": 20, "totalCount": 150, "totalPages": 8 } }
// Error
{ "error": { "code": "VALIDATION_ERROR", "message": "Invalid input", "details": [...] } }json
// 单资源成功响应
{ "data": { "id": "123", "name": "Alice" }, "meta": { "requestId": "req_abc123" } }
// 集合资源成功响应
{ "data": [...], "meta": { "page": 1, "pageSize": 20, "totalCount": 150, "totalPages": 8 } }
// 错误响应
{ "error": { "code": "VALIDATION_ERROR", "message": "无效输入", "details": [...] } }Caching Strategy Decision Table
缓存策略决策表
| Strategy | Description | Use Case |
|---|---|---|
| Cache-Aside | App checks cache, falls back to DB | General purpose |
| Write-Through | Write to cache and DB simultaneously | Strong consistency |
| Write-Behind | Write to cache, async write to DB | High write throughput |
| Read-Through | Cache loads from DB on miss | Transparent caching |
| 策略 | 描述 | 适用场景 |
|---|---|---|
| Cache-Aside | 应用先查缓存,未命中回查数据库 | 通用场景 |
| Write-Through | 数据同时写入缓存和数据库 | 强一致性要求场景 |
| Write-Behind | 先写缓存,异步同步到数据库 | 高写入吞吐量场景 |
| Read-Through | 缓存未命中时主动从数据库加载 | 透明缓存场景 |
STOP — Do NOT proceed to Phase 3 until:
停止 — 满足以下条件前请勿进入第三阶段:
- Project structure follows layered architecture
- Input validation is at the edge (Zod, Joi, class-validator)
- Error handling returns structured error responses
- Caching strategy is implemented with invalidation plan
- 项目结构遵循分层架构规范
- 输入校验在边缘层完成(Zod、Joi、class-validator)
- 错误处理返回结构化错误响应
- 缓存策略已实现并配套失效机制
Phase 3: Hardening
第三阶段:生产加固
Goal: Prepare the service for production operation.
目标: 完成服务生产环境运行所需的准备工作。
Actions
动作
- Add comprehensive error handling
- Implement health checks and readiness probes
- Set up observability (traces, metrics, logs)
- Load test critical paths
- Document runbooks for operational scenarios
- 完善全链路错误处理
- 实现健康检查与就绪探针
- 搭建可观测性体系(链路追踪、指标、日志)
- 对核心路径进行压测
- 编写运维场景的运行手册
Health Check Endpoints
健康检查端点规范
json
// GET /health — lightweight liveness check
{ "status": "healthy" }
// GET /health/ready — readiness with dependency checks
{
"status": "healthy",
"checks": {
"database": { "status": "healthy", "latency": "5ms" },
"redis": { "status": "healthy", "latency": "2ms" },
"queue": { "status": "healthy", "latency": "8ms" }
},
"uptime": "72h15m",
"version": "1.4.2"
}json
// GET /health — 轻量存活检查
{ "status": "healthy" }
// GET /health/ready — 包含依赖检查的就绪探针
{
"status": "healthy",
"checks": {
"database": { "status": "healthy", "latency": "5ms" },
"redis": { "status": "healthy", "latency": "2ms" },
"queue": { "status": "healthy", "latency": "8ms" }
},
"uptime": "72h15m",
"version": "1.4.2"
}Observability: RED Method Metrics
可观测性:RED方法指标
| Metric | Description | Implementation |
|---|---|---|
| Rate | Requests per second | Counter incremented per request |
| Errors | Error rate per second | Counter incremented per error |
| Duration | Latency distribution | Histogram (p50, p95, p99) |
| 指标 | 描述 | 实现方式 |
|---|---|---|
| Rate | 每秒请求量 | 每个请求触发计数器累加 |
| Errors | 每秒错误量 | 每个错误触发计数器累加 |
| Duration | 延迟分布 | 直方图统计(p50、p95、p99分位) |
Structured Logging Format
结构化日志格式规范
json
{
"timestamp": "2025-01-15T10:30:00.123Z",
"level": "info",
"message": "User created",
"service": "user-service",
"traceId": "abc123",
"spanId": "def456",
"userId": "usr_123",
"duration": 45
}json
{
"timestamp": "2025-01-15T10:30:00.123Z",
"level": "info",
"message": "用户创建成功",
"service": "user-service",
"traceId": "abc123",
"spanId": "def456",
"userId": "usr_123",
"duration": 45
}Rate Limiting Algorithm Decision Table
限流算法决策表
| Algorithm | Pros | Cons | Best For |
|---|---|---|---|
| Fixed Window | Simple, low memory | Burst at boundaries | Internal APIs |
| Sliding Window | Smooth distribution | More memory | Public APIs |
| Token Bucket | Controlled bursts | Slightly complex | Industry standard |
| Leaky Bucket | Constant output | No burst allowed | Strict rate control |
| 算法 | 优势 | 劣势 | 最佳适用场景 |
|---|---|---|---|
| 固定窗口 | 实现简单、内存占用低 | 窗口边界存在流量突刺风险 | 内部API |
| 滑动窗口 | 流量分布平滑 | 内存占用更高 | 公开API |
| 令牌桶 | 可控制流量突刺 | 实现稍复杂 | 行业通用标准场景 |
| 漏桶 | 输出速率恒定 | 不允许流量突刺 | 严格速率控制场景 |
STOP — Hardening complete when:
停止 — 满足以下条件时加固完成:
- Health check endpoints respond correctly
- Structured logging is configured
- Metrics are exported (RED method)
- Load test completed on critical paths
- Error handling returns appropriate status codes
- 健康检查端点响应正常
- 结构化日志已配置完成
- 指标已按RED方法导出
- 核心路径压测完成
- 错误处理返回对应状态码
Event-Driven Architecture Patterns
事件驱动架构模式
Message Queue Pattern Decision Table
消息队列模式决策表
| Pattern | Use Case | Example |
|---|---|---|
| Pub/Sub | Broadcast to multiple consumers | User registered -> email, analytics, CRM |
| Work Queue | Distribute tasks across workers | Image processing, PDF generation |
| Request/Reply | Async request with response | Price calculation service |
| Dead Letter | Handle failed messages | Retry policy exceeded |
| 模式 | 适用场景 | 示例 |
|---|---|---|
| 发布/订阅 | 广播消息到多个消费者 | 用户注册 -> 触发邮件、 analytics、CRM更新 |
| 工作队列 | 分布式任务分发 | 图片处理、PDF生成 |
| 请求/响应 | 异步请求需返回结果 | 价格计算服务 |
| 死信队列 | 处理消费失败的消息 | 超过重试次数的消息 |
Event Schema
事件Schema规范
json
{
"eventId": "evt_abc123",
"eventType": "user.created",
"timestamp": "2025-01-15T10:30:00Z",
"version": "1.0",
"source": "user-service",
"data": { "userId": "usr_123", "email": "alice@example.com" },
"metadata": { "correlationId": "corr_xyz789", "causationId": "cmd_def456" }
}json
{
"eventId": "evt_abc123",
"eventType": "user.created",
"timestamp": "2025-01-15T10:30:00Z",
"version": "1.0",
"source": "user-service",
"data": { "userId": "usr_123", "email": "alice@example.com" },
"metadata": { "correlationId": "corr_xyz789", "causationId": "cmd_def456" }
}GraphQL Anti-Patterns
GraphQL反模式
| Anti-Pattern | Problem | Fix |
|---|---|---|
| N+1 queries | Performance degradation | DataLoader for batching |
| Unbounded queries | DoS vulnerability | Enforce depth and complexity limits |
| Over-fetching in resolvers | Wasted DB queries | Select only requested fields |
| 反模式 | 问题 | 解决方案 |
|---|---|---|
| N+1查询 | 性能下降 | 使用DataLoader做批量查询 |
| 无边界查询 | DoS安全风险 | 强制查询深度和复杂度限制 |
| 解析器过度查询 | 浪费数据库查询资源 | 仅查询请求的字段 |
Anti-Patterns / Common Mistakes
反模式/常见错误
| Anti-Pattern | Why It Is Wrong | Correct Approach |
|---|---|---|
| Exposing database IDs directly | Security risk, coupling to DB | Use UUIDs or prefixed IDs |
| Synchronous external service calls in request path | Single point of failure, latency | Async with queues or circuit breaker |
| N+1 query patterns | Linear performance degradation | Eager loading or DataLoader |
| Catching and swallowing errors | Silent failures, impossible debugging | Log and propagate with context |
| Shared mutable state across handlers | Race conditions, unpredictable behavior | Stateless request handling |
| Skipping input validation | Injection, data corruption | Validate at the edge, always |
| Generic 500 for all errors | Poor developer experience | Specific error codes and messages |
| No API versioning | Breaking changes affect all consumers | Version from day one ( |
| 反模式 | 错误原因 | 正确做法 |
|---|---|---|
| 直接暴露数据库ID | 安全风险、与数据库耦合 | 使用UUID或带前缀的ID |
| 请求路径中同步调用外部服务 | 单点故障、高延迟 | 队列异步调用或加熔断器 |
| N+1查询模式 | 性能线性下降 | 预加载或使用DataLoader |
| 捕获并吞掉错误 | 静默失败、无法排查问题 | 记录日志并附带上下文抛出 |
| 处理器间共享可变状态 | 竞态条件、行为不可预测 | 无状态请求处理 |
| 省略输入校验 | 注入风险、数据损坏 | 始终在边缘层做校验 |
| 所有错误都返回通用500 | 开发者体验差 | 返回具体错误码和信息 |
| 不做API版本控制 | 破坏性变更影响所有消费者 | 从第一天就做版本控制( |
Documentation Lookup (Context7)
文档查询(Context7)
Use then for up-to-date docs. Returned docs override memorized knowledge.
mcp__context7__resolve-library-idmcp__context7__query-docs- — for middleware patterns, routing, or request/response API
express - — for plugin system, hooks, or schema validation
fastify - — for decorators, modules, providers, or guards
nestjs - — for schema syntax, client API, or migration commands
prisma
先调用再调用获取最新文档,返回的文档优先级高于记忆知识。
mcp__context7__resolve-library-idmcp__context7__query-docs- — 查询中间件模式、路由、请求/响应API相关内容
express - — 查询插件系统、钩子、Schema校验相关内容
fastify - — 查询装饰器、模块、provider、守卫相关内容
nestjs - — 查询Schema语法、客户端API、迁移命令相关内容
prisma
Integration Points
集成点
| Skill | Relationship |
|---|---|
| Architecture decisions guide backend service boundaries |
| Backend security follows OWASP and auth patterns |
| Backend performance uses caching and query tuning |
| Backend test strategy defines integration test approach |
| Review verifies API design and error handling |
| API behavior becomes acceptance criteria |
| Backend serves the full-stack tRPC layer |
| 技能 | 关联关系 |
|---|---|
| 架构决策指导后端服务边界划分 |
| 后端安全遵循OWASP和认证模式规范 |
| 后端性能优化使用缓存和查询调优 |
| 后端测试策略定义集成测试方案 |
| 评审验证API设计和错误处理逻辑 |
| API行为作为验收标准 |
| 后端为全栈tRPC层提供服务 |
Key Principles
核心原则
- API versioning from day one ()
/v1/ - Input validation at the edge (Zod, Joi, class-validator)
- Idempotency keys for non-GET endpoints
- Graceful shutdown (drain connections, finish in-flight requests)
- Circuit breaker for external service calls
- Database migrations versioned and reversible
- Secrets in environment variables, never in code
- 从第一天就做API版本控制()
/v1/ - 输入校验在边缘层完成(Zod、Joi、class-validator)
- 非GET接口使用幂等键
- 优雅停机(释放连接、完成处理中的请求)
- 外部服务调用加熔断器
- 数据库迁移版本化且可回滚
- 密钥存储在环境变量中,绝对不写入代码
Skill Type
技能类型
FLEXIBLE — Adapt API style and architecture to the project context. The three-phase process (design, implement, harden) is strongly recommended. Health checks, structured logging, and error handling are non-negotiable for production services.
灵活适配 — 根据项目上下文调整API风格和架构。强烈推荐遵循三阶段流程(设计、实现、加固)。健康检查、结构化日志和错误处理是生产服务的强制要求。