senior-backend

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Senior Backend Engineer

资深后端工程师

Overview

概述

Design and implement robust, scalable backend systems with a focus on API design, service architecture, data management, and operational excellence. This skill covers RESTful and GraphQL API patterns, message-driven architecture, caching strategies, rate limiting, health checks, and full observability with OpenTelemetry.
Announce at start: "I'm using the senior-backend skill for backend system design and implementation."

设计并实现健壮、可扩展的后端系统,重点关注API设计、服务架构、数据管理和运维卓越性。本技能覆盖RESTful和GraphQL API模式、消息驱动架构、缓存策略、限流、健康检查,以及基于OpenTelemetry的全链路可观测性。
启动时声明: "我将使用senior-backend技能完成后端系统设计与实现工作。"

Phase 1: API Design

第一阶段:API设计

Goal: Define the contract before writing implementation code.
目标: 在编写实现代码前先定义接口契约。

Actions

动作

  1. Define resource models and relationships
  2. Design endpoint structure (REST) or schema (GraphQL)
  3. Establish authentication and authorization strategy
  4. Define rate limiting and throttling policies
  5. Create API documentation (OpenAPI/GraphQL schema)
  1. 定义资源模型及关联关系
  2. 设计端点结构(REST)或Schema(GraphQL)
  3. 制定认证与授权策略
  4. 定义限流与节流规则
  5. 生成API文档(OpenAPI/GraphQL Schema)

API Style Decision Table

API风格决策表

FactorRESTGraphQLgRPC
Multiple consumers with different data needsPoor fitStrong fitPoor fit
Simple CRUD operationsStrong fitOverkillOverkill
Real-time subscriptionsRequires WebSocket add-onBuilt-inBuilt-in (streaming)
Service-to-serviceGoodOverkillStrong fit
Public APIStrong fitGoodPoor fit (tooling)
Mobile with bandwidth constraintsOverfetching riskStrong fitStrong fit
考量因素RESTGraphQLgRPC
多消费者存在差异化数据需求不适用非常适用不适用
简单CRUD操作非常适用过度设计过度设计
实时订阅能力需要额外集成WebSocket原生支持原生支持(流式)
服务间通信良好过度设计非常适用
公开API非常适用良好不适用(工具链支持不足)
带宽受限的移动端场景存在过度请求风险非常适用非常适用

STOP — Do NOT proceed to Phase 2 until:

停止 — 满足以下条件前请勿进入第二阶段:

  • Resource models are defined
  • Endpoint structure or schema is documented
  • Auth strategy is chosen
  • API contract is reviewable (OpenAPI/GraphQL schema)

  • 资源模型已定义完成
  • 端点结构或Schema已文档化
  • 认证策略已选定
  • API契约可评审(OpenAPI/GraphQL Schema)

Phase 2: Implementation

第二阶段:功能实现

Goal: Build the service layer with clear separation of concerns.
目标: 构建关注点清晰分离的服务层。

Actions

动作

  1. Set up project structure with clear layering
  2. Implement data access layer (repositories/DAOs)
  3. Build service layer with business logic
  4. Create API controllers/resolvers
  5. Add middleware (auth, logging, error handling, CORS)
  6. Implement caching strategy
  1. 搭建分层清晰的项目结构
  2. 实现数据访问层(repository/DAO)
  3. 编写包含业务逻辑的服务层
  4. 开发API控制器/解析器
  5. 添加中间件(认证、日志、错误处理、CORS)
  6. 实现缓存策略

RESTful URL Structure

RESTful URL结构规范

GET    /api/v1/users              # List users (paginated)
GET    /api/v1/users/:id          # Get single user
POST   /api/v1/users              # Create user
PUT    /api/v1/users/:id          # Full update
PATCH  /api/v1/users/:id          # Partial update
DELETE /api/v1/users/:id          # Delete user
GET    /api/v1/users/:id/orders   # Nested resources
POST   /api/v1/users/:id/activate # State transitions
GET    /api/v1/users              # 用户列表(分页)
GET    /api/v1/users/:id          # 获取单个用户
POST   /api/v1/users              # 创建用户
PUT    /api/v1/users/:id          # 全量更新
PATCH  /api/v1/users/:id          # 部分更新
DELETE /api/v1/users/:id          # 删除用户
GET    /api/v1/users/:id/orders   # 嵌套资源
POST   /api/v1/users/:id/activate # 状态流转

HTTP Status Code Decision Table

HTTP状态码决策表

CodeMeaningWhen to Use
200OKSuccessful GET, PUT, PATCH
201CreatedSuccessful POST creating resource
204No ContentSuccessful DELETE
400Bad RequestValidation errors
401UnauthorizedMissing or invalid auth
403ForbiddenAuth valid but insufficient permissions
404Not FoundResource does not exist
409ConflictDuplicate or state conflict
422Unprocessable EntitySemantically invalid input
429Too Many RequestsRate limit exceeded
500Internal Server ErrorUnexpected server failure
状态码含义使用场景
200OKGET、PUT、PATCH请求成功
201CreatedPOST创建资源成功
204No ContentDELETE请求成功
400Bad Request参数校验失败
401Unauthorized缺失认证信息或认证无效
403Forbidden认证有效但权限不足
404Not Found资源不存在
409Conflict重复创建或状态冲突
422Unprocessable Entity输入参数语义无效
429Too Many Requests触发限流规则
500Internal Server Error非预期的服务端故障

Response Format

响应格式规范

json
// Success (single)
{ "data": { "id": "123", "name": "Alice" }, "meta": { "requestId": "req_abc123" } }

// Success (collection)
{ "data": [...], "meta": { "page": 1, "pageSize": 20, "totalCount": 150, "totalPages": 8 } }

// Error
{ "error": { "code": "VALIDATION_ERROR", "message": "Invalid input", "details": [...] } }
json
// 单资源成功响应
{ "data": { "id": "123", "name": "Alice" }, "meta": { "requestId": "req_abc123" } }

// 集合资源成功响应
{ "data": [...], "meta": { "page": 1, "pageSize": 20, "totalCount": 150, "totalPages": 8 } }

// 错误响应
{ "error": { "code": "VALIDATION_ERROR", "message": "无效输入", "details": [...] } }

Caching Strategy Decision Table

缓存策略决策表

StrategyDescriptionUse Case
Cache-AsideApp checks cache, falls back to DBGeneral purpose
Write-ThroughWrite to cache and DB simultaneouslyStrong consistency
Write-BehindWrite to cache, async write to DBHigh write throughput
Read-ThroughCache loads from DB on missTransparent caching
策略描述适用场景
Cache-Aside应用先查缓存,未命中回查数据库通用场景
Write-Through数据同时写入缓存和数据库强一致性要求场景
Write-Behind先写缓存,异步同步到数据库高写入吞吐量场景
Read-Through缓存未命中时主动从数据库加载透明缓存场景

STOP — Do NOT proceed to Phase 3 until:

停止 — 满足以下条件前请勿进入第三阶段:

  • Project structure follows layered architecture
  • Input validation is at the edge (Zod, Joi, class-validator)
  • Error handling returns structured error responses
  • Caching strategy is implemented with invalidation plan

  • 项目结构遵循分层架构规范
  • 输入校验在边缘层完成(Zod、Joi、class-validator)
  • 错误处理返回结构化错误响应
  • 缓存策略已实现并配套失效机制

Phase 3: Hardening

第三阶段:生产加固

Goal: Prepare the service for production operation.
目标: 完成服务生产环境运行所需的准备工作。

Actions

动作

  1. Add comprehensive error handling
  2. Implement health checks and readiness probes
  3. Set up observability (traces, metrics, logs)
  4. Load test critical paths
  5. Document runbooks for operational scenarios
  1. 完善全链路错误处理
  2. 实现健康检查与就绪探针
  3. 搭建可观测性体系(链路追踪、指标、日志)
  4. 对核心路径进行压测
  5. 编写运维场景的运行手册

Health Check Endpoints

健康检查端点规范

json
// GET /health — lightweight liveness check
{ "status": "healthy" }

// GET /health/ready — readiness with dependency checks
{
  "status": "healthy",
  "checks": {
    "database": { "status": "healthy", "latency": "5ms" },
    "redis": { "status": "healthy", "latency": "2ms" },
    "queue": { "status": "healthy", "latency": "8ms" }
  },
  "uptime": "72h15m",
  "version": "1.4.2"
}
json
// GET /health — 轻量存活检查
{ "status": "healthy" }

// GET /health/ready — 包含依赖检查的就绪探针
{
  "status": "healthy",
  "checks": {
    "database": { "status": "healthy", "latency": "5ms" },
    "redis": { "status": "healthy", "latency": "2ms" },
    "queue": { "status": "healthy", "latency": "8ms" }
  },
  "uptime": "72h15m",
  "version": "1.4.2"
}

Observability: RED Method Metrics

可观测性:RED方法指标

MetricDescriptionImplementation
RateRequests per secondCounter incremented per request
ErrorsError rate per secondCounter incremented per error
DurationLatency distributionHistogram (p50, p95, p99)
指标描述实现方式
Rate每秒请求量每个请求触发计数器累加
Errors每秒错误量每个错误触发计数器累加
Duration延迟分布直方图统计(p50、p95、p99分位)

Structured Logging Format

结构化日志格式规范

json
{
  "timestamp": "2025-01-15T10:30:00.123Z",
  "level": "info",
  "message": "User created",
  "service": "user-service",
  "traceId": "abc123",
  "spanId": "def456",
  "userId": "usr_123",
  "duration": 45
}
json
{
  "timestamp": "2025-01-15T10:30:00.123Z",
  "level": "info",
  "message": "用户创建成功",
  "service": "user-service",
  "traceId": "abc123",
  "spanId": "def456",
  "userId": "usr_123",
  "duration": 45
}

Rate Limiting Algorithm Decision Table

限流算法决策表

AlgorithmProsConsBest For
Fixed WindowSimple, low memoryBurst at boundariesInternal APIs
Sliding WindowSmooth distributionMore memoryPublic APIs
Token BucketControlled burstsSlightly complexIndustry standard
Leaky BucketConstant outputNo burst allowedStrict rate control
算法优势劣势最佳适用场景
固定窗口实现简单、内存占用低窗口边界存在流量突刺风险内部API
滑动窗口流量分布平滑内存占用更高公开API
令牌桶可控制流量突刺实现稍复杂行业通用标准场景
漏桶输出速率恒定不允许流量突刺严格速率控制场景

STOP — Hardening complete when:

停止 — 满足以下条件时加固完成:

  • Health check endpoints respond correctly
  • Structured logging is configured
  • Metrics are exported (RED method)
  • Load test completed on critical paths
  • Error handling returns appropriate status codes

  • 健康检查端点响应正常
  • 结构化日志已配置完成
  • 指标已按RED方法导出
  • 核心路径压测完成
  • 错误处理返回对应状态码

Event-Driven Architecture Patterns

事件驱动架构模式

Message Queue Pattern Decision Table

消息队列模式决策表

PatternUse CaseExample
Pub/SubBroadcast to multiple consumersUser registered -> email, analytics, CRM
Work QueueDistribute tasks across workersImage processing, PDF generation
Request/ReplyAsync request with responsePrice calculation service
Dead LetterHandle failed messagesRetry policy exceeded
模式适用场景示例
发布/订阅广播消息到多个消费者用户注册 -> 触发邮件、 analytics、CRM更新
工作队列分布式任务分发图片处理、PDF生成
请求/响应异步请求需返回结果价格计算服务
死信队列处理消费失败的消息超过重试次数的消息

Event Schema

事件Schema规范

json
{
  "eventId": "evt_abc123",
  "eventType": "user.created",
  "timestamp": "2025-01-15T10:30:00Z",
  "version": "1.0",
  "source": "user-service",
  "data": { "userId": "usr_123", "email": "alice@example.com" },
  "metadata": { "correlationId": "corr_xyz789", "causationId": "cmd_def456" }
}

json
{
  "eventId": "evt_abc123",
  "eventType": "user.created",
  "timestamp": "2025-01-15T10:30:00Z",
  "version": "1.0",
  "source": "user-service",
  "data": { "userId": "usr_123", "email": "alice@example.com" },
  "metadata": { "correlationId": "corr_xyz789", "causationId": "cmd_def456" }
}

GraphQL Anti-Patterns

GraphQL反模式

Anti-PatternProblemFix
N+1 queriesPerformance degradationDataLoader for batching
Unbounded queriesDoS vulnerabilityEnforce depth and complexity limits
Over-fetching in resolversWasted DB queriesSelect only requested fields

反模式问题解决方案
N+1查询性能下降使用DataLoader做批量查询
无边界查询DoS安全风险强制查询深度和复杂度限制
解析器过度查询浪费数据库查询资源仅查询请求的字段

Anti-Patterns / Common Mistakes

反模式/常见错误

Anti-PatternWhy It Is WrongCorrect Approach
Exposing database IDs directlySecurity risk, coupling to DBUse UUIDs or prefixed IDs
Synchronous external service calls in request pathSingle point of failure, latencyAsync with queues or circuit breaker
N+1 query patternsLinear performance degradationEager loading or DataLoader
Catching and swallowing errorsSilent failures, impossible debuggingLog and propagate with context
Shared mutable state across handlersRace conditions, unpredictable behaviorStateless request handling
Skipping input validationInjection, data corruptionValidate at the edge, always
Generic 500 for all errorsPoor developer experienceSpecific error codes and messages
No API versioningBreaking changes affect all consumersVersion from day one (
/v1/
)

反模式错误原因正确做法
直接暴露数据库ID安全风险、与数据库耦合使用UUID或带前缀的ID
请求路径中同步调用外部服务单点故障、高延迟队列异步调用或加熔断器
N+1查询模式性能线性下降预加载或使用DataLoader
捕获并吞掉错误静默失败、无法排查问题记录日志并附带上下文抛出
处理器间共享可变状态竞态条件、行为不可预测无状态请求处理
省略输入校验注入风险、数据损坏始终在边缘层做校验
所有错误都返回通用500开发者体验差返回具体错误码和信息
不做API版本控制破坏性变更影响所有消费者从第一天就做版本控制(
/v1/

Documentation Lookup (Context7)

文档查询(Context7)

Use
mcp__context7__resolve-library-id
then
mcp__context7__query-docs
for up-to-date docs. Returned docs override memorized knowledge.
  • express
    — for middleware patterns, routing, or request/response API
  • fastify
    — for plugin system, hooks, or schema validation
  • nestjs
    — for decorators, modules, providers, or guards
  • prisma
    — for schema syntax, client API, or migration commands

先调用
mcp__context7__resolve-library-id
再调用
mcp__context7__query-docs
获取最新文档,返回的文档优先级高于记忆知识。
  • express
    — 查询中间件模式、路由、请求/响应API相关内容
  • fastify
    — 查询插件系统、钩子、Schema校验相关内容
  • nestjs
    — 查询装饰器、模块、provider、守卫相关内容
  • prisma
    — 查询Schema语法、客户端API、迁移命令相关内容

Integration Points

集成点

SkillRelationship
senior-architect
Architecture decisions guide backend service boundaries
security-review
Backend security follows OWASP and auth patterns
performance-optimization
Backend performance uses caching and query tuning
testing-strategy
Backend test strategy defines integration test approach
code-review
Review verifies API design and error handling
acceptance-testing
API behavior becomes acceptance criteria
senior-fullstack
Backend serves the full-stack tRPC layer

技能关联关系
senior-architect
架构决策指导后端服务边界划分
security-review
后端安全遵循OWASP和认证模式规范
performance-optimization
后端性能优化使用缓存和查询调优
testing-strategy
后端测试策略定义集成测试方案
code-review
评审验证API设计和错误处理逻辑
acceptance-testing
API行为作为验收标准
senior-fullstack
后端为全栈tRPC层提供服务

Key Principles

核心原则

  • API versioning from day one (
    /v1/
    )
  • Input validation at the edge (Zod, Joi, class-validator)
  • Idempotency keys for non-GET endpoints
  • Graceful shutdown (drain connections, finish in-flight requests)
  • Circuit breaker for external service calls
  • Database migrations versioned and reversible
  • Secrets in environment variables, never in code

  • 从第一天就做API版本控制(
    /v1/
  • 输入校验在边缘层完成(Zod、Joi、class-validator)
  • 非GET接口使用幂等键
  • 优雅停机(释放连接、完成处理中的请求)
  • 外部服务调用加熔断器
  • 数据库迁移版本化且可回滚
  • 密钥存储在环境变量中,绝对不写入代码

Skill Type

技能类型

FLEXIBLE — Adapt API style and architecture to the project context. The three-phase process (design, implement, harden) is strongly recommended. Health checks, structured logging, and error handling are non-negotiable for production services.
灵活适配 — 根据项目上下文调整API风格和架构。强烈推荐遵循三阶段流程(设计、实现、加固)。健康检查、结构化日志和错误处理是生产服务的强制要求。