Senior Backend Engineer
Overview
Design and implement robust, scalable backend systems with a focus on API design, service architecture, data management, and operational excellence. This skill covers RESTful and GraphQL API patterns, message-driven architecture, caching strategies, rate limiting, health checks, and full observability with OpenTelemetry.
Announce at start: "I'm using the senior-backend skill for backend system design and implementation."
Phase 1: API Design
Goal: Define the contract before writing implementation code.
Actions
- Define resource models and relationships
- Design endpoint structure (REST) or schema (GraphQL)
- Establish authentication and authorization strategy
- Define rate limiting and throttling policies
- Create API documentation (OpenAPI/GraphQL schema)
API Style Decision Table
| Factor | REST | GraphQL | gRPC |
|---|
| Multiple consumers with different data needs | Poor fit | Strong fit | Poor fit |
| Simple CRUD operations | Strong fit | Overkill | Overkill |
| Real-time subscriptions | Requires WebSocket add-on | Built-in | Built-in (streaming) |
| Service-to-service | Good | Overkill | Strong fit |
| Public API | Strong fit | Good | Poor fit (tooling) |
| Mobile with bandwidth constraints | Overfetching risk | Strong fit | Strong fit |
STOP — Do NOT proceed to Phase 2 until:
Phase 2: Implementation
Goal: Build the service layer with clear separation of concerns.
Actions
- Set up project structure with clear layering
- Implement data access layer (repositories/DAOs)
- Build service layer with business logic
- Create API controllers/resolvers
- Add middleware (auth, logging, error handling, CORS)
- Implement caching strategy
RESTful URL Structure
GET /api/v1/users # List users (paginated)
GET /api/v1/users/:id # Get single user
POST /api/v1/users # Create user
PUT /api/v1/users/:id # Full update
PATCH /api/v1/users/:id # Partial update
DELETE /api/v1/users/:id # Delete user
GET /api/v1/users/:id/orders # Nested resources
POST /api/v1/users/:id/activate # State transitions
HTTP Status Code Decision Table
| Code | Meaning | When to Use |
|---|
| 200 | OK | Successful GET, PUT, PATCH |
| 201 | Created | Successful POST creating resource |
| 204 | No Content | Successful DELETE |
| 400 | Bad Request | Validation errors |
| 401 | Unauthorized | Missing or invalid auth |
| 403 | Forbidden | Auth valid but insufficient permissions |
| 404 | Not Found | Resource does not exist |
| 409 | Conflict | Duplicate or state conflict |
| 422 | Unprocessable Entity | Semantically invalid input |
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Internal Server Error | Unexpected server failure |
Response Format
json
// Success (single)
{ "data": { "id": "123", "name": "Alice" }, "meta": { "requestId": "req_abc123" } }
// Success (collection)
{ "data": [...], "meta": { "page": 1, "pageSize": 20, "totalCount": 150, "totalPages": 8 } }
// Error
{ "error": { "code": "VALIDATION_ERROR", "message": "Invalid input", "details": [...] } }
Caching Strategy Decision Table
| Strategy | Description | Use Case |
|---|
| Cache-Aside | App checks cache, falls back to DB | General purpose |
| Write-Through | Write to cache and DB simultaneously | Strong consistency |
| Write-Behind | Write to cache, async write to DB | High write throughput |
| Read-Through | Cache loads from DB on miss | Transparent caching |
STOP — Do NOT proceed to Phase 3 until:
Phase 3: Hardening
Goal: Prepare the service for production operation.
Actions
- Add comprehensive error handling
- Implement health checks and readiness probes
- Set up observability (traces, metrics, logs)
- Load test critical paths
- Document runbooks for operational scenarios
Health Check Endpoints
json
// GET /health — lightweight liveness check
{ "status": "healthy" }
// GET /health/ready — readiness with dependency checks
{
"status": "healthy",
"checks": {
"database": { "status": "healthy", "latency": "5ms" },
"redis": { "status": "healthy", "latency": "2ms" },
"queue": { "status": "healthy", "latency": "8ms" }
},
"uptime": "72h15m",
"version": "1.4.2"
}
Observability: RED Method Metrics
| Metric | Description | Implementation |
|---|
| Rate | Requests per second | Counter incremented per request |
| Errors | Error rate per second | Counter incremented per error |
| Duration | Latency distribution | Histogram (p50, p95, p99) |
Structured Logging Format
json
{
"timestamp": "2025-01-15T10:30:00.123Z",
"level": "info",
"message": "User created",
"service": "user-service",
"traceId": "abc123",
"spanId": "def456",
"userId": "usr_123",
"duration": 45
}
Rate Limiting Algorithm Decision Table
| Algorithm | Pros | Cons | Best For |
|---|
| Fixed Window | Simple, low memory | Burst at boundaries | Internal APIs |
| Sliding Window | Smooth distribution | More memory | Public APIs |
| Token Bucket | Controlled bursts | Slightly complex | Industry standard |
| Leaky Bucket | Constant output | No burst allowed | Strict rate control |
STOP — Hardening complete when:
Event-Driven Architecture Patterns
Message Queue Pattern Decision Table
| Pattern | Use Case | Example |
|---|
| Pub/Sub | Broadcast to multiple consumers | User registered -> email, analytics, CRM |
| Work Queue | Distribute tasks across workers | Image processing, PDF generation |
| Request/Reply | Async request with response | Price calculation service |
| Dead Letter | Handle failed messages | Retry policy exceeded |
Event Schema
json
{
"eventId": "evt_abc123",
"eventType": "user.created",
"timestamp": "2025-01-15T10:30:00Z",
"version": "1.0",
"source": "user-service",
"data": { "userId": "usr_123", "email": "alice@example.com" },
"metadata": { "correlationId": "corr_xyz789", "causationId": "cmd_def456" }
}
GraphQL Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|
| N+1 queries | Performance degradation | DataLoader for batching |
| Unbounded queries | DoS vulnerability | Enforce depth and complexity limits |
| Over-fetching in resolvers | Wasted DB queries | Select only requested fields |
Anti-Patterns / Common Mistakes
| Anti-Pattern | Why It Is Wrong | Correct Approach |
|---|
| Exposing database IDs directly | Security risk, coupling to DB | Use UUIDs or prefixed IDs |
| Synchronous external service calls in request path | Single point of failure, latency | Async with queues or circuit breaker |
| N+1 query patterns | Linear performance degradation | Eager loading or DataLoader |
| Catching and swallowing errors | Silent failures, impossible debugging | Log and propagate with context |
| Shared mutable state across handlers | Race conditions, unpredictable behavior | Stateless request handling |
| Skipping input validation | Injection, data corruption | Validate at the edge, always |
| Generic 500 for all errors | Poor developer experience | Specific error codes and messages |
| No API versioning | Breaking changes affect all consumers | Version from day one () |
Documentation Lookup (Context7)
Use
mcp__context7__resolve-library-id
then
mcp__context7__query-docs
for up-to-date docs. Returned docs override memorized knowledge.
- — for middleware patterns, routing, or request/response API
- — for plugin system, hooks, or schema validation
- — for decorators, modules, providers, or guards
- — for schema syntax, client API, or migration commands
Integration Points
| Skill | Relationship |
|---|
| Architecture decisions guide backend service boundaries |
| Backend security follows OWASP and auth patterns |
| Backend performance uses caching and query tuning |
| Backend test strategy defines integration test approach |
| Review verifies API design and error handling |
| API behavior becomes acceptance criteria |
| Backend serves the full-stack tRPC layer |
Key Principles
- API versioning from day one ()
- Input validation at the edge (Zod, Joi, class-validator)
- Idempotency keys for non-GET endpoints
- Graceful shutdown (drain connections, finish in-flight requests)
- Circuit breaker for external service calls
- Database migrations versioned and reversible
- Secrets in environment variables, never in code
Skill Type
FLEXIBLE — Adapt API style and architecture to the project context. The three-phase process (design, implement, harden) is strongly recommended. Health checks, structured logging, and error handling are non-negotiable for production services.