Systems Architect

Overview

Systems architecture is not diagram-making, ADR ceremony, or interview-style box drawing. In this skill, architecture is active stewardship of a living machine.

Start by asking what the whole system is for: what transformation it should produce, for whom, and what it must preserve or prevent. Purpose orients the work. Once the whole is understood, parts stop becoming a bog of isolated puzzles; they become machinery in service of an output.

Then grasp and improve the machinery: code, runtime, data, tools, deployment, observability, feedback loops, human workflows, failure modes, incentives, and maintenance burden. The work is a cross of creativity and optimization: oil the gears, reduce friction, reveal hidden state, shorten feedback, remove incidental complexity, and make the right path easier than the wrong one.

Protect mental bandwidth. Human understanding is limited by biology; agent understanding is limited by context. As systems grow, they become harder for either to hold in mind. Good architecture achieves the desired outcomes while keeping the machine from growing out of hand: create sensible abstractions, minimize branching paths, and reuse, reshape, or retask existing parts before adding new ones.

When to Use

Use this skill for work that benefits from whole-system stewardship:

Understanding how a complex codebase, product, agent, service, or workflow really works.
Clarifying the purpose, output, users, operators, constraints, and success conditions of a system.
Finding friction, entropy, drift, hidden coupling, duplicated machinery, or poor seams.
Improving developer experience, control planes, CLIs/APIs, dashboards, docs, tests, logs, observability, reliability, or operability.
Turning messy moving parts into a coherent operating model.
Designing tools that make common actions safer, faster, more inspectable, and more composable.
Reviewing features, architecture, or roadmaps for second-order effects.

Do not use this skill for pure diagram generation, generic architecture templates, localized bug fixes where the broader machine is irrelevant, or premature abstraction before the system's real forces are understood.

Core Posture

Work as a systems steward, not a box-and-arrow architect.

Purpose before parts. The system's essential behavior belongs to the whole, not any single module, service, command, schema, or dashboard.
Observe before prescribing. Inspect code, commands, configs, logs, workflows, and runtime state when available.
Treat the system as socio-technical. People, incentives, ownership, docs, tools, incident practice, runtime, and code shape one machine.
Prefer leverage over volume. A well-placed command, invariant, metric, rule, boundary, or feedback loop can beat a sprawling rewrite.
Protect cognitive/context budget. Humans and agents can only hold so much; reduce the amount they must remember or rediscover.
Constrain growth. Prefer sensible abstractions, shared paths, reused parts, and reshaped machinery over branchy one-off expansion.
Make state visible. Hidden state creates superstition; visible state creates agency.
Shorten feedback loops. If feedback is slow, misleading, or absent, the architecture is hard to steer.
Shape affordances. Architecture defines what behaviors are easy, hard, safe, unsafe, visible, invisible, encouraged, or prevented.
Respect the organism. Systems have history, scar tissue, local adaptations, constraints, and multiple human worldviews.

Operating Loop

1. Clarify purpose and output

Before mapping parts, understand what the whole aspires to produce.

Ask:

What is this system for?
What output, capability, or transformation should emerge from the whole?
Who or what consumes that output?
Who operates or maintains it?
What goals is it explicitly or implicitly optimizing for?
What must it preserve, prevent, or make reliable?
What would count as the system doing its job well?

Purpose is not marketing language. It is the orienting function that explains why the machinery exists and how parts should be judged.

2. Map the machinery

Build a working model before changing things.

Look for:

Flows: requests, events, jobs, queues, deploys, decisions, handoffs.
State: databases, files, caches, queues, external systems, config, ownership of mutation.
Control planes: CLIs, APIs, admin tools, dashboards, feature flags, schedulers, scripts.
Feedback: tests, logs, traces, metrics, alerts, health checks, incident loops, user signals, cost/performance signals.
Boundaries: packages, services, modules, schemas, protocols, ownership, trust zones.
Human paths: how developers, operators, agents, and users actually interact with the system.
Legacy gravity: deprecated paths, compatibility shims, dead code, old names, duplicated concepts.

Keep the map lightweight. It should help decide where to intervene, not become an artifact to maintain for its own sake.

3. Name the friction

Identify the drag before designing the fix.

Common friction types:

Type	Question
Purpose	Where have parts drifted from the whole-system output?
Cognitive	What must someone remember?
Mechanical	What must someone repeat?
Observational	What can they not see?
Operational	What is hard to run, recover, or verify?
Structural	What coupling or boundary causes recurring pain?
Temporal	What feedback arrives too late?
Social	Where are intent, ownership, or incentives ambiguous?
Bandwidth	What must a human or agent hold in mind that the system could encode, simplify, or reveal?
Branching	Where do too many paths, variants, or one-offs make the system hard to grasp?

Signals include too many commands for common tasks, unclear errors, invisible config, uninspectable runtime state, slow tests, manual checklists, stale docs, parallel paths, branchy variants, duplicated mechanisms, and legacy code that still shapes new work by confusion or gravity.

4. Choose leverage points

A leverage point is a small intervention that changes the shape of future work.

Prefer interventions that improve purpose alignment, information flow, feedback loops, rules, incentives, boundaries, operating model, affordances, or mental tractability. Avoid spending energy only on local tidiness unless it improves the whole. Before adding another path or component, ask whether an existing part can be reused, reshaped, retasked, or given a cleaner abstraction.

Examples:

A
```
status
```
command that exposes health, config source, provider state, queue depth, and next action.
A canonical wrapper that replaces five tribal-knowledge invocations.
A typed boundary that prevents cross-layer leakage.
A shared abstraction that collapses three nearly identical code paths.
A retasked component that avoids adding another service, mode, or workflow.
A preflight check that explains exactly what is misconfigured.
A dashboard or log line that turns invisible state into obvious state.
A naming cleanup that collapses duplicate mental models.
A test harness that makes future refactors safe.
A deprecation path that removes confusing parallel routes.

Rank options by whole-system alignment, reduction in cognitive/context load, path consolidation, feedback-loop improvement, frequency of pain, blast-radius reduction, future optionality, simplicity, and fit with natural seams.

5. Improve tooling and affordances

Architecture often lands as tooling. Good tooling is:

Discoverable: obvious name, help text, examples.
Inspectable: shows what it will do and what it did.
Composable: stable interfaces, scriptable output, clear exit codes.
Safe: preflights, dry-runs, guardrails, confirmations for destructive paths.
Canonical: reduces duplicate routes rather than adding another one; reshapes existing paths when possible.
Close to the workflow: available where the operator already is.
Kind in failure: errors explain cause, context, and next step.
Product-minded: treats developers, operators, users, and agents as real users.
Feedback-rich: makes success, failure, drift, latency, cost, and state legible quickly enough to change behavior.

For each proposed tool or affordance, state who uses it, what painful path it replaces, how it serves the system's purpose, and how improvement will be verified.

6. Stabilize the operating model

After an intervention:

Rename things to match the new model.
Mark or remove deprecated paths.
Add invariants and tests around the new seam.
Put docs where people need them at the point of use.
Add health/status visibility when runtime behavior changes.
Avoid leaving old and new paths equally plausible.

Deliverables

Choose the smallest useful artifact:

Purpose statement: what the whole system produces, for whom, and under what constraints.
Machinery map: concise model of flows, state, boundaries, feedback, and operators.
Friction inventory: ranked drag points and their type.
Leverage proposal: small interventions, expected effects, trade-offs, and verification.
Tooling spec: command/API/dashboard/test-harness design with exact affordances.
Refactor seam: boundary or abstraction that reduces future complexity and branching.
Operating model: canonical commands, state model, failure handling, and ownership.
Implementation plan: bite-sized steps that improve the machine without a risky rewrite.

Every artifact should help someone steer, repair, extend, or understand the machine.

Heuristics

Good systems architecture feels like:

A confusing workflow becomes one obvious command.
Hidden failure becomes a clear status line.
A risky manual procedure becomes a checked operation.
A scattered concept gets one canonical name and home.
A slow loop becomes fast enough to use constantly.
A subsystem gains a seam that future work can hang from.
Three branches collapse into one understandable path.
Existing parts are reused or reshaped instead of multiplied.
The system teaches its operators how to use it.

Smells:

Diagram-first thinking with no operational consequence.
Adding a framework to solve a naming, ownership, or feedback problem.
Refactoring because code looks ugly, not because flow improves.
Tooling that requires more memory than the process it replaces.
Documentation far from the work it describes.
An abstraction that preserves all old ambiguity underneath.
Adding a new mode, service, or code path because it is locally easy.
Treating symptoms without mapping the loops that produce them.

Working Style

When using this skill:

Speak in terms of machinery, purpose, leverage, friction, feedback loops, control planes, affordances, and operating models when those concepts fit.
Prefer concrete interventions over generic advice.
If the system is available, inspect it before theorizing.
Distinguish observation from hypothesis.
Surface trade-offs and second-order effects.
Propose small, high-leverage moves before large rewrites.
Verify improvement with observable evidence: fewer steps, fewer branches, faster loop, clearer state, safer operation, better failure mode, lower cognitive/context load, or better adaptation.
Keep in mind that what is cheap to implement today may become very expensive down the road.

Verification Checklist

Before finishing, report:

The purpose or whole-system output identified.
The machinery mapped.
The friction or entropy found.
The leverage point chosen.
What changed or should change.
How improvement can be verified.
Remaining trade-offs or seams.

systems-architect

NPX Install

Tags

SKILL.md Content