architecture-design

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Architecture decision document

架构决策文档

The real work is structured thinking about a technical decision — framing the problem, weighing the options against what actually matters, and committing with reasons. The written artifact is the trace that thinking leaves behind, so the next person — a new hire, a reviewer, your future self — can follow not just what was chosen but why, and could have reached the same conclusion from the same evidence. Keep that order of priority: a document that reads well but skips the thinking is worthless; treat the artifact as a consequence of the reasoning, never as the goal, or it curdles into bureaucracy. Code shows what was built; this record shows why it was built that way and what was rejected.
This is deliberately a pragmatic framework, not a fixed format. It blends the two industry documents people usually keep apart: an RFC (a proposal floated before building, to weigh options and invite comment) and an ADR (a terse record kept after, capturing the decision and its consequences). Treat the distinction as a spectrum, not a fork — most real documents sit somewhere in between: a proposal that, once accepted, becomes the record. So don't agonize over "is this an RFC or an ADR?" Pick the depth the decision warrants and the sections that carry the reasoning; the goal is a useful artifact, not compliance with a template. Lean on the method below for what to think about, and let the situation set how heavy each part should be.
Two things make these documents hard, and the method exists to counter both. First, it's tempting to follow hype or personal taste; decisions made without objective criteria cost the whole team later. Second, we tend to make things more complicated than they need to be — anyone can complicate, few can simplify, and simplifying is the real work. So: tie every choice back to a stated requirement, and cut anything that isn't pulling its weight.
真正的核心工作是对技术决策进行结构化思考——界定问题、结合实际需求权衡选项、并附上理由做出决策。书面文档是这种思考过程留下的痕迹,以便后续人员(新员工、评审者、未来的你)不仅能了解选择了什么,还能明白为什么做出这样的选择,并且基于同样的证据也能得出相同的结论。请牢记优先级:一份文笔优美但缺失思考过程的文档毫无价值;应将文档视为推理过程的产物,而非目标,否则会沦为形式主义。代码展示了构建内容;这份记录则解释了为何采用该构建方式,以及被否决的方案有哪些。
这是一个务实框架,而非固定格式。它融合了行业中通常分开的两类文档:RFC(构建前提出的提案,用于权衡选项并征集意见)和ADR(构建后留存的简洁记录,用于捕获决策及其影响)。请将二者视为一个连续体,而非割裂的分支——大多数实际文档处于两者之间:一份提案在被采纳后,就成为了正式记录。因此无需纠结“这是RFC还是ADR?”,根据决策的重要程度选择合适的深度,保留承载推理过程的章节;目标是产出有用的文档,而非遵循模板。借助下文的方法明确需要思考的内容,再根据实际情况调整各部分的篇幅。
这类文档的编写存在两大难点,而本方法正是为解决它们而生。第一,人们容易跟风或凭个人喜好做决策;缺乏客观标准的决策会给整个团队带来后续成本。第二,我们往往会把事情复杂化——任何人都能增加复杂度,但很少有人能简化,而简化才是真正的核心工作。因此:每一项选择都要与明确的需求挂钩,剔除所有无关内容。

Sharpen the axe first

先磨利斧头

"If I had eight hours to chop down a tree, I'd spend six sharpening the axe." — attributed to Abraham Lincoln.
Architecture is the highest-leverage, hardest-to-reverse work in software. Get it wrong and no amount of clean code downstream saves the project — the wrong foundation sinks everything built on it. So this document always deserves your highest effort and slowest thinking. There is no "quick mode" here, and a fast, thin pass is itself a failure. Thinking time is not the bottleneck — a rushed plausible answer that's subtly wrong costs far more than the hours spent getting it right.
Spend the bulk of your effort before the conclusion — sharpening: genuinely understanding the context, pinning down the requirements that actually constrain the choice, and exploring the alternatives in earnest. The written document is the chips that fly; the real work is the cut you make in your head first. Concretely:
  • Don't commit to the first design that seems to work. Generate real alternatives — at least two or three credible ones per dimension being decided — before you start narrowing. If you can only think of one option, you haven't looked hard enough yet.
  • Grill your own recommendation. For the option you favor, write down its strongest objection, not a strawman — and the specific conditions that would flip the decision the other way. A tradeoff table where every cell favors your pick is a warning sign, not a victory: you've stopped looking.
  • Steelman what you reject. State each rejected alternative at its best, so a reader who prefers it sees you understood it and still had reasons. That's what makes the decision trustworthy.
  • Be precise; ambiguity is where bad decisions hide. Concrete numbers (load, latency budgets, cost), named components, grounded claims with a link or a measurement. "Should be fast" hides a decision; "≤ 300ms at p95 under 2× peak, validated by load test" makes one.
  • Surface uncertainty honestly. Where you're guessing, say so, and say what evidence (a spike, a POC, a benchmark) would resolve it — then recommend running it. A POC to de-risk an irreversible choice is the axe-sharpening, not a delay.
The sections below are what to think about, not a checklist to fill quickly. Slow down on the parts that carry the most risk — usually the requirements and the tradeoff analysis.
Write in the language of the request and its source material. Whatever language the task, the codebase, and the existing docs are in, write the document in that same language — match what its readers will expect.
“如果给我八小时砍一棵树,我会花六小时磨利斧头。” —— 相传为亚伯拉罕·林肯所言。
架构设计是软件开发中影响力最高、最难逆转的工作。一旦出错,下游再整洁的代码也无法挽救项目——错误的基础会导致其上的所有构建付诸东流。因此,编写这份文档值得你投入最高的精力和最缜密的思考。这里没有“快速模式”,仓促完成的简略文档本身就是失败。思考时间并非瓶颈——一个看似合理但存在细微错误的仓促答案,其后续成本远高于花时间得出正确结论的成本。
将大部分精力投入到得出结论之前的准备工作——磨利斧头:真正理解背景、明确实际约束选择的需求、认真探索替代方案。书面文档只是思考过程的副产品;真正的工作是你在脑中完成的梳理。具体来说:
  • 不要满足于第一个看似可行的设计。在开始缩小范围前,针对每个决策维度至少生成两到三个可信的替代方案。如果你只能想到一个选项,说明你还没有深入研究。
  • 严格审视自己推荐的方案。对于你倾向的选项,写下其最有力的反对意见,而非稻草人式的弱反驳——以及会让决策转向另一方的具体条件。如果权衡表中所有项都偏向你的选择,这是一个警告信号,而非胜利:你已经停止探索了。
  • 公正呈现被否决的方案。以最佳状态描述每个被否决的替代方案,让偏好该方案的读者看到你理解它,且仍有充分理由否决它。这才会让决策具备可信度。
  • 保持精准;模糊是不良决策的温床。使用具体数字(负载、延迟预算、成本)、明确的组件、有依据的主张(附带链接或测量数据)。“应该很快”掩盖了决策;“在2倍峰值负载下p95延迟≤300ms,已通过负载测试验证”才是明确的决策。
  • 诚实地暴露不确定性。对于你猜测的内容,明确说明,并指出哪些证据(试点项目、POC、基准测试)能解决不确定性——然后建议开展相关工作。为降低不可逆选择的风险而进行的POC就是磨利斧头的过程,而非拖延。
下文的章节是需要思考的内容,而非快速填写的清单。放慢节奏处理风险最高的部分——通常是需求和权衡分析。
使用请求方及其素材的语言进行撰写。无论任务、代码库和现有文档使用何种语言,这份文档都应使用相同的语言——匹配读者的预期。

Two shapes, one method

两种形式,一种方法

The RFC↔ADR spectrum shows up as two practical shapes. Same method underneath; the framing and the emphasis shift with when you're writing.
  • Forward-looking (RFC / design doc) — you're choosing before building, to weigh options and invite comment. This is the full method below; the heart is the tradeoff analysis and the recorded decision. See
    references/example-rfc.md
    .
  • Retrospective (ADR / technical documentation) — you're recording something already decided or built. Same spirit, reshaped: Context (situation before → motivations → scope), Architecture (components + step-by-step flows), Risks & mitigations, Lessons learned, improvement points, and a version history table. See
    references/example-technical-doc.md
    .
Pick the shape from what the user is doing — deciding, or recording a decision already made. When unsure, ask. The sections below describe the forward-looking method; the retrospective variant reuses the same building blocks (context, design/architecture, risks) with a backward-looking framing.
A fill-in template for the forward-looking shape lives in
assets/template.md
— start from it rather than inventing structure, but treat its sections as a checklist, not a cage: drop what doesn't apply, add what the decision needs.
RFC与ADR的连续体体现为两种实用形式。底层方法一致;框架和侧重点会根据撰写时机有所调整。
  • 前瞻性(RFC / 设计文档)——你在构建前进行选择,用于权衡选项并征集意见。这需要用到下文的完整方法;核心是权衡分析和记录决策。详见
    references/example-rfc.md
  • 回顾性(ADR / 技术文档)——你记录已做出的决策或已完成的构建。精神内核一致,形式调整为:背景(之前的情况→动机→范围)、架构(组件+分步流程)、风险与缓解措施、经验教训、改进点,以及版本历史表。详见
    references/example-technical-doc.md
根据用户的行为选择合适的形式——是正在决策,还是记录已做出的决策。不确定时可以询问。下文描述的是前瞻性方法;回顾性变体复用相同的构建模块(背景、设计/架构、风险),但采用回溯视角。
前瞻性形式的填充模板位于
assets/template.md
——从模板开始,而非自行构建结构,但请将其章节视为检查清单,而非束缚:删除不适用的部分,添加决策所需的内容。

The method

方法步骤

Work the steps in order, and finish each before starting the next — don't jump the gun. The sequence is the whole point, not ceremony. The single most common way these efforts fail is rushing to a solution before the problem is understood — a team arguing Lambda vs. Kubernetes before anyone has written down what the system must actually do. When you feel the pull to name a technology, a component, or an alternative while you're still in Context or Requirements, that pull is the warning sign: note the idea so you don't lose it, then get back to the problem. A design built on a shaky requirement is wasted work, and a tradeoff table over options nobody tied to a requirement is just opinion dressed up as analysis.
So the earlier steps gate the later ones: don't open the Design until Context and Requirements are genuinely settled, and don't run the Tradeoff analysis until the Design is on the table. Hold the problem in focus until it's truly understood; the solution discussion has to wait its turn.
The one moment to step back and read the whole document end to end is after the Alternatives analysis reaches its conclusion — check that the requirements still hold, that every component traces to one, and that the decision follows from the analysis. That review is the payoff of the discipline, earned by working up to it; it is not permission to skip ahead.
按顺序完成每个步骤,完成前一个步骤后再开始下一个——不要急于求成。顺序才是关键,而非形式主义。这类工作失败最常见的原因是在理解问题之前就急于寻找解决方案——团队还没写下系统实际必须满足的需求,就开始争论Lambda vs. Kubernetes。当你在背景或需求阶段就想要命名技术、组件或替代方案时,这种冲动就是警告信号:记下想法以免遗忘,然后回到问题本身。基于不稳定需求构建的设计是无用功,而脱离需求的选项权衡表只是伪装成分析的主观意见。
因此,前期步骤约束后期步骤:在背景和需求真正确定之前,不要进入设计阶段;在设计方案确定之前,不要进行权衡分析。专注于问题直到真正理解;解决方案的讨论必须等待时机。
唯一需要通读整篇文档的时刻是在替代方案分析得出结论之后——检查需求是否仍然成立,每个组件是否都能追溯到需求,决策是否由分析推导而来。这种审查是严谨性的回报,是逐步推进工作的成果;这不是跳过前面步骤的许可。

1 — Contextualize

1 — 梳理背景

Focus on the problem, not the document. The point of this section is to make a reader understand the situation, so frame it around what's happening in the world — not around "this document describes…".
Tell the story so a newcomer follows it without prior knowledge: things were this way → then this happened → and because of that, we now need to decide X. Nothing is "obvious" — the obvious is exactly what a newcomer is missing, so say it. By the end the reader should be able to answer the two questions that matter most: what problem are we solving, and why does it matter now? Then state what is explicitly out of scope — naming non-goals keeps the work from sprawling.
聚焦问题,而非文档本身。本节的目的是让读者理解当前情境,因此围绕现实中的情况展开——而非“本文档描述……”。
讲述一个让新人无需前置知识就能理解的故事:之前的状态是这样→然后发生了这件事→因此,我们现在需要决定X。没有什么是“显而易见”的——显而易见的内容正是新人缺失的信息,因此要明确说明。读完本节后,读者应能回答两个最重要的问题:我们要解决什么问题,为什么现在必须解决? 然后明确说明范围之外的内容——列出非目标可以防止工作范围蔓延。

2 — Requirements

2 — 明确需求

Requirements are what is non-negotiable. The hard part — and where most efforts lose focus — is separating the architecturally-relevant requirements from the long tail of feature details that don't shape the structure. Be explicit about the cut, because everything downstream is judged against this list. A requirement is architecturally relevant when it meets at least one of these tests:
  • Hard to reverse — getting it wrong is expensive or near-impossible to undo later (data model, consistency model, a public contract, a security boundary).
  • Shapes the structure — it forces a component, a boundary, or an integration to exist; drop it and the design would look genuinely different.
  • Business-critical — the system fails its purpose if this isn't met ("we cannot lose an order").
  • Cross-cutting quality — a system-wide "-ility" with a real target: latency, throughput, availability, durability, security, cost, operability.
If a requirement passes none of these, it's a feature detail — capture it elsewhere; it doesn't belong in the analysis that drives the architecture. Keep the list short: each entry has to earn its place.
Split them into functional (what the system must do) and non-functional (how well — latency budgets, observability, test coverage, security, standardization). State them concretely enough to be checkable: "search response ≤ 3000ms at p95, validated under load" beats "search should be fast."
需求是不可协商的内容。难点——也是大多数工作失去焦点的地方——是将与架构相关的需求与不影响结构的大量功能细节区分开。明确划分界限,因为后续所有内容都将基于这份清单进行评判。当需求满足以下至少一项测试时,即为与架构相关
  • 难以逆转——出错的成本极高或几乎无法挽回(数据模型、一致性模型、公共契约、安全边界)。
  • 影响结构——它迫使组件、边界或集成必须存在;删除它,设计会发生实质性变化。
  • 业务关键——如果不满足该需求,系统将无法实现其目的(“我们不能丢失订单”)。
  • 跨领域质量——全系统层面的“能力”指标,带有明确目标:延迟、吞吐量、可用性、耐用性、安全性、成本、可操作性。
如果需求不符合以上任何一项,它就是功能细节——记录在其他地方;不属于驱动架构设计的分析范畴。请保持清单简洁:每一项都必须证明自己的价值。
将需求分为功能性(系统必须完成的任务)和非功能性(完成的质量——延迟预算、可观测性、测试覆盖率、安全性、标准化)。需求的表述要具体到可验证:“搜索响应在负载测试下p95延迟≤3000ms”优于“搜索应该很快”。

3 — Design

3 — 设计方案

Now solve the requirements with technology — and hold onto that word, solve. Good architecture is the architecture that meets the requirements, nothing more mystical than that; elegance that doesn't serve a requirement isn't good design, it's decoration. This is the exact point where many lose the thread, so make the link explicit and keep traceability in both directions:
  • Every component exists because it addresses a requirement. If you can't name the requirement a component serves, it's scope creep — cut it or justify it.
  • Every requirement is met by something in the design. If a requirement maps to no component, the design is incomplete; that gap is the first thing to fix.
So specify the components and, for each, name the requirement(s) it answers. The test of a good design stays simple: does it meet the requirements? Don't get lost solving dilemmas nobody asked for.
Include at minimum:
  • one static diagram — the components and how they fit together;
  • one dynamic diagram — a flow or sequence showing how they interact over time.
If you can't render diagrams, describe them precisely (a numbered step-by-step flow, a component list with responsibilities and arrows) and leave a clear placeholder for the real diagram. This section takes refinement and keeps everyone aligned on the direction being taken; it's normal to iterate here.
现在用技术满足需求——记住“满足”这个词。好的架构是满足需求的架构,没有更玄妙的定义;不符合需求的优雅设计不是好设计,只是装饰。这正是很多人偏离方向的节点,因此要明确建立关联,并保持双向可追溯性
  • 每个组件的存在都是为了满足某项需求。如果你无法说出组件服务的需求,那就是范围蔓延——删除它或证明其合理性。
  • 每项需求都由设计中的某个部分满足。如果某项需求没有对应的组件,设计就是不完整的;这个缺口是首先要修复的问题。
因此,请明确组件,并为每个组件指出它满足的需求。好设计的测试标准很简单:它是否满足需求? 不要去解决没人提出的难题。
至少包含:
  • 一张静态图——展示组件及其组合方式;
  • 一张动态图——展示组件随时间的交互流程或序列。
如果无法生成图表,请精确描述(编号的分步流程、带有职责和箭头的组件列表),并为真实图表留下清晰的占位符。本节需要反复打磨,确保所有人都对齐方向;在此阶段迭代是正常的。

4 — Tradeoff analysis

4 — 权衡分析

This is where the document earns its keep, and where the bulk of your effort belongs (see Sharpen the axe first). There is no silver bullet and no one-size-fits-all — every alternative has upsides, downsides, and risks, and all of them get analyzed and recorded with real depth. Surfacing a downside isn't weakening your case; it's what makes the eventual decision trustworthy. Push past the first pass: if the analysis came easily, you probably haven't found the alternative's real failure modes yet.
For each alternative, capture:
  • Pros — what the approach genuinely brings in its favor.
  • Cons — what it genuinely brings against it.
  • Risks — negative impacts that might happen and must be managed. Managing means dealing with uncertainty, so each risk gets four attributes:
    • Impact if it occurs — low / medium / high
    • Probability of occurring — low / medium / high
    • Mitigation — actions to stop the risk from happening
    • Contingency — how you'd act if it happens anyway
A table keeps this scannable and forces the discipline of filling every cell. Group alternatives by the dimension being decided (data store, provisioning, language, …) so related options sit side by side. The exact column layout is shown in
references/example-rfc.md
— reuse it.
Crucially, weigh each alternative against the requirements from section 2, one by one. An option that wins on elegance but misses a hard requirement doesn't win.
这是文档体现价值的地方,也是你投入精力最多的部分(见先磨利斧头)。没有银弹,也没有万能方案——每个替代方案都有优缺点和风险,所有这些都需要深入分析和记录。暴露缺点不是削弱你的论点;而是让最终决策具备可信度。不要停留在表面分析:如果分析过程过于轻松,你可能还没找到替代方案的真正失效模式。
针对每个替代方案,记录:
  • 优点——该方案真正具备的优势。
  • 缺点——该方案真正存在的劣势。
  • 风险——可能发生的负面影响,必须加以管理。管理意味着应对不确定性,因此每个风险需包含四个属性:
    • 影响——如果发生,影响程度为低/中/高
    • 概率——发生的概率为低/中/高
    • 缓解措施——阻止风险发生的行动
    • 应急预案——如果风险发生,如何应对
表格形式便于扫描,并能强制要求填写每个单元格。按决策维度(数据存储、资源配置、编程语言等)对替代方案进行分组,使相关选项并列展示。具体列布局见
references/example-rfc.md
——直接复用即可。
关键是,逐一根据第2节的需求权衡每个替代方案。一个在优雅性上胜出但未满足硬性需求的选项不算胜出。

5 — The decision itself

5 — 决策内容

This is the moment the discipline has earned: now read the whole document end to end (see Work the steps in order) — requirements still right, every component tracing to one, the analysis genuinely supporting where it points — and revise what that pass exposes before you commit.
Then a decision has to be made and stated plainly — which alternative, and the reasoning that carried it. Name the decision style so the basis is on the record:
  • Autocratic — if it's your call to make, make it; consult others, but you own the outcome.
  • Democratic — one vote each, the majority decides. Not always applicable, but a fair tiebreaker when alternatives come out genuinely close.
Don't leave this implicit. A document that analyzes options but never commits leaves the reader exactly where they started.
这是严谨性带来的成果:现在通读整篇文档(见按顺序完成步骤)——检查需求是否仍然合理,每个组件是否都能追溯到需求,分析是否真正支持结论——在做出承诺之前,根据通读发现的问题进行修订。
然后必须明确做出并陈述决策——选择哪个替代方案,以及支持该选择的理由。明确决策方式,以便记录决策依据:
  • 独裁式——如果是由你做决策,就果断决定;可以咨询他人,但你对结果负责。
  • 民主式——一人一票,多数决定。并非总是适用,但当替代方案的优劣真正接近时,是公平的决胜方式。
不要隐含决策。一份只分析选项但未做出决策的文档会让读者回到原点。

6 — Conclude and communicate

6 — 总结与沟通

Record the most relevant points of the decided architecture, and — just as importantly — make sure every stakeholder ends up on the same page about the decision and its impacts. A decision nobody hears about isn't really made. For forward-looking docs this is often a rollout/launch strategy and a task roadmap; for retrospective docs it's the lessons learned, open improvement points, and a version history table (version, date, author, change) so the document stays a living record.
记录已确定架构的最相关要点,同样重要的是,确保所有利益相关方都对齐决策内容及其影响。没人知晓的决策不算真正的决策。对于前瞻性文档,这通常是发布/上线策略和任务路线图;对于回顾性文档,这是经验教训、待改进点,以及版本历史表(版本、日期、作者、变更内容),以便文档成为动态更新的记录。

Writing principles

写作原则

  • Newcomer-readable. Assume the reader is meeting the project for the first time. Spell out the obvious; define the acronyms.
  • Every claim tied to a requirement or to evidence. Link to the dashboard, the benchmark, the PostHog report, the schema. Decisions backed by data outlive opinions.
  • Simplify ruthlessly. If a section, alternative, or requirement isn't earning its place, cut it.
  • Be honest about downsides and risks. The credibility of the decision rests on having genuinely considered what could go wrong.
  • Match the source's language, structure, and formatting. Mirror the headings, numbered flows, and table styles the examples use, in the reader's language.
  • 便于新人理解。假设读者是第一次接触该项目。明确说明显而易见的内容;定义缩写词。
  • 每个主张都与需求或证据挂钩。链接到仪表板、基准测试、PostHog报告、模式定义。基于数据的决策比主观意见更持久。
  • 无情简化。如果某个章节、替代方案或需求没有价值,就删除它。
  • 诚实地说明劣势和风险。决策的可信度取决于是否真正考虑了可能出现的问题。
  • 匹配素材的语言、结构和格式。镜像示例使用的标题、编号流程和表格样式,使用读者的语言。

References

参考资料

  • references/example-rfc.md
    — a worked forward-looking RFC (search-service decision): full structure end to end, and the canonical tradeoff-table layout.
  • references/example-technical-doc.md
    — a worked retrospective technical doc (social-auth implementation): context → architecture → flows → risks → lessons learned → version history.
  • assets/template.md
    — fill-in skeleton for the forward-looking shape.
  • references/example-rfc.md
    —— 完整的前瞻性RFC示例(搜索服务决策):包含完整结构,以及标准的权衡表布局。
  • references/example-technical-doc.md
    —— 完整的回顾性技术文档示例(社交认证实现):背景→架构→流程→风险→经验教训→版本历史。
  • assets/template.md
    —— 前瞻性形式的填充模板。