otp-thinking
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOTP Thinking
OTP 思维
Paradigm shifts for OTP design. These insights challenge typical concurrency and state management patterns.
OTP 设计的范式转变。这些见解挑战了典型的并发和状态管理模式。
The Iron Law
铁律
GENSERVER IS A BOTTLENECK BY DESIGNA GenServer processes ONE message at a time. Before creating one, ask:
- Do I actually need serialized access?
- Will this become a throughput bottleneck?
- Can reads bypass the GenServer via ETS?
The ETS pattern: GenServer owns ETS table, writes serialize through GenServer, reads bypass it entirely with .
:read_concurrencyNo exceptions: Don't wrap stateless functions in GenServer. Don't create GenServer "for organization".
GENSERVER IS A BOTTLENECK BY DESIGNGenServer 一次仅处理一条消息。在创建它之前,请先问自己:
- 我真的需要序列化访问吗?
- 这会成为吞吐量瓶颈吗?
- 能否通过 ETS 绕过 GenServer 进行读取?
ETS 模式:GenServer 拥有 ETS 表,写入操作通过 GenServer 序列化执行,读取操作则完全通过 绕过它。
:read_concurrency无例外:不要将无状态函数包装在 GenServer 中。不要为了“代码组织”而创建 GenServer。
GenServer Patterns
GenServer 模式
| Function | Use For |
|---|---|
| Synchronous requests expecting replies |
| Fire-and-forget messages |
When in doubt, use to ensure back-pressure. Set appropriate timeouts for .
callcall/3Use for post-init work—keeps fast and non-blocking.
handle_continue/2init/1| 函数 | 适用场景 |
|---|---|
| 需要返回结果的同步请求 |
| 无需返回结果的“即发即弃”消息 |
拿不准时就用 ,以确保背压机制。为 设置合适的超时时间。
callcall/3使用 处理初始化后的工作——保持 快速且非阻塞。
handle_continue/2init/1Task.Supervisor, Not Task.async
使用 Task.Supervisor,而非 Task.async
Task.async| Pattern | On task crash |
|---|---|
| Caller crashes (linked, unsupervised) |
| Caller crashes (linked, supervised) |
| Caller survives, can handle error |
Use Task.Supervisor for: Production code, graceful shutdown, observability, .
Use Task.async for: Quick experiments, scripts, when crash-together is acceptable.
async_nolinkTask.async| 模式 | 任务崩溃时的表现 |
|---|---|
| 调用者崩溃(已链接、无监督) |
| 调用者崩溃(已链接、有监督) |
| 调用者存活,可处理错误 |
使用 Task.Supervisor 的场景:生产代码、优雅停机、可观测性、。
使用 Task.async 的场景:快速实验、脚本、可接受“同崩溃”的情况。
async_nolinkDynamicSupervisor + Registry = Named Dynamic Processes
DynamicSupervisor + Registry = 命名动态进程
DynamicSupervisor only supports (dynamic children have no ordering). Use Registry for names—never create atoms dynamically:
:one_for_oneelixir
defp via_tuple(id), do: {:via, Registry, {MyApp.Registry, id}}PartitionSupervisor scales DynamicSupervisor for millions of children.
DynamicSupervisor 仅支持 策略(动态子进程无顺序)。使用 Registry 来命名——切勿动态创建原子:
:one_for_oneelixir
defp via_tuple(id), do: {:via, Registry, {MyApp.Registry, id}}PartitionSupervisor 可扩展 DynamicSupervisor 以支持数百万个子进程。
:pg for Distributed, Registry for Local
:pg 用于分布式场景,Registry 用于本地场景
| Tool | Scope | Use Case |
|---|---|---|
| Registry | Single node | Named dynamic processes |
| :pg | Cluster-wide | Process groups, pub/sub |
:pg:pg2| 工具 | 作用范围 | 适用场景 |
|---|---|---|
| Registry | 单节点 | 命名动态进程 |
| :pg | 集群范围 | 进程组、发布/订阅 |
:pg:pg2Broadway vs Oban: Different Problems
Broadway vs Oban:解决不同问题
| Tool | Use For |
|---|---|
| Broadway | External queues (SQS, Kafka, RabbitMQ) — data ingestion with batching |
| Oban | Background jobs with database persistence |
Broadway is NOT a job queue.
| 工具 | 适用场景 |
|---|---|
| Broadway | 外部队列(SQS、Kafka、RabbitMQ)——带批处理的数据摄入 |
| Oban | 具备数据库持久化的后台任务 |
Broadway 不是任务队列。
Broadway Gotchas
Broadway 注意事项
Processors are for runtime, not code organization. Dispatch to modules in , don't add processors for different message types.
handle_messageone_for_all is for Broadway bugs, not your code. Your errors are caught and result in failed messages, not supervisor restarts.
handle_messageHandle expected failures in the producer (connection loss, rate limits). Reserve max_restarts for unexpected bugs.
处理器用于运行时,而非代码组织。 在 中分发到不同模块,不要为不同消息类型添加处理器。
handle_messageone_for_all 策略用于处理 Broadway 自身的 bug,而非你的代码。 你的 错误会被捕获并标记为失败消息,不会触发监督器重启。
handle_message在生产者中处理预期失败(连接丢失、速率限制)。将 max_restarts 留给意外的 bug。
Supervision Strategies Encode Dependencies
监督策略编码依赖关系
| Strategy | Children Relationship |
|---|---|
| :one_for_one | Independent |
| :one_for_all | Interdependent (all restart) |
| :rest_for_one | Sequential dependency |
Use and to prevent restart loops.
:max_restarts:max_secondsThink about failure cascades BEFORE coding.
| 策略 | 子进程关系 |
|---|---|
| :one_for_one | 相互独立 |
| :one_for_all | 相互依赖(全部重启) |
| :rest_for_one | 顺序依赖 |
使用 和 防止重启循环。
:max_restarts:max_seconds在编码前就考虑故障连锁反应。
Abstraction Decision Tree
抽象决策树
Need state?
├── No → Plain function
└── Yes → Complex behavior?
├── No → Agent
└── Yes → Supervision?
├── No → spawn_link
└── Yes → Request/response?
├── No → Task.Supervisor
└── Yes → Explicit states?
├── No → GenServer
└── Yes → GenStateMachine需要状态?
├── 否 → 普通函数
└── 是 → 是否有复杂行为?
├── 否 → Agent
└── 是 → 是否需要监督?
├── 否 → spawn_link
└── 是 → 是否需要请求/响应?
├── 否 → Task.Supervisor
└── 是 → 是否有明确状态?
├── 否 → GenServer
└── 是 → GenStateMachineStorage Options
存储选项
| Need | Use |
|---|---|
| Memory cache | ETS ( |
| Static config | :persistent_term (faster than ETS) |
| Disk persistence | DETS (2GB limit) |
| Transactions/Distribution | Mnesia |
| 需求 | 工具 |
|---|---|
| 内存缓存 | ETS(读取时使用 |
| 静态配置 | :persistent_term(比 ETS 更快) |
| 磁盘持久化 | DETS(2GB 限制) |
| 事务/分布式 | Mnesia |
:sys Debugs ANY OTP Process
:sys 可调试任意 OTP 进程
elixir
:sys.get_state(pid) # Current state
:sys.trace(pid, true) # Trace events (TURN OFF when done!)elixir
:sys.get_state(pid) # 当前状态
:sys.trace(pid, true) # 跟踪事件(使用后务必关闭!)Telemetry Is Built Into Everything
Telemetry 内置于所有组件
Phoenix, Ecto, and most libraries emit telemetry events. Attach handlers:
elixir
:telemetry.attach("my-handler", [:phoenix, :endpoint, :stop], &handle/4, nil)Use + reporters (StatsD, Prometheus, LiveDashboard).
Telemetry.MetricsPhoenix、Ecto 及大多数库都会发送 telemetry 事件。附加处理程序:
elixir
:telemetry.attach("my-handler", [:phoenix, :endpoint, :stop], &handle/4, nil)使用 + 报告器(StatsD、Prometheus、LiveDashboard)。
Telemetry.MetricsRed Flags - STOP and Reconsider
危险信号 - 立即停止并重新考虑
- GenServer wrapping stateless computation
- Task.async in production when you need error handling
- Creating atoms dynamically for process names
- Single GenServer becoming throughput bottleneck
- Using Broadway for background jobs (use Oban)
- Using Oban for external queue consumption (use Broadway)
- No supervision strategy reasoning
Any of these? Re-read The Iron Law and use the Abstraction Decision Tree.
- 用 GenServer 包装无状态计算
- 在需要错误处理的生产环境中使用 Task.async
- 动态创建原子作为进程名称
- 单个 GenServer 成为吞吐量瓶颈
- 使用 Broadway 处理后台任务(应使用 Oban)
- 使用 Oban 消费外部队列(应使用 Broadway)
- 未对监督策略进行合理性考量
出现以上任何一种情况?请重新阅读《铁律》并使用抽象决策树。