feature-flags-architect

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Feature Flags Architect

Feature Flags 架构师

End-to-end discipline for feature flags: classify them, ship them, ramp them, and retire them. Most teams treat flags as throwaway
if
-statements; this skill treats them as a controlled lifecycle with measurable debt.
Feature Flags全生命周期管理规范:分类、发布、逐步放量、停用。大多数团队将开关视为一次性的
if
语句;本技能将其视为可量化债务的受控生命周期。

When to use

适用场景

  • Adding a new flag and need a rollout plan
  • Auditing a codebase for stale or orphaned flags
  • Choosing a flag provider (LaunchDarkly vs GrowthBook vs Statsig vs Unleash vs Flipt vs build-your-own)
  • Designing a kill-switch path for a risky launch
  • Cleaning up flag debt before a release freeze
  • Reviewing whether a feature should ship behind a flag at all
  • 添加新开关并需要制定发布计划
  • 审计代码库中的过期或孤立开关
  • 选择开关供应商(LaunchDarkly vs GrowthBook vs Statsig vs Unleash vs Flipt vs 自研)
  • 为高风险发布设计熔断开关路径
  • 在发布冻结前清理开关债务
  • 评审功能是否应该通过开关发布

Core principle: flags are a lifecycle, not an
if

核心原则:开关是生命周期,而非
if
语句

request → design → ship → ramp → cleanup → archive
Flags that skip cleanup become debt: dead branches, stale defaults, untested code paths, unbounded blast radius. The three scripts in this skill enforce the lifecycle.
需求 → 设计 → 发布 → 放量 → 清理 → 归档
跳过清理环节的开关会变成债务:死分支、过期默认值、未测试的代码路径、无限制影响范围。本技能中的三个脚本可强制规范生命周期。

Quick start

快速开始

bash
undefined
bash
undefined

1. Audit the repo for flag debt

1. 审计代码库中的开关债务

python scripts/flag_debt_scanner.py --repo . --max-age-days 90
python scripts/flag_debt_scanner.py --repo . --max-age-days 90

2. Plan a progressive rollout for a new flag

2. 为新开关制定渐进式发布计划

python scripts/rollout_planner.py --population 100000 --target-percent 100 --duration-days 14 --strategy ring
python scripts/rollout_planner.py --population 100000 --target-percent 100 --duration-days 14 --strategy ring

3. Verify every flag has a documented kill switch

3. 验证每个开关都有文档化的熔断开关

python scripts/kill_switch_audit.py --repo . --flag-doc docs/feature-flags.md
undefined
python scripts/kill_switch_audit.py --repo . --flag-doc docs/feature-flags.md
undefined

The 4 flag types (taxonomy)

4种开关类型(分类体系)

Different flag types have different lifespans and ownership. Misclassifying creates debt.
TypePurposeTypical lifespanOwnerCleanup trigger
ReleaseHide unfinished features in productiondays–weeksEng100% rollout reached
ExperimentA/B test variantsweeksProduct/MarketingTest concluded; winner picked
OperationalCircuit breakers, perf toggles, kill switchesmonths–yearsEng/SREReplaced by autoscaling/feature retirement
PermissionEntitlements per user/account/planyears (permanent)ProductPlan/role removed
Only Release and Experiment flags should be on a debt-scanner watchlist. Operational and Permission flags are by design long-lived. See
references/flag_taxonomy.md
for decision tree.
不同类型的开关有不同的生命周期和归属方,分类错误会产生债务。
类型用途典型生命周期归属方清理触发条件
Release隐藏生产环境中未完成的功能数天至数周工程师团队完成100%放量
ExperimentA/B测试变体数周产品/营销团队测试结束,确定胜出方案
Operational断路器、性能开关、熔断开关数月至数年工程师/SRE团队被自动扩缩容/功能停用替代
Permission按用户/账户/套餐设置权限数年(永久)产品团队套餐/角色被移除
仅Release和Experiment类型的开关需要纳入债务扫描监控列表。Operational和Permission类型的开关设计为长期存在。详见
references/flag_taxonomy.md
中的决策树。

The 3 Python tools

3个Python工具

All three are stdlib-only. Run with
--help
.
所有工具均仅依赖Python标准库,可通过
--help
查看使用说明。

flag_debt_scanner.py

flag_debt_scanner.py

Finds flags older than
--max-age-days
with low usage, suggesting candidates for cleanup.
bash
python scripts/flag_debt_scanner.py --repo . --max-age-days 90 --format text
python scripts/flag_debt_scanner.py --repo . --max-age-days 60 --format json > debt.json
Detection heuristic:
  1. Walk
    --repo
    for code references matching common flag-call patterns:
    • flag("...")
      ,
      isFlagEnabled("...")
      ,
      featureFlag("...")
      ,
      getFlag("...")
    • client.variation("...", ...)
      ,
      unleash.isEnabled("...")
      ,
      growthbook.feature("...")
  2. For each unique flag identifier, find the oldest commit that introduced it (
    git log --diff-filter=A -S <name>
    ).
  3. Flag as DEBT if introduced >
    --max-age-days
    ago AND used in ≤
    --min-uses
    places.
Outputs flag name, age in days, file references, suggested action. JSON mode is CI-friendly.
查找超过
--max-age-days
且使用率低的开关,建议清理候选对象。
bash
python scripts/flag_debt_scanner.py --repo . --max-age-days 90 --format text
python scripts/flag_debt_scanner.py --repo . --max-age-days 60 --format json > debt.json
检测逻辑:
  1. 遍历
    --repo
    目录,匹配常见开关调用模式的代码引用:
    • flag("...")
      ,
      isFlagEnabled("...")
      ,
      featureFlag("...")
      ,
      getFlag("...")
    • client.variation("...", ...)
      ,
      unleash.isEnabled("...")
      ,
      growthbook.feature("...")
  2. 针对每个唯一开关标识,查找引入它的最早提交记录(
    git log --diff-filter=A -S <name>
    )。
  3. 如果开关引入时间超过
    --max-age-days
    且使用次数≤
    --min-uses
    ,则标记为债务。
输出开关名称、存在天数、文件引用、建议操作。JSON模式适用于CI流程。

rollout_planner.py

rollout_planner.py

Generates a phased rollout schedule from population size, target percent, duration, and strategy.
bash
python scripts/rollout_planner.py --population 100000 --target-percent 100 --duration-days 14 --strategy ring
python scripts/rollout_planner.py --population 50000 --target-percent 25 --duration-days 7 --strategy linear
python scripts/rollout_planner.py --population 1000000 --target-percent 100 --duration-days 30 --strategy log
Strategies:
  • ring
    : 1% → 5% → 25% → 50% → 100%, evenly spaced. Default for risky launches.
  • linear
    : constant rate per day. Default for medium-risk.
  • log
    : rapid early, slow tail. Default for low-risk launches with confidence.
  • cohort
    : by named cohort (internal → beta → free → paid → all).
Outputs a markdown table with date, percent, expected user count, abort criteria, and verification step per phase.
根据用户规模、目标放量比例、时长和策略生成分阶段发布计划。
bash
python scripts/rollout_planner.py --population 100000 --target-percent 100 --duration-days 14 --strategy ring
python scripts/rollout_planner.py --population 50000 --target-percent 25 --duration-days 7 --strategy linear
python scripts/rollout_planner.py --population 1000000 --target-percent 100 --duration-days 30 --strategy log
策略类型:
  • ring
    :1% → 5% → 25% → 50% → 100%,均匀间隔。适用于高风险发布。
  • linear
    :每日固定比例放量。适用于中等风险发布。
  • log
    :前期快速放量,后期放缓。适用于低风险且有信心的发布。
  • cohort
    :按指定用户群体放量(内部人员 → 测试用户 → 免费用户 → 付费用户 → 全部用户)。
输出Markdown表格,包含各阶段的日期、放量比例、预计用户数、终止条件和验证步骤。

kill_switch_audit.py

kill_switch_audit.py

Cross-references code-discovered flags against documentation to verify each has a kill switch path written down.
bash
python scripts/kill_switch_audit.py --repo . --flag-doc docs/feature-flags.md
python scripts/kill_switch_audit.py --repo . --flag-doc runbooks/flags.md --format json
What it checks:
  1. Every code-discovered flag has an entry in
    --flag-doc
  2. Each entry declares: owner, type, kill-switch trigger, monitoring dashboard
  3. Reports flags missing documentation (FAIL) or missing fields (WARN)
Use as a pre-merge gate before any new flag ships.
将代码中发现的开关与文档交叉比对,验证每个开关都有书面记录的熔断开关路径。
bash
python scripts/kill_switch_audit.py --repo . --flag-doc docs/feature-flags.md
python scripts/kill_switch_audit.py --repo . --flag-doc runbooks/flags.md --format json
检查内容:
  1. 代码中发现的每个开关在
    --flag-doc
    中都有条目
  2. 每个条目需声明:归属方、类型、熔断触发条件、监控仪表盘
  3. 报告缺少文档的开关(失败)或缺少字段的开关(警告)
可作为新开关发布前的合并前置检查。

Provider chooser (5 + DIY)

供应商选择(5种+自研)

ProviderBest forPricing modelLock-in riskOSS option
LaunchDarklyEnterprise, complex targeting, audit/compliancePer-MAU, expensiveHighNo
GrowthBookMid-market, A/B testing focused, OSS-friendlyPer-MAU + OSSLowYes (self-host)
StatsigGrowth/product teams, advanced experimentationFree tier + per-MAUMediumNo
UnleashOSS-first, self-hosted, dev-friendlyOSS + EnterpriseLowYes
FliptLightweight, k8s-native, simple needsOSS-onlyNoneYes
DIY<100 flags, no targeting, full controlNoneNoneN/A
Decision rules:
  • <50 flags + no targeting → DIY with config file or env vars
  • Need analytics + experimentation → Statsig or GrowthBook
  • Compliance/SOC2 audit logs required → LaunchDarkly
  • Self-hosting required (data residency / air-gapped) → Unleash or Flipt
  • See
    references/provider_comparison.md
    for detail.
供应商最佳适用场景定价模式锁定风险开源选项
LaunchDarkly企业级、复杂定向规则、审计/合规需求按月活跃用户(MAU)收费,价格较高
GrowthBook中端市场、聚焦A/B测试、开源友好按MAU收费 + 开源版本是(可自托管)
Statsig增长/产品团队、高级实验需求免费版 + 按MAU收费
Unleash开源优先、自托管、开发者友好开源版 + 企业版
Flipt轻量级、K8s原生、需求简单仅开源
自研开关数量<100、无需定向规则、完全可控不适用
决策规则:
  • 开关数量<50且无需定向规则 → 使用配置文件或环境变量自研
  • 需要分析能力+实验功能 → 选择Statsig或GrowthBook
  • 需要合规/SOC2审计日志 → 选择LaunchDarkly
  • 必须自托管(数据驻留/隔离环境) → 选择Unleash或Flipt
  • 详见
    references/provider_comparison.md
    中的详细说明。

Workflows

工作流

Workflow 1: Ship a new feature behind a flag

工作流1:通过开关发布新功能

1. Classify: which of the 4 flag types?
   → Release (most common for engineering work)
2. Run rollout_planner.py to design the ramp
3. Add flag entry to docs/feature-flags.md BEFORE writing code:
   - name, owner, type, kill-switch trigger, dashboard URL
4. Write the code with the flag
5. Run kill_switch_audit.py — must pass before merge
6. Deploy at 0%; verify kill switch works
7. Execute rollout schedule; abort if abort criteria met
8. At 100% for 7+ days: remove flag, delete dead branch, archive doc entry
1. 分类:属于4种开关类型中的哪一种?
   → Release(工程师工作中最常见)
2. 运行rollout_planner.py设计放量计划
3. 在编写代码前,将开关条目添加到docs/feature-flags.md:
   - 名称、归属方、类型、熔断触发条件、仪表盘URL
4. 编写包含开关的代码
5. 运行kill_switch_audit.py — 必须通过才能合并
6. 以0%比例部署;验证熔断开关可用
7. 执行发布计划;若触发终止条件则停止放量
8. 100%放量持续7天以上:移除开关、删除死分支、归档文档条目

Workflow 2: Quarterly flag cleanup

工作流2:季度开关清理

1. Run flag_debt_scanner.py --repo . --max-age-days 90 > debt.md
2. For each flagged item:
   a. Confirm it reached 100% (or was killed)
   b. Find the issue/PR that introduced it; verify owner agrees to remove
   c. Delete dead branches; remove flag config
   d. Run kill_switch_audit.py — should now show one fewer flag
3. Update CHANGELOG: "Removed N stale flags"
1. 运行flag_debt_scanner.py --repo . --max-age-days 90 > debt.md
2. 针对每个标记的条目:
   a. 确认已完成100%放量(或已停用)
   b. 查找引入开关的工单/PR;确认归属方同意移除
   c. 删除死分支;移除开关配置
   d. 运行kill_switch_audit.py — 应显示开关数量减少一个
3. 更新CHANGELOG:“移除N个过期开关”

Workflow 3: Choose a provider

工作流3:选择供应商

1. Estimate flag count (current + 12-month projection)
2. Required features:
   - Targeting rules (user, account, geo, %)?
   - A/B testing + stats?
   - Audit log / SOC2?
   - Self-hosting / data residency?
3. Pricing budget (MAU * cost-per-MAU)
4. See provider_comparison.md decision tree
5. Build a 30-day proof-of-concept before signing
1. 估算开关数量(当前+12个月预测)
2. 所需功能:
   - 定向规则(用户、账户、地域、比例)?
   - A/B测试+统计分析?
   - 审计日志/SOC2合规?
   - 自托管/数据驻留?
3. 预算(MAU * 每MAU成本)
4. 查看provider_comparison.md中的决策树
5. 签署协议前进行30天概念验证

Workflow 4: Design a kill switch

工作流4:设计熔断开关

1. Identify the failure modes:
   - Latency spike (which threshold?)
   - Error rate spike (which threshold?)
   - Business metric regression (which threshold?)
2. Wire each to an abort:
   - Manual: dashboard link + on-call playbook
   - Automated: alert threshold flips flag back to 0%
3. Test the kill switch in staging BEFORE production rollout
4. Document in flag-doc; pass kill_switch_audit.py
1. 识别故障模式:
   - 延迟飙升(阈值是多少?)
   - 错误率飙升(阈值是多少?)
   - 业务指标退化(阈值是多少?)
2. 为每种故障模式配置终止机制:
   - 手动:仪表盘链接 + 值班手册
   - 自动:告警阈值触发开关回退至0%
3. 在生产放量前,在预发布环境测试熔断开关
4. 在开关文档中记录;确保通过kill_switch_audit.py检查

References

参考文档

  • references/flag_taxonomy.md
    — 4 types, decision tree, ownership, lifespan
  • references/provider_comparison.md
    — LaunchDarkly / GrowthBook / Statsig / Unleash / Flipt / DIY trade-offs
  • references/rollout_strategies.md
    — ring / linear / log / cohort / geo, abort criteria, monitoring
  • references/flag_lifecycle.md
    — request → design → ship → ramp → cleanup → archive
  • references/flag_taxonomy.md
    — 4种开关类型、决策树、归属方、生命周期
  • references/provider_comparison.md
    — LaunchDarkly/GrowthBook/Statsig/Unleash/Flipt/自研的对比
  • references/rollout_strategies.md
    — 环形/线性/对数/群体/地域放量策略、终止条件、监控
  • references/flag_lifecycle.md
    — 需求 → 设计 → 发布 → 放量 → 清理 → 归档

Slash command

斜杠命令

/flag-cleanup
— Run the full cleanup workflow on the current repo: scan for debt, generate a removal plan, audit kill switches.
/flag-cleanup
— 在当前代码库运行完整清理工作流:扫描债务、生成移除计划、审计熔断开关。

Asset templates

资产模板

  • assets/flag_request_template.md
    — fill-in form for new flag requests (name, owner, type, kill switch, rollout plan)
  • assets/flag_request_template.md
    — 新开关请求填写模板(名称、归属方、类型、熔断开关、发布计划)

Anti-patterns

反模式

  • Permanent flag with
    if (FLAG_FOO)
    50 places
    — should be a Permission flag with a runtime config, not a Release flag
  • Flag with no owner — when the original engineer leaves, no one cleans it up
  • No kill switch documented — when the feature breaks, no one knows how to disable it
  • A/B test that ran 6 months — pick a winner; running indefinitely is debt
  • Flags as feature toggles for cosmetic changes — ship via deploy, not flag
  • 在50处使用
    if (FLAG_FOO)
    的永久开关
    — 应作为Permission类型开关使用运行时配置,而非Release类型
  • 无归属方的开关 — 原工程师离职后,无人负责清理
  • 未文档化的熔断开关 — 功能故障时,无人知晓如何禁用
  • 运行6个月的A/B测试 — 应确定胜出方案;持续运行会产生债务
  • 用于 cosmetic 变更的功能开关 — 应通过部署发布,而非开关

Verifiable success

可验证的成功指标

A team using this skill should achieve:
  • 100% of new flags pass
    kill_switch_audit.py
    at merge time
  • flag_debt_scanner.py --max-age-days 90
    returns ≤5 stale flags repo-wide
  • Every flag has a documented owner, type, and kill switch
  • Mean time to retire a Release flag: <60 days from 100% rollout
使用本技能的团队应达成:
  • 100%的新开关在合并时通过
    kill_switch_audit.py
    检查
  • flag_debt_scanner.py --max-age-days 90
    返回全仓库过期开关≤5个
  • 每个开关都有文档化的归属方、类型和熔断开关
  • Release类型开关从100%放量到停用的平均时间:<60天