feature-flags-architect
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFeature Flags Architect
Feature Flags 架构师
End-to-end discipline for feature flags: classify them, ship them, ramp them, and retire them. Most teams treat flags as throwaway -statements; this skill treats them as a controlled lifecycle with measurable debt.
ifFeature Flags全生命周期管理规范:分类、发布、逐步放量、停用。大多数团队将开关视为一次性的语句;本技能将其视为可量化债务的受控生命周期。
ifWhen to use
适用场景
- Adding a new flag and need a rollout plan
- Auditing a codebase for stale or orphaned flags
- Choosing a flag provider (LaunchDarkly vs GrowthBook vs Statsig vs Unleash vs Flipt vs build-your-own)
- Designing a kill-switch path for a risky launch
- Cleaning up flag debt before a release freeze
- Reviewing whether a feature should ship behind a flag at all
- 添加新开关并需要制定发布计划
- 审计代码库中的过期或孤立开关
- 选择开关供应商(LaunchDarkly vs GrowthBook vs Statsig vs Unleash vs Flipt vs 自研)
- 为高风险发布设计熔断开关路径
- 在发布冻结前清理开关债务
- 评审功能是否应该通过开关发布
Core principle: flags are a lifecycle, not an if
if核心原则:开关是生命周期,而非if
语句
ifrequest → design → ship → ramp → cleanup → archiveFlags that skip cleanup become debt: dead branches, stale defaults, untested code paths, unbounded blast radius. The three scripts in this skill enforce the lifecycle.
需求 → 设计 → 发布 → 放量 → 清理 → 归档跳过清理环节的开关会变成债务:死分支、过期默认值、未测试的代码路径、无限制影响范围。本技能中的三个脚本可强制规范生命周期。
Quick start
快速开始
bash
undefinedbash
undefined1. Audit the repo for flag debt
1. 审计代码库中的开关债务
python scripts/flag_debt_scanner.py --repo . --max-age-days 90
python scripts/flag_debt_scanner.py --repo . --max-age-days 90
2. Plan a progressive rollout for a new flag
2. 为新开关制定渐进式发布计划
python scripts/rollout_planner.py --population 100000 --target-percent 100 --duration-days 14 --strategy ring
python scripts/rollout_planner.py --population 100000 --target-percent 100 --duration-days 14 --strategy ring
3. Verify every flag has a documented kill switch
3. 验证每个开关都有文档化的熔断开关
python scripts/kill_switch_audit.py --repo . --flag-doc docs/feature-flags.md
undefinedpython scripts/kill_switch_audit.py --repo . --flag-doc docs/feature-flags.md
undefinedThe 4 flag types (taxonomy)
4种开关类型(分类体系)
Different flag types have different lifespans and ownership. Misclassifying creates debt.
| Type | Purpose | Typical lifespan | Owner | Cleanup trigger |
|---|---|---|---|---|
| Release | Hide unfinished features in production | days–weeks | Eng | 100% rollout reached |
| Experiment | A/B test variants | weeks | Product/Marketing | Test concluded; winner picked |
| Operational | Circuit breakers, perf toggles, kill switches | months–years | Eng/SRE | Replaced by autoscaling/feature retirement |
| Permission | Entitlements per user/account/plan | years (permanent) | Product | Plan/role removed |
Only Release and Experiment flags should be on a debt-scanner watchlist. Operational and Permission flags are by design long-lived. See for decision tree.
references/flag_taxonomy.md不同类型的开关有不同的生命周期和归属方,分类错误会产生债务。
| 类型 | 用途 | 典型生命周期 | 归属方 | 清理触发条件 |
|---|---|---|---|---|
| Release | 隐藏生产环境中未完成的功能 | 数天至数周 | 工程师团队 | 完成100%放量 |
| Experiment | A/B测试变体 | 数周 | 产品/营销团队 | 测试结束,确定胜出方案 |
| Operational | 断路器、性能开关、熔断开关 | 数月至数年 | 工程师/SRE团队 | 被自动扩缩容/功能停用替代 |
| Permission | 按用户/账户/套餐设置权限 | 数年(永久) | 产品团队 | 套餐/角色被移除 |
仅Release和Experiment类型的开关需要纳入债务扫描监控列表。Operational和Permission类型的开关设计为长期存在。详见中的决策树。
references/flag_taxonomy.mdThe 3 Python tools
3个Python工具
All three are stdlib-only. Run with .
--help所有工具均仅依赖Python标准库,可通过查看使用说明。
--helpflag_debt_scanner.py
flag_debt_scanner.pyflag_debt_scanner.py
flag_debt_scanner.pyFinds flags older than with low usage, suggesting candidates for cleanup.
--max-age-daysbash
python scripts/flag_debt_scanner.py --repo . --max-age-days 90 --format text
python scripts/flag_debt_scanner.py --repo . --max-age-days 60 --format json > debt.jsonDetection heuristic:
- Walk for code references matching common flag-call patterns:
--repo- ,
flag("..."),isFlagEnabled("..."),featureFlag("...")getFlag("...") - ,
client.variation("...", ...),unleash.isEnabled("...")growthbook.feature("...")
- For each unique flag identifier, find the oldest commit that introduced it ().
git log --diff-filter=A -S <name> - Flag as DEBT if introduced > ago AND used in ≤
--max-age-daysplaces.--min-uses
Outputs flag name, age in days, file references, suggested action. JSON mode is CI-friendly.
查找超过且使用率低的开关,建议清理候选对象。
--max-age-daysbash
python scripts/flag_debt_scanner.py --repo . --max-age-days 90 --format text
python scripts/flag_debt_scanner.py --repo . --max-age-days 60 --format json > debt.json检测逻辑:
- 遍历目录,匹配常见开关调用模式的代码引用:
--repo- ,
flag("..."),isFlagEnabled("..."),featureFlag("...")getFlag("...") - ,
client.variation("...", ...),unleash.isEnabled("...")growthbook.feature("...")
- 针对每个唯一开关标识,查找引入它的最早提交记录()。
git log --diff-filter=A -S <name> - 如果开关引入时间超过且使用次数≤
--max-age-days,则标记为债务。--min-uses
输出开关名称、存在天数、文件引用、建议操作。JSON模式适用于CI流程。
rollout_planner.py
rollout_planner.pyrollout_planner.py
rollout_planner.pyGenerates a phased rollout schedule from population size, target percent, duration, and strategy.
bash
python scripts/rollout_planner.py --population 100000 --target-percent 100 --duration-days 14 --strategy ring
python scripts/rollout_planner.py --population 50000 --target-percent 25 --duration-days 7 --strategy linear
python scripts/rollout_planner.py --population 1000000 --target-percent 100 --duration-days 30 --strategy logStrategies:
- : 1% → 5% → 25% → 50% → 100%, evenly spaced. Default for risky launches.
ring - : constant rate per day. Default for medium-risk.
linear - : rapid early, slow tail. Default for low-risk launches with confidence.
log - : by named cohort (internal → beta → free → paid → all).
cohort
Outputs a markdown table with date, percent, expected user count, abort criteria, and verification step per phase.
根据用户规模、目标放量比例、时长和策略生成分阶段发布计划。
bash
python scripts/rollout_planner.py --population 100000 --target-percent 100 --duration-days 14 --strategy ring
python scripts/rollout_planner.py --population 50000 --target-percent 25 --duration-days 7 --strategy linear
python scripts/rollout_planner.py --population 1000000 --target-percent 100 --duration-days 30 --strategy log策略类型:
- :1% → 5% → 25% → 50% → 100%,均匀间隔。适用于高风险发布。
ring - :每日固定比例放量。适用于中等风险发布。
linear - :前期快速放量,后期放缓。适用于低风险且有信心的发布。
log - :按指定用户群体放量(内部人员 → 测试用户 → 免费用户 → 付费用户 → 全部用户)。
cohort
输出Markdown表格,包含各阶段的日期、放量比例、预计用户数、终止条件和验证步骤。
kill_switch_audit.py
kill_switch_audit.pykill_switch_audit.py
kill_switch_audit.pyCross-references code-discovered flags against documentation to verify each has a kill switch path written down.
bash
python scripts/kill_switch_audit.py --repo . --flag-doc docs/feature-flags.md
python scripts/kill_switch_audit.py --repo . --flag-doc runbooks/flags.md --format jsonWhat it checks:
- Every code-discovered flag has an entry in
--flag-doc - Each entry declares: owner, type, kill-switch trigger, monitoring dashboard
- Reports flags missing documentation (FAIL) or missing fields (WARN)
Use as a pre-merge gate before any new flag ships.
将代码中发现的开关与文档交叉比对,验证每个开关都有书面记录的熔断开关路径。
bash
python scripts/kill_switch_audit.py --repo . --flag-doc docs/feature-flags.md
python scripts/kill_switch_audit.py --repo . --flag-doc runbooks/flags.md --format json检查内容:
- 代码中发现的每个开关在中都有条目
--flag-doc - 每个条目需声明:归属方、类型、熔断触发条件、监控仪表盘
- 报告缺少文档的开关(失败)或缺少字段的开关(警告)
可作为新开关发布前的合并前置检查。
Provider chooser (5 + DIY)
供应商选择(5种+自研)
| Provider | Best for | Pricing model | Lock-in risk | OSS option |
|---|---|---|---|---|
| LaunchDarkly | Enterprise, complex targeting, audit/compliance | Per-MAU, expensive | High | No |
| GrowthBook | Mid-market, A/B testing focused, OSS-friendly | Per-MAU + OSS | Low | Yes (self-host) |
| Statsig | Growth/product teams, advanced experimentation | Free tier + per-MAU | Medium | No |
| Unleash | OSS-first, self-hosted, dev-friendly | OSS + Enterprise | Low | Yes |
| Flipt | Lightweight, k8s-native, simple needs | OSS-only | None | Yes |
| DIY | <100 flags, no targeting, full control | None | None | N/A |
Decision rules:
- <50 flags + no targeting → DIY with config file or env vars
- Need analytics + experimentation → Statsig or GrowthBook
- Compliance/SOC2 audit logs required → LaunchDarkly
- Self-hosting required (data residency / air-gapped) → Unleash or Flipt
- See for detail.
references/provider_comparison.md
| 供应商 | 最佳适用场景 | 定价模式 | 锁定风险 | 开源选项 |
|---|---|---|---|---|
| LaunchDarkly | 企业级、复杂定向规则、审计/合规需求 | 按月活跃用户(MAU)收费,价格较高 | 高 | 无 |
| GrowthBook | 中端市场、聚焦A/B测试、开源友好 | 按MAU收费 + 开源版本 | 低 | 是(可自托管) |
| Statsig | 增长/产品团队、高级实验需求 | 免费版 + 按MAU收费 | 中 | 无 |
| Unleash | 开源优先、自托管、开发者友好 | 开源版 + 企业版 | 低 | 是 |
| Flipt | 轻量级、K8s原生、需求简单 | 仅开源 | 无 | 是 |
| 自研 | 开关数量<100、无需定向规则、完全可控 | 无 | 无 | 不适用 |
决策规则:
- 开关数量<50且无需定向规则 → 使用配置文件或环境变量自研
- 需要分析能力+实验功能 → 选择Statsig或GrowthBook
- 需要合规/SOC2审计日志 → 选择LaunchDarkly
- 必须自托管(数据驻留/隔离环境) → 选择Unleash或Flipt
- 详见中的详细说明。
references/provider_comparison.md
Workflows
工作流
Workflow 1: Ship a new feature behind a flag
工作流1:通过开关发布新功能
1. Classify: which of the 4 flag types?
→ Release (most common for engineering work)
2. Run rollout_planner.py to design the ramp
3. Add flag entry to docs/feature-flags.md BEFORE writing code:
- name, owner, type, kill-switch trigger, dashboard URL
4. Write the code with the flag
5. Run kill_switch_audit.py — must pass before merge
6. Deploy at 0%; verify kill switch works
7. Execute rollout schedule; abort if abort criteria met
8. At 100% for 7+ days: remove flag, delete dead branch, archive doc entry1. 分类:属于4种开关类型中的哪一种?
→ Release(工程师工作中最常见)
2. 运行rollout_planner.py设计放量计划
3. 在编写代码前,将开关条目添加到docs/feature-flags.md:
- 名称、归属方、类型、熔断触发条件、仪表盘URL
4. 编写包含开关的代码
5. 运行kill_switch_audit.py — 必须通过才能合并
6. 以0%比例部署;验证熔断开关可用
7. 执行发布计划;若触发终止条件则停止放量
8. 100%放量持续7天以上:移除开关、删除死分支、归档文档条目Workflow 2: Quarterly flag cleanup
工作流2:季度开关清理
1. Run flag_debt_scanner.py --repo . --max-age-days 90 > debt.md
2. For each flagged item:
a. Confirm it reached 100% (or was killed)
b. Find the issue/PR that introduced it; verify owner agrees to remove
c. Delete dead branches; remove flag config
d. Run kill_switch_audit.py — should now show one fewer flag
3. Update CHANGELOG: "Removed N stale flags"1. 运行flag_debt_scanner.py --repo . --max-age-days 90 > debt.md
2. 针对每个标记的条目:
a. 确认已完成100%放量(或已停用)
b. 查找引入开关的工单/PR;确认归属方同意移除
c. 删除死分支;移除开关配置
d. 运行kill_switch_audit.py — 应显示开关数量减少一个
3. 更新CHANGELOG:“移除N个过期开关”Workflow 3: Choose a provider
工作流3:选择供应商
1. Estimate flag count (current + 12-month projection)
2. Required features:
- Targeting rules (user, account, geo, %)?
- A/B testing + stats?
- Audit log / SOC2?
- Self-hosting / data residency?
3. Pricing budget (MAU * cost-per-MAU)
4. See provider_comparison.md decision tree
5. Build a 30-day proof-of-concept before signing1. 估算开关数量(当前+12个月预测)
2. 所需功能:
- 定向规则(用户、账户、地域、比例)?
- A/B测试+统计分析?
- 审计日志/SOC2合规?
- 自托管/数据驻留?
3. 预算(MAU * 每MAU成本)
4. 查看provider_comparison.md中的决策树
5. 签署协议前进行30天概念验证Workflow 4: Design a kill switch
工作流4:设计熔断开关
1. Identify the failure modes:
- Latency spike (which threshold?)
- Error rate spike (which threshold?)
- Business metric regression (which threshold?)
2. Wire each to an abort:
- Manual: dashboard link + on-call playbook
- Automated: alert threshold flips flag back to 0%
3. Test the kill switch in staging BEFORE production rollout
4. Document in flag-doc; pass kill_switch_audit.py1. 识别故障模式:
- 延迟飙升(阈值是多少?)
- 错误率飙升(阈值是多少?)
- 业务指标退化(阈值是多少?)
2. 为每种故障模式配置终止机制:
- 手动:仪表盘链接 + 值班手册
- 自动:告警阈值触发开关回退至0%
3. 在生产放量前,在预发布环境测试熔断开关
4. 在开关文档中记录;确保通过kill_switch_audit.py检查References
参考文档
- — 4 types, decision tree, ownership, lifespan
references/flag_taxonomy.md - — LaunchDarkly / GrowthBook / Statsig / Unleash / Flipt / DIY trade-offs
references/provider_comparison.md - — ring / linear / log / cohort / geo, abort criteria, monitoring
references/rollout_strategies.md - — request → design → ship → ramp → cleanup → archive
references/flag_lifecycle.md
- — 4种开关类型、决策树、归属方、生命周期
references/flag_taxonomy.md - — LaunchDarkly/GrowthBook/Statsig/Unleash/Flipt/自研的对比
references/provider_comparison.md - — 环形/线性/对数/群体/地域放量策略、终止条件、监控
references/rollout_strategies.md - — 需求 → 设计 → 发布 → 放量 → 清理 → 归档
references/flag_lifecycle.md
Slash command
斜杠命令
/flag-cleanup/flag-cleanupAsset templates
资产模板
- — fill-in form for new flag requests (name, owner, type, kill switch, rollout plan)
assets/flag_request_template.md
- — 新开关请求填写模板(名称、归属方、类型、熔断开关、发布计划)
assets/flag_request_template.md
Anti-patterns
反模式
- Permanent flag with 50 places — should be a Permission flag with a runtime config, not a Release flag
if (FLAG_FOO) - Flag with no owner — when the original engineer leaves, no one cleans it up
- No kill switch documented — when the feature breaks, no one knows how to disable it
- A/B test that ran 6 months — pick a winner; running indefinitely is debt
- Flags as feature toggles for cosmetic changes — ship via deploy, not flag
- 在50处使用的永久开关 — 应作为Permission类型开关使用运行时配置,而非Release类型
if (FLAG_FOO) - 无归属方的开关 — 原工程师离职后,无人负责清理
- 未文档化的熔断开关 — 功能故障时,无人知晓如何禁用
- 运行6个月的A/B测试 — 应确定胜出方案;持续运行会产生债务
- 用于 cosmetic 变更的功能开关 — 应通过部署发布,而非开关
Verifiable success
可验证的成功指标
A team using this skill should achieve:
- 100% of new flags pass at merge time
kill_switch_audit.py - returns ≤5 stale flags repo-wide
flag_debt_scanner.py --max-age-days 90 - Every flag has a documented owner, type, and kill switch
- Mean time to retire a Release flag: <60 days from 100% rollout
使用本技能的团队应达成:
- 100%的新开关在合并时通过检查
kill_switch_audit.py - 返回全仓库过期开关≤5个
flag_debt_scanner.py --max-age-days 90 - 每个开关都有文档化的归属方、类型和熔断开关
- Release类型开关从100%放量到停用的平均时间:<60天