caio-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

/cs:caio-review — CAIO Forcing Questions

/cs:caio-review — CAIO强制质询

Command:
/cs:caio-review <plan>
The eval-demanding CAIO pressure-tests any plan that involves AI. Six questions before any AI feature ships, any multi-year vendor commitment, or any AI team expansion.
命令:
/cs:caio-review <plan>
要求评估的CAIO会对任何涉及AI的计划进行压力测试。在任何AI功能上线、签署多年期供应商协议或AI团队扩张前,需回答以下六个问题。

When to Run

适用场景

  • Before shipping any new AI-powered feature
  • Before signing a multi-year AI vendor contract (API or self-hosted infra)
  • Before EU launch of any AI feature
  • Before a major AI team hire (especially ML engineer or research scientist)
  • Before a fine-tuning project commitment
  • Before adopting AI in a regulated domain (employment, credit, healthcare, education, etc.)
  • When the founder uses the word "AI" near "competitive advantage" or "moat"
  • 任何新AI驱动功能上线前
  • 签署多年期AI供应商合同(API或自托管基础设施)前
  • 任何AI功能在欧盟上线前
  • 招聘核心AI团队成员前(尤其是ML工程师或研究科学家)
  • 承诺微调项目前
  • 在受监管领域(就业、信贷、医疗、教育等)采用AI前
  • 创始人将“AI”与“竞争优势”或“护城河”关联提及之时

The Six CAIO Questions

六个CAIO质询问题

1. What does this AI need to be good at, and how would you measure it?

1. 该AI需要擅长什么,你将如何衡量?

No eval set = no ship. Before any AI feature deploys, define the eval criteria.
  • 50-100 representative inputs minimum
  • Expected outputs OR rubric for grading
  • Edge cases: ambiguous, adversarial, format-edge
  • If you can't write down what "good" looks like, you don't have a feature; you have a vibe.
没有评估数据集 = 不能上线。 在部署任何AI功能前,需明确评估标准。
  • 至少50-100个代表性输入样本
  • 预期输出或评分规则
  • 边缘案例:模糊输入、对抗性输入、格式边界案例
  • 如果你无法明确写出“优秀”的定义,那你拥有的不是一个功能,只是一种感觉。

2. What's the SLO on hallucination / error rate, and what's the fallback?

2. 幻觉/错误率的服务水平目标(SLO)是什么,fallback方案是什么?

Every AI feature has a failure mode. Plan for it.
  • Quantified SLO: "<5% hallucination on factual queries"
  • Detection mechanism: monitoring, sampling, customer feedback loop
  • Fallback: human-in-loop review, lower-risk default response, refuse-to-answer
  • Blast radius if SLO breached: how many users affected, what is the cost?
每个AI功能都有故障模式。需提前规划应对方案。
  • 量化的SLO:例如“事实查询的幻觉率<5%”
  • 检测机制:监控、抽样、客户反馈循环
  • Fallback方案:人工介入审核、低风险默认响应、拒绝回答
  • SLO突破后的影响范围:受影响用户数量、损失成本?

3. What's the risk tier under EU AI Act, and is conformity assessment required?

3. 该AI在《欧盟AI法案》下的风险等级是什么,是否需要合规评估?

Run
ai_risk_classifier.py
if any EU residents are affected OR domain is regulated.
  • PROHIBITED → cannot launch in EU; re-scope
  • HIGH → conformity assessment + EU DB registration + 10 Articles of obligations (3-12 months, $50-200K)
  • LIMITED → transparency obligations (chatbot disclosure, AI-generated content marking)
  • MINIMAL → no specific obligations; NIST AI RMF voluntary
如果有欧盟居民受影响或涉及受监管领域,请运行
ai_risk_classifier.py
  • 禁止类(PROHIBITED)→ 无法在欧盟上线;需重新规划范围
  • 高风险类(HIGH)→ 需合规评估+欧盟数据库注册+10项义务条款(耗时3-12个月,成本5-20万美元)
  • 有限风险类(LIMITED)→ 需履行透明度义务(聊天机器人披露、AI生成内容标记)
  • 低风险类(MINIMAL)→ 无特定义务;可自愿遵循NIST AI风险管理框架(RMF)

4. API, fine-tune, or build?

4. 选择API、微调还是自研?

Run
model_buildvsbuy_calculator.py
for the specific use case.
  • 80% of B2B SaaS use cases: API
  • 15%: fine-tune (when domain-specific behavior + labeled data + ML team + high volume)
  • <1%: build from scratch
  • Decision must consider economic breakeven AND practical feasibility (data, team, compliance)
针对具体用例运行
model_buildvsbuy_calculator.py
  • 80%的B2B SaaS用例:选择API
  • 15%:选择微调(当需要特定领域行为、有标注数据、具备ML团队且业务量较高时)
  • <1%:从零开始自研
  • 决策需考虑经济盈亏平衡点和实际可行性(数据、团队、合规性)

5. What's the 12-month cost trajectory at expected scale?

5. 预期规模下的12个月成本趋势是什么?

Run
ai_cost_economics.py
for the workload.
  • API: variable, scales linearly
  • Self-hosted: mostly fixed, breakeven typically 1-10B tokens/month for 70B-class
  • Hidden costs of self-hosted: ops, monitoring, model updates, capacity, failover, security
  • Hidden costs of API: vendor lock-in, capability drift, rate limits, data residency
  • Prompt caching is the most underrated lever; check provider support
针对工作负载运行
ai_cost_economics.py
  • API:可变成本,线性增长
  • 自托管:主要为固定成本,70B参数级模型通常在月处理10亿-100亿tokens时达到盈亏平衡
  • 自托管的隐性成本:运维、监控、模型更新、容量规划、故障转移、安全
  • API的隐性成本:供应商锁定、能力漂移、速率限制、数据驻留要求
  • 提示缓存是最被低估的成本优化手段;请确认供应商是否支持

6. What role unblocks this — and have we hired prerequisites first?

6. 哪个角色能推进这项工作——我们是否已先招聘了必备人员?

Map AI capability to specific role. Founders confuse AI engineer / ML engineer / research scientist.
  • AI engineer: applied + full-stack + prompts + evals + deployment (most startups need this)
  • ML engineer: fine-tuning + retraining infra (only after platform engineer + labeled data)
  • Research scientist: model invention (only if model IS the product)
  • Don't hire research scientist as first AI hire — they need infrastructure to be productive
将AI能力与具体角色对应。创始人常混淆AI工程师/ML工程师/研究科学家。
  • AI工程师:应用开发+全栈+提示工程+评估+部署(大多数初创企业需要这类人才)
  • ML工程师:微调+重训基础设施(需在平台工程师和标注数据到位后再招聘)
  • 研究科学家:模型研发(仅当模型本身就是产品时才需要)
  • 不要将研究科学家作为首位AI招聘人员——他们需要基础设施才能高效工作

Workflow

工作流程

bash
undefined
bash
undefined

1. Model selection check

1. 模型选择检查

python ../../../skills/chief-ai-officer-advisor/scripts/model_buildvsbuy_calculator.py use_case.json
python ../../../skills/chief-ai-officer-advisor/scripts/model_buildvsbuy_calculator.py use_case.json

2. Regulatory classification

2. 监管分类

python ../../../skills/chief-ai-officer-advisor/scripts/ai_risk_classifier.py use_case.json
python ../../../skills/chief-ai-officer-advisor/scripts/ai_risk_classifier.py use_case.json

3. Cost projection

3. 成本预测

python ../../../skills/chief-ai-officer-advisor/scripts/ai_cost_economics.py workload.json
undefined
python ../../../skills/chief-ai-officer-advisor/scripts/ai_cost_economics.py workload.json
undefined

Output Format

输出格式

markdown
undefined
markdown
undefined

CAIO Review: <plan>

CAIO审查:<plan>

Date: YYYY-MM-DD
日期: YYYY-MM-DD

The Decision Being Made

待决策事项

[one sentence — which CAIO decision: model selection | risk classification | economics | next hire]
[一句话说明——CAIO需决策的内容:模型选择 | 风险分类 | 经济性 | 下一轮招聘]

Eval Discipline

评估规范

  • Eval set committed: yes/no
  • SLO defined: <metric> < <threshold>
  • Fallback behavior: <one line>
  • 已承诺评估数据集:是/否
  • 已定义SLO:<指标> < <阈值>
  • Fallback行为:<一句话描述>

Model Selection (if applicable)

模型选择(如适用)

  • Recommended: API / FINE_TUNE / BUILD
  • 3-year TCO: $X (chosen path) vs $Y (alternatives)
  • Breakeven: <volume>
  • 推荐方案:API / FINE_TUNE / BUILD
  • 3年总拥有成本(TCO):所选方案为$X,其他方案为$Y
  • 盈亏平衡点:<业务量>

Risk Classification (if applicable)

风险分类(如适用)

  • EU AI Act tier: PROHIBITED / HIGH / LIMITED / MINIMAL
  • Conformity assessment required: yes/no
  • US state triggers: [list]
  • Required controls open: N
  • 《欧盟AI法案》等级:PROHIBITED / HIGH / LIMITED / MINIMAL
  • 是否需要合规评估:是/否
  • 美国州级触发项:[列表]
  • 待完成的必要控制措施数量:N

Cost Economics (if applicable)

成本经济性(如适用)

  • Monthly cost at current volume: $X
  • Breakeven for self-hosted migration: <volume>
  • Migration cost if applicable: $X (3-6 months)
  • 当前业务量下的月度成本:$X
  • 迁移至自托管的盈亏平衡点:<业务量>
  • 迁移成本(如适用):$X(耗时3-6个月)

Org (if applicable)

组织架构(如适用)

  • Next hire: <role>
  • Why this, not the alternative: <one line>
  • Prerequisite hires in place: yes/no
  • 下一轮招聘角色:<角色>
  • 选择该角色而非其他的原因:<一句话描述>
  • 必备前置人员已到位:是/否

Verdict

结论

🟢 SHIP | 🟡 SHARPEN | 🔴 BLOCK
🟢 上线 | 🟡 优化 | 🔴 阻止

Next Steps

下一步行动

[3 concrete actions]
undefined
[3项具体行动]
undefined

Routing

路由

  • /cs:cdo-review
    — for any training-data implications
  • /cs:gc-review
    — for AI vendor contracts, output liability, training-data licensing
  • /cs:ciso-review
    — for prompt injection / jailbreak / training-data poisoning threat model
  • /cs:cfo-review
    — for multi-year vendor or GPU commitment TCO
  • /cs:chro-review
    — for AI team hires (comp, ladder, leveling)
  • /cs:decide
    — log the verdict
  • /cs:freeze 60
    — on multi-year AI commitments
  • /cs:cdo-review
    — 针对任何训练数据相关影响
  • /cs:gc-review
    — 针对AI供应商合同、输出责任、训练数据许可
  • /cs:ciso-review
    — 针对提示注入/越狱/训练数据投毒的威胁模型
  • /cs:cfo-review
    — 针对多年期供应商或GPU投入的总拥有成本(TCO)
  • /cs:chro-review
    — 针对AI团队招聘(薪酬、晋升体系、职级评定)
  • /cs:decide
    — 记录结论
  • /cs:freeze 60
    — 冻结多年期AI承诺

Related

相关内容

  • Agent:
    cs-caio-advisor
  • Skill:
    chief-ai-officer-advisor
  • Adjacent:
    ../../../skills/chief-data-officer-advisor/
    (training data rights, data strategy)

Version: 1.0.0
  • Agent:
    cs-caio-advisor
  • Skill:
    chief-ai-officer-advisor
  • 相关技能:
    ../../../skills/chief-data-officer-advisor/
    (训练数据权限、数据策略)

版本: 1.0.0