caio-review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese/cs:caio-review — CAIO Forcing Questions
/cs:caio-review — CAIO强制质询
Command:
/cs:caio-review <plan>The eval-demanding CAIO pressure-tests any plan that involves AI. Six questions before any AI feature ships, any multi-year vendor commitment, or any AI team expansion.
命令:
/cs:caio-review <plan>要求评估的CAIO会对任何涉及AI的计划进行压力测试。在任何AI功能上线、签署多年期供应商协议或AI团队扩张前,需回答以下六个问题。
When to Run
适用场景
- Before shipping any new AI-powered feature
- Before signing a multi-year AI vendor contract (API or self-hosted infra)
- Before EU launch of any AI feature
- Before a major AI team hire (especially ML engineer or research scientist)
- Before a fine-tuning project commitment
- Before adopting AI in a regulated domain (employment, credit, healthcare, education, etc.)
- When the founder uses the word "AI" near "competitive advantage" or "moat"
- 任何新AI驱动功能上线前
- 签署多年期AI供应商合同(API或自托管基础设施)前
- 任何AI功能在欧盟上线前
- 招聘核心AI团队成员前(尤其是ML工程师或研究科学家)
- 承诺微调项目前
- 在受监管领域(就业、信贷、医疗、教育等)采用AI前
- 创始人将“AI”与“竞争优势”或“护城河”关联提及之时
The Six CAIO Questions
六个CAIO质询问题
1. What does this AI need to be good at, and how would you measure it?
1. 该AI需要擅长什么,你将如何衡量?
No eval set = no ship. Before any AI feature deploys, define the eval criteria.
- 50-100 representative inputs minimum
- Expected outputs OR rubric for grading
- Edge cases: ambiguous, adversarial, format-edge
- If you can't write down what "good" looks like, you don't have a feature; you have a vibe.
没有评估数据集 = 不能上线。 在部署任何AI功能前,需明确评估标准。
- 至少50-100个代表性输入样本
- 预期输出或评分规则
- 边缘案例:模糊输入、对抗性输入、格式边界案例
- 如果你无法明确写出“优秀”的定义,那你拥有的不是一个功能,只是一种感觉。
2. What's the SLO on hallucination / error rate, and what's the fallback?
2. 幻觉/错误率的服务水平目标(SLO)是什么,fallback方案是什么?
Every AI feature has a failure mode. Plan for it.
- Quantified SLO: "<5% hallucination on factual queries"
- Detection mechanism: monitoring, sampling, customer feedback loop
- Fallback: human-in-loop review, lower-risk default response, refuse-to-answer
- Blast radius if SLO breached: how many users affected, what is the cost?
每个AI功能都有故障模式。需提前规划应对方案。
- 量化的SLO:例如“事实查询的幻觉率<5%”
- 检测机制:监控、抽样、客户反馈循环
- Fallback方案:人工介入审核、低风险默认响应、拒绝回答
- SLO突破后的影响范围:受影响用户数量、损失成本?
3. What's the risk tier under EU AI Act, and is conformity assessment required?
3. 该AI在《欧盟AI法案》下的风险等级是什么,是否需要合规评估?
Run if any EU residents are affected OR domain is regulated.
ai_risk_classifier.py- PROHIBITED → cannot launch in EU; re-scope
- HIGH → conformity assessment + EU DB registration + 10 Articles of obligations (3-12 months, $50-200K)
- LIMITED → transparency obligations (chatbot disclosure, AI-generated content marking)
- MINIMAL → no specific obligations; NIST AI RMF voluntary
如果有欧盟居民受影响或涉及受监管领域,请运行。
ai_risk_classifier.py- 禁止类(PROHIBITED)→ 无法在欧盟上线;需重新规划范围
- 高风险类(HIGH)→ 需合规评估+欧盟数据库注册+10项义务条款(耗时3-12个月,成本5-20万美元)
- 有限风险类(LIMITED)→ 需履行透明度义务(聊天机器人披露、AI生成内容标记)
- 低风险类(MINIMAL)→ 无特定义务;可自愿遵循NIST AI风险管理框架(RMF)
4. API, fine-tune, or build?
4. 选择API、微调还是自研?
Run for the specific use case.
model_buildvsbuy_calculator.py- 80% of B2B SaaS use cases: API
- 15%: fine-tune (when domain-specific behavior + labeled data + ML team + high volume)
- <1%: build from scratch
- Decision must consider economic breakeven AND practical feasibility (data, team, compliance)
针对具体用例运行。
model_buildvsbuy_calculator.py- 80%的B2B SaaS用例:选择API
- 15%:选择微调(当需要特定领域行为、有标注数据、具备ML团队且业务量较高时)
- <1%:从零开始自研
- 决策需考虑经济盈亏平衡点和实际可行性(数据、团队、合规性)
5. What's the 12-month cost trajectory at expected scale?
5. 预期规模下的12个月成本趋势是什么?
Run for the workload.
ai_cost_economics.py- API: variable, scales linearly
- Self-hosted: mostly fixed, breakeven typically 1-10B tokens/month for 70B-class
- Hidden costs of self-hosted: ops, monitoring, model updates, capacity, failover, security
- Hidden costs of API: vendor lock-in, capability drift, rate limits, data residency
- Prompt caching is the most underrated lever; check provider support
针对工作负载运行。
ai_cost_economics.py- API:可变成本,线性增长
- 自托管:主要为固定成本,70B参数级模型通常在月处理10亿-100亿tokens时达到盈亏平衡
- 自托管的隐性成本:运维、监控、模型更新、容量规划、故障转移、安全
- API的隐性成本:供应商锁定、能力漂移、速率限制、数据驻留要求
- 提示缓存是最被低估的成本优化手段;请确认供应商是否支持
6. What role unblocks this — and have we hired prerequisites first?
6. 哪个角色能推进这项工作——我们是否已先招聘了必备人员?
Map AI capability to specific role. Founders confuse AI engineer / ML engineer / research scientist.
- AI engineer: applied + full-stack + prompts + evals + deployment (most startups need this)
- ML engineer: fine-tuning + retraining infra (only after platform engineer + labeled data)
- Research scientist: model invention (only if model IS the product)
- Don't hire research scientist as first AI hire — they need infrastructure to be productive
将AI能力与具体角色对应。创始人常混淆AI工程师/ML工程师/研究科学家。
- AI工程师:应用开发+全栈+提示工程+评估+部署(大多数初创企业需要这类人才)
- ML工程师:微调+重训基础设施(需在平台工程师和标注数据到位后再招聘)
- 研究科学家:模型研发(仅当模型本身就是产品时才需要)
- 不要将研究科学家作为首位AI招聘人员——他们需要基础设施才能高效工作
Workflow
工作流程
bash
undefinedbash
undefined1. Model selection check
1. 模型选择检查
python ../../../skills/chief-ai-officer-advisor/scripts/model_buildvsbuy_calculator.py use_case.json
python ../../../skills/chief-ai-officer-advisor/scripts/model_buildvsbuy_calculator.py use_case.json
2. Regulatory classification
2. 监管分类
python ../../../skills/chief-ai-officer-advisor/scripts/ai_risk_classifier.py use_case.json
python ../../../skills/chief-ai-officer-advisor/scripts/ai_risk_classifier.py use_case.json
3. Cost projection
3. 成本预测
python ../../../skills/chief-ai-officer-advisor/scripts/ai_cost_economics.py workload.json
undefinedpython ../../../skills/chief-ai-officer-advisor/scripts/ai_cost_economics.py workload.json
undefinedOutput Format
输出格式
markdown
undefinedmarkdown
undefinedCAIO Review: <plan>
CAIO审查:<plan>
Date: YYYY-MM-DD
日期: YYYY-MM-DD
The Decision Being Made
待决策事项
[one sentence — which CAIO decision: model selection | risk classification | economics | next hire]
[一句话说明——CAIO需决策的内容:模型选择 | 风险分类 | 经济性 | 下一轮招聘]
Eval Discipline
评估规范
- Eval set committed: yes/no
- SLO defined: <metric> < <threshold>
- Fallback behavior: <one line>
- 已承诺评估数据集:是/否
- 已定义SLO:<指标> < <阈值>
- Fallback行为:<一句话描述>
Model Selection (if applicable)
模型选择(如适用)
- Recommended: API / FINE_TUNE / BUILD
- 3-year TCO: $X (chosen path) vs $Y (alternatives)
- Breakeven: <volume>
- 推荐方案:API / FINE_TUNE / BUILD
- 3年总拥有成本(TCO):所选方案为$X,其他方案为$Y
- 盈亏平衡点:<业务量>
Risk Classification (if applicable)
风险分类(如适用)
- EU AI Act tier: PROHIBITED / HIGH / LIMITED / MINIMAL
- Conformity assessment required: yes/no
- US state triggers: [list]
- Required controls open: N
- 《欧盟AI法案》等级:PROHIBITED / HIGH / LIMITED / MINIMAL
- 是否需要合规评估:是/否
- 美国州级触发项:[列表]
- 待完成的必要控制措施数量:N
Cost Economics (if applicable)
成本经济性(如适用)
- Monthly cost at current volume: $X
- Breakeven for self-hosted migration: <volume>
- Migration cost if applicable: $X (3-6 months)
- 当前业务量下的月度成本:$X
- 迁移至自托管的盈亏平衡点:<业务量>
- 迁移成本(如适用):$X(耗时3-6个月)
Org (if applicable)
组织架构(如适用)
- Next hire: <role>
- Why this, not the alternative: <one line>
- Prerequisite hires in place: yes/no
- 下一轮招聘角色:<角色>
- 选择该角色而非其他的原因:<一句话描述>
- 必备前置人员已到位:是/否
Verdict
结论
🟢 SHIP | 🟡 SHARPEN | 🔴 BLOCK
🟢 上线 | 🟡 优化 | 🔴 阻止
Next Steps
下一步行动
[3 concrete actions]
undefined[3项具体行动]
undefinedRouting
路由
- — for any training-data implications
/cs:cdo-review - — for AI vendor contracts, output liability, training-data licensing
/cs:gc-review - — for prompt injection / jailbreak / training-data poisoning threat model
/cs:ciso-review - — for multi-year vendor or GPU commitment TCO
/cs:cfo-review - — for AI team hires (comp, ladder, leveling)
/cs:chro-review - — log the verdict
/cs:decide - — on multi-year AI commitments
/cs:freeze 60
- — 针对任何训练数据相关影响
/cs:cdo-review - — 针对AI供应商合同、输出责任、训练数据许可
/cs:gc-review - — 针对提示注入/越狱/训练数据投毒的威胁模型
/cs:ciso-review - — 针对多年期供应商或GPU投入的总拥有成本(TCO)
/cs:cfo-review - — 针对AI团队招聘(薪酬、晋升体系、职级评定)
/cs:chro-review - — 记录结论
/cs:decide - — 冻结多年期AI承诺
/cs:freeze 60
Related
相关内容
- Agent:
cs-caio-advisor - Skill:
chief-ai-officer-advisor - Adjacent: (training data rights, data strategy)
../../../skills/chief-data-officer-advisor/
Version: 1.0.0
- Agent:
cs-caio-advisor - Skill:
chief-ai-officer-advisor - 相关技能:(训练数据权限、数据策略)
../../../skills/chief-data-officer-advisor/
版本: 1.0.0