caio-review

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

/cs:caio-review — CAIO Forcing Questions

/cs:caio-review — CAIO强制质询

Command:

/cs:caio-review <plan>

The eval-demanding CAIO pressure-tests any plan that involves AI. Six questions before any AI feature ships, any multi-year vendor commitment, or any AI team expansion.

命令：

/cs:caio-review <plan>

要求评估的CAIO会对任何涉及AI的计划进行压力测试。在任何AI功能上线、签署多年期供应商协议或AI团队扩张前，需回答以下六个问题。

When to Run

适用场景

Before shipping any new AI-powered feature
Before signing a multi-year AI vendor contract (API or self-hosted infra)
Before EU launch of any AI feature
Before a major AI team hire (especially ML engineer or research scientist)
Before a fine-tuning project commitment
Before adopting AI in a regulated domain (employment, credit, healthcare, education, etc.)
When the founder uses the word "AI" near "competitive advantage" or "moat"

任何新AI驱动功能上线前
签署多年期AI供应商合同（API或自托管基础设施）前
任何AI功能在欧盟上线前
招聘核心AI团队成员前（尤其是ML工程师或研究科学家）
承诺微调项目前
在受监管领域（就业、信贷、医疗、教育等）采用AI前
创始人将“AI”与“竞争优势”或“护城河”关联提及之时

The Six CAIO Questions

六个CAIO质询问题

1. What does this AI need to be good at, and how would you measure it?

1. 该AI需要擅长什么，你将如何衡量？

No eval set = no ship. Before any AI feature deploys, define the eval criteria.

50-100 representative inputs minimum
Expected outputs OR rubric for grading
Edge cases: ambiguous, adversarial, format-edge
If you can't write down what "good" looks like, you don't have a feature; you have a vibe.

没有评估数据集 = 不能上线。 在部署任何AI功能前，需明确评估标准。

至少50-100个代表性输入样本
预期输出或评分规则
边缘案例：模糊输入、对抗性输入、格式边界案例
如果你无法明确写出“优秀”的定义，那你拥有的不是一个功能，只是一种感觉。

2. What's the SLO on hallucination / error rate, and what's the fallback?

2. 幻觉/错误率的服务水平目标（SLO）是什么，fallback方案是什么？

Every AI feature has a failure mode. Plan for it.

Quantified SLO: "<5% hallucination on factual queries"
Detection mechanism: monitoring, sampling, customer feedback loop
Fallback: human-in-loop review, lower-risk default response, refuse-to-answer
Blast radius if SLO breached: how many users affected, what is the cost?

每个AI功能都有故障模式。需提前规划应对方案。

量化的SLO：例如“事实查询的幻觉率<5%”
检测机制：监控、抽样、客户反馈循环
Fallback方案：人工介入审核、低风险默认响应、拒绝回答
SLO突破后的影响范围：受影响用户数量、损失成本？

3. What's the risk tier under EU AI Act, and is conformity assessment required?

3. 该AI在《欧盟AI法案》下的风险等级是什么，是否需要合规评估？

Run
ai_risk_classifier.py
if any EU residents are affected OR domain is regulated.

PROHIBITED → cannot launch in EU; re-scope
HIGH → conformity assessment + EU DB registration + 10 Articles of obligations (3-12 months, $50-200K)
LIMITED → transparency obligations (chatbot disclosure, AI-generated content marking)
MINIMAL → no specific obligations; NIST AI RMF voluntary

如果有欧盟居民受影响或涉及受监管领域，请运行
ai_risk_classifier.py
。

禁止类（PROHIBITED）→ 无法在欧盟上线；需重新规划范围
高风险类（HIGH）→ 需合规评估+欧盟数据库注册+10项义务条款（耗时3-12个月，成本5-20万美元）
有限风险类（LIMITED）→ 需履行透明度义务（聊天机器人披露、AI生成内容标记）
低风险类（MINIMAL）→ 无特定义务；可自愿遵循NIST AI风险管理框架（RMF）

4. API, fine-tune, or build?

4. 选择API、微调还是自研？

Run
model_buildvsbuy_calculator.py
for the specific use case.

80% of B2B SaaS use cases: API
15%: fine-tune (when domain-specific behavior + labeled data + ML team + high volume)
<1%: build from scratch
Decision must consider economic breakeven AND practical feasibility (data, team, compliance)

针对具体用例运行
model_buildvsbuy_calculator.py
。

80%的B2B SaaS用例：选择API
15%：选择微调（当需要特定领域行为、有标注数据、具备ML团队且业务量较高时）
<1%：从零开始自研
决策需考虑经济盈亏平衡点和实际可行性（数据、团队、合规性）

5. What's the 12-month cost trajectory at expected scale?

5. 预期规模下的12个月成本趋势是什么？

Run
ai_cost_economics.py
for the workload.

API: variable, scales linearly
Self-hosted: mostly fixed, breakeven typically 1-10B tokens/month for 70B-class
Hidden costs of self-hosted: ops, monitoring, model updates, capacity, failover, security
Hidden costs of API: vendor lock-in, capability drift, rate limits, data residency
Prompt caching is the most underrated lever; check provider support

针对工作负载运行
ai_cost_economics.py
。

API：可变成本，线性增长
自托管：主要为固定成本，70B参数级模型通常在月处理10亿-100亿tokens时达到盈亏平衡
自托管的隐性成本：运维、监控、模型更新、容量规划、故障转移、安全
API的隐性成本：供应商锁定、能力漂移、速率限制、数据驻留要求
提示缓存是最被低估的成本优化手段；请确认供应商是否支持

6. What role unblocks this — and have we hired prerequisites first?

6. 哪个角色能推进这项工作——我们是否已先招聘了必备人员？

Map AI capability to specific role. Founders confuse AI engineer / ML engineer / research scientist.

AI engineer: applied + full-stack + prompts + evals + deployment (most startups need this)
ML engineer: fine-tuning + retraining infra (only after platform engineer + labeled data)
Research scientist: model invention (only if model IS the product)
Don't hire research scientist as first AI hire — they need infrastructure to be productive

将AI能力与具体角色对应。创始人常混淆AI工程师/ML工程师/研究科学家。

AI工程师：应用开发+全栈+提示工程+评估+部署（大多数初创企业需要这类人才）
ML工程师：微调+重训基础设施（需在平台工程师和标注数据到位后再招聘）
研究科学家：模型研发（仅当模型本身就是产品时才需要）
不要将研究科学家作为首位AI招聘人员——他们需要基础设施才能高效工作

Workflow

工作流程

bash

undefined

bash

undefined

1. Model selection check

1. 模型选择检查

python ../../../skills/chief-ai-officer-advisor/scripts/model_buildvsbuy_calculator.py use_case.json

2. Regulatory classification

2. 监管分类

python ../../../skills/chief-ai-officer-advisor/scripts/ai_risk_classifier.py use_case.json

3. Cost projection

3. 成本预测

python ../../../skills/chief-ai-officer-advisor/scripts/ai_cost_economics.py workload.json

undefined

python ../../../skills/chief-ai-officer-advisor/scripts/ai_cost_economics.py workload.json

undefined

Output Format

输出格式

markdown

undefined

markdown

undefined

CAIO Review: <plan>

CAIO审查：<plan>

Date: YYYY-MM-DD

日期： YYYY-MM-DD

The Decision Being Made

待决策事项

[one sentence — which CAIO decision: model selection | risk classification | economics | next hire]

[一句话说明——CAIO需决策的内容：模型选择 | 风险分类 | 经济性 | 下一轮招聘]

Eval Discipline

评估规范

Eval set committed: yes/no
SLO defined: <metric> < <threshold>
Fallback behavior: <one line>

已承诺评估数据集：是/否
已定义SLO：<指标> < <阈值>
Fallback行为：<一句话描述>

Model Selection (if applicable)

模型选择（如适用）

Recommended: API / FINE_TUNE / BUILD
3-year TCO: $X (chosen path) vs $Y (alternatives)
Breakeven: <volume>

推荐方案：API / FINE_TUNE / BUILD
3年总拥有成本（TCO）：所选方案为$X，其他方案为$Y
盈亏平衡点：<业务量>

Risk Classification (if applicable)

风险分类（如适用）

EU AI Act tier: PROHIBITED / HIGH / LIMITED / MINIMAL
Conformity assessment required: yes/no
US state triggers: [list]
Required controls open: N

《欧盟AI法案》等级：PROHIBITED / HIGH / LIMITED / MINIMAL
是否需要合规评估：是/否
美国州级触发项：[列表]
待完成的必要控制措施数量：N

Cost Economics (if applicable)

成本经济性（如适用）

Monthly cost at current volume: $X
Breakeven for self-hosted migration: <volume>
Migration cost if applicable: $X (3-6 months)

当前业务量下的月度成本：$X
迁移至自托管的盈亏平衡点：<业务量>
迁移成本（如适用）：$X（耗时3-6个月）

Org (if applicable)

组织架构（如适用）

Next hire: <role>
Why this, not the alternative: <one line>
Prerequisite hires in place: yes/no

下一轮招聘角色：<角色>
选择该角色而非其他的原因：<一句话描述>
必备前置人员已到位：是/否

Verdict

结论

🟢 SHIP | 🟡 SHARPEN | 🔴 BLOCK

🟢 上线 | 🟡 优化 | 🔴 阻止

Next Steps

下一步行动

[3 concrete actions]

undefined

[3项具体行动]

undefined

Routing

路由

```
/cs:cdo-review
```
— for any training-data implications
```
/cs:gc-review
```
— for AI vendor contracts, output liability, training-data licensing
```
/cs:ciso-review
```
— for prompt injection / jailbreak / training-data poisoning threat model
```
/cs:cfo-review
```
— for multi-year vendor or GPU commitment TCO
```
/cs:chro-review
```
— for AI team hires (comp, ladder, leveling)
```
/cs:decide
```
— log the verdict
```
/cs:freeze 60
```
— on multi-year AI commitments

```
/cs:cdo-review
```
— 针对任何训练数据相关影响
```
/cs:gc-review
```
— 针对AI供应商合同、输出责任、训练数据许可
```
/cs:ciso-review
```
— 针对提示注入/越狱/训练数据投毒的威胁模型
```
/cs:cfo-review
```
— 针对多年期供应商或GPU投入的总拥有成本（TCO）
```
/cs:chro-review
```
— 针对AI团队招聘（薪酬、晋升体系、职级评定）
```
/cs:decide
```
— 记录结论
```
/cs:freeze 60
```
— 冻结多年期AI承诺