clinical-research

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

clinical-research

clinical-research

Prospective clinical study DESIGN: endpoints, sample size / power, and phase-gate feasibility. Every output is an estimate with stated assumptions routed to a named human owner. This skill never gives clinical advice as fact and never substitutes for a biostatistician or regulatory affairs.
前瞻性临床研究设计:终点指标、样本量/检验效能及阶段评审可行性。所有输出均为带有明确假设的估算值,并指定对应的负责人。本工具绝不会将临床建议作为事实输出,也不能替代生物统计学家或监管事务人员的工作。

Purpose

用途

R&D clinical teams, medical monitors, and biostatistics functions live at the moment between we-have-a-hypothesis and we-have-a-protocol-ready-for-submission. This skill structures three of the hardest design decisions:
Three deterministic tools:
  1. sample_size_estimator.py
    — Closed-form power / sample-size for two-arm means (Cohen's d), proportions (normal approximation), and survival (Schoenfeld events). Inflates for dropout. Prints an "ESTIMATE — confirm with a biostatistician" banner.
  2. endpoint_selector.py
    — Scores candidate endpoints across 5 weighted dimensions (clinical relevance, measurability, regulatory acceptance, sensitivity-to-change, burden) and classifies each as PRIMARY / KEY-SECONDARY / EXPLORATORY. Penalizes unvalidated surrogate endpoints.
  3. phase_gate_scorer.py
    — Scores a study plan 0-100 across recruitment feasibility, endpoint readiness, statistical power, operational complexity, and budget fit; returns GO / GO-WITH-CONDITIONS / REDESIGN / NO-GO plus the named owners who must sign.
研发临床团队、医学监查员和生物统计部门经常处于“已有研究假设”与“试验方案准备就绪待提交”之间的阶段。本工具可协助梳理三大核心设计决策:
三个确定性工具:
  1. sample_size_estimator.py
    — 针对双臂试验的均值(Cohen's d)、比例(正态近似)和生存分析(Schoenfeld事件数)场景,提供闭式检验效能/样本量计算。支持纳入脱落率调整。输出时会显示“估算值——请与生物统计学家确认”的提示栏。
  2. endpoint_selector.py
    — 从5个加权维度(临床相关性、可测量性、监管认可度、对变化的敏感性、负担)对候选终点指标进行评分,并将其分类为主要/关键次要/探索性。对未经验证的替代终点会进行扣分。
  3. phase_gate_scorer.py
    — 从招募可行性、终点指标就绪度、统计检验效能、操作复杂度和预算适配性五个维度对研究方案进行0-100分的评分;返回GO/有条件通过/重新设计/不通过的结论,同时列出必须签字的负责人。

When to use

使用场景

Invoke this skill when:
  • You are choosing a primary endpoint and need to defend it against surrogate-endpoint scrutiny.
  • You need a defensible first sample-size estimate for a protocol synopsis.
  • A study plan needs a feasibility read before a phase-gate review.
  • You are pressure-testing whether the planned enrollment is achievable given the eligible population and sites.
Do NOT use this skill to: prepare a regulatory submission or clinical evaluation report (use
ra-qm-team
), find or position a grant (use
research/grants
), design a live product A/B experiment (use
product-team/experiment-designer
), or replace a biostatistician's final sample-size justification.
在以下场景中调用本工具:
  • 你需要选择主要终点指标,并应对关于替代终点的审查。
  • 需要为试验方案大纲提供具有合理性的初步样本量估算。
  • 阶段评审前需要对研究方案进行可行性评估。
  • 需要测试计划招募人数在符合条件的人群和研究中心下是否可实现。
请勿在以下场景使用本工具:准备监管提交材料或临床评估报告(请使用
ra-qm-team
)、寻找或定位研究资助(请使用
research/grants
)、设计产品线上A/B实验(请使用
product-team/experiment-designer
)、替代生物统计学家最终的样本量论证。

Workflow

工作流程

  1. Draft the synopsis — Fill
    assets/protocol_synopsis_template.md
    (objectives, design, population, endpoints, statistical plan placeholder, owners-to-sign).
  2. Select the endpoint — Run
    endpoint_selector.py --input endpoints.json --profile {drug|device|biologic|diagnostic|digital-therapeutic}
    . Read the classification + surrogate flags. If >1 primary, plan multiplicity control.
  3. Estimate the sample size — Run
    sample_size_estimator.py --design {means|proportions|survival} ...
    . Trace the effect/difference/HR to a published or anchor-based source; inflate for dropout.
  4. Score feasibility — Run
    phase_gate_scorer.py --input study.json --profile <same> --phase {1|2|3|4}
    . Read the verdict + blockers + named owners.
  5. Route for sign-off — Assemble the synopsis + estimates into the gate packet. The packet is a recommendation; a biostatistician, medical monitor, and regulatory owner sign.
  1. 撰写方案大纲 — 填写
    assets/protocol_synopsis_template.md
    (研究目标、试验设计、研究人群、终点指标、统计计划占位符、签字负责人)。
  2. 选择终点指标 — 运行
    endpoint_selector.py --input endpoints.json --profile {drug|device|biologic|diagnostic|digital-therapeutic}
    。查看分类结果及替代标记。若存在多个主要终点指标,需规划多重性控制策略。
  3. 估算样本量 — 运行
    sample_size_estimator.py --design {means|proportions|survival} ...
    。将效应量/差值/风险比追溯至已发表文献或锚定来源;纳入脱落率进行调整。
  4. 可行性评分 — 运行
    phase_gate_scorer.py --input study.json --profile <same> --phase {1|2|3|4}
    。查看结论、障碍点及指定负责人。
  5. 提交签字 — 将方案大纲与估算结果整理成评审包。该评审包仅为建议文件;需由生物统计学家、医学监查员和监管负责人签字确认。

Scripts

脚本说明

ScriptPurposeProfiles
scripts/sample_size_estimator.py
Power / sample-size for means, proportions, survivaln/a (design-driven)
scripts/endpoint_selector.py
5-dimension endpoint scoring + classification + surrogate flagdrug, device, biologic, diagnostic, digital-therapeutic
scripts/phase_gate_scorer.py
Feasibility 0-100 + GO/GO-WITH-CONDITIONS/REDESIGN/NO-GO + ownersdrug, device, biologic, diagnostic, digital-therapeutic
All three: stdlib-only,
--help
,
--sample
,
--output {human,json}
.
脚本用途适用场景
scripts/sample_size_estimator.py
针对均值、比例、生存分析场景计算检验效能/样本量无(由试验设计驱动)
scripts/endpoint_selector.py
基于5维度对终点指标评分、分类并标记替代终点drug, device, biologic, diagnostic, digital-therapeutic
scripts/phase_gate_scorer.py
可行性0-100分评分 + GO/有条件通过/重新设计/不通过结论 + 指定负责人drug, device, biologic, diagnostic, digital-therapeutic
所有脚本:仅依赖标准库,支持
--help
--sample
--output {human,json}
参数。

Onboarding & customization

入门与自定义配置

Run the onboarding questionnaire once before you start — it captures your defaults and named owners so every tool in this skill is pre-configured. Customization is the point: the answers actually change tool behavior.
bash
python3 scripts/onboard.py            # interactive (also: --defaults, --set key=value, --reset)
python3 scripts/onboard.py --show     # see the questions + current effective config
Answers are saved to
~/.config/research-ops/clinical-research.json
(global) or
./.research-ops/clinical-research.json
(
--scope project
) and are read automatically by
config_loader.py
. They set the default development-area profile, default alpha / power / dropout, and the named biostatistician / medical monitor / regulatory owner printed on outputs. CLI flags always override saved config;
RESEARCH_OPS_NO_CONFIG=1
ignores it entirely.
The seven questions: development area · alpha · power · dropout · biostatistician · medical monitor · regulatory owner.
开始使用前需运行一次入门问卷——问卷会收集默认配置和指定负责人信息,使本工具的所有功能都预先配置完成。自定义是核心特性:问卷答案会实际改变工具行为。
bash
python3 scripts/onboard.py            # 交互式模式(也支持:--defaults, --set key=value, --reset)
python3 scripts/onboard.py --show     # 查看问卷问题及当前生效配置
问卷答案会保存至
~/.config/research-ops/clinical-research.json
(全局配置)或
./.research-ops/clinical-research.json
--scope project
项目级配置),并由
config_loader.py
自动读取。这些配置会设置默认研发领域场景、默认检验水准/检验效能/脱落率,以及输出时显示的指定生物统计学家/医学监查员/监管负责人。命令行参数始终会覆盖保存的配置;设置
RESEARCH_OPS_NO_CONFIG=1
可完全忽略配置文件。
七个问卷问题:研发领域 · 检验水准 · 检验效能 · 脱落率 · 生物统计学家 · 医学监查员 · 监管负责人。

Optimize with autoresearch (opt-in)

可选:结合autoresearch优化

This skill ships an isolated, opt-in bridge to
engineering/autoresearch-agent
. Only when you ask to "optimize" / "run a loop" does an autoresearch experiment iteratively improve a study plan against this skill's own feasibility score.
scripts/ar_evaluator.py
is the ground-truth evaluator; it prints
feasibility_composite: <0-100>
(higher is better).
bash
/ar:setup --domain custom --name trial-feasibility \
  --target study.json \
  --eval "python3 ar_evaluator.py --target study.json" \
  --metric feasibility_composite --direction higher
/ar:loop custom/trial-feasibility
Isolated: no hard dependency — autoresearch runs only on demand, and the loop edits
study.json
, never the evaluator (locked ground truth).
本工具附带一个独立、可选
engineering/autoresearch-agent
桥接功能。仅当你要求“优化”或“运行循环”时,autoresearch实验才会基于本工具的可行性评分,迭代改进研究方案。
scripts/ar_evaluator.py
是真值评估器,会输出
feasibility_composite: <0-100>
(分数越高越好)。
bash
/ar:setup --domain custom --name trial-feasibility \
  --target study.json \
  --eval "python3 ar_evaluator.py --target study.json" \
  --metric feasibility_composite --direction higher
/ar:loop custom/trial-feasibility
独立性:无强依赖——autoresearch仅按需运行,且循环仅修改
study.json
,不会改动评估器(锁定真值)。

References

参考资料

  • references/study_design_canon.md
    — ICH E8(R1) general considerations; ICH E9 + E9(R1) estimand addendum; CONSORT 2010; SPIRIT 2013; FDA Multiple Endpoints guidance (2022).
  • references/endpoint_and_power.md
    — Cohen Statistical Power Analysis; Schoenfeld (1983) survival sample size; FDA Surrogate Endpoint Table / BEST glossary; FDA PRO guidance (2009); Chow, Shao & Wang Sample Size Calculations in Clinical Research.
  • references/trial_operations.md
    — ICH E6(R2/R3) GCP; TransCelerate risk-based monitoring; FDA RBM guidance; CTTI recruitment best practices; site-feasibility scoring literature.
  • references/study_design_canon.md
    — ICH E8(R1)通用考量;ICH E9 + E9(R1) estimand补充说明;CONSORT 2010;SPIRIT 2013;FDA《多重终点指南》(2022)。
  • references/endpoint_and_power.md
    — Cohen《统计检验效能分析》;Schoenfeld(1983)生存分析样本量计算;FDA替代终点表/BEST术语表;FDA《患者报告结局指南》(2009);Chow, Shao & Wang《临床研究中的样本量计算》。
  • references/trial_operations.md
    — ICH E6(R2/R3) GCP;TransCelerate基于风险的监查;FDA RBM指南;CTTI招募最佳实践;研究中心可行性评分文献。

Assumptions

假设前提

  • Sample-size formulas use normal approximations with a built-in z-table. They are first-pass estimates; a biostatistician produces the final justification (and may use simulation, adaptive designs, or exact methods).
  • The endpoint scorer applies customary regulatory priors per development area via
    --profile
    . Company- or indication-specific precedent overrides the prior.
  • The phase-gate scorer bakes in a profile cost-per-patient benchmark; pass a real budget to override the default.
  • An unvalidated surrogate cannot anchor a PRIMARY endpoint — the scorer enforces this with a penalty.
  • 样本量公式采用正态近似及内置z表。这些均为初步估算值;最终的样本量论证需由生物统计学家完成(可能会使用模拟、自适应设计或精确方法)。
  • 终点指标评分器通过
    --profile
    参数,根据研发领域应用常规监管先验规则。公司或适应症特定的先例可覆盖该先验规则。
  • 阶段评审评分器内置了基于场景的每患者成本基准;可传入实际预算覆盖默认值。
  • 未经验证的替代终点不能作为主要终点指标——评分器会对此进行扣分强制约束。

Anti-patterns

反模式

  • Presenting a power estimate as fact. Every output is an estimate with a named owner who must sign.
  • Powering for a convenience effect size. The effect must trace to a published or anchor-based MCID, not to the n you can afford.
  • Anchoring a primary on an unvalidated surrogate. Surrogate endpoints need validation evidence for the indication.
  • Ignoring multiplicity. More than one primary endpoint requires pre-specified alpha allocation.
  • Skipping dropout inflation. Raw n undersizes the study; inflate by 1/(1 − dropout).
  • 将检验效能估算值作为事实呈现。所有输出均为估算值,需由指定负责人签字确认。
  • 基于便捷效应量计算检验效能。效应量必须追溯至已发表或锚定的最小临床重要差异(MCID),而非基于你能负担的样本量。
  • 将未经验证的替代终点作为主要终点。替代终点需有针对该适应症的验证证据。
  • 忽略多重性问题。若存在多个主要终点指标,需预先指定检验水准分配方案。
  • 未纳入脱落率调整。原始样本量会使研究规模不足;需按1/(1 − 脱落率)进行调整。

Distinct from

与其他工具的区别

Sibling / neighborScopeDifference
ra-qm-team
ISO 13485 QMS, ISO 14971 risk, EU MDR tech docs + clinical evaluation, FDA 510(k)/PMA/De Novo/QSR submissionThat is the submission; clinical-research designs the study beforehand
research/grants
NIH funding discovery + positioningThat finds funding; this designs the trial
product-team/experiment-designer
Live product A/B hypothesis + sample sizeThat is a product experiment; this is a clinical trial
research-finance
(sibling)
R&D program budget + burnThat funds the program; this scopes the study
关联工具范围差异
ra-qm-team
ISO 13485质量管理体系、ISO 14971风险管理、EU MDR技术文档+临床评估、FDA 510(k)/PMA/De Novo/QSR提交该工具负责提交环节;clinical-research专注于提交前的研究设计
research/grants
NIH资助机会发掘与定位该工具负责寻找资助;本工具负责设计临床试验
product-team/experiment-designer
产品线上A/B实验假设+样本量计算该工具针对产品实验;本工具针对临床试验
research-finance
(关联工具)
研发项目预算+消耗管理该工具负责项目资助;本工具负责研究范围规划

Quick examples

快速示例

bash
python3 scripts/sample_size_estimator.py --sample
python3 scripts/sample_size_estimator.py --design proportions --p1 0.30 --p2 0.45 --dropout 0.15
python3 scripts/endpoint_selector.py --sample
python3 scripts/phase_gate_scorer.py --sample --output json
The sample correctly flags an unvalidated serum-cytokine surrogate (cannot be primary) and ranks PASI-75 as the PRIMARY endpoint; the phase-gate sample returns a verdict with a named owner chain.
bash
python3 scripts/sample_size_estimator.py --sample
python3 scripts/sample_size_estimator.py --design proportions --p1 0.30 --p2 0.45 --dropout 0.15
python3 scripts/endpoint_selector.py --sample
python3 scripts/phase_gate_scorer.py --sample --output json
示例会正确标记未经验证的血清细胞因子替代终点(不能作为主要终点),并将PASI-75列为主要终点;阶段评审示例会返回带有指定负责人链的结论。

Forcing-question library (Matt Pocock grill discipline)

强制问题库(Matt Pocock严谨审查机制)

Walked one at a time by
/cs:grill-research-ops
or the orchestrator. Recommended answer + canon citation per question. Never bundled.
  1. "Is your primary endpoint a clinical outcome or a surrogate — and if surrogate, is it on FDA's validated table?" Recommended: clinical outcome unless the surrogate is validated for this indication. Canon: FDA Surrogate Endpoint Table; BEST (Biomarkers, EndpointS, and other Tools) glossary.
  2. "What's the minimal clinically important difference you're powering for — and where did that number come from?" Recommended: a published or anchor-based MCID, cited; never a convenience effect size. Canon: ICH E9; Cohen Statistical Power Analysis.
  3. "What dropout rate are you assuming, and is the sample size inflated for it?" Recommended: inflate n by 1/(1 − dropout) using a justified rate. Canon: Chow, Shao & Wang; ICH E9(R1).
  4. "Single primary endpoint or multiple — and if multiple, what's the multiplicity control?" Recommended: pre-specify alpha allocation (hierarchical / Bonferroni). Canon: FDA Multiple Endpoints guidance (2022).
  5. "Who is the named biostatistician / medical monitor / regulatory owner signing this synopsis?" Recommended: name them now — this output is a recommendation, not a protocol. Canon: ICH E6(R2) GCP roles & responsibilities.
Walk depth-first. Lock 1-2 before opening 3-5. After all are answered, invoke
endpoint_selector.py
sample_size_estimator.py
phase_gate_scorer.py
.
/cs:grill-research-ops
或编排器逐一提出。每个问题均提供推荐答案及规范引用。问题不会批量提出。
  1. “你的主要终点是临床结局还是替代终点——如果是替代终点,是否在FDA的验证列表中?” 推荐答案:除非替代终点针对该适应症已验证,否则优先选择临床结局。 规范引用:FDA替代终点表;BEST(Biomarkers, EndpointS, and other Tools)术语表。
  2. “你计算检验效能所基于的最小临床重要差异是什么——该数值来源何处?” 推荐答案:引用已发表或锚定的MCID;绝不能使用便捷效应量。 规范引用:ICH E9;Cohen《统计检验效能分析》。
  3. “你假设的脱落率是多少,样本量是否已纳入该调整?” 推荐答案:使用合理的脱落率,按1/(1 − 脱落率)调整样本量。 规范引用:Chow, Shao & Wang;ICH E9(R1)。
  4. “是单一主要终点还是多个——如果是多个,多重性控制策略是什么?” 推荐答案:预先指定检验水准分配方案(分层/Bonferroni法)。 规范引用:FDA《多重终点指南》(2022)。
  5. “谁是签署该方案大纲的指定生物统计学家/医学监查员/监管负责人?” 推荐答案:现在明确姓名——本输出仅为建议文件,并非最终试验方案。 规范引用:ICH E6(R2) GCP角色与职责。
按深度优先顺序逐一回答。回答完1-2题后才能开启3-5题。所有问题回答完毕后,依次调用
endpoint_selector.py
sample_size_estimator.py
phase_gate_scorer.py