digital-health-clinical-asr-build

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Clinical ASR Flywheel — Stage 2 (Build the benchmark)

临床ASR飞轮——第2阶段（构建基准测试集）

⚠ Agent: read this entire SKILL.md before answering. This stage is conversational and gated. Specifically: ask the user 1–2 specialty-aware clarifying questions before proposing terms (Step 2a), walk them through the two-tier IPA pipeline (override → merriam-webster → magpie_g2p) in Step 2c, hit the explicit QA-mode audition gate in Step 2d before full Cartesian synthesis, and name KER as the headline metric they'll see in Stage 3. Skipping any of these defeats the methodology.

You are the curate-and-synthesize stage. The user arrives from

/digital-health-clinical-asr-setup

and leaves with a NeMo-format

manifest.jsonl

plus the audio it references — both ready for scoring at

/digital-health-clinical-asr-eval

Be conversational. This is the warmest, most domain-aware step in the flywheel: you're asking a clinician (or someone who works with them) which terms hurt today and shaping a benchmark around their reality. Ask short, focused questions. Show the user what's being added. Don't lecture.

⚠ Agent：请在作答前完整阅读本SKILL.md。 本阶段采用对话式流程且设有准入机制。具体要求：在提出术语前（步骤2a），向用户询问1-2个专科相关的澄清问题；在步骤2c中，向用户讲解双层IPA处理流程（优先级：自定义覆盖 → Merriam-Webster → magpie_g2p）；在进行全笛卡尔积合成前，需通过步骤2d中的显式QA模式审核关卡；并告知用户第3阶段将以KER作为核心指标。跳过任何步骤都会破坏方法论的有效性。

你处于术语整理与合成阶段。用户从

/digital-health-clinical-asr-setup

进入本阶段，离开时将获得NeMo格式的

manifest.jsonl

及其对应的音频文件——两者均可直接用于

/digital-health-clinical-asr-eval

的评分环节。

保持对话风格。这是飞轮流程中最贴近业务场景、最具领域感知的步骤：你需要询问临床医生（或相关从业者）当前遇到的术语识别痛点，并围绕实际场景构建基准测试集。提问要简短聚焦，向用户展示正在添加的内容，避免说教。

Data leaves your environment — disclose this to the user before any term is sent

数据将离开你的环境——在发送任何术语前告知用户

This stage transmits user-curated content to two external services. Surface this to the user before invoking either call:

Service	What gets sent	When
Merriam-Webster ( `dictionaryapi.com` API or `merriam-webster.com` public site)	One HTTP request per term in the seed list — term goes in URL path	Step 2c — see MW path bullets below
NVIDIA NVCF Magpie TTS ( `grpc.nvcf.nvidia.com` )	Each generated clinical sentence (text, plus any SSML IPA wrappers)	Steps 2d and 2e, every synthesis call

Both endpoints expect non-PHI synthetic content — the term list you curate, the sentences

/data-designer

(or your fallback templates) generates from it. Do not pass real patient records, real ASR transcripts, or any PHI through this skill. If the term list itself is sensitive (proprietary drug codenames, unreleased product names, customer-confidential indications), confirm with the user that external-API transmission is acceptable under their organization's data-governance policy before proceeding.

If no MW transmission is acceptable: take Path C below (skip MW; pipeline falls through to Magpie G2P with reduced coverage on long-tail terms).

本阶段会将用户整理的内容传输至两个外部服务。在调用任一服务前，需向用户明确说明：

服务	传输内容	时机
Merriam-Webster（ `dictionaryapi.com` API 或 `merriam-webster.com` 公共站点）	种子列表中每个术语对应一次HTTP请求——术语放在URL路径中	步骤2c——参见下方MW路径说明
NVIDIA NVCF Magpie TTS（ `grpc.nvcf.nvidia.com` ）	每个生成的临床句子（文本及所有SSML IPA包装）	步骤2d和2e的每次合成调用

两个端点均要求传输非PHI的合成内容——即你整理的术语列表、由

/data-designer

（或备用模板）生成的句子。请勿通过本Skill传输真实患者记录、真实ASR转录文本或任何PHI数据。若术语列表本身涉及敏感内容（如专有药物代号、未发布产品名称、客户保密适应症），需先确认用户所在组织的数据治理政策允许向外部API传输此类内容，再继续操作。

若不允许使用Merriam-Webster传输：选择下方路径C（跳过MW；流程自动 fallback 到Magpie G2P，但长尾术语的覆盖范围会降低）。

Purpose

目标

Curate a clinical-specialty term list, generate eval audio for it through Magpie TTS with a two-tier IPA pipeline, and write a NeMo-format manifest tagged with the clinical-extension fields (

term

entity_category

ipa_source

voice_id

noise_level

context_type

). The output is the input to Stage 3.

By the end the user has:

$EVAL_DIR/cycle<N>/
├── audio/<slug>.wav        synthesized clips
├── manifest.jsonl          NeMo format + clinical extension
├── term_seed.csv           the curated input
└── pronunciation_overrides.csv   appendable across cycles

(

$EVAL_DIR

is the user's own choice — this skill does not impose a layout. The structure above is a recommendation, not a requirement.)

整理临床专科术语列表，通过带有双层IPA流程的Magpie TTS生成评估音频，并生成带有临床扩展字段（

term

、

entity_category

、

ipa_source

、

voice_id

、

noise_level

、

context_type

）的NeMo格式清单。输出结果将作为第3阶段的输入。

完成后，用户将获得以下内容：

$EVAL_DIR/cycle<N>/
├── audio/<slug>.wav        合成音频片段
├── manifest.jsonl          NeMo格式 + 临床扩展字段
├── term_seed.csv           整理后的输入术语
└── pronunciation_overrides.csv   可跨周期追加的发音覆盖文件

（

$EVAL_DIR

由用户自行选择——本Skill不强制目录结构。上述结构为推荐方案，非硬性要求。）

When to use this skill

何时使用本Skill

Activate on user phrases like:

"Build a clinical ASR benchmark"
"Curate drug names / procedure names for ASR eval"
"Generate eval audio for medical terms"
"Create a NeMo manifest from clinical terms"
"Add oncology / cardiology / ortho terms to my benchmark"
"Audition the TTS pronunciation for these drug names"
"Make me a cycle-N manifest"

Do not activate when (also: if the message mentions

auth

API key

gRPC

streaming

riva-build

NIM deploy

NGC

, or

Docker

, route per the bullets below and stop):

The user already has a manifest and wants to score it →
```
/digital-health-clinical-asr-eval
```
The user wants to fine-tune on an existing manifest →
```
/digital-health-clinical-asr-finetune
```
The user is asking generic TTS / SSML / voice-cloning / voice-catalog questions →
```
/read-aloud
```
(or
```
/riva-tts
```
)
TTS/ASR auth / API keys / gRPC / streaming →
```
/riva-tts
```
or
```
/riva-asr
```

NIM deploy or

riva-build

riva-deploy

flags →

/riva-asr-custom

/riva-tts-custom

NGC / Docker / NVIDIA Container Toolkit →
```
/riva-nim-setup
```
The user is asking generic synthetic-data questions →
```
/data-designer
```

当用户提出以下类似需求时激活：

"构建临床ASR基准测试集"
"整理药物名称/手术名称用于ASR评估"
"为医学术语生成评估音频"
"从临床术语创建NeMo清单"
"在我的基准测试集中添加肿瘤/心血管/骨科术语"
"试听这些药物名称的TTS发音"
"帮我生成cycle-N清单"

请勿激活的场景（此外：若消息中提及

auth

、

API key

、

gRPC

、

streaming

、

riva-build

、

NIM deploy

、

NGC

或

Docker

，请按以下指引路由并停止操作）：

用户已有清单并想要评分 →
```
/digital-health-clinical-asr-eval
```
用户想要基于现有清单进行微调 →
```
/digital-health-clinical-asr-finetune
```
用户询问通用TTS/SSML/语音克隆/语音库相关问题 →
```
/read-aloud
```
（或
```
/riva-tts
```
）
TTS/ASR 认证/API密钥/gRPC/流式传输相关问题 →
```
/riva-tts
```
或
```
/riva-asr
```

NIM部署或

riva-build

riva-deploy

参数相关问题 →

/riva-asr-custom

或

/riva-tts-custom

NGC/Docker/NVIDIA容器工具包相关问题 →
```
/riva-nim-setup
```
用户询问通用合成数据相关问题 →
```
/data-designer
```

Prerequisites

前置条件

/digital-health-clinical-asr-setup
completed —
```
NVIDIA_API_KEY
```
exported, Python deps installed, the six upstream skills confirmed.
/read-aloud
(or
```
/riva-tts
```
) reachable. Hosted Magpie via NVCF is the default. Self-hosted Magpie NIM works but adds
```
/riva-nim-setup
```
to the prerequisite chain.
/data-designer
reachable. Template fallback is acceptable for a first cycle if
```
/data-designer
```
is unavailable, but tag those rows so future cycles can re-generate.
A working directory the user owns. The skill recommends
```
$EVAL_DIR/cycle<N>/
```
but does not enforce it.

已完成
/digital-health-clinical-asr-setup
——已导出
```
NVIDIA_API_KEY
```
，安装Python依赖，确认六个上游Skill可用。
/read-aloud
（或
```
/riva-tts
```
）可访问。默认使用NVCF托管的Magpie。自托管Magpie NIM也可使用，但需额外完成
```
/riva-nim-setup
```
前置流程。
**
```
/data-designer
```
**可访问。若
```
/data-designer
```
不可用，首次周期可使用模板备用方案，但需为这些行添加标签以便后续周期重新生成。
用户拥有一个工作目录。本Skill推荐使用
```
$EVAL_DIR/cycle<N>/
```
，但不强制要求。

Instructions

操作步骤

2a. Specialty interview →

term_seed.csv

2a. 专科访谈 →

term_seed.csv

Ask one question at a time. The goal is to surface 4–10 candidate terms with the right

entity_category

, not to write a textbook.

Questions, in order:

What specialty / workflow is this for? (oncology dictation, ICU handoff, psych intake, ortho post-op, …)
What ASR failure modes have you seen? — drug names, multi-word procedures, abbreviations, compound conditions.
Which terms come up daily vs which are the hard ones? — daily-common terms become the sanity baseline; daily-hard terms become the signal.

Propose 4–10 candidate terms with

entity_category

. Confirm with the user before writing. Then write

term_seed.csv

csv

term,entity_category
cefazolin,drug
acetabular reamer,procedure
tibial plateau,anatomy
femoroacetabular impingement,condition
hemoglobin a1c,lab
respiratory therapist,role

The category vocabulary is fixed. KER keys off it. Allowed values:

drug | procedure | anatomy | condition | lab | role

If the user proposes a new category, push back: either it maps to one of the six, or the methodology needs a deliberate extension (which is a future cycle's job, not a one-off ad-hoc add).

一次只问一个问题。目标是筛选出4-10个带有正确

entity_category

的候选术语，而非撰写专业教材。

提问顺序：

这是针对哪个专科/工作流程的？（如肿瘤口述、ICU交接班、精神科接诊、骨科术后随访等）
你遇到过哪些ASR识别失败的情况？——如药物名称、多词手术名称、缩写、复合病症。
哪些是日常高频术语，哪些是识别难度高的术语？——日常高频术语作为 sanity 基线；识别难度高的术语作为核心测试信号。

提出4-10个带有

entity_category

的候选术语，经用户确认后写入

term_seed.csv

：

csv

term,entity_category
cefazolin,drug
acetabular reamer,procedure
tibial plateau,anatomy
femoroacetabular impingement,condition
hemoglobin a1c,lab
respiratory therapist,role

分类词汇是固定的。KER指标依赖该分类。允许的值为：

drug | procedure | anatomy | condition | lab | role

若用户提出新分类，请说明：要么可映射到上述六个分类之一，要么需要对方法论进行针对性扩展（这属于后续周期的工作，而非临时添加）。

2b. Sentence generation via

/data-designer

2b. 通过

/data-designer

生成句子

Brief

/data-designer

with:

For each row in
term_seed.csv
, generate one or more natural English sentences embedding
term
in a way that fits the row's
entity_category
. Output schema:
{term, entity_category, sentence, context_type}
. Generate 3–5
context_type
variants per term. Initial
context_type
vocabulary:
dictation
,
handoff
,
chart_note
,
history
. Sentence length 10–30 words.

The output of this step is a per-term sentence variants file. Any filename is fine — pick one and use it consistently across the cycle directory.

Template fallback. If

/data-designer

is unavailable, use a 4-template fallback (one per

context_type

) and substitute

term

mechanically. Tag those rows in the manifest (

context_type

is set, the sentence is just less natural) so a future cycle can regenerate.

向

/data-designer

提供以下指令：

针对
term_seed.csv
中的每一行，生成一个或多个自然英文句子，将
term
嵌入符合该行
entity_category
的场景中。输出 schema：
{term, entity_category, sentence, context_type}
。每个术语生成3-5种
context_type
变体。初始
context_type
词汇：
dictation
、
handoff
、
chart_note
、
history
。句子长度为10-30个单词。

本步骤的输出是每个术语的句子变体文件。文件名可任意选择，但需在整个周期目录中保持一致。

模板备用方案。若

/data-designer

不可用，使用4种模板（每种对应一种

context_type

）并自动替换

term

。在清单中为这些行添加标签（设置

context_type

，但句子仅为机械替换），以便后续周期重新生成更自然的句子。

2c. Two-tier IPA tagging (the load-bearing quality lever)

2c. 双层IPA标注（核心质量保障环节）

Every term passes through a 3-tier pipeline, in order:

Override —
```
pronunciation_overrides.csv
```
carries verified IPA the team has audited. If
```
term
```
matches a row here, the override wins.
Merriam-Webster — for un-overridden terms, fetch the MW respelling, convert to IPA, validate against Magpie's en-US phoneme set. If both succeed, the term is tagged
```
merriam-webster
```
.
Magpie G2P (fall-through) — if neither override nor MW produces a valid IPA, the plain text is passed to Magpie's neural G2P at synthesis time. The row is tagged
```
magpie_g2p
```
.

Every manifest row carries the

ipa_source

tag (

override | merriam-webster | magpie_g2p

). The delta between

merriam-webster

and

magpie_g2p

rows in the Stage 3 leaderboard is the proof the pronunciation strategy is working — call it out explicitly when you produce the leaderboard.

Three MW lookup choices — all tag

merriam-webster

. A:

dictionaryapi.com

JSON API +

DICTIONARY_API_KEY

(free at dictionaryapi.com) — recommended for standalone use. B: HTML scrape of

merriam-webster.com

— no key, brittle to site HTML changes; recipe inlined in

references/pronunciation-pipeline.md

. C: skip MW, fall through to Magpie G2P with weaker long-tail coverage. Both recipes + the full respelling→IPA table live in

references/pronunciation-pipeline.md

. The Path A function takes

api_key

as an arg (never reads

os.environ

); pass

None

to skip MW.

pronunciation_overrides.csv

schema:

csv

term,ipa,verified_by,verified_at,notes
cefazolin,sɛfəˈzoʊlɪn,brandoing,2026-05-13,confirmed against MW respelling + ear test

Append-only across cycles. Re-running the build later picks up new entries automatically.

每个术语都会依次通过三层处理流程：

自定义覆盖——
```
pronunciation_overrides.csv
```
包含经过团队审核的验证IPA。若
```
term
```
与其中某行匹配，则使用自定义覆盖的发音。
Merriam-Webster——对于未被覆盖的术语，获取MW的音标转写，转换为IPA格式，并验证是否符合Magpie的美式英语音素集。若两者均成功，则该术语标记为
```
merriam-webster
```
。
Magpie G2P（兜底方案）——若自定义覆盖和MW均无法生成有效IPA，则在合成时将纯文本传入Magpie的神经G2P模型。该行标记为
```
magpie_g2p
```
。

每个清单行都会携带

ipa_source

标签（

override | merriam-webster | magpie_g2p

）。第3阶段排行榜中

merriam-webster

和

magpie_g2p

行的差异正是发音策略有效性的证明——在生成排行榜时需明确指出这一点。

三种MW查询选项——均标记为

merriam-webster

。A：

dictionaryapi.com

JSON API +

DICTIONARY_API_KEY

（可在dictionaryapi.com免费获取）——推荐独立使用。B：爬取

merriam-webster.com

的HTML页面——无需密钥，但易受网站HTML结构变化影响；实现方法见

references/pronunciation-pipeline.md

。C：跳过MW，直接使用Magpie G2P兜底，但长尾术语的覆盖能力较弱。两种实现方案+完整的音标转写→IPA对照表均位于

references/pronunciation-pipeline.md

中。路径A的函数以

api_key

为参数（从不读取

os.environ

）；传入

None

即可跳过MW。

pronunciation_overrides.csv

的schema：

csv

term,ipa,verified_by,verified_at,notes
cefazolin,sɛfəˈzoʊlɪn,brandoing,2026-05-13,confirmed against MW respelling + ear test

可跨周期追加内容。后续重新运行构建流程时会自动读取新条目。

2d. QA-mode synthesis (do not skip this gate)

2d. QA模式合成（请勿跳过此关卡）

Before running the full Cartesian product, synthesize one wav per term with: first voice, clean noise, default context. Audition each clip with the user.

For every term tagged

magpie_g2p

, propose an IPA candidate using clinical suffix patterns and validate against Magpie's en-US phoneme set before suggesting:

Suffix	Stress pattern (example)
`-mycin`	…ˈmaɪsɪn (vancomycin, gentamicin)
`-prazole`	…ˈpreɪzoʊl (esomeprazole, omeprazole)
`-statin`	…ˈstætɪn (atorvastatin, rosuvastatin)
`-sartan`	…ˈsɑːrtən (losartan, valsartan)
`-azole`	…ˈeɪzoʊl (fluconazole, ketoconazole)
`-cillin`	…ˈsɪlɪn (amoxicillin, piperacillin)
`-parin`	…ˈpɛərɪn (enoxaparin, heparin)

Phoneme-validation pattern — live-probe Magpie's en-US neural G2P with a candidate IPA. If Magpie accepts the SSML, the IPA is in its inventory. Use the suffix patterns above as a pre-filter (cheap heuristic) and the live probe to confirm before committing to an override. The

magpie_validates_ipa(ipa, api_key, voice_id)

recipe — a minimal NVCF gRPC synthesis call that returns

True

False

fail-closed — is in

references/pronunciation-pipeline.md

Call it once per candidate IPA before showing it to the user. On user approval, append the verified IPA to

pronunciation_overrides.csv

. The row's

ipa_source

flips from

magpie_g2p

override

on the next manifest generation.

HITL audition gate before Step 2e — fail-closed. Do not synthesize the full Cartesian product, do not promote any staged IPA candidate to

pronunciation_overrides.csv

, and do not advance to Stage 3 until one of the following has happened explicitly in conversation:

The user confirms they have auditioned the QA clips and reports their verdict per clip (or per bucket: "the MW set sounds fine", "fix
```
pembrolizumab
```
", etc.). Provide the
```
afplay
```
(macOS) or
```
paplay
```
/
```
aplay
```
(Linux) commands so the user can play them — then halt and wait for their reply after listening. Paper-only approval via an AskUserQuestion prompt — clicking "Promote all" or "Lock in" without auditioning — does not satisfy this gate. Magpie-validating an IPA proves it's in the phoneme inventory; it does not prove it matches the intended pronunciation. Only the user's ears do that.
The user explicitly opts to skip audition for this cycle, in deliberate language (e.g. "skip audition, accept the risk that mispronunciations may dilute the Stage 3 KER signal — log it as a cycle-N caveat"), not as a side-effect of a single click-through. Record the skip in a cycle-level note (e.g.
```
eval/cycle<N>/cycle_notes.md
```
) so a future operator can see the audition was deferred.

Magpie NVCF rate-limits aggressively on >100-row jobs, and a do-over costs both API credits and clock time — but the larger risk is shipping a manifest with mispronounced reference audio that quietly corrupts the Stage 3 KER signal. Time spent auditioning is cheaper than re-running the cycle.

在运行全笛卡尔积合成前，为每个术语合成一个wav文件：使用第一个语音、无噪声、默认场景。与用户一起试听每个音频片段。

对于所有标记为

magpie_g2p

的术语，先使用临床后缀模式生成IPA候选，并验证是否符合Magpie的美式英语音素集，再向用户提出建议：

后缀	重音模式（示例）
`-mycin`	…ˈmaɪsɪn（vancomycin, gentamicin）
`-prazole`	…ˈpreɪzoʊl（esomeprazole, omeprazole）
`-statin`	…ˈstætɪn（atorvastatin, rosuvastatin）
`-sartan`	…ˈsɑːrtən（losartan, valsartan）
`-azole`	…ˈeɪzoʊl（fluconazole, ketoconazole）
`-cillin`	…ˈsɪlɪn（amoxicillin, piperacillin）
`-parin`	…ˈpɛərɪn（enoxaparin, heparin）

音素验证方式——使用候选IPA实时测试Magpie的美式英语神经G2P模型。若Magpie接受该SSML，则说明该IPA在其音素库中。先使用上述后缀模式作为预筛选（低成本启发式规则），再通过实时测试确认后，方可提交自定义覆盖。

magpie_validates_ipa(ipa, api_key, voice_id)

的实现方法——一个最小化的NVCF gRPC合成调用，返回

True

False

的闭包——位于

references/pronunciation-pipeline.md

中。

在向用户展示候选IPA前，需调用一次验证。经用户批准后，将验证通过的IPA追加到

pronunciation_overrides.csv

中。下次生成清单时，该行的

ipa_source

将从

magpie_g2p

变为

override

。

进入步骤2e前需通过HITL审核关卡——未通过则终止流程。在对话中明确发生以下情况之一前，不得进行全笛卡尔积合成、不得将任何候选IPA升级到

pronunciation_overrides.csv

、不得进入第3阶段：

用户确认已试听QA音频，并针对每个音频（或分组）给出反馈（如“MW组的发音没问题”、“修正
```
pembrolizumab
```
的发音”等）。提供
```
afplay
```
（macOS）或
```
paplay
```
/
```
aplay
```
（Linux）命令供用户播放音频——然后暂停并等待用户听完后的回复。仅通过点击“全部确认”或“锁定”而未试听的纸面批准不满足此关卡要求。Magpie验证IPA仅能证明其在音素库中，无法证明其符合预期发音。只有用户的听觉判断才能确认这一点。
用户明确选择跳过本次周期的试听，且表述清晰（例如：“跳过试听，接受发音错误可能削弱第3阶段KER信号的风险——将此记录为cycle-N的注意事项”），而非仅通过单次点击操作。将跳过试听的情况记录在周期级备注中（如
```
eval/cycle<N>/cycle_notes.md
```
），以便后续操作人员知晓试听已被推迟。

Magpie NVCF对超过100行的任务会严格限流，重新执行会消耗API额度和时间——但更大的风险是交付带有错误发音参考音频的清单，从而悄悄破坏第3阶段的KER信号。花时间试听比重新执行整个周期更划算。

2e. Full benchmark generation

2e. 完整基准测试集生成

After pronunciations are locked, generate the full Cartesian product

|terms| × |voices| × |noise_levels| × |context_types|

. Defaults: 2–4 Magpie en-US voices (Mia/Jason/Ray),

[clean, snr_15db, snr_5db]

[dictation, handoff, chart_note, history]

Self-contained synthesis — no

/read-aloud

required. The

synthesize_row(row, all_overrides, out_dir, api_key)

recipe — opens an NVCF gRPC stream, wraps overrides into SSML via

render_sentence_with_overrides

, writes 16-bit mono PCM to

<out_dir>/audio/<slug>.wav

— is in

references/pronunciation-pipeline.md

(§Synthesis call). Key invariant:

all_overrides

carries every entry from

pronunciation_overrides.csv

(including context-word overrides like

intravenously

) so the renderer wraps any override whose verbatim text appears in

row['text']

. Wrapping only

row['term']

silently drops context-word overrides.

Noise-injection (clean →

snr_15db

→

snr_5db

) and the manifest schema (NeMo canonical fields + clinical extension, plus pre-flight schema and audio-existence checks) all live in

references/manifest-schema.md

Warn when product > 100 rows. Magpie NVCF rate-limits with ~5–10%

RESOURCE_EXHAUSTED

drops on big runs. Re-run the dropped rows.

发音确认无误后，生成全笛卡尔积：

|术语数| × |语音数| × |噪声等级| × |场景类型|

。默认配置：2-4种Magpie美式英语语音（Mia/Jason/Ray）、

[clean, snr_15db, snr_5db]

、

[dictation, handoff, chart_note, history]

。

合成流程独立完成——无需依赖

/read-aloud

。

synthesize_row(row, all_overrides, out_dir, api_key)

的实现方法——打开NVCF gRPC流，通过

render_sentence_with_overrides

将自定义覆盖包装为SSML，将16位单声道PCM写入

<out_dir>/audio/<slug>.wav

——位于

references/pronunciation-pipeline.md

（§合成调用）。核心规则：

all_overrides

需包含

pronunciation_overrides.csv

中的所有条目（包括

intravenously

等场景词汇的覆盖），以便渲染器自动包装

row['text']

中出现的任何覆盖词汇。仅包装

row['term']

会导致场景词汇的覆盖被忽略。

噪声注入（clean →

snr_15db

→

snr_5db

）和清单schema（NeMo标准字段+临床扩展字段，以及预校验schema和音频存在性检查）均位于

references/manifest-schema.md

中。

当笛卡尔积行数>100时发出警告。Magpie NVCF对大型任务会有约5-10%的

RESOURCE_EXHAUSTED

错误。需重新运行失败的行。

Stage 2 completion checklist

第2阶段完成检查清单

Don't consider Stage 2 done until all five sub-steps ran. Agents commonly stop after 2a or 2b; the goal is a synthesized manifest plus a hand-off:

2a —

term_seed.csv

, 4–10 terms,

entity_category ∈ {drug, procedure, anatomy, condition, lab, role}

2b — 3–5
```
context_type
```
sentence variants per term

2c — every term tagged

ipa_source ∈ {override, merriam-webster, magpie_g2p}

2d — QA wavs auditioned, IPA overrides locked with explicit user approval
2e —
```
manifest.jsonl
```
+ per-row audio for the Cartesian product
Hand-off — name
```
/digital-health-clinical-asr-eval
```
as the next skill and KER as its headline metric

Writes go only into the user-chosen

$EVAL_DIR/cycle<N>/

. Don't write elsewhere, modify env, or install packages — those belong to

/digital-health-clinical-asr-setup

需完成所有五个子步骤后，方可认为第2阶段结束。Agent常停留在步骤2a或2b；本阶段的目标是生成合成清单并完成交接：

2a —

term_seed.csv

，包含4-10个术语，

entity_category ∈ {drug, procedure, anatomy, condition, lab, role}

2b — 每个术语对应3-5种
```
context_type
```
的句子变体

2c — 每个术语均标记

ipa_source ∈ {override, merriam-webster, magpie_g2p}

2d — QA音频已试听，IPA覆盖经用户明确批准后锁定
2e —
```
manifest.jsonl
```
+ 笛卡尔积中每一行对应的音频
交接 — 告知用户下一Skill为
```
/digital-health-clinical-asr-eval
```
，其核心指标为KER

所有写入操作仅允许在用户选择的

$EVAL_DIR/cycle<N>/

目录中进行。不得写入其他位置、修改环境变量或安装包——这些操作属于

/digital-health-clinical-asr-setup

的职责范围。

Examples

示例

Scenario A — fresh oncology benchmark. User: "We're seeing chemo drug names mistranscribed. Where do I start?" → Step 2a: confirm specialty is oncology, ask about which drugs (immunotherapy biologics, platinum agents, taxanes). Propose ~10 candidates:

cisplatin

paclitaxel

pembrolizumab

nivolumab

carboplatin

docetaxel

bevacizumab

trastuzumab

cetuximab

pemetrexed

. Write

term_seed.csv

with all

entity_category=drug

. Step 2b: brief

/data-designer

for 4 context variants each = 40 sentences. Step 2c: MW lookup for each — biologics like

pembrolizumab

will likely fall to

magpie_g2p

; platinum agents likely hit MW. Step 2d: synthesize one QA wav per term, walk the user through the

pembrolizumab

etc. clips, propose IPA candidates with

-mab

suffix stress patterns. Step 2e: on approval, run 10 terms × 2 voices × 2 noise levels × 3 contexts = 120 rows.

Scenario B — appending to an existing cycle. User: "I have a cycle-1 manifest and I want to add 5 more procedures." → Re-run only Steps 2a (specialty interview just for the new terms), 2b (sentence gen for the additions), 2c (IPA pipeline for the additions), 2d (audition the new terms), and 2e (synthesize only the new term rows). Append to the existing

manifest.jsonl

. Do not regenerate audio for existing terms — cycle isolation is intentional so leaderboards diff cycle N vs cycle N+1 cleanly.

场景A——全新肿瘤基准测试集。用户：“我们发现化疗药物名称经常被转录错误。我该从哪里开始？” → 步骤2a：确认专科为肿瘤学，询问涉及哪些药物（免疫治疗生物制剂、铂类药物、紫杉烷类）。提出约10个候选术语：

cisplatin

、

paclitaxel

、

pembrolizumab

、

nivolumab

、

carboplatin

、

docetaxel

、

bevacizumab

、

trastuzumab

、

cetuximab

、

pemetrexed

。将所有术语的

entity_category=drug

写入

term_seed.csv

。步骤2b：向

/data-designer

提供指令，每个术语生成4种场景变体 → 共40个句子。步骤2c：为每个术语查询MW——

pembrolizumab

等生物制剂可能会 fallback 到

magpie_g2p

；铂类药物可能匹配到MW结果。步骤2d：为每个术语合成一个QA音频，引导用户试听

pembrolizumab

等音频，基于

-mab

后缀重音模式提出IPA候选。步骤2e：获得批准后，运行10个术语 × 2种语音 × 2种噪声等级 × 3种场景 = 120行。

场景B——向现有周期追加内容。用户：“我有一个cycle-1清单，想添加5个手术术语。” → 仅重新运行步骤2a（仅针对新增术语进行专科访谈）、2b（为新增术语生成句子）、2c（为新增术语执行IPA流程）、2d（试听新增术语的音频）和2e（仅合成新增术语的行）。将结果追加到现有

manifest.jsonl

中。请勿重新生成现有术语的音频——周期隔离是有意设计的，以便排行榜可以清晰对比cycle N和cycle N+1的差异。

Artifacts produced

生成的产物

```
term_seed.csv
```
— curated terms with
```
entity_category
```
```
pronunciation_overrides.csv
```
— verified IPA, appendable across cycles
```
manifest.jsonl
```
— NeMo format with clinical extension fields (one JSON object per line)
```
audio/<slug>.wav
```
— synthesized clips, one per manifest row

```
term_seed.csv
```
— 带有
```
entity_category
```
的整理后术语
```
pronunciation_overrides.csv
```
— 验证通过的IPA，可跨周期追加
```
manifest.jsonl
```
— 带有临床扩展字段的NeMo格式清单（每行一个JSON对象）
```
audio/<slug>.wav
```
— 合成音频片段，每个清单行对应一个

Troubleshooting

故障排查

TTS rate-limit drops (
RESOURCE_EXHAUSTED
) on >100-row generation → expected on Magpie NVCF. Confirm exponential backoff is active in
```
/read-aloud
```
; expect ~5–10% drops on big runs and re-run for the gaps.
All
ipa_source
rows tagged
magpie_g2p
→ MW lookup is failing across the board, or candidate IPAs are failing phoneme validation. Re-verify whichever MW path you configured (
```
DICTIONARY_API_KEY
```
for A; HTTPS reachability + parser for B), then check candidates against Magpie's en-US phoneme inventory.
Magpie mispronounces a term even with the IPA override → first verify the IPA is in the Magpie en-US phoneme inventory and the SSML wrapping is syntactically valid. If both check out, the underlying TTS bug is owned by
```
/read-aloud
```
(
```
/riva-tts
```
) — route there for diagnosis. This skill provides the override mechanism but does not own the neural G2P or SSML parser.
Sentence variants from
/data-designer
are bland / template-like → check the brief; the schema-only prompt sometimes produces stereotyped output. Add 1–2 in-context examples to the brief and re-run.
Audio files exist but
manifest.jsonl
is short → manifest writer skipped rows whose synthesis returned a NVCF error. Re-run the build with only the missing rows.

For anything not in this list, identify which upstream skill is implicated and route there. The

digital-health-clinical-asr-build

skill owns the methodology, not the TTS or DataDesigner internals.

TTS限流错误（
RESOURCE_EXHAUSTED
）——当生成行数>100时出现，这在Magpie NVCF上是预期情况。确认
```
/read-aloud
```
已启用指数退避机制；大型任务约有5-10%的失败率，需重新运行失败的行。
所有
ipa_source
行均标记为
magpie_g2p
——MW查询全面失败，或候选IPA未通过音素验证。重新验证你配置的MW路径（路径A需检查
```
DICTIONARY_API_KEY
```
；路径B需检查HTTPS可达性和解析器），然后检查候选IPA是否符合Magpie的美式英语音素库。
即使使用IPA覆盖，Magpie仍发音错误——首先验证IPA是否在Magpie的美式英语音素库中，且SSML包装语法正确。若两者均无问题，则底层TTS bug由
```
/read-aloud
```
（
```
/riva-tts
```
）负责——请路由至该Skill进行诊断。本Skill仅提供覆盖机制，不负责神经G2P或SSML解析器的问题。
/data-designer
生成的句子变体平淡/模板化——检查指令；仅提供schema的提示有时会产生刻板输出。在指令中添加1-2个上下文示例后重新运行。
音频文件存在但
manifest.jsonl
行数不足——清单生成器跳过了合成时返回NVCF错误的行。仅针对缺失的行重新运行构建流程。

对于未在此列表中的问题，确定涉及哪个上游Skill并路由至该Skill。

digital-health-clinical-asr-build

Skill负责方法论，不负责TTS或DataDesigner的内部实现。

Limitations

局限性

English-only by default. Magpie's en-US phoneme inventory is what the two-tier IPA pipeline validates against. Other locales need a different upstream phoneme set + override CSV format.
Six fixed entity categories. Extending
```
entity_category
```
is a deliberate methodology change, not a one-off tweak — KER breakdowns, leaderboard sections, and downstream finetune scripts all key off the vocabulary.
Tiny first cycles. Below ~20 terms, the by-
```
ipa_source
```
leaderboard split won't have enough rows in each bucket to be statistically meaningful. Build a meaningful cycle even if it costs a session.
Magpie NVCF rate-limits. ~5–10% drops on large jobs; budget a re-run pass.

默认仅支持英语。双层IPA流程基于Magpie的美式英语音素库进行验证。其他地区语言需要不同的上游音素集+覆盖CSV格式。
六个固定实体分类。扩展
```
entity_category
```
是对方法论的针对性修改，而非临时调整——KER指标细分、排行榜章节和下游微调脚本均依赖该词汇体系。
首次周期规模较小。若术语数少于20个，按
```
ipa_source
```
拆分的排行榜每个分组的行数不足，无法具备统计意义。即使需要额外时间，也要构建一个有意义的周期。
Magpie NVCF限流。大型任务约有5-10%的失败率；需预留重新运行的时间。

Next steps

下一步

Forward:
```
/digital-health-clinical-asr-eval
```
— transcribe the manifest, score WER/CER/KER/SER, produce the five-section leaderboard.
Back to setup (if anything in the env is broken):
```
/digital-health-clinical-asr-setup
```
.
Lateral for TTS-specific debugging:
```
/read-aloud
```
or
```
/riva-tts
```
.

前进：
```
/digital-health-clinical-asr-eval
```
— 转录清单，评分WER/CER/KER/SER，生成五部分排行榜。
返回设置（若环境存在问题）：
```
/digital-health-clinical-asr-setup
```
。
横向排查（TTS特定问题）：
```
/read-aloud
```
或
```
/riva-tts
```
。

References

参考文档

```
references/manifest-schema.md
```
— NeMo canonical fields + clinical extension; pre-flight schema and audio-existence checks; cross-cycle stability rules

```
references/manifest-schema.md
```
— NeMo标准字段+临床扩展字段；预校验schema和音频存在性检查；跨周期稳定性规则

digital-health-clinical-asr-build

Original

Translation

Clinical ASR Flywheel — Stage 2 (Build the benchmark)

临床ASR飞轮——第2阶段（构建基准测试集）

Data leaves your environment — disclose this to the user before any term is sent

数据将离开你的环境——在发送任何术语前告知用户

Purpose

目标

When to use this skill

何时使用本Skill

Prerequisites

前置条件

Instructions

操作步骤

2a. Specialty interview →
`term_seed.csv`

2a. 专科访谈 →
`term_seed.csv`

2b. Sentence generation via
`/data-designer`

2b. 通过
`/data-designer`
生成句子

2c. Two-tier IPA tagging (the load-bearing quality lever)

2c. 双层IPA标注（核心质量保障环节）

2d. QA-mode synthesis (do not skip this gate)

2d. QA模式合成（请勿跳过此关卡）

2e. Full benchmark generation

2e. 完整基准测试集生成

Stage 2 completion checklist

第2阶段完成检查清单

Examples

示例

Artifacts produced

生成的产物

Troubleshooting

故障排查

Limitations

局限性

Next steps

下一步

References

参考文档

digital-health-clinical-asr-build

Original

Translation

Clinical ASR Flywheel — Stage 2 (Build the benchmark)

临床ASR飞轮——第2阶段（构建基准测试集）

Data leaves your environment — disclose this to the user before any term is sent

数据将离开你的环境——在发送任何术语前告知用户

Purpose

目标

When to use this skill

何时使用本Skill

Prerequisites

前置条件

Instructions

操作步骤

2a. Specialty interview → term_seed.csv

2a. 专科访谈 → term_seed.csv

2b. Sentence generation via /data-designer

2b. 通过/data-designer生成句子

2c. Two-tier IPA tagging (the load-bearing quality lever)

2c. 双层IPA标注（核心质量保障环节）

2d. QA-mode synthesis (do not skip this gate)

2d. QA模式合成（请勿跳过此关卡）

2e. Full benchmark generation

2e. 完整基准测试集生成

Stage 2 completion checklist

第2阶段完成检查清单

Examples

示例

Artifacts produced

生成的产物

Troubleshooting

故障排查

Limitations

局限性

Next steps

下一步

References

参考文档

2a. Specialty interview →
`term_seed.csv`

2a. 专科访谈 →
`term_seed.csv`

2b. Sentence generation via
`/data-designer`

2b. 通过
`/data-designer`
生成句子