voice-batch-runner

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Voice Batch Runner

Follow shared release-shell rules in:

```
postplus-shared
```
release-shell rules

Use this skill after persona, concept, and script work already exists.

This skill is for:

designing an initial voice from persona traits
generating script-specific audio takes
storing reusable voice profiles for later videos
preparing for future voice-identity capture or timbre-preserving generation

This skill is not for unconstrained voice casting.

遵循以下共享的release-shell规则：

```
postplus-shared
```
release-shell规则

请在角色（persona）、概念和脚本工作完成后使用此技能。

此技能适用于：

根据角色特征设计初始语音
生成脚本专属音频片段
存储可复用的语音配置文件，供后续视频使用
为未来的语音身份捕获或音色保留生成做准备

此技能不适用于无限制的语音选角。

Core Idea

核心理念

Voice should be treated as a first-class persona asset, not a one-off byproduct of one script.

That means the system should separate:

```
voice profile
```
- how this persona should sound
```
voice identity
```
- the reusable voice source or captured timbre, if available
```
voice take
```
- one concrete audio file generated for one script

The script can change every time. The persona voice should remain stable.

语音应被视为角色的核心资产，而非单支脚本的一次性副产品。

这意味着系统应区分以下三类对象：

```
voice profile
```
（语音配置文件）
- 定义该角色的语音风格
```
voice identity
```
（语音身份）
- 可复用的语音源或已捕获的音色（若存在）
```
voice take
```
（语音片段）
- 为单支脚本生成的具体音频文件

脚本可以随时更改，但角色的语音应保持稳定。

Hosted Boundary Rule

托管边界规则

keep request files, raw provider responses, and run manifests under
```
<work-folder>/.postplus/voice-batch-runner/
```
when they are internal execution state
keep only final user-facing audio exports outside
```
.postplus/
```
if hosted voice capability is unavailable, unauthorized, or returns a stable network error, stop immediately instead of switching to ad hoc shell glue

若请求文件、原始服务商响应和运行清单属于内部执行状态，请将其存储在
```
<work-folder>/.postplus/voice-batch-runner/
```
目录下
仅将最终面向用户的音频导出文件存储在
```
.postplus/
```
目录外
若托管语音功能不可用、未获授权或持续返回网络错误，请立即停止操作，不要切换到临时shell脚本

Skill Family Direction

技能家族发展方向

This skill is the first member of a future voice skill family.

The family can naturally expand into:

```
voice-batch-runner
```
- current skill; orchestrates voice generation and persistence
```
voice-identity-capture
```
- future skill; captures or normalizes a reusable voice identity from approved reference audio
```
voice-review
```
- future skill; audits realism, pacing, and persona fit

For now, keep everything in

voice-batch-runner

, but design the data model so these can split later.

此技能是未来语音技能家族的首个成员。

该家族可自然扩展为以下技能：

```
voice-batch-runner
```
- 当前技能；协调语音生成与持久化
```
voice-identity-capture
```
- 未来技能；从已批准的参考音频中捕获或标准化可复用的语音身份
```
voice-review
```
- 未来技能；审核语音的真实感、语速及角色贴合度

目前所有功能均保留在

voice-batch-runner

中，但需设计数据模型以便后续拆分。

Fact Rule

事实规则

Voice generation should be grounded in persona and content evidence.

Required upstream inputs:

approved persona registry
script text
persona voice baseline
video purpose or lane if it changes delivery style

Do not let the TTS model invent:

a totally different age or authority level
ad-like delivery when the persona is a work-friend creator
high-drama acting not supported by benchmark tone

语音生成应基于角色和内容依据。

必需的上游输入：

已批准的角色注册表
脚本文本
角色语音基准
视频用途或定位（若会影响语音交付风格）

禁止TTS模型生成以下内容：

与角色完全不符的年龄或权威感
当角色为职场好友型创作者时，生成广告式的语音风格
基准语气不支持的夸张戏剧化演绎

Source Selection Rule

源选择规则

Use persona and script inputs from the active project context.

If the task clearly belongs to one client or campaign folder, read from that context first.

Do not assume one client directory is the default source base for all voice work.

使用当前项目上下文里的角色和脚本输入。

若任务明确属于某一客户或活动文件夹，请优先从该上下文读取数据。

不要假设某一客户目录是所有语音工作的默认源基础。

Voice Objects

语音对象

This workflow should distinguish three object types.

此工作流应区分三种对象类型。

1. Voice Profile

1. Voice Profile（语音配置文件）

The durable description of how the persona should sound.

Should include:

```
voiceProfileId
```
```
personaId
```
```
style
```
```
pace
```
```
tone
```
```
language
```
```
forbiddenTraits
```
```
sourceBasis
```

对角色语音风格的持久化描述。

应包含：

```
voiceProfileId
```
```
personaId
```
```
style
```
```
pace
```
```
tone
```
```
language
```
```
forbiddenTraits
```
```
sourceBasis
```

2. Voice Identity

2. Voice Identity（语音身份）

An optional reusable voice source.

This may later point to:

a provider voice id
a designed seed voice
a captured timbre from approved reference audio

Should include:

```
voiceIdentityId
```
```
voiceProfileId
```
```
provider
```
```
providerVoiceId
```
or equivalent
```
referenceAudioPaths
```
```
status
```

可选的可复用语音源。

未来可指向：

服务商语音ID
设计的种子语音
从已批准参考音频中捕获的音色

应包含：

```
voiceIdentityId
```
```
voiceProfileId
```
```
provider
```
```
providerVoiceId
```
or equivalent
```
referenceAudioPaths
```
```
status
```

3. Voice Take

3. Voice Take（语音片段）

One concrete generated audio output for one script.

Should include:

```
voiceTakeId
```
```
voiceProfileId
```
```
voiceIdentityId
```
if used
```
scriptId
```
or source path
```
audioPath
```
```
requestPath
```
```
responsePath
```
```
manifestPath
```
```
reviewStatus
```

为单支脚本生成的具体音频输出。

应包含：

```
voiceTakeId
```
```
voiceProfileId
```
```
voiceIdentityId
```
if used
```
scriptId
```
or source path
```
audioPath
```
```
requestPath
```
```
responsePath
```
```
manifestPath
```
```
reviewStatus
```

Default Workflow

默认工作流

1. Start from persona registry

1. 从角色注册表开始

Before generating audio, confirm the persona registry contains:

voice baseline
approved image anchor
intended use cases

If voice baseline is missing, write it first.

生成音频前，请确认角色注册表包含以下内容：

语音基准
已批准的形象锚点
预期使用场景

若缺少语音基准，请先创建。

2. Create or refine the voice profile

2. 创建或优化语音配置文件

Translate persona traits into a provider-ready voice description.

Example dimensions:

calm vs energetic
practical vs polished
lightly nerdy vs polished professional
medium pace vs brisk pace
friendly and efficient vs authoritative

将角色特征转换为服务商可识别的语音描述。

示例维度：

沉稳 vs 活力
务实 vs 优雅
略带书呆子气 vs 专业干练
中等语速 vs 快速语速
友好高效 vs 权威正式

3. Generate an initial voice design

3. 生成初始语音设计

Use a voice-design model to generate a reference voice or first take from:

```
text
```
```
voice_description
```
```
language
```

This first result should be reviewed before being treated as reusable.

使用语音设计模型，根据以下内容生成参考语音或首个语音片段：

```
text
```
```
voice_description
```
```
language
```

首个结果需经过审核后才能被视为可复用资源。

4. Generate script-specific voice takes

4. 生成脚本专属语音片段

Once a voice profile or voice identity exists:

keep the voice stable
swap in a new script text
generate a new take for each new video

The text changes. The voice continuity should not.

一旦语音配置文件或语音身份存在：

保持语音稳定
替换为新的脚本文本
为每个新视频生成新的语音片段

文本可以更改，但语音连贯性必须保持。

5. Review and iterate

5. 审核与迭代

Voice assets need structured review, not vague opinions.

Common review categories:

```
voice_too_salesy
```
```
voice_too_slow
```
```
voice_too_fast
```
```
voice_not_young_enough
```
```
voice_not_professional_enough
```
```
voice_too_flat
```
```
voice_too_broadcast
```
```
voice_persona_drift
```

语音资产需要结构化审核，而非模糊的评价。

常见审核类别：

```
voice_too_salesy
```
```
voice_too_slow
```
```
voice_too_fast
```
```
voice_not_young_enough
```
```
voice_not_professional_enough
```
```
voice_too_flat
```
```
voice_too_broadcast
```
```
voice_persona_drift
```

Path Selection Rule

路径选择规则

Store outputs under the active project's voice asset structure when one already exists.

If no such structure exists yet, use a clear workspace output path and state where files were written.

If the output will become a durable client asset, prefer confirming the destination with the user.

若当前项目已有语音资产结构，请将输出存储在该结构下。

若尚无此类结构，请使用清晰的工作区输出路径，并记录文件存储位置。

若输出将成为客户的持久资产，请优先与用户确认存储目标位置。

Example Persistence Convention

持久化约定示例

One possible project-local layout is:

text

voices/<voice-take-id>/
  request.json
  response.json
  manifest.json
  audio/
  review.json

Keep internal request files, raw provider responses, and run manifests under

<work-folder>/.postplus/voice-batch-runner/

when they are execution artifacts rather than the final handoff.

一种可行的项目本地目录结构如下：

text

voices/<voice-take-id>/
  request.json
  response.json
  manifest.json
  audio/
  review.json

若内部请求文件、原始服务商响应和运行清单属于执行产物而非最终交付物，请将其存储在

<work-folder>/.postplus/voice-batch-runner/

目录下。

Tool Contract

工具契约

This skill expects these tool adapters:

```
design_voice
```
```
clone_voice_take
```

clone_voice_take

accepts

referenceAudioPath

for local files and uploads it inside the script before calling the hosted clone endpoint.

Future extension:

```
capture_voice_identity
```

See

references/tool-contracts.md

此技能需要以下工具适配器：

```
design_voice
```
```
clone_voice_take
```

clone_voice_take

接受本地文件的

referenceAudioPath

参数，并在调用托管克隆接口前将文件上传至脚本内部。

未来扩展：

```
capture_voice_identity
```

详见

references/tool-contracts.md

。

Core Scripts

核心脚本

```
scripts/design_voice.mjs
```
```
scripts/clone_voice_take.mjs
```

These scripts take normalized request JSON files and write:

```
request.json
```
```
response.json
```
```
manifest.json
```
```
review.json
```
downloaded audio under
```
audio/
```

```
scripts/design_voice.mjs
```
```
scripts/clone_voice_take.mjs
```

这些脚本接受标准化的请求JSON文件，并生成以下文件：

```
request.json
```
```
response.json
```
```
manifest.json
```
```
review.json
```
下载的音频文件存储在
```
audio/
```
目录下

Current Provider Direction

当前服务商方向

First likely provider path:

hosted voice design capability

Use it for initial voice design or first-pass takes.

Also relevant:

hosted voice clone capability

Use voice clone when:

you already have an approved reference audio for a persona
later scripts need new text but should preserve the same timbre and speaking style
you can provide the reference transcript for better matching

This fits the future requirement of "script changes, persona voice stays stable" better than voice-design alone.

Read the provider notes before implementing:

```
references/hosted-tts-voice-design.md
```
```
references/hosted-tts-voice-clone.md
```

Future provider path:

a second model that preserves an approved voice timbre while reading new text

That future step should not change the outer workflow. It should only swap the tool adapter or voice identity backend.

首选服务商路径：

托管语音设计功能

用于初始语音设计或首轮语音片段生成。

Review Rule

审核规则

Before generating a take, verify:

persona registry exists
voice baseline exists
script text is finalized enough for review
output path is explicit

After generating a take, review:

realism
persona fit
pacing
whether it sounds too much like an ad
whether it is reusable across many scripts

When reviewing cloned voice output, also check:

how well it preserves the target timbre
whether accent and speaking style drift from the reference
whether the reference audio quality is limiting the result

生成语音片段前，请验证：

角色注册表已存在
语音基准已存在
脚本文本已足够完善可用于审核
输出路径明确

生成语音片段后，请审核：

真实感
角色贴合度
语速
是否过于像广告风格
是否可在多支脚本中复用

审核克隆语音输出时，还需检查：

目标音色的保留程度
口音和说话风格是否偏离参考音频
参考音频质量是否限制了输出结果

Example Commands

示例命令

Design an initial persona-aligned voice:

bash

node ${CLAUDE_SKILL_DIR}/scripts/design_voice.mjs \
  --request /path/to/request.json

Generate a new take from approved reference audio:

bash

node ${CLAUDE_SKILL_DIR}/scripts/clone_voice_take.mjs \
  --request /path/to/request.json

设计贴合角色的初始语音：

bash

node ${CLAUDE_SKILL_DIR}/scripts/design_voice.mjs \
  --request /path/to/request.json

从已批准的参考音频生成新的语音片段：

bash

node ${CLAUDE_SKILL_DIR}/scripts/clone_voice_take.mjs \
  --request /path/to/request.json

Failure Mode

故障处理

Stop and state the gap if:

no persona registry exists
no voice baseline exists
the script is still too unstable
the request does not specify whether this is voice design or a script-specific take

Do not solve missing voice strategy by randomly changing the TTS description.

出现以下情况时，请停止操作并说明缺失项：

角色注册表不存在
语音基准不存在
脚本仍过于不稳定
请求未明确是语音设计还是脚本专属语音片段生成

不要通过随机修改TTS描述来解决语音策略缺失的问题。