hyperframes-media
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHyperFrames Media Preprocessing
HyperFrames 媒体预处理
Three CLI commands that produce assets for compositions: (speech), (timestamps), and (transparent video). Each downloads a model on first run and caches it under . Drop the output into the project, then reference it from the composition HTML — see the skill for the audio/video element conventions.
ttstranscriberemove-background~/.cache/hyperframes/hyperframes以下三个CLI命令可生成合成内容所需的资源:(语音生成)、(时间戳转录)和(透明视频生成)。每个命令首次运行时会下载模型,并缓存到目录下。将输出文件放入项目后,即可在合成HTML中引用——关于音视频元素的规范,请参考技能文档。
ttstranscriberemove-background~/.cache/hyperframes/hyperframesText-to-Speech (tts
)
tts文本转语音(tts
)
ttsGenerate speech audio locally with Kokoro-82M. No API key.
bash
npx hyperframes tts "Text here" --voice af_nova --output narration.wav
npx hyperframes tts script.txt --voice bf_emma --output narration.wav
npx hyperframes tts --list # all 54 voices通过Kokoro-82M本地生成语音音频,无需API密钥。
bash
npx hyperframes tts "Text here" --voice af_nova --output narration.wav
npx hyperframes tts script.txt --voice bf_emma --output narration.wav
npx hyperframes tts --list # 查看全部54种语音Voice Selection
语音选择
Match voice to content. Default is .
af_heart| Content type | Voice | Why |
|---|---|---|
| Product demo | | Warm, professional |
| Tutorial / how-to | | Neutral, easy to follow |
| Marketing / promo | | Energetic or authoritative |
| Documentation | | Clear British English, formal |
| Casual / social | | Approachable, natural |
根据内容类型匹配语音,默认语音为。
af_heart| 内容类型 | 语音选项 | 选择理由 |
|---|---|---|
| 产品演示 | | 温暖、专业 |
| 教程/操作指南 | | 中性、易于理解 |
| 营销/推广内容 | | 充满活力或具有权威性 |
| 文档类内容 | | 清晰的英式英语、正式严谨 |
| 休闲/社交类内容 | | 亲切自然、易于接近 |
Multilingual
多语言支持
Voice IDs encode language in the first letter: =American English, =British English, =Spanish, =French, =Hindi, =Italian, =Japanese, =Brazilian Portuguese, =Mandarin. The CLI auto-detects the phonemizer locale from the prefix — no needed when the voice matches the text.
abefhijpz--langbash
npx hyperframes tts "La reunión empieza a las nueve" --voice ef_dora --output es.wav
npx hyperframes tts "今日はいい天気ですね" --voice jf_alpha --output ja.wavUse only to override auto-detection (stylized accents). Valid codes: , , , , , , , , . Non-English phonemization requires system-wide ( / ).
--langen-usen-gbesfr-frhiitpt-brjazhespeak-ngbrew install espeak-ngapt-get install espeak-ng语音ID的首字母代表语言:=美式英语,=英式英语,=西班牙语,=法语,=印地语,=意大利语,=日语,=巴西葡萄牙语,=普通话。CLI会根据前缀自动检测音素化区域设置——当语音与文本语言匹配时,无需添加参数。
abefhijpz--langbash
npx hyperframes tts "La reunión empieza a las nueve" --voice ef_dora --output es.wav
npx hyperframes tts "今日はいい天気ですね" --voice jf_alpha --output ja.wav仅在需要覆盖自动检测(如特殊口音风格)时使用参数。有效的语言代码包括:, , , , , , , , 。非英语音素化需要系统级安装(执行 / 安装)。
--langen-usen-gbesfr-frhiitpt-brjazhespeak-ngbrew install espeak-ngapt-get install espeak-ngSpeed
语速设置
- — tutorial, complex content, accessibility
0.7-0.8 - — natural pace (default)
1.0 - — intros, transitions, upbeat content
1.1-1.2 - — rarely appropriate; test carefully
1.5+
- — 教程、复杂内容、无障碍场景
0.7-0.8 - — 自然语速(默认)
1.0 - — 开场、过渡片段、欢快内容
1.1-1.2 - — 极少适用;需仔细测试
1.5+
Long Scripts
长脚本处理
For more than a few paragraphs, write to a file and pass the path. Inputs over ~5 minutes of speech may benefit from splitting into segments.
.txt对于超过几段的长文本,建议写入文件并传入文件路径。时长超过约5分钟的语音输入,可考虑拆分为多个片段处理。
.txtRequirements
环境要求
Python 3.8+ with and (). Model downloads on first use (~311 MB + ~27 MB voices, cached in ).
kokoro-onnxsoundfilepip install kokoro-onnx soundfile~/.cache/hyperframes/tts/需安装Python 3.8+,并安装和库(执行安装)。首次使用时会下载模型(约311 MB + 约27 MB语音包),缓存至目录。
kokoro-onnxsoundfilepip install kokoro-onnx soundfile~/.cache/hyperframes/tts/Transcription (transcribe
)
transcribe转录功能(transcribe
)
transcribeProduce a normalized with word-level timestamps.
transcript.jsonbash
npx hyperframes transcribe audio.mp3
npx hyperframes transcribe video.mp4 --model small --language es
npx hyperframes transcribe subtitles.srt # import existing
npx hyperframes transcribe subtitles.vtt
npx hyperframes transcribe openai-response.json生成包含单词级时间戳的标准化文件。
transcript.jsonbash
npx hyperframes transcribe audio.mp3
npx hyperframes transcribe video.mp4 --model small --language es
npx hyperframes transcribe subtitles.srt # 导入已有字幕
npx hyperframes transcribe subtitles.vtt
npx hyperframes transcribe openai-response.jsonLanguage Rule (Non-Negotiable)
语言规则(必须遵守)
Never use models unless the user explicitly states the audio is English. models (, ) translate non-English audio into English instead of transcribing it. This silently destroys the original language.
.en.ensmall.enmedium.en- Language known and non-English → (no
--model small --language <code>suffix).en - Language known and English →
--model small.en - Language unknown → (no
--model small, no.en) — whisper auto-detects--language
Default model is , not .
smallsmall.en除非用户明确说明音频为英语,否则绝不要使用模型。 模型(, )会将非英语音频翻译为英语,而非转录原语言,这会无声地破坏原始语言内容。
.en.ensmall.enmedium.en- 已知语言为非英语 → 使用(不要加
--model small --language <code>后缀).en - 已知语言为英语 → 使用
--model small.en - 未知语言 → 使用(不加
--model small,也不加.en)——Whisper会自动检测语言--language
默认模型为,而非。
smallsmall.enModel Sizes
模型尺寸
| Model | Size | Speed | When to use |
|---|---|---|---|
| 75 MB | Fastest | Quick previews, testing pipeline |
| 142 MB | Fast | Short clips, clear audio |
| 466 MB | Moderate | Default — most content |
| 1.5 GB | Slow | Important content, noisy audio, music |
| 3.1 GB | Slowest | Production quality |
Music with vocals: start at minimum; produced tracks often need manual SRT/VTT import. For caption-quality checks (mandatory after every transcription), the cleaning JS, retry rules, and the OpenAI/Groq API import path, see hyperframes/references/transcript-guide.md.
medium| 模型 | 大小 | 速度 | 使用场景 |
|---|---|---|---|
| 75 MB | 最快 | 快速预览、测试流水线 |
| 142 MB | 快 | 短片段、清晰音频 |
| 466 MB | 中等 | 默认选项——适用于大多数内容 |
| 1.5 GB | 慢 | 重要内容、嘈杂音频、带 vocals 的音乐 |
| 3.1 GB | 最慢 | 生产级质量要求 |
带 vocals 的音乐:至少从模型开始尝试;成品音轨通常需要手动导入SRT/VTT文件。关于字幕质量检查(每次转录后必须执行)、清理JS脚本、重试规则以及OpenAI/Groq API导入路径,请参考hyperframes/references/transcript-guide.md。
mediumOutput Shape
输出格式
Compositions consume a flat array of word objects. The field (, , ...) is added during normalization for stable references in caption overrides; it's optional for backwards compatibility.
idw0w1json
[
{ "id": "w0", "text": "Hello", "start": 0.0, "end": 0.5 },
{ "id": "w1", "text": "world.", "start": 0.6, "end": 1.2 }
]合成内容会使用扁平化的单词对象数组。字段(, , ...)是在标准化过程中添加的,用于在字幕覆盖中提供稳定的引用;为了向后兼容,该字段为可选。
idw0w1json
[
{ "id": "w0", "text": "Hello", "start": 0.0, "end": 0.5 },
{ "id": "w1", "text": "world.", "start": 0.6, "end": 1.2 }
]Background Removal (remove-background
)
remove-background背景移除功能(remove-background
)
remove-backgroundRemove the background from a video or image so it can sit as a transparent overlay in a composition (e.g. an avatar floating on a background plate).
bash
npx hyperframes remove-background avatar.mp4 -o transparent.webm # default: VP9 alpha WebM
npx hyperframes remove-background avatar.mp4 -o transparent.mov # ProRes 4444 (editing)
npx hyperframes remove-background portrait.jpg -o cutout.png # single-image cutout
npx hyperframes remove-background avatar.mp4 -o transparent.webm --device cpu
npx hyperframes remove-background --info # detected providersUses (MIT). First run downloads ~168 MB of weights to .
u2net_human_seg~/.cache/hyperframes/background-removal/models/移除视频或图片的背景,使其可作为透明叠加层用于合成内容(例如,在背景板上悬浮的头像)。
bash
npx hyperframes remove-background avatar.mp4 -o transparent.webm # 默认:VP9 带 alpha 通道的 WebM
npx hyperframes remove-background avatar.mp4 -o transparent.mov # ProRes 4444(适用于剪辑)
npx hyperframes remove-background portrait.jpg -o cutout.png # 单张图片抠图
npx hyperframes remove-background avatar.mp4 -o transparent.webm --device cpu
npx hyperframes remove-background --info # 查看已检测到的提供方使用模型(MIT许可)。首次运行时会下载约168 MB的权重文件,缓存至目录。
u2net_human_seg~/.cache/hyperframes/background-removal/models/Output Format
输出格式
| Format | When |
|---|---|
| Default. Compositions play this directly via |
| Editing in DaVinci/Premiere/FCP. Large files. |
| Single-image cutout (still subject, layered over a backdrop). |
Chrome decodes VP9 alpha natively, so the plugs into a composition like any other muted-autoplay video — see the skill for the track conventions.
.webmhyperframes<video>| 格式 | 使用场景 |
|---|---|
| 默认格式。合成内容可通过 |
| 适用于DaVinci/Premiere/FCP等剪辑软件。文件体积较大。 |
| 单张图片抠图(静态主体,可叠加在背景上)。 |
Chrome原生支持VP9 alpha通道解码,因此文件可像其他静音自动播放视频一样直接接入合成内容——关于轨道的规范,请参考技能文档。
.webm<video>hyperframesTTS → Transcribe → Captions
文本转语音→转录→生成字幕流程
When there's no pre-recorded voiceover, generate one and transcribe it back to get word-level timestamps for captions:
bash
npx hyperframes tts script.txt --voice af_heart --output narration.wav
npx hyperframes transcribe narration.wav # → transcript.jsonWhisper extracts precise word boundaries from the generated audio, so caption timing matches delivery without hand-tuning.
当没有预先录制的配音时,可先生成语音,再将其转录以获取单词级时间戳来制作字幕:
bash
npx hyperframes tts script.txt --voice af_heart --output narration.wav
npx hyperframes transcribe narration.wav # → 生成 transcript.jsonWhisper会从生成的音频中提取精确的单词边界,因此字幕时间轴会与语音播放完美匹配,无需手动调整。