audio-transcription

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Audio Transcription

音频转写

Follow shared release-shell rules in:
  • postplus-shared
    release-shell rules
Use this skill when the input is audio and the main job is:
  • transcript generation
  • subtitle-ready timing
  • rough speech search
  • multilingual audio transcription
This skill is not for video semantic understanding.
遵循以下共享的release-shell规则:
  • postplus-shared
    release-shell规则
当输入为音频且主要任务为以下内容时,使用此skill:
  • 生成转写文本
  • 生成可用于字幕的时间标记
  • 粗略语音内容搜索
  • 多语言音频转写
此skill不适用于视频语义理解。

Hosted Endpoint

托管端点

First-version hosted transcription endpoints:
  • hosted transcription capability
  • transcription-whisper
  • transcription-whisper-turbo
Use
transcription-whisper
by default when subtitle quality matters.
Use
transcription-whisper-turbo
when:
  • the user wants a cheaper rough pass
  • timestamps are not the primary requirement
第一版托管转写端点:
  • 托管转写能力
  • transcription-whisper
  • transcription-whisper-turbo
当字幕质量至关重要时,默认使用
transcription-whisper
在以下场景使用
transcription-whisper-turbo
  • 用户需要低成本的粗略转写结果
  • 时间戳不是主要需求

Output Contract

输出约定

Persist:
  • request.json
  • response.json
  • manifest.json
  • downloaded provider outputs under
    outputs/
Do not rely on the provider dashboard as the durable record.
持久化存储:
  • request.json
  • response.json
  • manifest.json
  • 下载的服务商输出文件存放在
    outputs/
    目录下
不要依赖服务商仪表盘作为持久化记录。

Poll Behavior

轮询行为

Hosted transcription is asynchronous. The script polls the prediction result URL until status is
completed
or
failed
. Default poll window: 150 attempts × 2 s = 5 minutes.
Short audio clips typically complete in under 30 s. If a job exceeds 5 minutes, retry rather than increasing the timeout further.
托管转写是异步的。脚本会轮询预测结果URL,直到状态变为
completed
failed
。默认轮询窗口:150次尝试 × 2秒 = 5分钟
短音频片段通常在30秒内完成。如果任务超过5分钟,请重试,不要进一步增加超时时间。

Default Workflow

默认工作流

  1. Normalize the transcription request.
  2. Submit to hosted Whisper capability.
  3. Save raw request and response locally.
  4. Poll if the job is asynchronous.
  5. Save downloaded transcript artifacts locally.
  6. Hand off to
    subtitle-packager
    if SRT/VTT is needed.
  1. 标准化转写请求。
  2. 提交至托管的Whisper能力服务。
  3. 将原始请求和响应保存到本地。
  4. 若任务为异步则进行轮询。
  5. 将下载的转写产物保存到本地。
  6. 如果需要SRT/VTT格式,移交至
    subtitle-packager
    处理。

Scripts

脚本

  • scripts/transcribe_audio.mjs
  • scripts/poll_transcription.mjs
  • scripts/transcribe_audio.mjs
  • scripts/poll_transcription.mjs

Read These References

参考文档

  • references/tool-contracts.md
  • references/tool-contracts.md

Release-Shell Execution Contract

Release-Shell执行约定

  • keep transcription requests, provider responses, manifests, and downloaded transcript artifacts under
    <work-folder>/.postplus/audio-transcription/
  • keep only final user-facing transcript exports outside
    .postplus/
  • start with a bounded first pass, usually one source file before larger batches
  • if hosted transcription capability is unavailable, unauthorized, or returns a stable network error, stop immediately instead of switching to ad hoc shell glue
  • 将转写请求、服务商响应、清单文件以及下载的转写产物存放在
    <work-folder>/.postplus/audio-transcription/
    目录下
  • 仅将最终面向用户的转写导出文件放在
    .postplus/
    目录外
  • 先进行有限的首次尝试,通常先处理一个源文件,再处理更大批量的文件
  • 如果托管转写能力不可用、未授权或返回稳定的网络错误,请立即停止,不要切换到临时的shell脚本拼接方案