file-transcribe

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

File Transcription Skill

文件转录Skill

Transcribe audio or video files to text using OpenAI Whisper, routed through the OpenKBS AI proxy. No API key needed — uses the project's credits. Language is auto-detected from the audio.
通过OpenKBS AI代理使用OpenAI Whisper将音频或视频文件转录为文本。无需API密钥——使用项目的额度。语言会从音频中自动检测。

When to use

使用场景

Use this skill when the user asks to:
  • Transcribe an audio file (MP3, WAV, OGG, FLAC, M4A, etc.)
  • Transcribe a video file (MP4, MKV, MOV, AVI, WebM, etc.)
  • Convert speech to text from any media file
  • Get a text version of a recording, meeting, lecture, podcast, etc.
当用户提出以下需求时使用此Skill:
  • 转录音频文件(MP3、WAV、OGG、FLAC、M4A等)
  • 转录视频文件(MP4、MKV、MOV、AVI、WebM等)
  • 将任何媒体文件中的语音转换为文本
  • 获取录音、会议、讲座、播客等的文本版本

Command

命令

bash
node .claude/skills/file-transcribe/transcribe.mjs <input-file> [output.txt]
  • <input-file>
    — path to audio or video file (required)
  • [output.txt]
    — output text file path (optional, defaults to
    transcript.txt
    )
bash
node .claude/skills/file-transcribe/transcribe.mjs <input-file> [output.txt]
  • <input-file>
    — 音频或视频文件路径(必填)
  • [output.txt]
    — 输出文本文件路径(可选,默认值为
    transcript.txt

Examples

示例

bash
undefined
bash
undefined

Transcribe an uploaded MP3 (language auto-detected)

转录上传的MP3文件(语言自动检测)

node .claude/skills/file-transcribe/transcribe.mjs .uploads/meeting.mp3
node .claude/skills/file-transcribe/transcribe.mjs .uploads/meeting.mp3

Transcribe a video with custom output path

转录视频并指定自定义输出路径

node .claude/skills/file-transcribe/transcribe.mjs .uploads/lecture.mp4 lecture_transcript.txt
node .claude/skills/file-transcribe/transcribe.mjs .uploads/lecture.mp4 lecture_transcript.txt

Force a specific language hint

强制指定语言提示

WHISPER_LANG=en node .claude/skills/file-transcribe/transcribe.mjs .uploads/podcast.mp3
undefined
WHISPER_LANG=en node .claude/skills/file-transcribe/transcribe.mjs .uploads/podcast.mp3
undefined

Environment variables (optional)

环境变量(可选)

VariableDefaultDescription
WHISPER_LANG
(auto-detect)Language hint for Whisper (e.g.
en
,
bg
,
de
)
CHUNK_SECONDS
600
Chunk duration in seconds (for large files)
BATCH
4
Parallel transcription requests
变量名默认值描述
WHISPER_LANG
(自动检测)Whisper的语言提示(例如
en
bg
de
CHUNK_SECONDS
600
大文件分割的块时长(秒)
BATCH
4
并行转录请求数

How it works

工作原理

  1. Detects if input is video or audio
  2. For video: extracts audio track as MP3
  3. Checks file size — files under 25MB go directly to Whisper; larger files are split into chunks
  4. Uploads to the preview server and calls the OpenKBS Whisper proxy
  5. Stitches chunk results in order and writes the final transcript
  6. Cleans up all temporary files
  1. 检测输入是视频还是音频
  2. 若是视频:提取音频轨道为MP3格式
  3. 检查文件大小——25MB以下的文件直接发送给Whisper;更大的文件会被分割为小块
  4. 上传到预览服务器并调用OpenKBS Whisper代理
  5. 按顺序拼接块的转录结果并写入最终转录文本
  6. 清理所有临时文件

Requirements

要求

  • ffmpeg
    must be installed
  • One of:
    • OpenKBS Studio — works automatically (SERVER_URL + KB_ID are set by the container)
    • OPENAI_KEY — set this env var for direct OpenAI Whisper access outside of Studio
  • 必须安装
    ffmpeg
  • 满足以下条件之一:
    • OpenKBS Studio — 自动运行(容器会设置SERVER_URL + KB_ID)
    • OPENAI_KEY — 在Studio外部直接使用OpenAI Whisper时需设置此环境变量

Workflow

工作流程

  1. Verify the input file exists (check
    .uploads/
    or the path the user specified)
  2. Run the transcription command to produce the raw transcript
  3. Read the raw transcript and perform a refinement pass:
  1. 验证输入文件是否存在(检查
    .uploads/
    或用户指定的路径)
  2. 运行转录命令生成原始转录文本
  3. 读取原始转录文本并进行优化处理:

Step 3: Refinement

步骤3:优化处理

After getting the raw transcript, you MUST read it and produce a polished version by fixing two things:
A. Error correction — Whisper mishears words, drops phrases, or garbles domain-specific terms. Analyze the full context of the conversation to infer what was actually said. For example, if the topic is clearly about databases and Whisper wrote "post-Greece queue well", that's "PostgreSQL". Fix these errors in-place while preserving the original meaning.
B. Speaker identification — Whisper does not label speakers. You must identify distinct speakers and label them. Use contextual clues: question→answer patterns, "I"/"you" shifts, topic ownership, speaking style differences. Label as
Speaker 1:
,
Speaker 2:
, etc. If names are mentioned in the conversation, use actual names instead.
Write the polished transcript to a separate file (e.g.
transcript_final.txt
) and keep the raw version for reference. Present the final version to the user.
获取原始转录文本后,你必须读取并生成一个优化版本,修复以下两点:
A. 错误修正 — Whisper可能会听错单词、遗漏短语或混淆特定领域术语。分析对话的完整上下文来推断实际内容。例如,如果主题明显是关于数据库,而Whisper转录为“post-Greece queue well”,实际应为“PostgreSQL”。在不改变原意的前提下就地修正这些错误。
B. 说话人识别 — Whisper不会标记说话人。你必须识别不同的说话人并进行标记。使用上下文线索:问答模式、“我”/“你”的转换、话题归属、说话风格差异。标记为
Speaker 1:
Speaker 2:
等。如果对话中提到了姓名,则使用实际姓名代替。
将优化后的转录文本写入单独的文件(例如
transcript_final.txt
),保留原始版本作为参考。向用户展示最终版本。

Step 4: Choose path

步骤4:选择流程

Once the refined transcript is ready, present the user with two options using AskUserQuestion:
  • Q&A then Requirement Specification (Recommended) — Ask clarifying questions first to resolve ambiguities, Whisper errors, and missing details, then generate the specification
  • Direct Requirement Specification — Generate the specification immediately from the transcript as-is
优化后的转录文本准备好后,使用AskUserQuestion向用户提供两个选项:
  • 先问答再生成需求规格(推荐) — 先提出澄清问题以解决歧义、Whisper错误和缺失信息,然后生成需求规格
  • 直接生成需求规格 — 直接根据转录文本生成需求规格

Step 5a: Q&A path (if chosen)

步骤5a:问答流程(若选择)

Enter plan mode FIRST, then conduct the Q&A inside plan mode. Analyze the transcript for ambiguities, contradictions, missing details, or things that look like transcription errors. Ask questions ONE AT A TIME — ask one question, wait for the user's answer, then use that answer as context for the next question. Do not batch multiple questions. Each subsequent question should be informed by all previous answers. Continue until all ambiguities are resolved. Then produce the Requirement Specification.
首先进入计划模式,然后在计划模式内进行问答。分析转录文本中的歧义、矛盾、缺失信息或疑似转录错误。一次只问一个问题——提出一个问题,等待用户回答,然后将该回答作为下一个问题的上下文。不要批量提出多个问题。后续每个问题都应基于之前所有的回答。持续此过程直到所有歧义都得到解决。然后生成需求规格。

Step 5b: Direct path (if chosen)

步骤5b:直接流程(若选择)

Enter plan mode and produce the Requirement Specification directly from the refined transcript.
进入计划模式并直接根据优化后的转录文本生成需求规格。

Requirement Specification format

需求规格格式

The document should cover: objectives, functional requirements, non-functional requirements, constraints, and acceptance criteria. Save as
requirements.md
(or a name the user prefers).
文档应涵盖:目标、功能需求、非功能需求、约束条件和验收标准。保存为
requirements.md
(或用户偏好的名称)。