mk-youtube-audio-transcribe

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

YouTube Audio Transcribe

YouTube音频转录

Transcribe audio files to text using local whisper.cpp (no cloud API required).

使用本地whisper.cpp将音频文件转录为文本（无需云API）。

Quick Start

快速开始

/mk-youtube-audio-transcribe <audio_file> [model] [language] [--force]

/mk-youtube-audio-transcribe <audio_file> [model] [language] [--force]

Parameters

参数说明

Parameter	Required	Default	Description
audio_file	Yes	-	Path to audio file
model	No	auto	Model: auto, tiny, base, small, medium, large-v3, belle-zh, kotoba-ja
language	No	auto	Language code: en, ja, zh, auto (auto-detect)
--force	No	false	Force re-transcribe even if cached file exists

参数	是否必填	默认值	说明
audio_file	是	-	音频文件路径
model	否	auto	模型选项：auto, tiny, base, small, medium, large-v3, belle-zh, kotoba-ja
language	否	auto	语言代码：en, ja, zh, auto（自动检测）
--force	否	false	即使存在缓存文件，也强制重新转录

Examples

使用示例

/mk-youtube-audio-transcribe /path/to/audio/video.m4a

- Transcribe with auto model selection

```
/mk-youtube-audio-transcribe video.m4a auto zh
```
- Auto-select best model for Chinese → belle-zh
```
/mk-youtube-audio-transcribe video.m4a auto ja
```
- Auto-select best model for Japanese → kotoba-ja

/mk-youtube-audio-transcribe audio.mp3 small en

- Use small model, force English

/mk-youtube-audio-transcribe podcast.wav medium ja

- Use medium model (explicit), Japanese

/mk-youtube-audio-transcribe /path/to/audio/video.m4a

- 使用自动模型选择进行转录

```
/mk-youtube-audio-transcribe video.m4a auto zh
```
- 自动为中文选择最佳模型 → belle-zh
```
/mk-youtube-audio-transcribe video.m4a auto ja
```
- 自动为日文选择最佳模型 → kotoba-ja

/mk-youtube-audio-transcribe audio.mp3 small en

- 使用small模型，强制识别英文

/mk-youtube-audio-transcribe podcast.wav medium ja

- 使用medium模型（指定），识别日文

How it Works

工作原理

Execute:

{baseDir}/scripts/transcribe.sh "<audio_file>" "<model>" "<language>"

Auto-download model if not found (with progress)
Convert audio to 16kHz mono WAV using ffmpeg
Run whisper-cli for transcription
Save full JSON to
```
{baseDir}/data/<filename>.json
```
Save plain text to
```
{baseDir}/data/<filename>.txt
```
Return file paths and metadata

┌─────────────────────────────┐
│      transcribe.sh          │
│  audio_file, [model], [lang]│
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│   ffmpeg: convert to WAV    │
│   16kHz, mono, pcm_s16le    │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│   whisper-cli: transcribe   │
│   with Metal acceleration   │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│   Save to files             │
│   .json (full) + .txt       │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│   Return file paths         │
│   {file_path, text_file_path}│
└─────────────────────────────┘

执行命令：

{baseDir}/scripts/transcribe.sh "<audio_file>" "<model>" "<language>"

如果未找到模型则自动下载（带进度提示）
使用ffmpeg将音频转换为16kHz单声道WAV格式
运行whisper-cli进行转录
将完整JSON保存至
```
{baseDir}/data/<filename>.json
```
将纯文本保存至
```
{baseDir}/data/<filename>.txt
```
返回文件路径和元数据

┌─────────────────────────────┐
│      transcribe.sh          │
│  audio_file, [model], [lang]│
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│   ffmpeg: 转换为WAV格式      │
│   16kHz, 单声道, pcm_s16le    │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│   whisper-cli: 执行转录      │
│   启用Metal加速              │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│   保存至文件                 │
│   .json（完整内容） + .txt    │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│   返回文件路径               │
│   {file_path, text_file_path}│
└─────────────────────────────┘

Output Format

输出格式

Success:

json

{
  "status": "success",
  "file_path": "{baseDir}/data/20091025__VIDEO_ID.json",
  "text_file_path": "{baseDir}/data/20091025__VIDEO_ID.txt",
  "language": "en",
  "duration": "3:32",
  "model": "medium",
  "char_count": 12345,
  "line_count": 100,
  "text_char_count": 10000,
  "text_line_count": 50,
  "cached": false,
  "video_id": "dQw4w9WgXcQ",
  "title": "Video Title",
  "channel": "Channel Name",
  "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}

Cache hit (returns existing transcription):

json

{
  "status": "success",
  "file_path": "{baseDir}/data/20091025__VIDEO_ID.json",
  "cached": true,
  ...
}

Error (general):

json

{
  "status": "error",
  "message": "Error description"
}

Error (unknown model):

json

{
  "status": "error",
  "error_code": "UNKNOWN_MODEL",
  "message": "Unknown model: invalid-name",
  "available_models": ["tiny", "base", "small", "medium", "large-v3", "large-v3-turbo", "belle-zh", "kotoba-ja", "kotoba-ja-q5"]
}

When you receive

UNKNOWN_MODEL

error: suggest a valid model from the

available_models

list.

Error (model not found):

json

{
  "status": "error",
  "error_code": "MODEL_NOT_FOUND",
  "message": "Model 'medium' not found. Please download it first.",
  "model": "medium",
  "model_size": "1.4GB",
  "download_url": "https://huggingface.co/...",
  "download_command": "curl -L --progress-bar -o '/path/to/models/ggml-medium.bin' 'https://...' 2>&1"
}

When you receive

MODEL_NOT_FOUND

error:

Inform user: "Downloading model '{model}' ({model_size})..."
Execute
```
download_command
```
using Bash tool with
```
timeout: 1800000
```
(30 minutes)
After download completes: re-run the original transcribe command

Error (model corrupted):

json

{
  "status": "error",
  "error_code": "MODEL_CORRUPTED",
  "message": "Model 'medium' is corrupted or incomplete. Please re-download.",
  "model": "medium",
  "model_size": "1.4GB",
  "expected_sha256": "6c14d5adee5f86394037b4e4e8b59f1673b6cee10e3cf0b11bbdbee79c156208",
  "actual_sha256": "def456...",
  "model_path": "/path/to/models/ggml-medium.bin",
  "download_command": "rm '/path/to/models/ggml-medium.bin' && curl -L --progress-bar -o '/path/to/models/ggml-medium.bin' 'https://...' 2>&1"
}

When you receive

MODEL_CORRUPTED

error:

Inform user: "Model '{model}' is corrupted. Re-downloading ({model_size})..."
Execute
```
download_command
```
(removes corrupted file and re-downloads) using Bash tool with
```
timeout: 1800000
```
(30 minutes)
After download completes: re-run the original transcribe command

成功时：

json

{
  "status": "success",
  "file_path": "{baseDir}/data/20091025__VIDEO_ID.json",
  "text_file_path": "{baseDir}/data/20091025__VIDEO_ID.txt",
  "language": "en",
  "duration": "3:32",
  "model": "medium",
  "char_count": 12345,
  "line_count": 100,
  "text_char_count": 10000,
  "text_line_count": 50,
  "cached": false,
  "video_id": "dQw4w9WgXcQ",
  "title": "Video Title",
  "channel": "Channel Name",
  "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}

命中缓存（返回已有的转录内容）：

json

{
  "status": "success",
  "file_path": "{baseDir}/data/20091025__VIDEO_ID.json",
  "cached": true,
  ...
}

通用错误：

json

{
  "status": "error",
  "message": "错误描述"
}

未知模型错误：

json

{
  "status": "error",
  "error_code": "UNKNOWN_MODEL",
  "message": "Unknown model: invalid-name",
  "available_models": ["tiny", "base", "small", "medium", "large-v3", "large-v3-turbo", "belle-zh", "kotoba-ja", "kotoba-ja-q5"]
}

当收到

UNKNOWN_MODEL

错误时：从

available_models

列表中推荐有效的模型。

模型未找到错误：

json

{
  "status": "error",
  "error_code": "MODEL_NOT_FOUND",
  "message": "Model 'medium' not found. Please download it first.",
  "model": "medium",
  "model_size": "1.4GB",
  "download_url": "https://huggingface.co/...",
  "download_command": "curl -L --progress-bar -o '/path/to/models/ggml-medium.bin' 'https://...' 2>&1"
}

当收到

MODEL_NOT_FOUND

错误时：

告知用户："正在下载模型'{model}'（大小{model_size}）..."
使用Bash工具执行
```
download_command
```
，设置
```
timeout: 1800000
```
（30分钟）
下载完成后：重新运行原转录命令

模型损坏错误：

json

{
  "status": "error",
  "error_code": "MODEL_CORRUPTED",
  "message": "Model 'medium' is corrupted or incomplete. Please re-download.",
  "model": "medium",
  "model_size": "1.4GB",
  "expected_sha256": "6c14d5adee5f86394037b4e4e8b59f1673b6cee10e3cf0b11bbdbee79c156208",
  "actual_sha256": "def456...",
  "model_path": "/path/to/models/ggml-medium.bin",
  "download_command": "rm '/path/to/models/ggml-medium.bin' && curl -L --progress-bar -o '/path/to/models/ggml-medium.bin' 'https://...' 2>&1"
}

当收到

MODEL_CORRUPTED

错误时：

告知用户："模型'{model}'已损坏，正在重新下载（大小{model_size}）..."
使用Bash工具执行
```
download_command
```
（删除损坏文件并重新下载），设置
```
timeout: 1800000
```
（30分钟）
下载完成后：重新运行原转录命令

Output Fields

输出字段说明

Field	Description
`file_path`	Absolute path to JSON file (with segments)
`text_file_path`	Absolute path to plain text file
`language`	Detected language code
`duration`	Audio duration
`model`	Model used for transcription
`char_count`	Character count of JSON file
`line_count`	Line count of JSON file
`text_char_count`	Character count of plain text file
`text_line_count`	Line count of plain text file
`video_id`	YouTube video ID (from centralized metadata store)
`title`	Video title (from centralized metadata store)
`channel`	Channel name (from centralized metadata store)
`url`	Full video URL (from centralized metadata store)

字段	说明
`file_path`	包含分段信息的JSON文件绝对路径
`text_file_path`	纯文本文件的绝对路径
`language`	检测到的语言代码
`duration`	音频时长
`model`	用于转录的模型
`char_count`	JSON文件的字符数
`line_count`	JSON文件的行数
`text_char_count`	纯文本文件的字符数
`text_line_count`	纯文本文件的行数
`video_id`	YouTube视频ID（来自集中元数据存储）
`title`	视频标题（来自集中元数据存储）
`channel`	频道名称（来自集中元数据存储）
`url`	完整视频URL（来自集中元数据存储）

Filename Format

文件名格式

Output files preserve the input audio filename's unified naming format with date prefix:

{YYYYMMDD}__{video_id}.{ext}

Example:

20091025__dQw4w9WgXcQ.json

输出文件保留输入音频文件名的统一命名格式，添加日期前缀：

{YYYYMMDD}__{video_id}.{ext}

示例：

20091025__dQw4w9WgXcQ.json

JSON File Format

JSON文件格式

The JSON file at

file_path

contains:

json

{
  "text": "Full transcription text...",
  "language": "en",
  "duration": "3:32",
  "model": "medium",
  "segments": [
    {
      "start": "00:00:00.000",
      "end": "00:00:05.000",
      "text": "First segment..."
    }
  ]
}

file_path

对应的JSON文件包含以下内容：

json

{
  "text": "完整转录文本...",
  "language": "en",
  "duration": "3:32",
  "model": "medium",
  "segments": [
    {
      "start": "00:00:00.000",
      "end": "00:00:05.000",
      "text": "第一段内容..."
    }
  ]
}

Models

模型说明

Standard Models

标准模型

Model	Size	RAM	Speed	Accuracy
auto	-	-	-	Auto-select based on language (default)
tiny	74MB	~273MB	Fastest	Low
base	141MB	~388MB	Fast	Medium
small	465MB	~852MB	Moderate	Good
medium	1.4GB	~2.1GB	Slow	High
large-v3	2.9GB	~3.9GB	Slowest	Best
large-v3-turbo	1.5GB	~2.1GB	Moderate	High (optimized for speed)

模型	大小	内存占用	速度	准确率
auto	-	-	-	根据语言自动选择（默认）
tiny	74MB	~273MB	最快	低
base	141MB	~388MB	快	中等
small	465MB	~852MB	适中	良好
medium	1.4GB	~2.1GB	慢	高
large-v3	2.9GB	~3.9GB	最慢	最佳
large-v3-turbo	1.5GB	~2.1GB	适中	高（针对速度优化）

Language-Specialized Models

语言专用模型

Model	Language	Size	Description
belle-zh	Chinese	1.5GB	BELLE-2 Chinese-specialized model
kotoba-ja	Japanese	1.4GB	kotoba-tech Japanese-specialized model
kotoba-ja-q5	Japanese	513MB	Quantized version (faster, smaller)

模型	语言	大小	说明
belle-zh	中文	1.5GB	BELLE-2中文专用模型
kotoba-ja	日文	1.4GB	kotoba-tech日文专用模型
kotoba-ja-q5	日文	513MB	量化版本（更快、体积更小）

Auto-Selection (model=auto)

自动选择（model=auto）

When model is

auto

(default), the system automatically selects the best model based on language:

Language	Auto-Selected Model
zh	belle-zh (Chinese-specialized)
ja	kotoba-ja (Japanese-specialized)
others	medium (general purpose)

Example:

/mk-youtube-audio-transcribe video.m4a auto zh

→ uses

belle-zh

当模型设为

auto

（默认）时，系统会根据语言自动选择最佳模型：

语言	自动选择的模型
zh	belle-zh（中文专用）
ja	kotoba-ja（日文专用）
其他	medium（通用）

示例：

/mk-youtube-audio-transcribe video.m4a auto zh

→ 使用

belle-zh

模型

Notes

注意事项

File caching: If transcription already exists for this video, it will be reused (returns
```
cached: true
```
)
Force refresh: Use
```
--force
```
flag to re-transcribe even if cached file exists
Specify language for best results - enables auto-selection of specialized models (zh→belle-zh, ja→kotoba-ja)
Use Read tool to get file content from
```
file_path
```
or
```
text_file_path
```
Models must be downloaded before first use - returns
```
MODEL_NOT_FOUND
```
error with download command
Uses Metal acceleration on macOS for faster processing
Supports auto language detection
Audio is converted to 16kHz WAV for optimal results
Requires ffmpeg and whisper-cli (pre-built in bin/)

文件缓存：如果该视频的转录内容已存在，将直接复用（返回
```
cached: true
```
）
强制刷新：使用
```
--force
```
标志可忽略缓存文件，重新进行转录
指定语言以获得最佳结果 - 可触发专用模型的自动选择（中文→belle-zh，日文→kotoba-ja）
使用Read工具从
```
file_path
```
或
```
text_file_path
```
获取文件内容
首次使用前需下载模型 - 未找到模型时会返回
```
MODEL_NOT_FOUND
```
错误及下载命令
在macOS上使用Metal加速以提升处理速度
支持自动语言检测
音频会转换为16kHz WAV格式以获得最佳效果
需要ffmpeg和whisper-cli（已预构建在bin/目录中）

Model Download

模型下载

Models must be downloaded before transcription. When you receive a

MODEL_NOT_FOUND

error, execute the

download_command

with

timeout: 1800000

bash

undefined

转录前必须下载模型。当收到

MODEL_NOT_FOUND

错误时，执行

download_command

并设置

timeout: 1800000

。

bash

undefined

In terminal (to see progress bar)

在终端中运行（可查看进度条）

./scripts/download-model.sh medium # 1.4GB ./scripts/download-model.sh belle-zh # 1.5GB (Chinese) ./scripts/download-model.sh kotoba-ja # 1.4GB (Japanese) ./scripts/download-model.sh --list # Show all available models

undefined

./scripts/download-model.sh medium # 1.4GB ./scripts/download-model.sh belle-zh # 1.5GB（中文） ./scripts/download-model.sh kotoba-ja # 1.4GB（日文） ./scripts/download-model.sh --list # 显示所有可用模型

undefined

Next Step

下一步操作

After transcription completes, invoke

/mk-youtube-transcript-summarize

with the

text_file_path

from the output to generate a structured summary:

/mk-youtube-transcript-summarize <text_file_path>

IMPORTANT: Always use the Skill tool to invoke

/mk-youtube-transcript-summarize

. Do NOT generate summaries directly without loading the skill — it contains critical rules for compression ratio, section structure, data preservation, and language handling.

转录完成后，使用输出中的

text_file_path

调用

/mk-youtube-transcript-summarize

以生成结构化摘要：

/mk-youtube-transcript-summarize <text_file_path>

重要提示：请务必使用Skill工具调用

/mk-youtube-transcript-summarize

。不要直接生成摘要，因为该工具包含关于压缩比、章节结构、数据保留和语言处理的关键规则。