baoyu-youtube-transcript

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

YouTube Transcript

YouTube 转录文本(字幕)下载

Downloads transcripts (subtitles/captions) from YouTube videos. Works with both manually created and auto-generated transcripts. No API key or browser required — uses YouTube's InnerTube API directly.
Fetches video metadata and cover image on first run, caches raw data for fast re-formatting.
从YouTube视频下载转录文本(字幕/字幕文件)。支持手动创建和自动生成的转录文本。无需API密钥或浏览器——直接调用YouTube的InnerTube API。
首次运行时获取视频元数据和封面图片,缓存原始数据以实现快速重新格式化。

Script Directory

脚本目录

Scripts in
scripts/
subdirectory.
{baseDir}
= this SKILL.md's directory path. Resolve
${BUN_X}
runtime: if
bun
installed →
bun
; if
npx
available →
npx -y bun
; else suggest installing bun. Replace
{baseDir}
and
${BUN_X}
with actual values.
ScriptPurpose
scripts/main.ts
Transcript download CLI
脚本位于
scripts/
子目录中。
{baseDir}
= 本SKILL.md文件所在的目录路径。解析
${BUN_X}
运行时:若已安装
bun
则使用
bun
;若
npx
可用则使用
npx -y bun
;否则建议安装bun。将
{baseDir}
${BUN_X}
替换为实际值。
脚本用途
scripts/main.ts
转录文本下载命令行工具

Usage

使用方法

bash
undefined
bash
undefined

Default: markdown with timestamps (English)

默认:带时间戳的Markdown格式(英文)

${BUN_X} {baseDir}/scripts/main.ts <youtube-url-or-id>
${BUN_X} {baseDir}/scripts/main.ts <youtube-url-or-id>

Specify languages (priority order)

指定语言(优先级顺序)

${BUN_X} {baseDir}/scripts/main.ts <url> --languages zh,en,ja
${BUN_X} {baseDir}/scripts/main.ts <url> --languages zh,en,ja

Without timestamps

不带时间戳

${BUN_X} {baseDir}/scripts/main.ts <url> --no-timestamps
${BUN_X} {baseDir}/scripts/main.ts <url> --no-timestamps

With chapter segmentation

按章节划分

${BUN_X} {baseDir}/scripts/main.ts <url> --chapters
${BUN_X} {baseDir}/scripts/main.ts <url> --chapters

With speaker identification (requires AI post-processing)

带说话人识别(需要AI后处理)

${BUN_X} {baseDir}/scripts/main.ts <url> --speakers
${BUN_X} {baseDir}/scripts/main.ts <url> --speakers

SRT subtitle file

生成SRT字幕文件

${BUN_X} {baseDir}/scripts/main.ts <url> --format srt
${BUN_X} {baseDir}/scripts/main.ts <url> --format srt

Translate transcript

翻译转录文本

${BUN_X} {baseDir}/scripts/main.ts <url> --translate zh-Hans
${BUN_X} {baseDir}/scripts/main.ts <url> --translate zh-Hans

List available transcripts

列出可用的转录文本

${BUN_X} {baseDir}/scripts/main.ts <url> --list
${BUN_X} {baseDir}/scripts/main.ts <url> --list

Force re-fetch (ignore cache)

强制重新获取(忽略缓存)

${BUN_X} {baseDir}/scripts/main.ts <url> --refresh
undefined
${BUN_X} {baseDir}/scripts/main.ts <url> --refresh
undefined

Options

选项

OptionDescriptionDefault
<url-or-id>
YouTube URL or video ID (multiple allowed)Required
--languages <codes>
Language codes, comma-separated, in priority order
en
--format <fmt>
Output format:
text
,
srt
text
--translate <code>
Translate to specified language code
--list
List available transcripts instead of fetching
--timestamps
Include
[HH:MM:SS → HH:MM:SS]
timestamps per paragraph
on
--no-timestamps
Disable timestamps
--chapters
Chapter segmentation from video description
--speakers
Raw transcript with metadata for speaker identification
--exclude-generated
Skip auto-generated transcripts
--exclude-manually-created
Skip manually created transcripts
--refresh
Force re-fetch, ignore cached data
-o, --output <path>
Save to specific file pathauto-generated
--output-dir <dir>
Base output directory
youtube-transcript
选项说明默认值
<url-or-id>
YouTube URL或视频ID(支持多个)必填
--languages <codes>
语言代码,逗号分隔,按优先级排序
en
--format <fmt>
输出格式:
text
srt
text
--translate <code>
翻译为指定的语言代码
--list
列出可用的转录文本而非直接获取
--timestamps
为每个段落添加
[HH:MM:SS → HH:MM:SS]
时间戳
开启
--no-timestamps
禁用时间戳
--chapters
从视频描述中解析章节划分
--speakers
包含说话人识别所需元数据的原始转录文本
--exclude-generated
跳过自动生成的转录文本
--exclude-manually-created
跳过手动创建的转录文本
--refresh
强制重新获取,忽略缓存数据
-o, --output <path>
保存到指定文件路径自动生成
--output-dir <dir>
基础输出目录
youtube-transcript

Input Formats

输入格式

Accepts any of these as video input:
  • Full URL:
    https://www.youtube.com/watch?v=dQw4w9WgXcQ
  • Short URL:
    https://youtu.be/dQw4w9WgXcQ
  • Embed URL:
    https://www.youtube.com/embed/dQw4w9WgXcQ
  • Shorts URL:
    https://www.youtube.com/shorts/dQw4w9WgXcQ
  • Video ID:
    dQw4w9WgXcQ
支持以下任意一种视频输入格式:
  • 完整URL:
    https://www.youtube.com/watch?v=dQw4w9WgXcQ
  • 短URL:
    https://youtu.be/dQw4w9WgXcQ
  • 嵌入URL:
    https://www.youtube.com/embed/dQw4w9WgXcQ
  • Shorts URL:
    https://www.youtube.com/shorts/dQw4w9WgXcQ
  • 视频ID:
    dQw4w9WgXcQ

Output Formats

输出格式

FormatExtensionDescription
text
.md
Markdown with frontmatter (incl.
description
), title heading, summary, optional TOC/cover/timestamps/chapters/speakers
srt
.srt
SubRip subtitle format for video players
格式扩展名说明
text
.md
带前置元数据(包含
description
)的Markdown格式,含标题、摘要,可选目录/封面/时间戳/章节/说话人信息
srt
.srt
适用于视频播放器的SubRip字幕格式

Output Directory

输出目录

youtube-transcript/
├── .index.json                          # Video ID → directory path mapping (for cache lookup)
└── {channel-slug}/{title-full-slug}/
    ├── meta.json                        # Video metadata (title, channel, description, duration, chapters, etc.)
    ├── transcript-raw.json              # Raw transcript snippets from YouTube API (cached)
    ├── transcript-sentences.json        # Sentence-segmented transcript (split by punctuation, merged across snippets)
    ├── imgs/
    │   └── cover.jpg                    # Video thumbnail
    ├── transcript.md                    # Markdown transcript (generated from sentences)
    └── transcript.srt                   # SRT subtitle (generated from raw snippets, if --format srt)
  • {channel-slug}
    : Channel name in kebab-case
  • {title-full-slug}
    : Full video title in kebab-case
The
--list
mode outputs to stdout only (no file saved).
youtube-transcript/
├── .index.json                          # 视频ID → 目录路径映射(用于缓存查找)
└── {channel-slug}/{title-full-slug}/
    ├── meta.json                        # 视频元数据(标题、频道、描述、时长、章节等)
    ├── transcript-raw.json              # 从YouTube API获取的原始转录文本片段(已缓存)
    ├── transcript-sentences.json        # 按句子分割的转录文本(按标点符号拆分,合并跨片段内容)
    ├── imgs/
    │   └── cover.jpg                    # 视频封面图片
    ├── transcript.md                    # Markdown格式转录文本(从句子数据生成)
    └── transcript.srt                   # SRT字幕文件(从原始片段生成,仅当指定--format srt时存在)
  • {channel-slug}
    :频道名称的短横线分隔格式(kebab-case)
  • {title-full-slug}
    :完整视频标题的短横线分隔格式(kebab-case)
--list
模式仅输出到标准输出(不保存文件)。

Caching

缓存机制

On first fetch, the script saves:
  • meta.json
    — video metadata, chapters, cover image path, language info
  • transcript-raw.json
    — raw transcript snippets from YouTube API (
    { text, start, duration }[]
    )
  • transcript-sentences.json
    — sentence-segmented transcript (
    { text, start: "HH:mm:ss", end: "HH:mm:ss" }[]
    ), split by sentence-ending punctuation (
    .?!…。?!
    etc.), timestamps proportionally allocated by character length, CJK-aware text merging
  • imgs/cover.jpg
    — video thumbnail
Subsequent runs for the same video use cached data (no network calls). Use
--refresh
to force re-fetch. If a different language is requested, the cache is automatically refreshed.
SRT output (
--format srt
) is generated from
transcript-raw.json
. Text/markdown output uses
transcript-sentences.json
for natural sentence boundaries.
首次获取时,脚本会保存以下内容:
  • meta.json
    — 视频元数据、章节、封面图片路径、语言信息
  • transcript-raw.json
    — 从YouTube API获取的原始转录文本片段(格式为
    { text, start, duration }[]
  • transcript-sentences.json
    — 按句子分割的转录文本(格式为
    { text, start: "HH:mm:ss", end: "HH:mm:ss" }[]
    ),按句末标点(
    .?!…。?!
    等)拆分,按字符长度比例分配时间戳,支持中日韩文本合并
  • imgs/cover.jpg
    — 视频封面图片
后续针对同一视频的运行会使用缓存数据(无需网络请求)。使用
--refresh
参数可强制重新获取数据。若请求不同语言的转录文本,缓存会自动刷新。
SRT格式输出(
--format srt
)由
transcript-raw.json
生成。文本/Markdown格式输出使用
transcript-sentences.json
以实现自然的句子边界。

Workflow

使用流程

When user provides a YouTube URL and wants the transcript:
  1. Run with
    --list
    first if the user hasn't specified a language, to show available options
  2. Always single-quote the URL when running the script — zsh treats
    ?
    as a glob wildcard, so an unquoted YouTube URL causes "no matches found": use
    'https://www.youtube.com/watch?v=ID'
  3. Default: run with
    --chapters --speakers
    for the richest output (chapters + speaker identification)
  4. The script auto-saves cached data + output file and prints the file path
  5. For
    --speakers
    mode: after the script saves the raw file, follow the speaker identification workflow below to post-process with speaker labels
When user only wants a cover image or metadata, running the script with any option will also cache
meta.json
and
imgs/cover.jpg
.
When re-formatting the same video (e.g., first text then SRT), the cached data is reused — no re-fetch needed.
当用户提供YouTube URL并需要转录文本时:
  1. 若用户未指定语言,先运行
    --list
    模式查看可用选项
  2. 运行脚本时务必用单引号包裹URL —— zsh会将
    ?
    视为通配符,未加引号的YouTube URL会导致“未找到匹配项”错误:请使用
    'https://www.youtube.com/watch?v=ID'
  3. 默认建议使用
    --chapters --speakers
    参数以获取最丰富的输出(章节划分 + 说话人识别)
  4. 脚本会自动保存缓存数据和输出文件,并打印文件路径
  5. 若使用
    --speakers
    模式:脚本保存原始文件后,按照以下说话人识别流程进行后处理以添加说话人标签
当用户仅需要封面图片或元数据时,运行任意参数的脚本都会缓存
meta.json
imgs/cover.jpg
当重新格式化同一视频(例如先生成文本格式再生成SRT格式)时,会复用缓存数据——无需重新获取。

Chapter & Speaker Workflow

章节与说话人识别流程

Chapters (
--chapters
)

章节划分(
--chapters

The script parses chapter timestamps from the video description (e.g.,
0:00 Introduction
), segments the transcript by chapter boundaries, groups snippets into readable paragraphs, and saves as
.md
with a Table of Contents. No further processing needed.
If no chapter timestamps exist in the description, the transcript is output as grouped paragraphs without chapter headings.
脚本从视频描述中解析章节时间戳(例如
0:00 Introduction
),按章节边界分割转录文本,将片段分组为易读的段落,并保存为带目录的
.md
文件。无需进一步处理。
若视频描述中无章节时间戳,转录文本将以分组段落形式输出,不带章节标题。

Speaker Identification (
--speakers
)

说话人识别(
--speakers

Speaker identification requires AI processing. The script outputs a raw
.md
file containing:
  • YAML frontmatter with video metadata (title, channel, date, cover, description, language)
  • Video description (for speaker name extraction)
  • Chapter list from description (if available)
  • Raw transcript in SRT format (pre-computed start/end timestamps, token-efficient)
After the script saves the raw file, spawn a sub-agent (use a cheaper model like Sonnet for cost efficiency) to process speaker identification:
  1. Read the saved
    .md
    file
  2. Read the prompt template at
    {baseDir}/prompts/speaker-transcript.md
  3. Process the raw transcript following the prompt:
    • Identify speakers using video metadata (title → guest, channel → host, description → names)
    • Detect speaker turns from conversation flow, question-answer patterns, and contextual cues
    • Segment into chapters (use description chapters if available, else create from topic shifts)
    • Format with
      **Speaker Name:**
      labels, paragraph grouping (2-4 sentences), and
      [HH:MM:SS → HH:MM:SS]
      timestamps
  4. Overwrite the
    .md
    file with the processed transcript (keep the YAML frontmatter)
When
--speakers
is used,
--chapters
is implied — the processed output always includes chapter segmentation.
说话人识别需要AI处理。脚本会输出一个原始
.md
文件,包含:
  • 带视频元数据(标题、频道、日期、封面、描述、语言)的YAML前置元数据
  • 视频描述(用于提取说话人姓名)
  • 从描述中获取的章节列表(若存在)
  • SRT格式的原始转录文本(预计算的开始/结束时间戳,高效分词)
脚本保存原始文件后,启动子Agent(使用Sonnet等低成本模型以降低成本)进行说话人识别处理:
  1. 读取已保存的
    .md
    文件
  2. 读取
    {baseDir}/prompts/speaker-transcript.md
    中的提示模板
  3. 按照提示处理原始转录文本:
    • 利用视频元数据识别说话人(标题→嘉宾,频道→主持人,描述→姓名)
    • 根据对话流程、问答模式和上下文线索检测说话人切换
    • 按章节分割(若描述中有章节则使用,否则根据主题变化创建章节)
    • **Speaker Name:**
      标签格式输出,将内容分组为2-4句的段落,并添加
      [HH:MM:SS → HH:MM:SS]
      时间戳
  4. 用处理后的转录文本覆盖原
    .md
    文件(保留YAML前置元数据)
当使用
--speakers
参数时,会自动启用
--chapters
——处理后的输出始终包含章节划分。

Error Cases

错误情况

ErrorMeaning
Transcripts disabledVideo has no captions at all
No transcript foundRequested language not available
Video unavailableVideo deleted, private, or region-locked
IP blockedToo many requests, try again later
Age restrictedVideo requires login for age verification
错误含义
Transcripts disabled该视频完全没有字幕
No transcript found请求的语言不可用
Video unavailable视频已删除、设为私有或受区域限制
IP blocked请求次数过多,请稍后重试
Age restricted视频需要登录进行年龄验证