youtube-transcript
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseYouTube Transcript Downloader
YouTube字幕下载工具
Download transcript from YouTube video, convert to clean plain text.
从YouTube视频下载字幕,转换为整洁的纯文本格式。
Workflow
操作流程
Follow this exact order. Stop and inform the user if any step fails.
请严格按照以下步骤执行。如果任何步骤失败,请停止操作并告知用户。
Step 1: Verify yt-dlp
步骤1:检查yt-dlp是否安装
bash
command -v yt-dlpIf not found, install:
- macOS:
brew install yt-dlp - Linux:
sudo apt install -y yt-dlp - Fallback:
pip3 install yt-dlp
If installation fails, tell the user to install manually.
bash
command -v yt-dlp如果未找到,请进行安装:
- macOS:
brew install yt-dlp - Linux:
sudo apt install -y yt-dlp - 备选方案:
pip3 install yt-dlp
如果安装失败,请告知用户手动安装。
Step 2: Get video info and check subtitles
步骤2:获取视频信息并检查字幕
bash
VIDEO_URL="THE_URL"
yt-dlp --print "%(title)s" "$VIDEO_URL"
yt-dlp --list-subs "$VIDEO_URL"Note what's available: manual subtitles (higher quality) vs auto-generated.
bash
VIDEO_URL="THE_URL"
yt-dlp --print "%(title)s" "$VIDEO_URL"
yt-dlp --list-subs "$VIDEO_URL"注意可用的字幕类型:人工字幕(质量更高)和自动生成字幕。
Step 3: Download subtitles
步骤3:下载字幕
Try in order until one succeeds:
Manual subtitles (best quality):
bash
yt-dlp --write-sub --skip-download --output "transcript_temp" "$VIDEO_URL"Auto-generated subtitles (fallback):
bash
yt-dlp --write-auto-sub --skip-download --output "transcript_temp" "$VIDEO_URL"Both produce a file.
.vttIf neither works, inform the user that video này không có phụ đề.
按以下顺序尝试,直到其中一种成功:
人工字幕(最佳质量):
bash
yt-dlp --write-sub --skip-download --output "transcript_temp" "$VIDEO_URL"自动生成字幕(备选方案):
bash
yt-dlp --write-auto-sub --skip-download --output "transcript_temp" "$VIDEO_URL"两种方式都会生成一个文件。
.vtt如果两种方式都失败,请告知用户该视频没有字幕。
Step 4: Convert VTT to plain text
步骤4:将VTT转换为纯文本
Use the bundled conversion script to deduplicate and clean the VTT output:
bash
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '')
VTT_FILE=$(ls transcript_temp*.vtt 2>/dev/null | head -n 1)
python3 skills/youtube-transcript/scripts/vtt-to-txt.py "$VTT_FILE" "${VIDEO_TITLE}.txt"Then clean up the temporary VTT file:
bash
rm "$VTT_FILE"使用附带的转换脚本对VTT输出进行去重和清理:
bash
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '')
VTT_FILE=$(ls transcript_temp*.vtt 2>/dev/null | head -n 1)
python3 skills/youtube-transcript/scripts/vtt-to-txt.py "$VTT_FILE" "${VIDEO_TITLE}.txt"然后清理临时VTT文件:
bash
rm "$VTT_FILE"Step 5: Confirm to user
步骤5:向用户确认
Tell the user the file name and location. Offer to read/display the content if they want to review it.
告知用户文件名和位置。如果用户想要查看内容,可以提供读取/展示内容的服务。
Why deduplication matters
去重的重要性
YouTube auto-generated VTT files show captions progressively with overlapping timestamps, producing many duplicate lines. The conversion script uses a seen-set to preserve speaking order while removing duplicates.
YouTube自动生成的VTT文件会随着重叠的时间戳逐步显示字幕,导致很多重复行。转换脚本使用已见集合(seen-set)在保留说话顺序的同时去除重复内容。
Language selection
语言选择
By default yt-dlp downloads all available subtitle languages. To target a specific language:
bash
yt-dlp --write-auto-sub --sub-langs vi --skip-download --output "transcript_temp" "$VIDEO_URL"Common codes: (English), (Vietnamese), (Japanese), (Korean).
envijako默认情况下,yt-dlp会下载所有可用语言的字幕。如果要指定特定语言:
bash
yt-dlp --write-auto-sub --sub-langs vi --skip-download --output "transcript_temp" "$VIDEO_URL"常用语言代码:(英语)、(越南语)、(日语)、(韩语)。
envijakoError handling
错误处理
Read for solutions to common issues (private videos, geo-blocking, SSL errors, missing subtitles).
references/error-handling.md请查阅获取常见问题的解决方案(私人视频、地域限制、SSL错误、无字幕等)。
references/error-handling.md