video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo & Podcast Digest Skill
视频&播客摘要 Skill
Send a video/podcast link → get full transcript + structured summary
发送视频/播客链接 → 获取完整转录稿 + 结构化摘要
Supported Platforms
支持的平台
| Platform | Type | Subtitles | Whisper Transcription |
|---|---|---|---|
| YouTube | Video | ✅ | ✅ |
| Bilibili | Video | ✅ | ✅ |
| X/Twitter | Video | ❌ | ✅ |
| Xiaoyuzhou (小宇宙) | Podcast | ❌ | ✅ |
| Apple Podcasts | Podcast | ❌ | ✅ |
| Direct links (mp3/mp4/m3u8) | Any | ❌ | ✅ |
| 平台 | 类型 | 字幕支持 | Whisper转录支持 |
|---|---|---|---|
| YouTube | 视频 | ✅ | ✅ |
| Bilibili | 视频 | ✅ | ✅ |
| X/Twitter | 视频 | ❌ | ✅ |
| 小宇宙 (Xiaoyuzhou) | 播客 | ❌ | ✅ |
| Apple Podcasts | 播客 | ❌ | ✅ |
| 直链 (mp3/mp4/m3u8) | 任意 | ❌ | ✅ |
Trigger
触发条件
Auto-triggered when a media URL is detected:
- YouTube: ,
youtube.comyoutu.be - Bilibili: ,
bilibili.comb23.tv - X/Twitter: ,
x.com(tweets with video)twitter.com - Xiaoyuzhou:
xiaoyuzhoufm.com - Apple Podcasts:
podcasts.apple.com - Direct: ,
.mp3,.mp4,.m3u8,.m4a.webm
检测到媒体URL时自动触发:
- YouTube: ,
youtube.comyoutu.be - Bilibili: ,
bilibili.comb23.tv - X/Twitter: ,
x.com(带视频的推文)twitter.com - 小宇宙:
xiaoyuzhoufm.com - Apple Podcasts:
podcasts.apple.com - 直链: ,
.mp3,.mp4,.m3u8,.m4a.webm
Pipeline
处理流程
Step 0: Detect Media Type
步骤0:检测媒体类型
| URL Pattern | Type | Pipeline |
|---|---|---|
| Podcast | → Step 1b (Xiaoyuzhou) |
| Podcast | → Step 1c (Apple) |
| Video | → Step 1d (Bilibili API) |
| Audio | → Step 2b (direct download) |
| Other | Video | → Step 1a (subtitle extraction) |
| URL匹配规则 | 类型 | 后续流程 |
|---|---|---|
| 播客 → 步骤1b(小宇宙处理) | |
| 播客 → 步骤1c(Apple播客处理) | |
| 视频 → 步骤1d(Bilibili API处理) | |
| 音频 → 步骤2b(直接下载) | |
| 其他 | 视频 → 步骤1a(字幕提取) |
Step 1a: Video — Extract Subtitles
步骤1a:视频——提取字幕
bash
rm -f /tmp/media_sub*.vtt /tmp/media_audio.mp3 /tmp/media_transcript*.json /tmp/media_segment_*.mp3 2>/dev/null || truebash
rm -f /tmp/media_sub*.vtt /tmp/media_audio.mp3 /tmp/media_transcript*.json /tmp/media_segment_*.mp3 2>/dev/null || trueYouTube (prefer English, fallback Chinese)
YouTube (优先英语字幕, fallback到中文)
yt-dlp --skip-download --write-auto-sub --sub-lang "en,zh-Hans" -o "/tmp/media_sub" "VIDEO_URL"
yt-dlp --skip-download --write-auto-sub --sub-lang "en,zh-Hans" -o "/tmp/media_sub" "VIDEO_URL"
Bilibili
Bilibili
yt-dlp --skip-download --write-auto-sub --sub-lang "zh-Hans,zh" -o "/tmp/media_sub" "VIDEO_URL"
Check for subtitles:
```bash
ls /tmp/media_sub*.vtt 2>/dev/null- Has subtitles → Read VTT content, skip to Step 3
- No subtitles → Step 2a (download audio)
yt-dlp --skip-download --write-auto-sub --sub-lang "zh-Hans,zh" -o "/tmp/media_sub" "VIDEO_URL"
检查字幕是否存在:
```bash
ls /tmp/media_sub*.vtt 2>/dev/null- 存在字幕 → 读取VTT内容,直接跳转到步骤3
- 无字幕 → 进入步骤2a(下载音频)
Step 1b: Xiaoyuzhou (小宇宙) — Extract Audio URL
步骤1b:小宇宙——提取音频链接
bash
AUDIO_URL=$(curl -sL -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36" \
"EPISODE_URL" \
| grep -oE 'https://media\.xyzcdn\.net/[^"]+\.(m4a|mp3)' \
| head -1)
echo "Audio URL: $AUDIO_URL"
curl -L -o /tmp/media_audio.mp3 "$AUDIO_URL"→ Step 2b
bash
AUDIO_URL=$(curl -sL -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36" \
"EPISODE_URL" \
| grep -oE 'https://media\.xyzcdn\.net/[^"]+\.(m4a|mp3)' \
| head -1)
echo "Audio URL: $AUDIO_URL"
curl -L -o /tmp/media_audio.mp3 "$AUDIO_URL"→ 进入步骤2b
Step 1c: Apple Podcasts — via yt-dlp
步骤1c:Apple Podcasts——通过yt-dlp处理
bash
yt-dlp -f "ba[ext=m4a]/ba/b" --extract-audio --audio-format mp3 --audio-quality 5 \
-o "/tmp/media_audio.%(ext)s" "APPLE_PODCAST_URL"→ Step 2b
bash
yt-dlp -f "ba[ext=m4a]/ba/b" --extract-audio --audio-format mp3 --audio-quality 5 \
-o "/tmp/media_audio.%(ext)s" "APPLE_PODCAST_URL"→ 进入步骤2b
Step 1d: Bilibili — API Direct Audio Stream
步骤1d:Bilibili——API直接获取音频流
bash
BV="BV1xxxxx"
curl -s "https://api.bilibili.com/x/web-interface/view?bvid=$BV" \
-H "User-Agent: Mozilla/5.0" -H "Referer: https://www.bilibili.com/" \
| python3 -c "import json,sys; d=json.load(sys.stdin)['data']; print(f\"Title: {d['title']}\nDuration: {d['duration']}s\nCID: {d['cid']}\")"
CID=<CID from previous step>
AUDIO_URL=$(curl -s "https://api.bilibili.com/x/player/playurl?bvid=$BV&cid=$CID&fnval=16&qn=64" \
-H "User-Agent: Mozilla/5.0" -H "Referer: https://www.bilibili.com/" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['data']['dash']['audio'][0]['baseUrl'])")
curl -L -o /tmp/media_audio.m4s \
-H "User-Agent: Mozilla/5.0" -H "Referer: https://www.bilibili.com/" "$AUDIO_URL"
ffmpeg -y -i /tmp/media_audio.m4s -acodec libmp3lame -q:a 5 /tmp/media_audio.mp3→ Step 2b
bash
BV="BV1xxxxx"
curl -s "https://api.bilibili.com/x/web-interface/view?bvid=$BV" \
-H "User-Agent: Mozilla/5.0" -H "Referer: https://www.bilibili.com/" \
| python3 -c "import json,sys; d=json.load(sys.stdin)['data']; print(f\"Title: {d['title']}\nDuration: {d['duration']}s\nCID: {d['cid']}\")"
CID=<CID from previous step>
AUDIO_URL=$(curl -s "https://api.bilibili.com/x/player/playurl?bvid=$BV&cid=$CID&fnval=16&qn=64" \
-H "User-Agent: Mozilla/5.0" -H "Referer: https://www.bilibili.com/" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['data']['dash']['audio'][0]['baseUrl'])")
curl -L -o /tmp/media_audio.m4s \
-H "User-Agent: Mozilla/5.0" -H "Referer: https://www.bilibili.com/" "$AUDIO_URL"
ffmpeg -y -i /tmp/media_audio.m4s -acodec libmp3lame -q:a 5 /tmp/media_audio.mp3→ 进入步骤2b
Step 2a: Video — Download Audio (when no subtitles)
步骤2a:视频——下载音频(无字幕时执行)
bash
yt-dlp --cookies-from-browser chrome -f "ba[ext=m4a]/ba/b" --extract-audio --audio-format mp3 --audio-quality 5 \
-o "/tmp/media_audio.%(ext)s" "VIDEO_URL"bash
yt-dlp --cookies-from-browser chrome -f "ba[ext=m4a]/ba/b" --extract-audio --audio-format mp3 --audio-quality 5 \
-o "/tmp/media_audio.%(ext)s" "VIDEO_URL"Step 2b: Check Audio Size & Segment
步骤2b:检查音频大小&分片
bash
FILE_SIZE=$(stat -f%z /tmp/media_audio.* 2>/dev/null || stat -c%s /tmp/media_audio.* 2>/dev/null)
echo "File size: $FILE_SIZE bytes"- ≤ 25MB → Step 2c (transcribe directly)
- > 25MB → Split into 10-minute segments:
bash
DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 /tmp/media_audio.* | head -1)
SEGMENT_SEC=600
SEGMENTS=$(python3 -c "import math; print(math.ceil(float('$DURATION')/$SEGMENT_SEC))")
for i in $(seq 0 $((SEGMENTS-1))); do
START=$((i * SEGMENT_SEC))
ffmpeg -y -i /tmp/media_audio.* -ss $START -t $SEGMENT_SEC -acodec libmp3lame -q:a 5 \
"/tmp/media_segment_${i}.mp3" 2>/dev/null
done→ Call Step 2c for each segment sequentially (parallel triggers Groq 524 timeout)
bash
FILE_SIZE=$(stat -f%z /tmp/media_audio.* 2>/dev/null || stat -c%s /tmp/media_audio.* 2>/dev/null)
echo "File size: $FILE_SIZE bytes"- ≤ 25MB → 进入步骤2c(直接转录)
- > 25MB → 切分为10分钟分片:
bash
DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 /tmp/media_audio.* | head -1)
SEGMENT_SEC=600
SEGMENTS=$(python3 -c "import math; print(math.ceil(float('$DURATION')/$SEGMENT_SEC))")
for i in $(seq 0 $((SEGMENTS-1))); do
START=$((i * SEGMENT_SEC))
ffmpeg -y -i /tmp/media_audio.* -ss $START -t $SEGMENT_SEC -acodec libmp3lame -q:a 5 \
"/tmp/media_segment_${i}.mp3" 2>/dev/null
done→ 按顺序为每个分片调用步骤2c(并行会触发Groq 524超时)
Step 2c: Whisper Transcription
步骤2c:Whisper转录
Prerequisite: environment variable
GROQ_API_KEYbash
if [ -z "$GROQ_API_KEY" ]; then
echo "❌ GROQ_API_KEY not set. Get one at: https://console.groq.com/keys"
exit 1
fi
curl -s -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "file=@AUDIO_FILE" \
-F "model=whisper-large-v3-turbo" \
-F "response_format=verbose_json" \
-F "language=zh" \
> /tmp/media_transcript.json
python3 -c "import json; print(json.load(open('/tmp/media_transcript.json'))['text'])"前置要求:已设置环境变量
GROQ_API_KEYbash
if [ -z "$GROQ_API_KEY" ]; then
echo "❌ GROQ_API_KEY not set. Get one at: https://console.groq.com/keys"
exit 1
fi
curl -s -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "file=@AUDIO_FILE" \
-F "model=whisper-large-v3-turbo" \
-F "response_format=verbose_json" \
-F "language=zh" \
> /tmp/media_transcript.json
python3 -c "import json; print(json.load(open('/tmp/media_transcript.json'))['text'])"Step 3: Structured Summary
步骤3:结构化摘要
Video (≤20 min):
- Overview (1-2 sentences)
- Key Points (3-5 bullet points)
- Notable Quotes (if any)
- Action Items (if applicable)
Podcast (>20 min):
- Overview (2-3 sentences)
- Chapter Summary (segmented by topic)
- Key Points (5-8 bullet points)
- Notable Quotes
- Action Items (if applicable)
视频(≤20分钟):
- 概览(1-2句话)
- 核心要点(3-5条要点)
- 值得关注的引用(如果有)
- 行动项(如果适用)
播客(>20分钟):
- 概览(2-3句话)
- 章节摘要(按主题分段)
- 核心要点(5-8条要点)
- 值得关注的引用
- 行动项(如果适用)
Error Handling
错误处理
| Situation | Action |
|---|---|
| No subtitles + no GROQ_API_KEY | Prompt user to set API key |
| Audio >25MB | ffmpeg segment (10min/segment), transcribe sequentially |
| Podcast >2 hours | Warn user, confirm before proceeding |
| Groq 524 timeout | Do NOT parallelize — transcribe sequentially, sleep 5-8s between |
| Groq 429 rate limit | Wait for retry-after header, then retry |
| yt-dlp Bilibili 412 | Use Bilibili API instead (Step 1d) |
| yt-dlp YouTube bot detection | Add |
| Spotify links | Not supported (DRM protected) |
| 场景 | 处理方案 |
|---|---|
| 无字幕 + 未设置GROQ_API_KEY | 提示用户设置API密钥 |
| 音频大小>25MB | 使用ffmpeg分片(每片10分钟),按顺序转录 |
| 播客时长>2小时 | 警告用户,确认后再继续处理 |
| Groq返回524超时 | 不要并行处理——按顺序转录,两次请求间隔5-8秒 |
| Groq返回429限流 | 等待retry-after头指定的时间后重试 |
| yt-dlp请求Bilibili返回412 | 改用Bilibili API处理(步骤1d) |
| yt-dlp请求YouTube被反爬拦截 | 增加 |
| Spotify链接 | 不支持(受DRM保护) |
Dependencies
依赖项
- : video download + subtitle extraction
yt-dlp - : audio conversion + segmentation
ffmpeg - : audio download, API calls
curl - : Whisper transcription (free at https://console.groq.com/keys)
GROQ_API_KEY
- :视频下载+字幕提取
yt-dlp - :音频转换+分片
ffmpeg - :音频下载、API请求
curl - :Whisper转录能力(可在https://console.groq.com/keys免费领取)
GROQ_API_KEY