Loading...
Loading...
Video Transcript Extraction Expert (based on Doubao Video Understanding Model). Supports links from Bilibili, Douyin, Xiaohongshu, YouTube, or local video files. Runs entirely in the background (headless) on the user's computer, no pop-ups, no requirement to log in to video platforms. Outputs strict verbatim transcripts with "semantic segmentation + paragraph-level timestamps" (retains colloquial words, internet memes, pauses). Long videos are automatically segmented to avoid being summarized by the model. Trigger scenarios: - User says "generate transcript", "extract transcript", "convert to text", "video to text" - User says "dictate video", "extract video copy", "video subtitles" - User uses the /video-transcript command - User pastes a video link (Bilibili/Douyin/Xiaohongshu/YouTube) with the intention of getting a text version
npx skill4agent add backtthefuture/huangshu video-transcriptInput video link or local path → Auto detection → Background download → Slicing → Compression → Doubao transcription → Strict verbatim transcript (stdout + local save)
VT_HOME="$(
for d in "$HOME/.claude/skills/video-transcript" \
"$(pwd)/.claude/skills/video-transcript" \
"$(pwd)/skills/video-transcript" \
"$HOME/.claude/plugins/video-transcript/video-transcript"; do
[ -f "$d/SKILL.md" ] && echo "$d" && break
done
)"
export VT_HOME
echo "VT_HOME=$VT_HOME"export VT_HOME=<path>"$VT_HOME/scripts/transcript.py"python3 "$VT_HOME/scripts/transcript.py" --doctorbash "$VT_HOME/install.sh"python3 "$VT_HOME/scripts/transcript.py" "<URL or local path>"https://www.bilibili.com/video/BVxxxb23.tv/xxxhttps://www.douyin.com/video/xxxv.douyin.com/xxxdouyin.com/jingxuan?modal_id=xxxxiaohongshu.com/discovery/item/xxxxiaohongshu.com/explore/xxxxhslink.com/xxxyoutube.com/watch?v=xxxyoutu.be/xxxVideo detection completed:
- Platform: Bilibili / Title: 《xxx》
- Duration: 17 minutes 12 seconds, will be sliced into 3 segments for independent transcription
- Estimated time: 3 minutes 20 seconds ~ 5 minutes 25 seconds, running now, please wait...
# <Title>$VT_HOME/outputs/<title>_transcript.md--no-save--output-dir <path>═══════════════════════════════════════════════════════
📊 Video Detection
───────────────────────────────────────────────────────
Platform: Bilibili
Title: A city of 100,000 people disappeared between Zhejiang and Anhui
Duration: 17 minutes 12 seconds
Segments: 3 segments (each ≤ 6 minutes)
Estimated Time: 3 minutes 20 seconds ~ 5 minutes 25 seconds
═══════════════════════════════════════════════════════# Video Title
> Duration 5:32 | Source: <URL or filename>
## 1. Introduce Topic [00:00 - 00:42]
Hello everyone, today we're going to talk about...
## 2. Core Viewpoint [00:42 - 02:15]
Well, my opinion is this, first of all...[MM:SS - MM:SS]_(No voice here, XX seconds)_| Scenario | Handling |
|---|---|
| Run |
| Check |
| API 401 / "Model not authorized" | Volcano Ark Console → Model Plaza → Click "Activate" for Doubao-Seed-2.0-pro |
| Douyin graphic note (note link) | Inform the user that graphic content is not supported, only videos are supported |
| Crawling failure due to platform frontend revision | Check |
| Long video is summarized instead of verbatim | Already handled automatically (>8min sliced); report if issue persists |
| Compressed video still > 50MB | The script will automatically iterate to reduce bitrate/resolution (max 4 rounds) |
| Parameter | Description |
|---|---|
| Video URL or local file path (required, optional for |
| Video title (uses detected title by default) |
| Compression target size in MB, default 30 |
| Do not save .md file (saved to |
| Change save path |
| Check mode: verify dependencies and configurations |
$VT_HOME/.env