explainer
Original:🇺🇸 English
Translated
Create explainer videos with narration and AI-generated visuals. Triggers on: "解说视频", "explainer video", "explain this as a video", "tutorial video", "introduce X (video)", "解释一下XX(视频形式)".
4installs
Sourcemarswaveai/skills
Added on
NPX Install
npx skill4agent add marswaveai/skills explainerTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →When to Use
- User wants to create an explainer or tutorial video
- User asks to "explain" something in video form
- User wants narrated content with AI-generated visuals
- User says "explainer video", "解说视频", "tutorial video"
When NOT to Use
- User wants audio-only content without visuals (use or
/speech)/podcast - User wants a podcast-style discussion (use )
/podcast - User wants to generate a standalone image (use )
/image-gen - User wants to read text aloud without video (use )
/speech
Purpose
Generate explainer videos that combine a single narrator's voiceover with AI-generated visuals. Ideal for product introductions, concept explanations, and tutorials. Supports text-only script generation or full text + video output.
Hard Constraints
- No shell scripts. Construct curl commands from the API reference files listed in Resources
- Always read for API key and headers
shared/authentication.md - Follow for polling, errors, and interaction patterns
shared/common-patterns.md - Always read config following before any interaction
shared/config-pattern.md - Never hardcode speaker IDs — always fetch from the speakers API
- Never save files to — use
~/Downloads/from config.listenhub/explainer/ - Explainer uses exactly 1 speaker
- Mode must be (for Info style) or
info(for Story style) — neverstory(useslidesskill instead)/slides
Step -1: API Key Check
Follow § API Key Check. If the key is missing, stop immediately.
shared/config-pattern.mdStep 0: Config Setup
Follow Step 0.
shared/config-pattern.mdIf file doesn't exist — ask location, then create immediately:
bash
mkdir -p ".listenhub/explainer"
echo '{"outputDir":".listenhub","outputMode":"inline","language":null,"defaultStyle":null,"defaultSpeakers":{}}' > ".listenhub/explainer/config.json"
CONFIG_PATH=".listenhub/explainer/config.json"
# (or $HOME/.listenhub/explainer/config.json for global)Then run Setup Flow below.
If file exists — read config, display summary, and confirm:
当前配置 (explainer):
输出方式:{inline / download / both}
语言偏好:{zh / en / 未设置}
默认风格:{info / story / 未设置}
默认主播:{speakerName / 未设置}Ask: "使用已保存的配置?" → 确认,直接继续 / 重新配置
Setup Flow (first run or reconfigure)
Ask these questions in order, then save all answers to config at once:
-
outputMode: Follow§ Setup Flow Question.
shared/output-mode.md -
Language (optional): "默认语言?"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → keep
null
-
Style (optional): "默认风格?"
- "Info — 信息展示型"
- "Story — 故事叙述型"
- "每次手动选择" → keep
null
After collecting answers, save immediately:
bash
# Follow shared/output-mode.md § Save to Config
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")Note: are saved after generation (see After Successful Generation section).
defaultSpeakersInteraction Flow
Step 1: Topic / Content
Free text input. Ask the user:
What would you like to explain or introduce?
Accept: topic description, text content, or concept to explain.
Step 2: Language
If is set, pre-fill and show in summary — skip this question.
Otherwise ask:
config.languageQuestion: "What language?"
Options:
- "Chinese (zh)" — Content in Mandarin Chinese
- "English (en)" — Content in EnglishStep 3: Style
If is set, pre-fill and show in summary — skip this question.
Otherwise ask:
config.defaultStyleQuestion: "What style of explainer?"
Options:
- "Info" — Informational, factual presentation style
- "Story" — Narrative, storytelling approachStep 4: Speaker Selection
Follow for the full selection flow, including:
shared/speaker-selection.md- Default from (skip step if set)
config.defaultSpeakers.{language} - Text table + free-text input
- Input matching and re-prompt on no match
Only 1 speaker is supported for explainer videos.
Step 5: Output Type
Question: "What output do you want?"
Options:
- "Text script only" — Generate narration script, no video
- "Text + Video" — Generate full explainer video with AI visualsStep 6: Confirm & Generate
Summarize all choices:
Ready to generate explainer:
Topic: {topic}
Language: {language}
Style: {info/story}
Speaker: {speaker name}
Output: {text only / text + video}
Proceed?Wait for explicit confirmation before calling any API.
Workflow
-
Submit (foreground):with content, speaker, language, mode → extract
POST /storybook/episodesepisodeId -
Tell the user the task is submitted
-
Poll (background): Run the following exact bash command withand
run_in_background: true. Do NOT use python3, awk, or any other JSON parser — usetimeout: 600000as shown:jqbashEPISODE_ID="<id-from-step-1>" for i in $(seq 1 30); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/storybook/episodes/$EPISODE_ID" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.processStatus // "pending"') case "$STATUS" in success|completed) echo "$RESULT"; exit 0 ;; failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 10 ;; esac done echo "TIMEOUT" >&2; exit 2 -
When notified, download and present script:Readfrom config. Follow
OUTPUT_MODEfor behavior.shared/output-mode.mdorinline: Present the script inline.bothPresent:解说脚本已生成! 「{title}」 在线查看:https://listenhub.ai/app/explainer/{episodeId}ordownload: Also save the script file.both- Create
.listenhub/explainer/YYYY-MM-DD-{episodeId}/ - Write from the generated script content
{episodeId}.md - Present the download path in addition to the above summary.
- Create
-
If video requested:(foreground) → poll again (background) using the exact bash command below with
POST /storybook/episodes/{episodeId}/videoandrun_in_background: true. Poll fortimeout: 600000, notvideoStatus:processStatusbashEPISODE_ID="<id-from-step-1>" for i in $(seq 1 30); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/storybook/episodes/$EPISODE_ID" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.videoStatus // "pending"') case "$STATUS" in success|completed) echo "$RESULT"; exit 0 ;; failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 10 ;; esac done echo "TIMEOUT" >&2; exit 2 -
When notified, download and present result:
Present result
Read from config. Follow for behavior.
OUTPUT_MODEshared/output-mode.mdinlinebothPresent:
解说视频已生成!
视频链接:{videoUrl}
音频链接:{audioUrl}
时长:{duration}s
消耗积分:{credits}downloadbothbash
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/explainer/${DATE}-{jobId}"
mkdir -p "$JOB_DIR"
curl -sS -o "${JOB_DIR}/{jobId}.mp3" "{audioUrl}"Present the download path in addition to the above summary.
After Successful Generation
Update config with the choices made this session:
bash
NEW_CONFIG=$(echo "$CONFIG" | jq \
--arg lang "{language}" \
--arg style "{info/story}" \
--arg speakerId "{speakerId}" \
'. + {"language": $lang, "defaultStyle": $style, "defaultSpeakers": (.defaultSpeakers + {($lang): [$speakerId]})}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"Estimated times:
- Text script only: 2-3 minutes
- Text + Video: 3-5 minutes
API Reference
- Speaker list:
shared/api-speakers.md - Speaker selection guide:
shared/speaker-selection.md - Episode creation:
shared/api-storybook.md - Polling: § Async Polling
shared/common-patterns.md - Config pattern:
shared/config-pattern.md
Composability
- Invokes: speakers API (for speaker selection); may invoke for voiceover
/speech - Invoked by: content-planner (Phase 3)
Example
User: "Create an explainer video introducing Claude Code"
Agent workflow:
- Topic: "Claude Code introduction"
- Ask language → "English"
- Ask style → "Info"
- Fetch speakers, user picks "cozy-man-english"
- Ask output → "Text + Video"
bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/storybook/episodes" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"sources": [{"type": "text", "content": "Introduce Claude Code: what it is, key features, and how to get started"}],
"speakers": [{"speakerId": "cozy-man-english"}],
"language": "en",
"mode": "info"
}'Poll until text is ready, then generate video if requested.