storytelling
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseStorytelling with genmedia
借助genmedia进行叙事创作
Use this skill when the user wants a sequence, not a single asset. Load
references as needed:
references/shot-planning.mdreferences/workflows.mdreferences/examples.md
Load alongside this skill for default endpoint choices.
model-routingThe goal is to produce clear story beats and executable genmedia runs. Avoid
generic inspiration copy, fake dialogue, and em dashes.
当用户需要生成序列内容而非单一资产时,可使用此技能。按需加载参考文档:
references/shot-planning.mdreferences/workflows.mdreferences/examples.md
加载以配合此技能完成默认端点选择。
model-routing目标是生成清晰的故事节拍与可执行的genmedia运行任务。避免使用通用灵感文案、虚构对话及破折号。
Inputs to collect
需要收集的输入信息
Ask only when missing information affects execution.
- Format: ad, short film, music video, documentary, tutorial, social story.
- Duration and aspect ratio.
- Number of shots or allowed range.
- Main subject, character, product, or location.
- Continuity anchors: character, product, wardrobe, environment, color.
- Source media: first frame, reference image, product shot, audio track.
- Audio needs: narration, music, sound design, transcript, no audio.
- Preferred model or model family, if the user wants to decide quality, cost, speed, audio, or multi-shot tradeoffs.
仅当缺失信息会影响执行时才询问用户。
- 格式:广告、短片、音乐视频、纪录片、教程、社交故事。
- 时长与宽高比。
- 镜头数量或允许范围。
- 主体、角色、产品或场景。
- 连贯性锚点:角色、产品、服装、环境、色彩。
- 源媒体:首帧、参考图片、产品照片、音轨。
- 音频需求:旁白、音乐、音效设计、转录文本、无需音频。
- 偏好模型或模型系列(若用户希望自主决定质量、成本、速度、音频或多镜头权衡)。
Genmedia workflow
Genmedia工作流
-
Start from routed endpoint IDs.bash
genmedia models --endpoint_id bytedance/seedance-2.0/text-to-video --json genmedia models --endpoint_id bytedance/seedance-2.0/image-to-video --json genmedia models --endpoint_id bytedance/seedance-2.0/reference-to-video --json genmedia models --endpoint_id fal-ai/kling-video/v3/pro/text-to-video --json genmedia models --endpoint_id alibaba/happy-horse/text-to-video --json genmedia models --endpoint_id veed/fabric-1.0 --jsonUse text search only as fallback discovery for an unsupported sequence control:bashgenmedia models "first frame last frame video generation" --json genmedia docs "multi shot video generation" --json -
Inspect schema before planning exact payloads.bash
genmedia schema <endpoint_id> --json genmedia pricing <endpoint_id> --json -
Upload references.bash
genmedia upload ./first-frame.png --json genmedia upload ./character.png --json genmedia upload ./product.png --json genmedia upload ./voiceover.wav --json -
Choose the sequence route.
- Highest quality video: start with Seedance 2.0 endpoints from
.
model-routing - Native multi-prompt: use if schema has shot arrays, prompt lists, or timeline fields.
- First/last frame: use for controlled transitions between key frames.
- Image-to-video per shot: use for maximum continuity from approved stills.
- Manual per-shot generation: use when the model only supports one prompt.
- Audio-first: generate or upload audio, then plan visual shot lengths.
- Lip-sync or talking avatar: use Fabric 1.0 or Creatify Aurora from
.
model-routing
- Highest quality video: start with Seedance 2.0 endpoints from
-
Run long jobs async and download every result with a unique template.bash
genmedia run <endpoint_id> \ --prompt "<shot or sequence prompt>" \ --async \ --json genmedia status <endpoint_id> <request_id> \ --download "./outputs/story/{request_id}_{index}.{ext}" \ --json -
Return a shot table with endpoint, request id, prompt summary, local path, and any continuity issues. Genmedia downloads clips; it does not replace a timeline editor unless the chosen model returns a complete stitched video.
-
从路由端点ID开始。bash
genmedia models --endpoint_id bytedance/seedance-2.0/text-to-video --json genmedia models --endpoint_id bytedance/seedance-2.0/image-to-video --json genmedia models --endpoint_id bytedance/seedance-2.0/reference-to-video --json genmedia models --endpoint_id fal-ai/kling-video/v3/pro/text-to-video --json genmedia models --endpoint_id alibaba/happy-horse/text-to-video --json genmedia models --endpoint_id veed/fabric-1.0 --json仅当遇到不支持的序列控制时,才将文本搜索作为后备发现方式:bashgenmedia models "first frame last frame video generation" --json genmedia docs "multi shot video generation" --json -
在规划确切负载前检查架构。bash
genmedia schema <endpoint_id> --json genmedia pricing <endpoint_id> --json -
上传参考素材。bash
genmedia upload ./first-frame.png --json genmedia upload ./character.png --json genmedia upload ./product.png --json genmedia upload ./voiceover.wav --json -
选择序列路由方式。
- 最高质量视频:从中的Seedance 2.0端点开始。
model-routing - 原生多提示词:若架构包含镜头数组、提示词列表或时间轴字段,则使用此方式。
- 首帧/末帧:用于关键帧之间的可控过渡。
- 单镜头图像转视频:用于从已确认的静态素材中实现最大连贯性。
- 手动单镜头生成:当模型仅支持单个提示词时使用。
- 音频优先:生成或上传音频,然后规划视觉镜头时长。
- 唇形同步或虚拟形象:使用中的Fabric 1.0或Creatify Aurora。
model-routing
- 最高质量视频:从
-
异步运行长任务,并使用唯一模板下载所有结果。bash
genmedia run <endpoint_id> \ --prompt "<shot or sequence prompt>" \ --async \ --json genmedia status <endpoint_id> <request_id> \ --download "./outputs/story/{request_id}_{index}.{ext}" \ --json -
返回包含端点、请求ID、提示词摘要、本地路径及任何连贯性问题的镜头表格。Genmedia仅负责下载剪辑,除非所选模型返回完整的拼接视频,否则它不会替代时间轴编辑器。
Shot planning
镜头规划
Plan every sequence as beats first:
- Hook: immediate visual reason to keep watching.
- Setup: who, what, where, and why it matters.
- Development: movement, discovery, proof, or escalation.
- Turn: reveal, transformation, result, or emotional change.
- Close: final image, product memory, CTA-safe frame, or unresolved mood.
For each shot, write:
- Shot number and duration.
- Story purpose.
- Visual prompt.
- Continuity anchor.
- Input reference, if any.
- Genmedia endpoint.
- Expected output path.
首先将每个序列规划为故事节拍:
- 钩子:立即吸引观看者的视觉元素。
- 铺垫:介绍人物、事件、场景及其重要性。
- 发展:情节推进、发现、验证或升级。
- 转折:揭示、转变、结果或情感变化。
- 收尾:最终画面、产品记忆、适合CTA的帧或未解决的氛围。
为每个镜头撰写以下内容:
- 镜头编号及时长。
- 故事用途。
- 视觉提示词。
- 连贯性锚点。
- 输入参考素材(如有)。
- Genmedia端点。
- 预期输出路径。
Prompt build order
提示词构建顺序
Use this structure for each shot:
text
SHOT [number], [duration]:
[story purpose]. [subject and action]. [location and time]. [camera framing].
[camera movement]. [lighting and color]. [continuity anchor]. [transition or
relationship to previous shot].Keep one shot to one clear action unless the selected model supports multi-shot
or timeline prompting.
为每个镜头使用以下结构:
text
SHOT [number], [duration]:
[story purpose]. [subject and action]. [location and time]. [camera framing].
[camera movement]. [lighting and color]. [continuity anchor]. [transition or
relationship to previous shot].除非所选模型支持多镜头或时间轴提示,否则每个镜头仅对应一个清晰动作。
Model routing
模型路由
- Highest quality video: ,
bytedance/seedance-2.0/text-to-video, orbytedance/seedance-2.0/image-to-video.bytedance/seedance-2.0/reference-to-video - Fast or lower-cost video: or
xai/grok-imagine-video/text-to-video.xai/grok-imagine-video/image-to-video - Multi-shot sequence: Seedance 2.0 first, then
, then
fal-ai/kling-video/v3/pro/text-to-video, thenfal-ai/kling-video/v3/pro/image-to-videooralibaba/happy-horse/text-to-video.alibaba/happy-horse/image-to-video - Text-heavy keyframes, boards, UI frames, posters, or infographics:
at
openai/gpt-image-2.quality=high - Talking avatar, native audio, or lip-sync:
,
veed/fabric-1.0, orveed/fabric-1.0/text.fal-ai/creatify/aurora
- 最高质量视频:、
bytedance/seedance-2.0/text-to-video或bytedance/seedance-2.0/image-to-video。bytedance/seedance-2.0/reference-to-video - 快速或低成本视频:或
xai/grok-imagine-video/text-to-video。xai/grok-imagine-video/image-to-video - 多镜头序列:优先使用Seedance 2.0,其次是、
fal-ai/kling-video/v3/pro/text-to-video,最后是fal-ai/kling-video/v3/pro/image-to-video或alibaba/happy-horse/text-to-video。alibaba/happy-horse/image-to-video - 含大量文本的关键帧、故事板、UI帧、海报或信息图:使用并设置
openai/gpt-image-2。quality=high - 虚拟形象、原生音频或唇形同步:、
veed/fabric-1.0或veed/fabric-1.0/text。fal-ai/creatify/aurora
Quality bar
质量标准
Before returning:
- Shot order has a clear narrative function.
- The first shot is strong enough for the platform.
- Continuity anchors are repeated without bloating every prompt.
- Camera motion is varied but not random.
- Durations add up to the requested runtime.
- Async request IDs and downloaded files are recorded.
- The model's actual schema, not assumptions, drove the final command.
返回结果前需确认:
- 镜头顺序具有清晰的叙事功能。
- 第一个镜头足以适配目标平台。
- 连贯性锚点重复出现但未冗余填充每个提示词。
- 镜头运动多样但不随机。
- 时长总和符合要求的运行时间。
- 已记录异步请求ID及下载文件。
- 最终命令由模型实际架构而非假设驱动。