storytelling

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Storytelling with genmedia

借助genmedia进行叙事创作

Use this skill when the user wants a sequence, not a single asset. Load references as needed:

```
references/shot-planning.md
```
```
references/workflows.md
```
```
references/examples.md
```

Load

model-routing

alongside this skill for default endpoint choices.

The goal is to produce clear story beats and executable genmedia runs. Avoid generic inspiration copy, fake dialogue, and em dashes.

当用户需要生成序列内容而非单一资产时，可使用此技能。按需加载参考文档：

```
references/shot-planning.md
```
```
references/workflows.md
```
```
references/examples.md
```

加载

model-routing

以配合此技能完成默认端点选择。

目标是生成清晰的故事节拍与可执行的genmedia运行任务。避免使用通用灵感文案、虚构对话及破折号。

Inputs to collect

需要收集的输入信息

Ask only when missing information affects execution.

Format: ad, short film, music video, documentary, tutorial, social story.
Duration and aspect ratio.
Number of shots or allowed range.
Main subject, character, product, or location.
Continuity anchors: character, product, wardrobe, environment, color.
Source media: first frame, reference image, product shot, audio track.
Audio needs: narration, music, sound design, transcript, no audio.
Preferred model or model family, if the user wants to decide quality, cost, speed, audio, or multi-shot tradeoffs.

仅当缺失信息会影响执行时才询问用户。

格式：广告、短片、音乐视频、纪录片、教程、社交故事。
时长与宽高比。
镜头数量或允许范围。
主体、角色、产品或场景。
连贯性锚点：角色、产品、服装、环境、色彩。
源媒体：首帧、参考图片、产品照片、音轨。
音频需求：旁白、音乐、音效设计、转录文本、无需音频。
偏好模型或模型系列（若用户希望自主决定质量、成本、速度、音频或多镜头权衡）。

Genmedia workflow

Genmedia工作流

Start from routed endpoint IDs.

bash

genmedia models --endpoint_id bytedance/seedance-2.0/text-to-video --json
genmedia models --endpoint_id bytedance/seedance-2.0/image-to-video --json
genmedia models --endpoint_id bytedance/seedance-2.0/reference-to-video --json
genmedia models --endpoint_id fal-ai/kling-video/v3/pro/text-to-video --json
genmedia models --endpoint_id alibaba/happy-horse/text-to-video --json
genmedia models --endpoint_id veed/fabric-1.0 --json

Use text search only as fallback discovery for an unsupported sequence control:

bash

genmedia models "first frame last frame video generation" --json
genmedia docs "multi shot video generation" --json

Inspect schema before planning exact payloads.

bash

genmedia schema <endpoint_id> --json
genmedia pricing <endpoint_id> --json

Upload references.

bash

genmedia upload ./first-frame.png --json
genmedia upload ./character.png --json
genmedia upload ./product.png --json
genmedia upload ./voiceover.wav --json

Choose the sequence route.
- Highest quality video: start with Seedance 2.0 endpoints from
```
model-routing
```
  .
- Native multi-prompt: use if schema has shot arrays, prompt lists, or timeline fields.
- First/last frame: use for controlled transitions between key frames.
- Image-to-video per shot: use for maximum continuity from approved stills.
- Manual per-shot generation: use when the model only supports one prompt.
- Audio-first: generate or upload audio, then plan visual shot lengths.
- Lip-sync or talking avatar: use Fabric 1.0 or Creatify Aurora from
```
model-routing
```
  .

Run long jobs async and download every result with a unique template.

bash

genmedia run <endpoint_id> \
  --prompt "<shot or sequence prompt>" \
  --async \
  --json

genmedia status <endpoint_id> <request_id> \
  --download "./outputs/story/{request_id}_{index}.{ext}" \
  --json

Return a shot table with endpoint, request id, prompt summary, local path, and any continuity issues. Genmedia downloads clips; it does not replace a timeline editor unless the chosen model returns a complete stitched video.

从路由端点ID开始。

bash

genmedia models --endpoint_id bytedance/seedance-2.0/text-to-video --json
genmedia models --endpoint_id bytedance/seedance-2.0/image-to-video --json
genmedia models --endpoint_id bytedance/seedance-2.0/reference-to-video --json
genmedia models --endpoint_id fal-ai/kling-video/v3/pro/text-to-video --json
genmedia models --endpoint_id alibaba/happy-horse/text-to-video --json
genmedia models --endpoint_id veed/fabric-1.0 --json

仅当遇到不支持的序列控制时，才将文本搜索作为后备发现方式：

bash

genmedia models "first frame last frame video generation" --json
genmedia docs "multi shot video generation" --json

在规划确切负载前检查架构。

bash

genmedia schema <endpoint_id> --json
genmedia pricing <endpoint_id> --json

上传参考素材。

bash

genmedia upload ./first-frame.png --json
genmedia upload ./character.png --json
genmedia upload ./product.png --json
genmedia upload ./voiceover.wav --json

选择序列路由方式。
- 最高质量视频：从
```
model-routing
```
  中的Seedance 2.0端点开始。
- 原生多提示词：若架构包含镜头数组、提示词列表或时间轴字段，则使用此方式。
- 首帧/末帧：用于关键帧之间的可控过渡。
- 单镜头图像转视频：用于从已确认的静态素材中实现最大连贯性。
- 手动单镜头生成：当模型仅支持单个提示词时使用。
- 音频优先：生成或上传音频，然后规划视觉镜头时长。
- 唇形同步或虚拟形象：使用
```
model-routing
```
  中的Fabric 1.0或Creatify Aurora。

异步运行长任务，并使用唯一模板下载所有结果。

bash

genmedia run <endpoint_id> \
  --prompt "<shot or sequence prompt>" \
  --async \
  --json

genmedia status <endpoint_id> <request_id> \
  --download "./outputs/story/{request_id}_{index}.{ext}" \
  --json

返回包含端点、请求ID、提示词摘要、本地路径及任何连贯性问题的镜头表格。Genmedia仅负责下载剪辑，除非所选模型返回完整的拼接视频，否则它不会替代时间轴编辑器。

Shot planning

镜头规划

Plan every sequence as beats first:

Hook: immediate visual reason to keep watching.
Setup: who, what, where, and why it matters.
Development: movement, discovery, proof, or escalation.
Turn: reveal, transformation, result, or emotional change.
Close: final image, product memory, CTA-safe frame, or unresolved mood.

For each shot, write:

Shot number and duration.
Story purpose.
Visual prompt.
Continuity anchor.
Input reference, if any.
Genmedia endpoint.
Expected output path.

首先将每个序列规划为故事节拍：

钩子：立即吸引观看者的视觉元素。
铺垫：介绍人物、事件、场景及其重要性。
发展：情节推进、发现、验证或升级。
转折：揭示、转变、结果或情感变化。
收尾：最终画面、产品记忆、适合CTA的帧或未解决的氛围。

为每个镜头撰写以下内容：

镜头编号及时长。
故事用途。
视觉提示词。
连贯性锚点。
输入参考素材（如有）。
Genmedia端点。
预期输出路径。

Prompt build order

提示词构建顺序

Use this structure for each shot:

text

SHOT [number], [duration]:
[story purpose]. [subject and action]. [location and time]. [camera framing].
[camera movement]. [lighting and color]. [continuity anchor]. [transition or
relationship to previous shot].

Keep one shot to one clear action unless the selected model supports multi-shot or timeline prompting.

为每个镜头使用以下结构：

text

SHOT [number], [duration]:
[story purpose]. [subject and action]. [location and time]. [camera framing].
[camera movement]. [lighting and color]. [continuity anchor]. [transition or
relationship to previous shot].

除非所选模型支持多镜头或时间轴提示，否则每个镜头仅对应一个清晰动作。

Model routing

模型路由

Highest quality video:

bytedance/seedance-2.0/text-to-video

bytedance/seedance-2.0/image-to-video

, or

bytedance/seedance-2.0/reference-to-video

Fast or lower-cost video:

xai/grok-imagine-video/text-to-video

xai/grok-imagine-video/image-to-video

Multi-shot sequence: Seedance 2.0 first, then

fal-ai/kling-video/v3/pro/text-to-video

, then

fal-ai/kling-video/v3/pro/image-to-video

, then

alibaba/happy-horse/text-to-video

alibaba/happy-horse/image-to-video

Text-heavy keyframes, boards, UI frames, posters, or infographics:
```
openai/gpt-image-2
```
at
```
quality=high
```
.

Talking avatar, native audio, or lip-sync:

veed/fabric-1.0

veed/fabric-1.0/text

, or

fal-ai/creatify/aurora

最高质量视频：

bytedance/seedance-2.0/text-to-video

、

bytedance/seedance-2.0/image-to-video

或

bytedance/seedance-2.0/reference-to-video

。

快速或低成本视频：

xai/grok-imagine-video/text-to-video

或

xai/grok-imagine-video/image-to-video

。

多镜头序列：优先使用Seedance 2.0，其次是

fal-ai/kling-video/v3/pro/text-to-video

、

fal-ai/kling-video/v3/pro/image-to-video

，最后是

alibaba/happy-horse/text-to-video

或

alibaba/happy-horse/image-to-video

。

含大量文本的关键帧、故事板、UI帧、海报或信息图：使用
```
openai/gpt-image-2
```
并设置
```
quality=high
```
。

虚拟形象、原生音频或唇形同步：

veed/fabric-1.0

、

veed/fabric-1.0/text

或

fal-ai/creatify/aurora

。

Quality bar

质量标准

Before returning:

Shot order has a clear narrative function.
The first shot is strong enough for the platform.
Continuity anchors are repeated without bloating every prompt.
Camera motion is varied but not random.
Durations add up to the requested runtime.
Async request IDs and downloaded files are recorded.
The model's actual schema, not assumptions, drove the final command.

返回结果前需确认：

镜头顺序具有清晰的叙事功能。
第一个镜头足以适配目标平台。
连贯性锚点重复出现但未冗余填充每个提示词。
镜头运动多样但不随机。
时长总和符合要求的运行时间。
已记录异步请求ID及下载文件。
最终命令由模型实际架构而非假设驱动。