gemini-omni-flash-api

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Gemini Omni Flash Skill

Gemini Omni Flash 技能

This skill uses the Gemini Omni Flash model (

gemini-omni-flash-preview

) to perform text to video generation, image to video generation and video editing.

[!WARNING] Important Regional Restrictions: Uploading videos to use for video edits is NOT available in the EEA, Switzerland, the United Kingdom, and some US states. If a video-to-video edit completes quickly with empty outputs (
total_output_tokens: 0
or no video content), it is likely due to this restriction.

本技能使用Gemini Omni Flash模型（

gemini-omni-flash-preview

）执行文本转视频生成、图像转视频生成和视频编辑操作。

[!WARNING] 重要区域限制：在欧洲经济区（EEA）、瑞士、英国以及美国部分州，无法上传视频进行视频编辑操作。如果视频转视频编辑快速完成但输出为空（
total_output_tokens: 0
或无视频内容），很可能是受此限制影响。

Core capabilities

核心功能

Video editing and refinement: Editing existing videos (maximum duration 10 seconds), applying stylistic changes, or performing inpainting/outpainting.
Text to video: Generating videos from a text prompt.
First-frame to video: Generating videos from a single input image.
Image-referenced generation: Using style, character, or object references from images to guide video generation.

视频编辑与优化：编辑现有视频（最长时长10秒），应用风格变更，或执行修复/扩展编辑。
文本转视频：根据文本提示生成视频。
首帧转视频：通过单张输入图像生成视频。
图像参考生成：利用图像中的风格、角色或物体参考来引导视频生成。

Workflow

工作流程

Analyze request: Determine the target task (e.g., first-frame-to-video, reference-guided editing) and identify any input media assets.
Run SDK scripts:
- Directly run the appropriate utility (
```
scripts/video/generate_video.py
```
  or
```
scripts/upload_file.py
```
  ).
- Configure settings like
```
--aspect-ratio
```
  (e.g.
```
16:9
```
  ,
```
9:16
```
  ) and
```
--duration
```
  (any integer between
```
3
```
  and
```
10
```
  seconds, e.g.
```
3
```
  ,
```
5
```
  ,
```
10
```
  ).
Retrieve and process output: Outputs are saved to the local filesystem (e.g.
```
media/
```
). Report back the completed media path to the user.

分析请求：确定目标任务（如首帧转视频、参考引导编辑）并识别所有输入媒体资源。
运行SDK脚本：
- 直接运行对应的工具脚本（
```
scripts/video/generate_video.py
```
  或
```
scripts/upload_file.py
```
  ）。
- 配置参数，如
```
--aspect-ratio
```
  （例如
```
16:9
```
  、
```
9:16
```
  ）和
```
--duration
```
  （3到10秒之间的任意整数，如
```
3
```
  、
```
5
```
  、
```
10
```
  ）。
获取并处理输出：输出内容保存到本地文件系统（如
```
media/
```
目录），向用户返回完成后的媒体文件路径。

Reference Documentation

参考文档

Interactions API: All operations and state management for the Gemini Omni Flash model (
```
gemini-omni-flash-preview
```
) are handled via the Interactions API.
Files API: Input media files (such as reference images and videos) must be uploaded via the Files API first before being referenced in generations. The uploaded file URI and MIME type are then included in the
```
interactions.create
```
input parts array.
Interactions API Skill Reference: Platform-wide guidelines, current model specifications, and SDK usage rules for the Interactions API.

Interactions API：Gemini Omni Flash模型（
```
gemini-omni-flash-preview
```
）的所有操作和状态管理均通过Interactions API处理。
Files API：输入媒体文件（如参考图像和视频）必须先通过Files API上传，之后才能在生成操作中引用。上传后的文件URI和MIME类型需包含在
```
interactions.create
```
的输入parts数组中。
Interactions API 技能参考：Interactions API的全平台指南、当前模型规格以及SDK使用规则。

Dependencies and Prerequisites

依赖项与前置条件

Python SDK (
google-genai
): Requires
```
google-genai >= 2.10.0
```
(Python) to support the new
```
interactions
```
client attribute. Install or upgrade using:
bash
```
pip install -U google-genai
```
Python Runtime: Requires Python >= 3.10 (for compatibility with modern
```
google-genai
```
SDK types and methods).
ffmpeg & ffprobe:
```
prep_video.py
```
,
```
inspect_video.py
```
, and
```
generate_video.py
```
(when stripping audio via
```
--strip-audio
```
) require
```
ffmpeg
```
and
```
ffprobe
```
binaries installed and available in your system
```
PATH
```
.

Python SDK（
google-genai
）：需要
```
google-genai >= 2.10.0
```
（Python版本）以支持新增的
```
interactions
```
客户端属性。使用以下命令安装或升级：
bash
```
pip install -U google-genai
```
Python运行环境：要求Python >= 3.10（以兼容
```
google-genai
```
SDK的现代类型和方法）。
ffmpeg & ffprobe：
```
prep_video.py
```
、
```
inspect_video.py
```
以及
```
generate_video.py
```
（当使用
```
--strip-audio
```
剥离音频时）需要安装
```
ffmpeg
```
和
```
ffprobe
```
二进制文件，并确保其在系统
```
PATH
```
中可用。

Available scripts

可用脚本

Use the following Python scripts to upload media with the Files API, prepare input videos with ffmpeg, and generate video outputs using the Interactions API.

upload_file.py: Uploads local media (images and videos) to the Files API and polls until
```
ACTIVE
```
. If uploading a video larger than 25MB, it prints an informative warning/tip highlighting that Gemini Omni Flash is optimized for editing 10s videos at 720p/24fps, and recommends pre-processing with
```
prep_video.py
```
first to speed up the upload.
bash
```
./scripts/upload_file.py path/to/image.png
```

generate_video.py: Performs end-to-end video generation and downloads the output video. It detects and uploads local media references (images or videos) before calling the Interactions API. Large video assets (>25MB) will trigger informative pre-processing recommendations without blocking the upload.

Text to video:

bash

./scripts/video/generate_video.py "A close-up of a cat drinking tea" --output media/cat_tea.mp4

Image to video (first frame and reference):

bash

./scripts/video/generate_video.py "The waves crash against the shore." --image reference.png --output media/waves.mp4

Video interpolation:

Provide exactly two images as keyframes to generate a transition video between them:

bash

./scripts/video/generate_video.py "A smooth timelapse from sunrise to sunset" --image start.png --image end.png --output media/interpolation.mp4

Video editing (keep original audio):

bash

./scripts/video/generate_video.py "Transform the style to Japanese anime" --video input.mp4 --output media/anime_style.mp4

Video editing (regenerate all audio from scratch):

bash

./scripts/video/generate_video.py "Transform the style to Japanese anime" --video input.mp4 --strip-audio --output media/anime_style_new_audio.mp4

Turn-by-turn video editing (edit previous interaction):

Edit a prior video generation without re-uploading assets by passing the interaction ID:

bash

./scripts/video/generate_video.py "Change the setting to a snowy winter wonderland." --previous-interaction-id "abc123xyz..." --output media/winter_wonderland.mp4

Parallel batch execution (prompts file): Run multiple prompts from a line-by-line text file concurrently:
bash
```
./scripts/video/generate_video.py --prompts-file prompts.txt --concurrency 3
```

Parallel batch execution (JSON config): Execute fully configured, distinct generation and editing jobs in parallel:

bash

./scripts/video/generate_video.py --batch jobs.json --concurrency 3

Example
jobs.json
:

json

[
  {
    "prompt": "Transform the style to Japanese anime.",
    "video": "input.mp4",
    "output": "media/anime_style.mp4",
    "strip_audio": false,
    "aspect_ratio": "16:9"
  },
  {
    "prompt": "A smooth timelapse from sunrise to sunset.",
    "image": ["start.png", "end.png"],
    "output": "media/interpolation.mp4"
  }
]

inspect_video.py: Inspects a local video file (using
```
ffprobe
```
) to check its duration, resolution, frame rate (FPS), audio stream presence, and format details.
bash
```
./scripts/video/inspect_video.py media/output.mp4
```
- To get a pre-parsed, structured JSON summary:
  bash
```
./scripts/video/inspect_video.py media/output.mp4 --json
```
- To get the complete, unmodified
```
ffprobe
```
  raw JSON dump:
  bash
```
./scripts/video/inspect_video.py media/output.mp4 --raw
```
prep_video.py: Normalizes, trims, and formats any video file to fit standard Gemini Omni Flash generation and editing limits. It handles timecode-based trimming, optional frame rate conversion, and proportional scaling of large videos (max 1280x720 for landscape, 720x1280 for portrait) to optimize upload times without stretching. If the video is longer than 10 seconds and the script is run interactively (in a TTY), it prompts the user to select the first 10s, last 10s, or enter a custom timecode (defaulting to the first 10s).
- Trim first 10s (default):
bash
```
 ./scripts/video/prep_video.py path/to/source.mp4
```
or explicitly specify the start and duration:
bash
```
./scripts/video/prep_video.py path/to/source.mp4 --start 0 --duration 10
```
- Trim last 10s (automatically calculates starting point based on source length):
  bash
```
./scripts/video/prep_video.py path/to/source.mp4 --start last
```
- Trim 10s starting at specific timecode (MM:SS or HH:MM:SS):
  bash
```
./scripts/video/prep_video.py path/to/source.mp4 --start 00:03 --output media/custom.mp4
```
- Custom frame rate and resolution:
  bash
```
./scripts/video/prep_video.py path/to/source.mp4 --fps 30 --resolution 1920x1080
```
- Strip audio for audio regeneration:
  bash
```
./scripts/video/prep_video.py path/to/source.mp4 --strip-audio --output media/video_with_no_audio.mp4
```

使用以下Python脚本通过Files API上传媒体、用ffmpeg预处理输入视频，以及通过Interactions API生成视频输出。

upload_file.py：将本地媒体（图像和视频）上传到Files API并轮询直到状态变为
```
ACTIVE
```
。如果上传的视频大于25MB，会打印提示警告，说明Gemini Omni Flash针对10秒720p/24fps的视频编辑优化，并建议先使用
```
prep_video.py
```
预处理以加快上传速度。
bash
```
./scripts/upload_file.py path/to/image.png
```

generate_video.py：执行端到端视频生成并下载输出视频。它会检测并上传本地媒体参考（图像或视频），再调用Interactions API。大于25MB的视频资源会触发预处理建议，但不会阻止上传。

文本转视频：

bash

./scripts/video/generate_video.py "A close-up of a cat drinking tea" --output media/cat_tea.mp4

图像转视频（首帧与参考）：

bash

./scripts/video/generate_video.py "The waves crash against the shore." --image reference.png --output media/waves.mp4

视频插值：

提供恰好两张图像作为关键帧，生成它们之间的过渡视频：

bash

./scripts/video/generate_video.py "A smooth timelapse from sunrise to sunset" --image start.png --image end.png --output media/interpolation.mp4

视频编辑（保留原音频）：

bash

./scripts/video/generate_video.py "Transform the style to Japanese anime" --video input.mp4 --output media/anime_style.mp4

视频编辑（重新生成全部音频）：

bash

./scripts/video/generate_video.py "Transform the style to Japanese anime" --video input.mp4 --strip-audio --output media/anime_style_new_audio.mp4

逐轮视频编辑（编辑之前的交互）：

通过传入交互ID，无需重新上传资源即可编辑之前生成的视频：

bash

./scripts/video/generate_video.py "Change the setting to a snowy winter wonderland." --previous-interaction-id "abc123xyz..." --output media/winter_wonderland.mp4

并行批量执行（提示文件）：从逐行文本文件中读取多个提示并并发执行：
bash
```
./scripts/video/generate_video.py --prompts-file prompts.txt --concurrency 3
```

并行批量执行（JSON配置）：并行执行完全配置好的独立生成和编辑任务：

bash

./scripts/video/generate_video.py --batch jobs.json --concurrency 3

示例
jobs.json
：

json

[
  {
    "prompt": "Transform the style to Japanese anime.",
    "video": "input.mp4",
    "output": "media/anime_style.mp4",
    "strip_audio": false,
    "aspect_ratio": "16:9"
  },
  {
    "prompt": "A smooth timelapse from sunrise to sunset.",
    "image": ["start.png", "end.png"],
    "output": "media/interpolation.mp4"
  }
]

inspect_video.py：使用
```
ffprobe
```
检查本地视频文件的时长、分辨率、帧率（FPS）、音频流存在情况以及格式细节。
bash
```
./scripts/video/inspect_video.py media/output.mp4
```
- 获取预解析的结构化JSON摘要：
  bash
```
./scripts/video/inspect_video.py media/output.mp4 --json
```
- 获取完整的未修改
```
ffprobe
```
  原始JSON输出：
  bash
```
./scripts/video/inspect_video.py media/output.mp4 --raw
```
prep_video.py：对任意视频文件进行标准化、裁剪和格式化，使其符合Gemini Omni Flash生成和编辑的标准限制。它支持基于时间码的裁剪、可选的帧率转换，以及对大视频的按比例缩放（横屏最大1280x720，竖屏最大720x1280），在不拉伸画面的前提下优化上传时间。如果视频时长超过10秒且脚本在交互式终端（TTY）中运行，会提示用户选择裁剪前10秒、后10秒或输入自定义时间码（默认裁剪前10秒）。
- 裁剪前10秒（默认）：
bash
```
 ./scripts/video/prep_video.py path/to/source.mp4
```
或明确指定起始时间和时长：
bash
```
./scripts/video/prep_video.py path/to/source.mp4 --start 0 --duration 10
```
- 裁剪后10秒（自动根据源视频长度计算起始点）：
  bash
```
./scripts/video/prep_video.py path/to/source.mp4 --start last
```
- 从指定时间码开始裁剪10秒（格式为MM:SS或HH:MM:SS）：
  bash
```
./scripts/video/prep_video.py path/to/source.mp4 --start 00:03 --output media/custom.mp4
```
- 自定义帧率和分辨率：
  bash
```
./scripts/video/prep_video.py path/to/source.mp4 --fps 30 --resolution 1920x1080
```
- 剥离音频以重新生成音频：
  bash
```
./scripts/video/prep_video.py path/to/source.mp4 --strip-audio --output media/video_with_no_audio.mp4
```

Using tags in prompts to set image roles

在提示词中使用标签设置图像角色

You can use tags in your prompt to make it clear whether each uploaded media is an initial frame or a reference.

你可以在提示词中使用标签，明确每个上传的媒体是首帧还是参考图像。

1. Simple tags (recommended)

1. 简单标签（推荐）

For simple cases where image roles are clear from the prompt, you can bind images to roles directly:

<FIRST_FRAME>
: Use the image as the starting frame of the video, for example:
```
<FIRST_FRAME> a woman is walking
```
<IMAGE_REF_N>
: Use the image as a reference, for example:
```
in the style of <IMAGE_REF_0> a woman <IMAGE_REF_1> is walking
```
(combines style reference from the first image and subject reference from the second image). Image references start from 0.

An example with 6 reference images:

none

[0-3s] A studio fashion sequence. Starting with woman <IMAGE_REF_0>, she is holding <IMAGE_REF_1>
[3-6s] Then we see the man <IMAGE_REF_2> holding <IMAGE_REF_3>
[6-10s] And finally another woman <IMAGE_REF_4> who is holding <IMAGE_REF_5> while walking.

对于图像角色从提示词中可明确区分的简单场景，可直接将图像绑定到对应角色：

<FIRST_FRAME>
：将图像用作视频的起始帧，例如：
```
<FIRST_FRAME> a woman is walking
```
<IMAGE_REF_N>
：将图像用作参考，例如：
```
in the style of <IMAGE_REF_0> a woman <IMAGE_REF_1> is walking
```
（结合第一张图像的风格参考和第二张图像的主体参考）。图像参考编号从0开始。

包含6张参考图像的示例：

none

[0-3s] A studio fashion sequence. Starting with woman <IMAGE_REF_0>, she is holding <IMAGE_REF_1>
[3-6s] Then we see the man <IMAGE_REF_2> holding <IMAGE_REF_3>
[6-10s] And finally another woman <IMAGE_REF_4> who is holding <IMAGE_REF_5> while walking.

2. Explicitly declare sources and references

2. 显式声明源图像和参考图像

For more complex cases with multiple images and multiple roles, you can use explicit prefix tags paired with natural language instruction suffixes.

Declaring sources and reference images:
- ```
[# Sources <FIRST_FRAME>@Image1]
```
  will use the first image as the starting frame.
- ```
[# References <IMAGE_REF_0>@Image1]
```
  will use the first image as a reference.
- ```
[# References <IMAGE_REF_1>@Image2]
```
  will use the second image as a reference.
- ```
[# References <IMAGE_REF_0>@Image1 <IMAGE_REF_1>@Image2]
```
  will use both images as references.
- ```
[# Sources <FIRST_FRAME>@Image1] [# References <IMAGE_REF_0>@Image2]
```
  will use the first image as the starting frame and the second image as a reference.

Guiding instructions: Add guiding instructions at the end of your prompt:

For starting frame:

"Use the given image as the starting frame."

For reference images:

"Use the given image(s) as references for video generation. The images should not be used as literal initial frames."

Example Expanded Prompt:

none

[# Sources <FIRST_FRAME>@Image1] [# References <IMAGE_REF_0>@Image2] a woman <IMAGE_REF_0> is walking. Use Image1 as the starting frame. Use Image2 as a reference for the video generation.

对于包含多张图像和多种角色的复杂场景，可以使用显式前缀标签搭配自然语言指令后缀。

声明源图像和参考图像：
- ```
[# Sources <FIRST_FRAME>@Image1]
```
  将第一张图像用作起始帧。
- ```
[# References <IMAGE_REF_0>@Image1]
```
  将第一张图像用作参考。
- ```
[# References <IMAGE_REF_1>@Image2]
```
  将第二张图像用作参考。
- ```
[# References <IMAGE_REF_0>@Image1 <IMAGE_REF_1>@Image2]
```
  将两张图像都用作参考。
- ```
[# Sources <FIRST_FRAME>@Image1] [# References <IMAGE_REF_0>@Image2]
```
  将第一张图像用作起始帧，第二张图像用作参考。

引导指令：在提示词末尾添加引导指令：

对于起始帧：

"Use the given image as the starting frame."

对于参考图像：

"Use the given image(s) as references for video generation. The images should not be used as literal initial frames."

扩展提示词示例：

none

[# Sources <FIRST_FRAME>@Image1] [# References <IMAGE_REF_0>@Image2] a woman <IMAGE_REF_0> is walking. Use Image1 as the starting frame. Use Image2 as a reference for the video generation.

Audio handling in video editing

视频编辑中的音频处理

When editing a source video that contains audio, you must choose between keeping the original audio or regenerating all audio from scratch.

Keep original audio: By default, Gemini Omni Flash preserves the existing audio layer (though it may modify or adapt it slightly during generation). Use this when the original background music, dialogue, or sound effects are desired.
Regenerate all audio from scratch: If you want Gemini Omni Flash to re-create a brand-new audio layer tailored to the new visual style or prompt, you must upload the video with its audio stream stripped out. If any audio stream is present, Gemini Omni Flash will attempt to preserve/modify it instead of starting from scratch.
- Use
```
--strip-audio
```
  (or
```
-a
```
  ) when pre-processing with
```
scripts/video/prep_video.py
```
  or executing
```
scripts/video/generate_video.py
```
  .
- This forces Gemini Omni Flash to perform full audio generation.

当编辑包含音频的源视频时，你必须选择保留原音频或重新生成全部音频。

保留原音频：默认情况下，Gemini Omni Flash会保留现有音频层（生成过程中可能会对其进行轻微修改或调整）。当需要保留原背景音乐、对话或音效时使用此选项。
重新生成全部音频：如果希望Gemini Omni Flash根据新的视觉风格或提示词重新创建全新的音频层，你必须上传剥离了音频流的视频。如果视频中存在任何音频流，Gemini Omni Flash会尝试保留/修改它，而非从头生成。
- 在使用
```
scripts/video/prep_video.py
```
  预处理或执行
```
scripts/video/generate_video.py
```
  时，使用
```
--strip-audio
```
  （或
```
-a
```
  ）参数。
- 这会强制Gemini Omni Flash执行完整的音频生成。

Prompting Gemini Omni Flash

Gemini Omni Flash提示词技巧

Single scene

单一场景

By default Gemini Omni Flash will try to create a video with a few different shots. It'll attempt to craft an interesting narrative based on the prompt.

If you need the output video to contain a single scene, you must prompt for that:

In a single unbroken scene
In a single continuous shot
No scene cuts

For example:

none

Continuous, unbroken handheld shot of a fluffy tabby cat sitting on a sunny windowsill, looking out into a leafy garden. The cat's tail twitches slowly, and its ears rotate slightly toward ambient noises. Sunbeams illuminate dust motes in the air. Sound design: Gentle breeze, distant bird chirps, quiet mechanical purring. No dialogue.

默认情况下，Gemini Omni Flash会尝试创建包含多个镜头的视频，并根据提示词构建有趣的叙事。

如果需要输出视频仅包含单一场景，必须在提示词中明确说明：

In a single unbroken scene
In a single continuous shot
No scene cuts

示例：

none

Continuous, unbroken handheld shot of a fluffy tabby cat sitting on a sunny windowsill, looking out into a leafy garden. The cat's tail twitches slowly, and its ears rotate slightly toward ambient noises. Sunbeams illuminate dust motes in the air. Sound design: Gentle breeze, distant bird chirps, quiet mechanical purring. No dialogue.

Removing unwanted elements

移除不需要的元素

If generations contain things you don't want, you can include simple negatives to avoid them:

No dialogue
No embellishments
No extra sound effects

如果生成内容包含你不想要的元素，可以在提示词中加入简单的否定描述来避免：

No dialogue
No embellishments
No extra sound effects

Prompts for editing

编辑类提示词

Simple prompts work best for editing. Overly descriptive prompts can lead to unintended changes.

For example:

Make this video anime
Make the phone invisible
Put a fashionable hat on this person
Change the lighting to be more dramatic
Change the text on the sign to say "Gemini Omni Flash"
Add a cat that jumps onto his lap, he begins to pet it

When editing a specific aspect of the video, it can help to include: "Keep everything else the same".

简单的提示词最适合编辑操作。过于详细的提示词可能导致意外的变更。

示例：

Make this video anime
Make the phone invisible
Put a fashionable hat on this person
Change the lighting to be more dramatic
Change the text on the sign to say "Gemini Omni Flash"
Add a cat that jumps onto his lap, he begins to pet it

当编辑视频的特定部分时，加入“Keep everything else the same”（保持其他内容不变）会有所帮助。

Prompting the audio

音频提示词

By default the model will try to generate an appropriate audio track for a video. This might not always be what you want. You can use your prompt to describe the type of audio you want. This is especially important if you want music in your video:

Include calm background music
The video has a high energy techno beat
The audio is a low tinny radio broadcast in the background, playing a song
Audio design: [a description of the audio you want]

默认情况下，模型会尝试为视频生成合适的音轨，但这可能并非你想要的效果。你可以在提示词中描述所需的音频类型，尤其是当你希望视频包含音乐时：

Include calm background music
The video has a high energy techno beat
The audio is a low tinny radio broadcast in the background, playing a song
Audio design: [a description of the audio you want]

When things should happen

时间节点提示

You can prompt for things to happen at specific times in the video, there is no precise syntax needed and you can use natural language. This is especially useful in creating your own scene cuts, rhythm or rapid fire sequences.

Simple examples:

after 3 seconds, a woman enters the scene
at 5s the chorus starts in the background audio
every 2s cut to a new frame
in a rapid fire sequence, every half a second (12 frames at 24fps) change the scene to a new location

You can also use a timecode syntax:

none

[0-3s] A person is walking
[3-6s] They stop and turn around
[6-10s] They start running

你可以提示内容在视频的特定时间点发生，无需精确语法，使用自然语言即可。这在创建自定义镜头切换、节奏或快速序列时尤其有用。

简单示例：

after 3 seconds, a woman enters the scene
at 5s the chorus starts in the background audio
every 2s cut to a new frame
in a rapid fire sequence, every half a second (12 frames at 24fps) change the scene to a new location

你也可以使用时间码语法：

none

[0-3s] A person is walking
[3-6s] They stop and turn around
[6-10s] They start running

Meta prompting

元提示词

Rather than specifying everything directly in a prompt, you can ask the model to pay attention to certain things. You can give Gemini Omni Flash these sorts of prompts verbatim:

Consider micro-detail, expression and timing to create a very rich, detailed but entirely natural scene.
Be extremely detailed in your descriptions of characters and environments. Apply costume design principles to characters. Be very specific about the people, items and objects in the scene.
Include plenty of appropriate detail in the background elements to make the scene feel realistic and natural.
Make a rapid fire video that shows a different rare [thing] every 1s, upbeat music, include text to label the thing.

无需在提示词中直接指定所有细节，你可以要求模型关注特定方面。可以直接向Gemini Omni Flash使用以下这类提示词：

Consider micro-detail, expression and timing to create a very rich, detailed but entirely natural scene.
Be extremely detailed in your descriptions of characters and environments. Apply costume design principles to characters. Be very specific about the people, items and objects in the scene.
Include plenty of appropriate detail in the background elements to make the scene feel realistic and natural.
Make a rapid fire video that shows a different rare [thing] every 1s, upbeat music, include text to label the thing.

Text in videos works really well

视频中的文本效果出色

Unlike previous video models, text in Gemini Omni Flash videos works really well. You can include decent amounts of text in your video and it will be rendered in a way that is correct and readable. If there will be naturally occurring text in your video, even in background elements, it can help to define what it should say.

For example:

One word on the screen at a time: "did, you, know, that, Omni, can, do, awesome, text?" Each word appears for 1s with a different animated style. No dialogue.
There is a street sign that says: "This is an AI generation by Omni", there is a storefront that says: "All you need AI", there's a car with the number plate: "OMN111"

与之前的视频模型不同，Gemini Omni Flash生成的视频中文本效果非常出色。你可以在视频中加入适量文本，它会被渲染得准确且可读。如果视频中会自然出现文本（即使是背景元素中的），明确说明文本内容会有所帮助。

示例：

One word on the screen at a time: "did, you, know, that, Omni, can, do, awesome, text?" Each word appears for 1s with a different animated style. No dialogue.
There is a street sign that says: "This is an AI generation by Omni", there is a storefront that says: "All you need AI", there's a car with the number plate: "OMN111" ",