Seedance 2.0 API
Seedance 2.0 is an AI video generation model on the Volcengine Ark platform. It supports text-to-video, image-to-video, multimodal reference, video editing, video extension, and audio-synced video generation. This skill covers direct API usage (curl, Python SDK, etc.) without any custom CLI wrapper.
Models
| Model | Model ID | Best for |
|---|
| Seedance 2.0 | doubao-seedance-2-0-260128
| Highest quality |
| Seedance 2.0 fast | doubao-seedance-2-0-fast-260128
| Faster/cheaper, same capabilities |
Both models share the same capabilities. Use 2.0 for maximum quality, 2.0 fast when speed and cost matter more.
Authentication
The API uses an API Key via the
environment variable. Pass it as a Bearer token:
Authorization: Bearer $ARK_API_KEY
The user is responsible for obtaining and configuring their own API key from the Volcengine Ark console. Never hardcode or generate credentials.
Core Workflow
Video generation is asynchronous — a two-step process:
- Create a task via
POST /contents/generations/tasks
— returns a task ID
- Poll the task via
GET /contents/generations/tasks/{id}
— until is or
- When succeeded, the response contains with the MP4 download link (valid for 24h)
API Base URL
https://ark.cn-beijing.volces.com/api/v3
Endpoints
| Method | Path | Purpose |
|---|
| POST | /contents/generations/tasks
| Create a video generation task |
| GET | /contents/generations/tasks/{id}
| Get task status and result |
| GET | /contents/generations/tasks
| List tasks with filters |
| DELETE | /contents/generations/tasks/{id}
| Cancel queued / delete completed task |
Generation Modes
Seedance 2.0 supports these input combinations (all via the
array):
Text-to-Video
Just a text prompt — the simplest mode.
Image-to-Video (First Frame)
One image + optional text. The image becomes the video's first frame.
- Set image to (or omit — it defaults to first frame with a single image)
Image-to-Video (First + Last Frame)
Two images + optional text. Control both start and end frames.
- First image:
- Second image:
Multimodal Reference
The most powerful mode — combine images (0-9), videos (0-3), audio (0-3), and text.
- Images use
- Videos use
- Audio uses
- Audio cannot be the sole input; at least one image or video is required alongside it
Video Editing
Provide a reference video + text describing the edits (e.g., "replace the perfume with a cream jar").
Video Extension
Provide 1-3 reference videos + text describing how to extend or stitch them.
Content Array Types
Each item in the
array has a
and corresponding payload:
Text:
json
{ "type": "text", "text": "your prompt here" }
Image:
json
{
"type": "image_url",
"image_url": { "url": "<URL, base64, or asset://ID>" },
"role": "first_frame | last_frame | reference_image"
}
Video:
json
{
"type": "video_url",
"video_url": { "url": "<URL or asset://ID>" },
"role": "reference_video"
}
Audio:
json
{
"type": "audio_url",
"audio_url": { "url": "<URL, base64, or asset://ID>" },
"role": "reference_audio"
}
Request Parameters
| Parameter | Type | Default | Notes |
|---|
| string | required | Model ID (see table above) |
| array | required | Array of text/image/video/audio objects |
| string | | , , , , , , |
| integer | | Video length in seconds. Range: [4, 15]. Use for auto. |
| string | | or (1080p not supported for 2.0) |
| boolean | | Whether to generate synced audio |
| boolean | | Include watermark |
| integer | | Reproducibility seed. [-1, 2^32-1] |
| boolean | | Return the last frame as PNG (useful for chaining) |
| string | - | Webhook URL for status notifications |
| array | - | to enable web search (text-only) |
| integer | | Task timeout in seconds [3600, 259200] |
| string | - | End-user identifier for abuse monitoring |
Task Statuses
| Status | Meaning |
|---|
| Waiting in queue |
| Generation in progress |
| Done — available |
| Error — check and |
| Task was cancelled while queued |
| Task exceeded timeout |
Examples
curl — Text-to-Video
bash
# Step 1: Create task
curl -X POST https://ark.cn-beijing.volces.com/api/v3/contents/generations/tasks \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $ARK_API_KEY" \
-d '{
"model": "doubao-seedance-2-0-260128",
"content": [
{
"type": "text",
"text": "A cat yawning at the camera, soft lighting, close-up shot"
}
],
"generate_audio": true,
"ratio": "16:9",
"duration": 5,
"watermark": false
}'
# Step 2: Poll for result (replace TASK_ID)
curl -X GET https://ark.cn-beijing.volces.com/api/v3/contents/generations/tasks/TASK_ID \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $ARK_API_KEY"
curl — Image-to-Video (First Frame)
bash
curl -X POST https://ark.cn-beijing.volces.com/api/v3/contents/generations/tasks \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $ARK_API_KEY" \
-d '{
"model": "doubao-seedance-2-0-260128",
"content": [
{
"type": "text",
"text": "The girl opens her eyes and looks at the camera gently"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/first-frame.png"
}
}
],
"generate_audio": true,
"ratio": "adaptive",
"duration": 5,
"watermark": false
}'
Python SDK — Full Workflow with Polling
python
import os
import time
from volcenginesdkarkruntime import Ark
# pip install 'volcengine-python-sdk[ark]'
client = Ark(
base_url="https://ark.cn-beijing.volces.com/api/v3",
api_key=os.environ.get("ARK_API_KEY"),
)
# Create task
result = client.content_generation.tasks.create(
model="doubao-seedance-2-0-260128",
content=[
{"type": "text", "text": "A cat yawning at the camera"},
{
"type": "image_url",
"image_url": {"url": "https://example.com/cat.png"},
},
],
generate_audio=True,
ratio="adaptive",
duration=5,
watermark=False,
)
task_id = result.id
print(f"Task created: {task_id}")
# Poll until done
while True:
task = client.content_generation.tasks.get(task_id=task_id)
if task.status == "succeeded":
print(f"Video URL: {task.content.video_url}")
break
elif task.status == "failed":
print(f"Failed: {task.error}")
break
else:
print(f"Status: {task.status}, waiting...")
time.sleep(30)
Python SDK — Multimodal Reference (images + video + audio)
python
result = client.content_generation.tasks.create(
model="doubao-seedance-2-0-260128",
content=[
{
"type": "text",
"text": "Use video 1's first-person perspective, use audio 1 as background music. A fruit tea commercial...",
},
{
"type": "image_url",
"image_url": {"url": "https://example.com/tea1.jpg"},
"role": "reference_image",
},
{
"type": "image_url",
"image_url": {"url": "https://example.com/tea2.jpg"},
"role": "reference_image",
},
{
"type": "video_url",
"video_url": {"url": "https://example.com/reference.mp4"},
"role": "reference_video",
},
{
"type": "audio_url",
"audio_url": {"url": "https://example.com/bgm.mp3"},
"role": "reference_audio",
},
],
generate_audio=True,
ratio="16:9",
duration=11,
watermark=False,
)
Python SDK — Video Extension
python
result = client.content_generation.tasks.create(
model="doubao-seedance-2-0-260128",
content=[
{"type": "text", "text": "Extend video 1 forward, the car drives into a desert oasis"},
{
"type": "video_url",
"video_url": {"url": "https://example.com/clip.mp4"},
"role": "reference_video",
},
],
generate_audio=True,
ratio="16:9",
duration=8,
watermark=False,
)
Chaining Videos with Last Frame
To create continuous multi-segment videos, use
on each task, then feed the last frame as the next task's first frame:
python
# Task 1: generate first segment
result1 = client.content_generation.tasks.create(
model="doubao-seedance-2-0-260128",
content=[{"type": "text", "text": "Scene 1 description..."}],
return_last_frame=True,
duration=5,
)
# ... poll until succeeded ...
task1 = client.content_generation.tasks.get(task_id=result1.id)
last_frame_url = task1.content.last_frame_url
# Task 2: use last frame as first frame
result2 = client.content_generation.tasks.create(
model="doubao-seedance-2-0-260128",
content=[
{"type": "text", "text": "Scene 2 continues from where scene 1 left off..."},
{
"type": "image_url",
"image_url": {"url": last_frame_url},
"role": "first_frame",
},
],
return_last_frame=True,
duration=5,
)
Prompt Tips
Seedance 2.0 understands natural language deeply. Structure prompts with:
- Subject + Action (required): who is doing what
- Environment + Aesthetics (optional): background, lighting, style
- Camera + Audio (optional): camera movement, sound effects
When referencing multiple inputs, use ordinal references: "image 1", "video 2", "audio 1" — these map to the order of same-type items in the
array (1-indexed).
For better instruction following with multiple reference images, use the bracket format:
[image 1] description of first image's role, [image 2] description of second image's role
.
To generate speech, put dialogue in quotes:
The man says: "Hello, welcome!"
Input Constraints
Images: jpeg/png/webp/bmp/tiff/gif. Aspect ratio 0.4-2.5. Size < 30MB. Dimensions 300-6000px per side.
Videos: mp4/mov. Duration 2-15s per clip, total across all clips <= 15s. Resolution 480p-720p. Size < 50MB. FPS 24-60.
Audio: wav/mp3. Duration 2-15s per clip, total <= 15s. Size < 15MB.
Text prompt: Chinese < 500 chars, English < 1000 words recommended.
Important Notes
- Generated video URLs expire after 24 hours — download or transfer promptly
- Task records are kept for 7 days only
- Seedance 2.0 does not support real human faces in uploaded reference images/videos (use virtual avatars via IDs or re-use model-generated videos for re-creation)
- Offline inference () is not supported for 2.0 models
- The parameter is not supported for 2.0 models — use instead
- is not supported for 2.0 models
- Web search (
tools: [{"type": "web_search"}]
) only works with text-only input
Listing and Managing Tasks
bash
# List recent tasks
curl -X GET "https://ark.cn-beijing.volces.com/api/v3/contents/generations/tasks?page_num=1&page_size=10" \
-H "Authorization: Bearer $ARK_API_KEY"
# Filter by status
curl -X GET "https://ark.cn-beijing.volces.com/api/v3/contents/generations/tasks?filter.status=succeeded" \
-H "Authorization: Bearer $ARK_API_KEY"
# Delete a task
curl -X DELETE "https://ark.cn-beijing.volces.com/api/v3/contents/generations/tasks/TASK_ID" \
-H "Authorization: Bearer $ARK_API_KEY"
Further Reference
This skill includes detailed reference files for deeper dives. Read them as needed:
references/api-reference.md
— Condensed endpoint specs, request/response schemas, and status codes
references/input-specs.md
— Exact constraints for images, videos, audio, text, and output resolution tables
- — Ready-to-use code recipes for every generation mode (text-to-video, first frame, multimodal, editing, extension, chaining, virtual avatars, polling)
references/prompt-tips.md
— Advanced prompt techniques: text overlays, subtitles, speech bubbles, multi-angle character reference, camera/audio reference patterns