luma-digital-human

Original：🇺🇸 English

Translated

Generate digital-human short videos with Luma / 拾光 / 拾光智能体 / 拾光工具 by composing voice clone, TTS, avatar, lip-sync, subtitle, and enhancement tools.

8installs

Sourcezl007700/luma-cli

Added on2026-06-03

NPX Install

npx skill4agent add zl007700/luma-cli luma-digital-human

SKILL.md Content

View Translation Comparison →

Luma Digital Human

Use this skill when an agent needs to create a digital-human spoken video from script text, a voice, and an avatar.

Read

../luma-shared/SKILL.md

first for common auth, project, output, and artifact rules.

Asset First

Inspect available voices and avatars:

bash

luma-cli asset list voice
luma-cli asset list roles

If the user provides a reference voice sample, clone it first:

bash

luma-cli voice clone ./voice.wav --name my_voice

If the user provides a video and says they want the sound, voice, tone, or audio from it, treat the video as a voice source and use voice clone. Do not upload that video as a

roles

avatar/source-role asset unless the user explicitly says they want the person/visual in the video to appear.

If the user provides a local avatar video, upload it:

bash

luma-cli asset upload avatar.mp4 --group roles

Standard Flow

Create or select a project:

bash

luma-cli project create demo
luma-cli project use demo

Generate voice:

bash

luma-cli --json tts "script text" --voice my_voice --speech-rate 1.1 --output step2_tts.wav

The

--json

flag returns

audio_object_key

in the output envelope. Use this key in step 3 to skip a redundant upload.

Generate lip-sync video (prefer

--audio-key

over

--audio

):

bash

luma-cli lipsync --avatar 数字人男 --audio-key <audio_object_key> --output step3_lipsync.mp4

If

--audio-key

is omitted, lipsync falls back to the project's

latest_tts_key

, then to

--audio

file upload.

Add subtitles:

bash

luma-cli subtitle step3_lipsync.mp4 --output step5_subtitle.mp4

Optionally enhance:

bash

luma-cli enhance step5_subtitle.mp4 --scale 2

Agent Notes

Script must come from research, not imagination. If the script is for a short-video production, the text source must be backed by
```
luma-cli research run
```
data or a known viral reference. Never invent a script topic without data support. See
```
../luma-workflow-viral-remix/SKILL.md
```
for the full research → rewrite flow.
Use
```
voice.clone
```
when a user provides a voice sample.
Uploaded videos are ambiguous. If the user did not clearly say whether the video is for voice clone, avatar/source role, PIP material, ASR/rewrite, or video processing, ask a short confirmation before choosing a workflow.
Do not assume a user-uploaded video is a digital-human avatar. If the user asks to use "the voice/audio/sound inside this video", extract or use audio for voice clone instead.
Use
```
asset.list voice
```
and
```
asset.list roles
```
when the user asks what is available.
Use the latest project TTS output for lip-sync unless the user explicitly provides
```
--audio
```
.
Keep the script text outside media commands until it is final enough for this generation attempt.
Do not enhance every draft; enhance only the selected final render.
Keep script revisions outside the media commands. The CLI should receive the final text for each generation attempt.

Use advanced backend parameters only when the user asks for them:

TTS:
```
--trim-long-silence
```

Lip-sync:

--random-start

,

--guidance-scale

,

--num-inference-steps

,

--no-superres

,

--superres-scale

,

--multi-shot-json

luma-digital-human

NPX Install

Tags

SKILL.md Content

Luma Digital Human

Asset First

Standard Flow

Agent Notes