luma-digital-human
Original:🇺🇸 English
Translated
Generate digital-human short videos with Luma / 拾光 / 拾光智能体 / 拾光工具 by composing voice clone, TTS, avatar, lip-sync, subtitle, and enhancement tools.
6installs
Sourcezl007700/luma-cli
Added on
NPX Install
npx skill4agent add zl007700/luma-cli luma-digital-humanTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Luma Digital Human
Use this skill when an agent needs to create a digital-human spoken video from script text, a voice, and an avatar.
Read first for common auth, project, output, and artifact rules.
../luma-shared/SKILL.mdAsset First
Inspect available voices and avatars:
bash
luma-cli asset list voice
luma-cli asset list rolesIf the user provides a reference voice sample, clone it first:
bash
luma-cli voice clone ./voice.wav --name my_voiceIf the user provides a video and says they want the sound, voice, tone, or audio from it, treat the video as a voice source and use voice clone. Do not upload that video as a avatar/source-role asset unless the user explicitly says they want the person/visual in the video to appear.
rolesIf the user provides a local avatar video, upload it:
bash
luma-cli asset upload avatar.mp4 --group rolesStandard Flow
-
Create or select a project:bash
luma-cli project create demo luma-cli project use demo -
Generate voice:bash
luma-cli --json tts "script text" --voice my_voice --speech-rate 1.1 --output step2_tts.wavTheflag returns--jsonin the output envelope. Use this key in step 3 to skip a redundant upload.audio_object_key -
Generate lip-sync video (preferover
--audio-key):--audiobashluma-cli lipsync --avatar 数字人男 --audio-key <audio_object_key> --output step3_lipsync.mp4Ifis omitted, lipsync falls back to the project's--audio-key, then tolatest_tts_keyfile upload.--audio -
Add subtitles:bash
luma-cli subtitle step3_lipsync.mp4 --output step5_subtitle.mp4 -
Optionally enhance:bash
luma-cli enhance step5_subtitle.mp4 --scale 2
Agent Notes
- Script must come from research, not imagination. If the script is for a short-video production, the text source must be backed by data or a known viral reference. Never invent a script topic without data support. See
luma-cli research runfor the full research → rewrite flow.../luma-workflow-viral-remix/SKILL.md - Use when a user provides a voice sample.
voice.clone - Uploaded videos are ambiguous. If the user did not clearly say whether the video is for voice clone, avatar/source role, PIP material, ASR/rewrite, or video processing, ask a short confirmation before choosing a workflow.
- Do not assume a user-uploaded video is a digital-human avatar. If the user asks to use "the voice/audio/sound inside this video", extract or use audio for voice clone instead.
- Use and
asset.list voicewhen the user asks what is available.asset.list roles - Use the latest project TTS output for lip-sync unless the user explicitly provides .
--audio - Keep the script text outside media commands until it is final enough for this generation attempt.
- Do not enhance every draft; enhance only the selected final render.
- Keep script revisions outside the media commands. The CLI should receive the final text for each generation attempt.
- Use advanced backend parameters only when the user asks for them:
- TTS:
--trim-long-silence - Lip-sync: ,
--random-start,--guidance-scale,--num-inference-steps,--no-superres,--superres-scale--multi-shot-json
- TTS: