luma-digital-human

Original🇺🇸 English
Translated

Generate digital-human short videos with Luma / 拾光 / 拾光智能体 / 拾光工具 by composing voice clone, TTS, avatar, lip-sync, subtitle, and enhancement tools.

6installs
Added on

NPX Install

npx skill4agent add zl007700/luma-cli luma-digital-human

Tags

Translated version includes tags in frontmatter

Luma Digital Human

Use this skill when an agent needs to create a digital-human spoken video from script text, a voice, and an avatar.
Read
../luma-shared/SKILL.md
first for common auth, project, output, and artifact rules.

Asset First

Inspect available voices and avatars:
bash
luma-cli asset list voice
luma-cli asset list roles
If the user provides a reference voice sample, clone it first:
bash
luma-cli voice clone ./voice.wav --name my_voice
If the user provides a video and says they want the sound, voice, tone, or audio from it, treat the video as a voice source and use voice clone. Do not upload that video as a
roles
avatar/source-role asset unless the user explicitly says they want the person/visual in the video to appear.
If the user provides a local avatar video, upload it:
bash
luma-cli asset upload avatar.mp4 --group roles

Standard Flow

  1. Create or select a project:
    bash
    luma-cli project create demo
    luma-cli project use demo
  2. Generate voice:
    bash
    luma-cli --json tts "script text" --voice my_voice --speech-rate 1.1 --output step2_tts.wav
    The
    --json
    flag returns
    audio_object_key
    in the output envelope. Use this key in step 3 to skip a redundant upload.
  3. Generate lip-sync video (prefer
    --audio-key
    over
    --audio
    ):
    bash
    luma-cli lipsync --avatar 数字人男 --audio-key <audio_object_key> --output step3_lipsync.mp4
    If
    --audio-key
    is omitted, lipsync falls back to the project's
    latest_tts_key
    , then to
    --audio
    file upload.
  4. Add subtitles:
    bash
    luma-cli subtitle step3_lipsync.mp4 --output step5_subtitle.mp4
  5. Optionally enhance:
    bash
    luma-cli enhance step5_subtitle.mp4 --scale 2

Agent Notes

  • Script must come from research, not imagination. If the script is for a short-video production, the text source must be backed by
    luma-cli research run
    data or a known viral reference. Never invent a script topic without data support. See
    ../luma-workflow-viral-remix/SKILL.md
    for the full research → rewrite flow.
  • Use
    voice.clone
    when a user provides a voice sample.
  • Uploaded videos are ambiguous. If the user did not clearly say whether the video is for voice clone, avatar/source role, PIP material, ASR/rewrite, or video processing, ask a short confirmation before choosing a workflow.
  • Do not assume a user-uploaded video is a digital-human avatar. If the user asks to use "the voice/audio/sound inside this video", extract or use audio for voice clone instead.
  • Use
    asset.list voice
    and
    asset.list roles
    when the user asks what is available.
  • Use the latest project TTS output for lip-sync unless the user explicitly provides
    --audio
    .
  • Keep the script text outside media commands until it is final enough for this generation attempt.
  • Do not enhance every draft; enhance only the selected final render.
  • Keep script revisions outside the media commands. The CLI should receive the final text for each generation attempt.
  • Use advanced backend parameters only when the user asks for them:
    • TTS:
      --trim-long-silence
    • Lip-sync:
      --random-start
      ,
      --guidance-scale
      ,
      --num-inference-steps
      ,
      --no-superres
      ,
      --superres-scale
      ,
      --multi-shot-json