music

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

When to Use

何时使用

  • User wants to generate original AI music from a prompt
  • User wants to create a cover from reference audio
  • User says "音乐", "music", "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", "create a song", or "做一首歌"
  • 用户想要通过提示词生成原创AI音乐
  • 用户想要基于参考音频创建翻唱版本
  • 用户说出 "音乐", "music", "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", "create a song", 或者 "做一首歌"

When NOT to Use

何时不适用

  • User wants text-to-speech reading (use
    /speech
    )
  • User wants a podcast discussion (use
    /podcast
    )
  • User wants an explainer video with narration (use
    /explainer
    )
  • User wants to transcribe audio to text (use
    /asr
    )
  • 用户想要文本转语音朗读(使用
    /speech
  • 用户想要播客讨论(使用
    /podcast
  • 用户想要带旁白的讲解视频(使用
    /explainer
  • 用户想要将音频转录为文本(使用
    /asr

Purpose

用途

Generate original AI music from text prompts, or create cover versions from reference audio. Two modes:
  1. Generate (original): Create a new song from a text prompt, with optional style, title, and instrumental-only options.
  2. Cover: Transform a reference audio file into a new version, with optional style modifications.
通过文本提示词生成原创AI音乐,或者基于参考音频创建翻唱版本。两种模式:
  1. 生成(原创):通过文本提示词创建全新歌曲,支持可选的风格、标题、纯乐器选项。
  2. 翻唱:将参考音频文件转换为新版本,支持可选的风格修改。

Hard Constraints

硬性约束

  • Always read config following
    shared/config-pattern.md
    before any interaction
  • Follow
    shared/cli-patterns.md
    for execution modes, error handling, and interaction patterns
  • Always follow
    shared/cli-authentication.md
    for auth checks
  • Never save files to
    ~/Downloads/
    or
    .listenhub/
    — save artifacts to the current working directory with friendly topic-based names (see
    shared/config-pattern.md
    § Artifact Naming)
  • No speakers involved — music generation does not use speaker selection
  • Audio file constraints for cover mode: mp3, wav, flac, m4a, ogg, aac; max 20MB
  • Long timeout: 600s default. Use
    run_in_background: true
    with
    timeout: 660000
<HARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any CLI command until the user has explicitly confirmed. </HARD-GATE>
  • 所有交互前必须阅读
    shared/config-pattern.md
    中的配置规则
  • 执行模式、错误处理和交互规则遵循
    shared/cli-patterns.md
  • 身份校验严格遵循
    shared/cli-authentication.md
  • 禁止将文件保存到
    ~/Downloads/
    或者
    .listenhub/
    —— 产物保存到当前工作目录,使用基于主题的友好命名(参考
    shared/config-pattern.md
    § 产物命名)
  • 不涉及 speaker 选择 —— 音乐生成不需要选择发音人
  • 翻唱模式的音频文件约束:mp3, wav, flac, m4a, ogg, aac; 最大20MB
  • 长超时:默认600s。使用
    run_in_background: true
    配合
    timeout: 660000
<HARD-GATE> 每个多选步骤都要使用AskUserQuestion工具 —— 不要以纯文本形式打印选项。每次只问一个问题,等待用户回答后再进入下一步。收集完所有参数后,汇总选择并要求用户确认。在用户明确确认前,不要调用任何CLI命令。 </HARD-GATE>

Step -1: CLI Auth Check

步骤-1: CLI身份校验

Follow
shared/cli-authentication.md
. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually.
遵循
shared/cli-authentication.md
。如果CLI未安装或者用户未登录,自动安装并自动登录 —— 永远不要要求用户手动执行命令。

Step 0: Config Setup

步骤0: 配置设置

Follow
shared/config-pattern.md
Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
bash
mkdir -p ".listenhub/music"
echo '{"outputMode":"download","language":null}' > ".listenhub/music/config.json"
CONFIG_PATH=".listenhub/music/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Do NOT ask any setup questions. Proceed directly to the Interaction Flow.
If file exists — read config silently and proceed:
bash
CONFIG_PATH=".listenhub/music/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/music/config.json"
CONFIG=$(cat "$CONFIG_PATH")
遵循
shared/config-pattern.md
步骤0(零问题启动)。
如果文件不存在 —— 静默创建默认配置并继续:
bash
mkdir -p ".listenhub/music"
echo '{"outputMode":"download","language":null}' > ".listenhub/music/config.json"
CONFIG_PATH=".listenhub/music/config.json"
CONFIG=$(cat "$CONFIG_PATH")
不要询问任何设置问题。直接进入交互流程。
如果文件存在 —— 静默读取配置并继续:
bash
CONFIG_PATH=".listenhub/music/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/music/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Setup Flow (user-initiated reconfigure only)

设置流程(仅用户主动发起重新配置时执行)

Only run when the user explicitly asks to reconfigure. Display current settings:
当前配置 (music):
  输出方式:{inline / download / both}
  语言偏好:{zh / en / 未设置}
Then ask:
  1. outputMode: Follow
    shared/output-mode.md
    § Setup Flow Question.
  2. Language (optional): "默认语言?"
    • "中文 (zh)"
    • "English (en)"
    • "每次手动选择" → keep
      null
After collecting answers, save immediately:
bash
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
if [ "$LANGUAGE" != "null" ]; then
  NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
fi
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
仅当用户明确要求重新配置时运行。展示当前设置:
当前配置 (music):
  输出方式:{inline / download / both}
  语言偏好:{zh / en / 未设置}
然后询问:
  1. outputMode: 遵循
    shared/output-mode.md
    § 设置流程问题。
  2. 语言(可选): "默认语言?"
    • "中文 (zh)"
    • "English (en)"
    • "每次手动选择" → 保持
      null
收集答案后立即保存:
bash
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
if [ "$LANGUAGE" != "null" ]; then
  NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
fi
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

Interaction Flow

交互流程

Step 1: Mode

步骤1: 模式

Ask the user which mode they want, unless the intent is already clear from their message (e.g., "翻唱" or "cover" implies cover mode; "作曲" or "compose" implies generate mode).
Question: "选择音乐生成模式:"
Options:
  - "原创 (Generate)" — 从文字描述生成全新歌曲
  - "翻唱 (Cover)" — 基于参考音频生成新版本
询问用户想要使用的模式,除非用户的消息已经明确表明意图(例如 "翻唱" 或 "cover" 表示翻唱模式; "作曲" 或 "compose" 表示生成模式)。
Question: "选择音乐生成模式:"
Options:
  - "原创 (Generate)" — 从文字描述生成全新歌曲
  - "翻唱 (Cover)" — 基于参考音频生成新版本

Step 2a: Prompt (generate mode)

步骤2a: 提示词(生成模式)

If the user chose Generate, ask for the song description:
"请描述你想要的歌曲(主题、情绪、歌词片段等):"
Accept free text. This maps to
--prompt
.
如果用户选择 生成, 询问歌曲描述:
"请描述你想要的歌曲(主题、情绪、歌词片段等):"
接受自由文本。对应参数
--prompt

Step 2b: Reference Audio (cover mode)

步骤2b: 参考音频(翻唱模式)

If the user chose Cover, ask for the reference audio:
"请提供参考音频文件路径或 URL:"
Accept a local file path or URL. This maps to
--audio
.
Validate the input:
  • If a local path: verify the file exists and check the extension is one of:
    mp3
    ,
    wav
    ,
    flac
    ,
    m4a
    ,
    ogg
    ,
    aac
  • If a URL: accept as-is (the CLI will validate)
  • Check file size does not exceed 20 MB for local files:
    bash
    FILE_SIZE=$(stat -f%z "{path}" 2>/dev/null || stat -c%s "{path}" 2>/dev/null)
    if [ "$FILE_SIZE" -gt 20971520 ]; then
      echo "File exceeds 20 MB limit"
    fi
If validation fails, inform the user and re-ask.
Optionally, the user may also provide a prompt to guide the cover style.
如果用户选择 翻唱, 询问参考音频:
"请提供参考音频文件路径或 URL:"
接受本地文件路径或URL。对应参数
--audio
校验输入:
  • 如果是本地路径: 验证文件存在,且扩展名属于以下类型:
    mp3
    ,
    wav
    ,
    flac
    ,
    m4a
    ,
    ogg
    ,
    aac
  • 如果是URL: 直接接受(CLI会做校验)
  • 本地文件大小不能超过20 MB:
    bash
    FILE_SIZE=$(stat -f%z "{path}" 2>/dev/null || stat -c%s "{path}" 2>/dev/null)
    if [ "$FILE_SIZE" -gt 20971520 ]; then
      echo "File exceeds 20 MB limit"
    fi
如果校验失败,告知用户并重新询问。
用户也可以额外提供提示词来引导翻唱风格。

Step 3: Style (optional)

步骤3: 风格(可选)

Ask for an optional style descriptor:
"指定音乐风格?(如 pop、rock、jazz、电子、古风等,留空则由 AI 自动选择)"
Accept free text or empty. This maps to
--style
.
询问可选的风格描述:
"指定音乐风格?(如 pop、rock、jazz、电子、古风等,留空则由 AI 自动选择)"
接受自由文本或空值。对应参数
--style

Step 4: Title (optional)

步骤4: 标题(可选)

Ask for an optional title:
"歌曲标题?(留空则自动生成)"
Accept free text or empty. This maps to
--title
.
询问可选的标题:
"歌曲标题?(留空则自动生成)"
接受自由文本或空值。对应参数
--title

Step 5: Instrumental

步骤5: 纯音乐

Question: "是否纯音乐(无人声)?"
Options:
  - "否,带人声(默认)"
  - "是,纯音乐"
Default is "no" (with vocals). If the user selects "是", add
--instrumental
flag.
Question: "是否纯音乐(无人声)?"
Options:
  - "否,带人声(默认)"
  - "是,纯音乐"
默认是 "否"(带人声)。如果用户选择 "是", 添加
--instrumental
标志。

Step 6: Confirm & Generate

步骤6: 确认并生成

Summarize all choices:
Generate mode:
准备生成音乐:

  模式:原创 (Generate)
  描述:{prompt}
  风格:{style / 自动}
  标题:{title / 自动}
  人声:{带人声 / 纯音乐}

  确认?
Cover mode:
准备生成音乐:

  模式:翻唱 (Cover)
  参考音频:{path-or-url}
  描述:{prompt / 无}
  风格:{style / 自动}
  标题:{title / 自动}
  人声:{带人声 / 纯音乐}

  确认?
Wait for explicit confirmation before running any CLI command.
汇总所有选择:
生成模式:
准备生成音乐:

  模式:原创 (Generate)
  描述:{prompt}
  风格:{style / 自动}
  标题:{title / 自动}
  人声:{带人声 / 纯音乐}

  确认?
翻唱模式:
准备生成音乐:

  模式:翻唱 (Cover)
  参考音频:{path-or-url}
  描述:{prompt / 无}
  风格:{style / 自动}
  标题:{title / 自动}
  人声:{带人声 / 纯音乐}

  确认?
在运行任何CLI命令前等待用户明确确认。

Workflow

工作流程

  1. Submit (background): Run the CLI command with
    run_in_background: true
    and
    timeout: 660000
    :
    Generate mode:
    bash
    listenhub music generate \
      --prompt "{prompt}" \
      --style "{style}" \
      --title "{title}" \
      --instrumental \
      --json
    Cover mode:
    bash
    listenhub music cover \
      --audio "{path-or-url}" \
      --prompt "{prompt}" \
      --style "{style}" \
      --title "{title}" \
      --instrumental \
      --json
    Flag notes:
    • --prompt
      — text description of the music (required for generate, optional for cover)
    • --audio
      — reference audio file path or URL (cover mode only, required)
    • --style
      — optional style/genre hint; omit if not provided
    • --title
      — optional track title; omit if not provided
    • --instrumental
      — add this flag for instrumental-only (no vocals); omit if not selected
    • Omit
      --prompt
      in cover mode if not provided
    The CLI handles polling internally. Music generation takes up to 10 minutes.
  2. Tell the user the task is submitted and that they will be notified when it finishes.
  3. When notified of completion, present the result:
    Parse the CLI JSON output for key fields:
    bash
    AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')
    TITLE=$(echo "$RESULT" | jq -r '.title // "Untitled"')
    DURATION=$(echo "$RESULT" | jq -r '.duration // empty')
    CREDITS=$(echo "$RESULT" | jq -r '.credits // empty')
    Read
    OUTPUT_MODE
    from config. Follow
    shared/output-mode.md
    for behavior.
    inline
    or
    both
    : Display audio URL as a clickable link.
    音乐已生成!
    
    标题:{title}
    在线收听:{audioUrl}
    时长:{duration}s
    消耗积分:{credits}
    download
    or
    both
    : Also download the file. Generate a slug from the title following
    shared/config-pattern.md
    § Artifact Naming.
    bash
    SLUG="{slug}"  # e.g. "summer-breeze"
    NAME="${SLUG}.mp3"
    # Dedup: if file exists, append -2, -3, etc.
    BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
    while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
    curl -sS -o "$NAME" "{audioUrl}"
    Present:
    已保存到当前目录:
      {NAME}
  1. 提交(后台运行): 运行CLI命令,设置
    run_in_background: true
    timeout: 660000
    :
    生成模式:
    bash
    listenhub music generate \
      --prompt "{prompt}" \
      --style "{style}" \
      --title "{title}" \
      --instrumental \
      --json
    翻唱模式:
    bash
    listenhub music cover \
      --audio "{path-or-url}" \
      --prompt "{prompt}" \
      --style "{style}" \
      --title "{title}" \
      --instrumental \
      --json
    标志说明:
    • --prompt
      —— 音乐的文本描述(生成模式必填,翻唱模式可选)
    • --audio
      —— 参考音频文件路径或URL(仅翻唱模式,必填)
    • --style
      —— 可选的风格/流派提示;未提供则省略
    • --title
      —— 可选的歌曲标题;未提供则省略
    • --instrumental
      —— 纯音乐(无人声)时添加该标志;未选择则省略
    • 翻唱模式下如果未提供
      --prompt
      则省略
    CLI内部会处理轮询,音乐生成最多需要10分钟。
  2. 告知用户任务已提交,完成后会收到通知。
  3. 收到完成通知后,展示结果:
    解析CLI的JSON输出获取关键字段:
    bash
    AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')
    TITLE=$(echo "$RESULT" | jq -r '.title // "Untitled"')
    DURATION=$(echo "$RESULT" | jq -r '.duration // empty')
    CREDITS=$(echo "$RESULT" | jq -r '.credits // empty')
    从配置中读取
    OUTPUT_MODE
    ,遵循
    shared/output-mode.md
    的行为。
    inline
    both
    : 将音频URL展示为可点击链接。
    音乐已生成!
    
    标题:{title}
    在线收听:{audioUrl}
    时长:{duration}s
    消耗积分:{credits}
    download
    both
    : 同时下载文件。遵循
    shared/config-pattern.md
    § 产物命名规则基于标题生成slug:
    bash
    SLUG="{slug}"  # e.g. "summer-breeze"
    NAME="${SLUG}.mp3"
    # 去重:如果文件已存在,追加-2、-3等后缀
    BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
    while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
    curl -sS -o "$NAME" "{audioUrl}"
    展示:
    已保存到当前目录:
      {NAME}

After Successful Generation

成功生成后

Update config with the language used this session if the user explicitly specified one:
bash
if [ -n "$LANGUAGE" ]; then
  NEW_CONFIG=$(echo "$CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
  echo "$NEW_CONFIG" > "$CONFIG_PATH"
fi
Estimated times:
  • Music generation: 5-10 minutes
如果用户本次会话明确指定了语言,更新配置:
bash
if [ -n "$LANGUAGE" ]; then
  NEW_CONFIG=$(echo "$CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
  echo "$NEW_CONFIG" > "$CONFIG_PATH"
fi
预计耗时:
  • 音乐生成: 5-10分钟

Resources

资源

  • CLI authentication:
    shared/cli-authentication.md
  • CLI patterns:
    shared/cli-patterns.md
  • Config pattern:
    shared/config-pattern.md
  • Output mode:
    shared/output-mode.md
  • CLI身份校验:
    shared/cli-authentication.md
  • CLI模式:
    shared/cli-patterns.md
  • 配置规则:
    shared/config-pattern.md
  • 输出模式:
    shared/output-mode.md

Composability

可组合性

  • Invokes: nothing
  • Invoked by: content-planner (Phase 3)
  • 调用: 无
  • 被调用方: content-planner (第三阶段)

Examples

示例

Generate original:
"帮我做一首关于夏天海边的歌"
  1. Detect: generate mode ("做一首歌")
  2. Read config (first run: create defaults with
    outputMode: "download"
    )
  3. Infer: mode = generate, prompt = "夏天海边的歌"
  4. Ask: style? title? instrumental?
  5. Confirm summary → user confirms
bash
listenhub music generate \
  --prompt "关于夏天海边的歌" \
  --json
Wait for CLI to return result, then download
{slug}.mp3
to cwd.
Cover from file:
"用这个音频翻唱一下 demo.mp3,jazz 风格"
  1. Detect: cover mode ("翻唱")
  2. Validate:
    demo.mp3
    exists, is a supported format, under 20 MB
  3. Infer: style = "jazz" from user input
  4. Ask: title? instrumental?
  5. Confirm summary → user confirms
bash
listenhub music cover \
  --audio "demo.mp3" \
  --style "jazz" \
  --json
Wait for CLI to return result, then download
{slug}.mp3
to cwd.
Generate instrumental:
"Create an instrumental electronic track for a game intro"
  1. Detect: generate mode ("Create ... track")
  2. Infer: style = "electronic", instrumental = yes
  3. Ask: title?
  4. Confirm summary → user confirms
bash
listenhub music generate \
  --prompt "instrumental electronic track for a game intro" \
  --style "electronic" \
  --instrumental \
  --json
Wait for CLI to return result, then download
{slug}.mp3
to cwd.
生成原创音乐:
"帮我做一首关于夏天海边的歌"
  1. 检测: 生成模式 ("做一首歌")
  2. 读取配置(首次运行:创建默认配置
    outputMode: "download"
  3. 推断: 模式 = 生成, 提示词 = "夏天海边的歌"
  4. 询问: 风格?标题?是否纯音乐?
  5. 汇总确认 → 用户确认
bash
listenhub music generate \
  --prompt "关于夏天海边的歌" \
  --json
等待CLI返回结果,然后将
{slug}.mp3
下载到当前工作目录。
基于文件生成翻唱:
"用这个音频翻唱一下 demo.mp3,jazz 风格"
  1. 检测: 翻唱模式 ("翻唱")
  2. 校验:
    demo.mp3
    存在,属于支持格式,大小小于20 MB
  3. 推断: 风格 = "jazz"(来自用户输入)
  4. 询问: 标题?是否纯音乐?
  5. 汇总确认 → 用户确认
bash
listenhub music cover \
  --audio "demo.mp3" \
  --style "jazz" \
  --json
等待CLI返回结果,然后下载
{slug}.mp3
到当前目录。
生成纯音乐:
"Create an instrumental electronic track for a game intro"
  1. 检测: 生成模式 ("Create ... track")
  2. 推断: 风格 = "electronic", 纯音乐 = 是
  3. 询问: 标题?
  4. 汇总确认 → 用户确认
bash
listenhub music generate \
  --prompt "instrumental electronic track for a game intro" \
  --style "electronic" \
  --instrumental \
  --json
等待CLI返回结果,然后下载
{slug}.mp3
到当前目录。