music

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When to Use

何时使用

User wants to generate original AI music from a prompt
User wants to create a cover from reference audio
User says "音乐", "music", "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", "create a song", or "做一首歌"

用户想要通过提示词生成原创AI音乐
用户想要基于参考音频创建翻唱版本
用户说出 "音乐", "music", "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", "create a song", 或者 "做一首歌"

When NOT to Use

何时不适用

User wants text-to-speech reading (use
```
/speech
```
)
User wants a podcast discussion (use
```
/podcast
```
)
User wants an explainer video with narration (use
```
/explainer
```
)
User wants to transcribe audio to text (use
```
/asr
```
)

用户想要文本转语音朗读（使用
```
/speech
```
）
用户想要播客讨论（使用
```
/podcast
```
）
用户想要带旁白的讲解视频（使用
```
/explainer
```
）
用户想要将音频转录为文本（使用
```
/asr
```
）

Purpose

用途

Generate original AI music from text prompts, or create cover versions from reference audio. Two modes:

Generate (original): Create a new song from a text prompt, with optional style, title, and instrumental-only options.
Cover: Transform a reference audio file into a new version, with optional style modifications.

通过文本提示词生成原创AI音乐，或者基于参考音频创建翻唱版本。两种模式：

生成（原创）：通过文本提示词创建全新歌曲，支持可选的风格、标题、纯乐器选项。
翻唱：将参考音频文件转换为新版本，支持可选的风格修改。

Hard Constraints

硬性约束

Always read config following
```
shared/config-pattern.md
```
before any interaction
Follow
```
shared/cli-patterns.md
```
for execution modes, error handling, and interaction patterns
Always follow
```
shared/cli-authentication.md
```
for auth checks
Never save files to
```
~/Downloads/
```
or
```
.listenhub/
```
— save artifacts to the current working directory with friendly topic-based names (see
```
shared/config-pattern.md
```
§ Artifact Naming)
No speakers involved — music generation does not use speaker selection
Audio file constraints for cover mode: mp3, wav, flac, m4a, ogg, aac; max 20MB
Long timeout: 600s default. Use
```
run_in_background: true
```
with
```
timeout: 660000
```

<HARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any CLI command until the user has explicitly confirmed. </HARD-GATE>

所有交互前必须阅读
```
shared/config-pattern.md
```
中的配置规则
执行模式、错误处理和交互规则遵循
```
shared/cli-patterns.md
```
身份校验严格遵循
```
shared/cli-authentication.md
```
禁止将文件保存到
```
~/Downloads/
```
或者
```
.listenhub/
```
—— 产物保存到当前工作目录，使用基于主题的友好命名（参考
```
shared/config-pattern.md
```
§ 产物命名）
不涉及 speaker 选择 —— 音乐生成不需要选择发音人
翻唱模式的音频文件约束：mp3, wav, flac, m4a, ogg, aac; 最大20MB
长超时：默认600s。使用
```
run_in_background: true
```
配合
```
timeout: 660000
```

<HARD-GATE> 每个多选步骤都要使用AskUserQuestion工具 —— 不要以纯文本形式打印选项。每次只问一个问题，等待用户回答后再进入下一步。收集完所有参数后，汇总选择并要求用户确认。在用户明确确认前，不要调用任何CLI命令。 </HARD-GATE>

Step -1: CLI Auth Check

步骤-1: CLI身份校验

shared/cli-authentication.md

. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually.

遵循

shared/cli-authentication.md

。如果CLI未安装或者用户未登录，自动安装并自动登录 —— 永远不要要求用户手动执行命令。

Step 0: Config Setup

步骤0: 配置设置

shared/config-pattern.md

Step 0 (Zero-Question Boot).

If file doesn't exist — silently create with defaults and proceed:

bash

mkdir -p ".listenhub/music"
echo '{"outputMode":"download","language":null}' > ".listenhub/music/config.json"
CONFIG_PATH=".listenhub/music/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Do NOT ask any setup questions. Proceed directly to the Interaction Flow.

If file exists — read config silently and proceed:

bash

CONFIG_PATH=".listenhub/music/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/music/config.json"
CONFIG=$(cat "$CONFIG_PATH")

遵循

shared/config-pattern.md

步骤0（零问题启动）。

如果文件不存在 —— 静默创建默认配置并继续:

bash

mkdir -p ".listenhub/music"
echo '{"outputMode":"download","language":null}' > ".listenhub/music/config.json"
CONFIG_PATH=".listenhub/music/config.json"
CONFIG=$(cat "$CONFIG_PATH")

不要询问任何设置问题。直接进入交互流程。

如果文件存在 —— 静默读取配置并继续:

bash

CONFIG_PATH=".listenhub/music/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/music/config.json"
CONFIG=$(cat "$CONFIG_PATH")

Setup Flow (user-initiated reconfigure only)

设置流程（仅用户主动发起重新配置时执行）

Only run when the user explicitly asks to reconfigure. Display current settings:

当前配置 (music)：
  输出方式：{inline / download / both}
  语言偏好：{zh / en / 未设置}

Then ask:

outputMode: Follow
```
shared/output-mode.md
```
§ Setup Flow Question.
Language (optional): "默认语言？"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → keep
```
null
```

After collecting answers, save immediately:

bash

NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
if [ "$LANGUAGE" != "null" ]; then
  NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
fi
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

仅当用户明确要求重新配置时运行。展示当前设置:

当前配置 (music)：
  输出方式：{inline / download / both}
  语言偏好：{zh / en / 未设置}

然后询问:

outputMode: 遵循
```
shared/output-mode.md
```
§ 设置流程问题。
语言（可选）: "默认语言？"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → 保持
```
null
```

收集答案后立即保存:

bash

NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
if [ "$LANGUAGE" != "null" ]; then
  NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
fi
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

Interaction Flow

交互流程

Step 1: Mode

步骤1: 模式

Ask the user which mode they want, unless the intent is already clear from their message (e.g., "翻唱" or "cover" implies cover mode; "作曲" or "compose" implies generate mode).

Question: "选择音乐生成模式："
Options:
  - "原创 (Generate)" — 从文字描述生成全新歌曲
  - "翻唱 (Cover)" — 基于参考音频生成新版本

询问用户想要使用的模式，除非用户的消息已经明确表明意图（例如 "翻唱" 或 "cover" 表示翻唱模式; "作曲" 或 "compose" 表示生成模式）。

Question: "选择音乐生成模式："
Options:
  - "原创 (Generate)" — 从文字描述生成全新歌曲
  - "翻唱 (Cover)" — 基于参考音频生成新版本

Step 2a: Prompt (generate mode)

步骤2a: 提示词（生成模式）

If the user chose Generate, ask for the song description:

"请描述你想要的歌曲（主题、情绪、歌词片段等）："

Accept free text. This maps to

--prompt

如果用户选择生成, 询问歌曲描述:

"请描述你想要的歌曲（主题、情绪、歌词片段等）："

接受自由文本。对应参数

--prompt

。

Step 2b: Reference Audio (cover mode)

步骤2b: 参考音频（翻唱模式）

If the user chose Cover, ask for the reference audio:

"请提供参考音频文件路径或 URL："

Accept a local file path or URL. This maps to

--audio

Validate the input:

If a local path: verify the file exists and check the extension is one of:
```
mp3
```
,
```
wav
```
,
```
flac
```
,
```
m4a
```
,
```
ogg
```
,
```
aac
```
If a URL: accept as-is (the CLI will validate)

Check file size does not exceed 20 MB for local files:

bash

FILE_SIZE=$(stat -f%z "{path}" 2>/dev/null || stat -c%s "{path}" 2>/dev/null)
if [ "$FILE_SIZE" -gt 20971520 ]; then
  echo "File exceeds 20 MB limit"
fi

If validation fails, inform the user and re-ask.

Optionally, the user may also provide a prompt to guide the cover style.

如果用户选择翻唱, 询问参考音频:

"请提供参考音频文件路径或 URL："

接受本地文件路径或URL。对应参数

--audio

。

校验输入:

如果是本地路径: 验证文件存在，且扩展名属于以下类型:
```
mp3
```
,
```
wav
```
,
```
flac
```
,
```
m4a
```
,
```
ogg
```
,
```
aac
```
如果是URL: 直接接受（CLI会做校验）

本地文件大小不能超过20 MB:

bash

FILE_SIZE=$(stat -f%z "{path}" 2>/dev/null || stat -c%s "{path}" 2>/dev/null)
if [ "$FILE_SIZE" -gt 20971520 ]; then
  echo "File exceeds 20 MB limit"
fi

如果校验失败，告知用户并重新询问。

用户也可以额外提供提示词来引导翻唱风格。

Step 3: Style (optional)

步骤3: 风格（可选）

Ask for an optional style descriptor:

"指定音乐风格？（如 pop、rock、jazz、电子、古风等，留空则由 AI 自动选择）"

Accept free text or empty. This maps to

--style

询问可选的风格描述:

"指定音乐风格？（如 pop、rock、jazz、电子、古风等，留空则由 AI 自动选择）"

接受自由文本或空值。对应参数

--style

。

Step 4: Title (optional)

步骤4: 标题（可选）

Ask for an optional title:

"歌曲标题？（留空则自动生成）"

Accept free text or empty. This maps to

--title

询问可选的标题:

"歌曲标题？（留空则自动生成）"

接受自由文本或空值。对应参数

--title

。

Step 5: Instrumental

步骤5: 纯音乐

Question: "是否纯音乐（无人声）？"
Options:
  - "否，带人声（默认）"
  - "是，纯音乐"

Default is "no" (with vocals). If the user selects "是", add

--instrumental

flag.

Question: "是否纯音乐（无人声）？"
Options:
  - "否，带人声（默认）"
  - "是，纯音乐"

默认是 "否"（带人声）。如果用户选择 "是", 添加

--instrumental

标志。

Step 6: Confirm & Generate

步骤6: 确认并生成

Summarize all choices:

Generate mode:

准备生成音乐：

  模式：原创 (Generate)
  描述：{prompt}
  风格：{style / 自动}
  标题：{title / 自动}
  人声：{带人声 / 纯音乐}

  确认？

Cover mode:

准备生成音乐：

  模式：翻唱 (Cover)
  参考音频：{path-or-url}
  描述：{prompt / 无}
  风格：{style / 自动}
  标题：{title / 自动}
  人声：{带人声 / 纯音乐}

  确认？

Wait for explicit confirmation before running any CLI command.

汇总所有选择:

生成模式:

准备生成音乐：

  模式：原创 (Generate)
  描述：{prompt}
  风格：{style / 自动}
  标题：{title / 自动}
  人声：{带人声 / 纯音乐}

  确认？

翻唱模式:

准备生成音乐：

  模式：翻唱 (Cover)
  参考音频：{path-or-url}
  描述：{prompt / 无}
  风格：{style / 自动}
  标题：{title / 自动}
  人声：{带人声 / 纯音乐}

  确认？

在运行任何CLI命令前等待用户明确确认。

Workflow

工作流程

Submit (background): Run the CLI command with
```
run_in_background: true
```
and
```
timeout: 660000
```
:
Generate mode:
bash
```
listenhub music generate \
  --prompt "{prompt}" \
  --style "{style}" \
  --title "{title}" \
  --instrumental \
  --json
```
Cover mode:
bash
```
listenhub music cover \
  --audio "{path-or-url}" \
  --prompt "{prompt}" \
  --style "{style}" \
  --title "{title}" \
  --instrumental \
  --json
```
Flag notes:
- ```
--prompt
```
  — text description of the music (required for generate, optional for cover)
- ```
--audio
```
  — reference audio file path or URL (cover mode only, required)
- ```
--style
```
  — optional style/genre hint; omit if not provided
- ```
--title
```
  — optional track title; omit if not provided
- ```
--instrumental
```
  — add this flag for instrumental-only (no vocals); omit if not selected
- Omit
```
--prompt
```
  in cover mode if not provided
The CLI handles polling internally. Music generation takes up to 10 minutes.
Tell the user the task is submitted and that they will be notified when it finishes.

When notified of completion, present the result:

Parse the CLI JSON output for key fields:

bash

AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')
TITLE=$(echo "$RESULT" | jq -r '.title // "Untitled"')
DURATION=$(echo "$RESULT" | jq -r '.duration // empty')
CREDITS=$(echo "$RESULT" | jq -r '.credits // empty')

Read

OUTPUT_MODE

from config. Follow

shared/output-mode.md

for behavior.

inline
or
both
: Display audio URL as a clickable link.

音乐已生成！

标题：{title}
在线收听：{audioUrl}
时长：{duration}s
消耗积分：{credits}

download
or
both
: Also download the file. Generate a slug from the title following

shared/config-pattern.md

§ Artifact Naming.

bash

SLUG="{slug}"  # e.g. "summer-breeze"
NAME="${SLUG}.mp3"
# Dedup: if file exists, append -2, -3, etc.
BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
curl -sS -o "$NAME" "{audioUrl}"

Present:

已保存到当前目录：
  {NAME}

提交（后台运行）: 运行CLI命令，设置
```
run_in_background: true
```
和
```
timeout: 660000
```
:
生成模式:
bash
```
listenhub music generate \
  --prompt "{prompt}" \
  --style "{style}" \
  --title "{title}" \
  --instrumental \
  --json
```
翻唱模式:
bash
```
listenhub music cover \
  --audio "{path-or-url}" \
  --prompt "{prompt}" \
  --style "{style}" \
  --title "{title}" \
  --instrumental \
  --json
```
标志说明:
- ```
--prompt
```
  —— 音乐的文本描述（生成模式必填，翻唱模式可选）
- ```
--audio
```
  —— 参考音频文件路径或URL（仅翻唱模式，必填）
- ```
--style
```
  —— 可选的风格/流派提示；未提供则省略
- ```
--title
```
  —— 可选的歌曲标题；未提供则省略
- ```
--instrumental
```
  —— 纯音乐（无人声）时添加该标志；未选择则省略
- 翻唱模式下如果未提供
```
--prompt
```
  则省略
CLI内部会处理轮询，音乐生成最多需要10分钟。
告知用户任务已提交，完成后会收到通知。

收到完成通知后，展示结果:

解析CLI的JSON输出获取关键字段:

bash

AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')
TITLE=$(echo "$RESULT" | jq -r '.title // "Untitled"')
DURATION=$(echo "$RESULT" | jq -r '.duration // empty')
CREDITS=$(echo "$RESULT" | jq -r '.credits // empty')

从配置中读取

OUTPUT_MODE

，遵循

shared/output-mode.md

的行为。

inline
或
both
: 将音频URL展示为可点击链接。

音乐已生成！

标题：{title}
在线收听：{audioUrl}
时长：{duration}s
消耗积分：{credits}

download
或
both
: 同时下载文件。遵循

shared/config-pattern.md

§ 产物命名规则基于标题生成slug:

bash

SLUG="{slug}"  # e.g. "summer-breeze"
NAME="${SLUG}.mp3"
# 去重：如果文件已存在，追加-2、-3等后缀
BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
curl -sS -o "$NAME" "{audioUrl}"

展示:

已保存到当前目录：
  {NAME}

After Successful Generation

成功生成后

Update config with the language used this session if the user explicitly specified one:

bash

if [ -n "$LANGUAGE" ]; then
  NEW_CONFIG=$(echo "$CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
  echo "$NEW_CONFIG" > "$CONFIG_PATH"
fi

Estimated times:

Music generation: 5-10 minutes

如果用户本次会话明确指定了语言，更新配置:

bash

if [ -n "$LANGUAGE" ]; then
  NEW_CONFIG=$(echo "$CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
  echo "$NEW_CONFIG" > "$CONFIG_PATH"
fi

预计耗时:

音乐生成: 5-10分钟

Resources

资源

CLI authentication:
```
shared/cli-authentication.md
```
CLI patterns:
```
shared/cli-patterns.md
```
Config pattern:
```
shared/config-pattern.md
```
Output mode:
```
shared/output-mode.md
```

CLI身份校验:
```
shared/cli-authentication.md
```
CLI模式:
```
shared/cli-patterns.md
```
配置规则:
```
shared/config-pattern.md
```
输出模式:
```
shared/output-mode.md
```

Composability

可组合性

Invokes: nothing
Invoked by: content-planner (Phase 3)

调用: 无
被调用方: content-planner (第三阶段)

Examples

示例

Generate original:

"帮我做一首关于夏天海边的歌"

Detect: generate mode ("做一首歌")
Read config (first run: create defaults with
```
outputMode: "download"
```
)
Infer: mode = generate, prompt = "夏天海边的歌"
Ask: style? title? instrumental?
Confirm summary → user confirms

bash

listenhub music generate \
  --prompt "关于夏天海边的歌" \
  --json

Wait for CLI to return result, then download

{slug}.mp3

to cwd.

Cover from file:

"用这个音频翻唱一下 demo.mp3，jazz 风格"

Detect: cover mode ("翻唱")
Validate:
```
demo.mp3
```
exists, is a supported format, under 20 MB
Infer: style = "jazz" from user input
Ask: title? instrumental?
Confirm summary → user confirms

bash

listenhub music cover \
  --audio "demo.mp3" \
  --style "jazz" \
  --json

Wait for CLI to return result, then download

{slug}.mp3

to cwd.

Generate instrumental:

"Create an instrumental electronic track for a game intro"

Detect: generate mode ("Create ... track")
Infer: style = "electronic", instrumental = yes
Ask: title?
Confirm summary → user confirms

bash

listenhub music generate \
  --prompt "instrumental electronic track for a game intro" \
  --style "electronic" \
  --instrumental \
  --json

Wait for CLI to return result, then download

{slug}.mp3

to cwd.

生成原创音乐:

"帮我做一首关于夏天海边的歌"

检测: 生成模式 ("做一首歌")
读取配置（首次运行：创建默认配置
```
outputMode: "download"
```
）
推断: 模式 = 生成, 提示词 = "夏天海边的歌"
询问: 风格？标题？是否纯音乐？
汇总确认 → 用户确认

bash

listenhub music generate \
  --prompt "关于夏天海边的歌" \
  --json

等待CLI返回结果，然后将

{slug}.mp3

下载到当前工作目录。

基于文件生成翻唱:

"用这个音频翻唱一下 demo.mp3，jazz 风格"

检测: 翻唱模式 ("翻唱")
校验:
```
demo.mp3
```
存在，属于支持格式，大小小于20 MB
推断: 风格 = "jazz"（来自用户输入）
询问: 标题？是否纯音乐？
汇总确认 → 用户确认

bash

listenhub music cover \
  --audio "demo.mp3" \
  --style "jazz" \
  --json

等待CLI返回结果，然后下载

{slug}.mp3

到当前目录。

生成纯音乐:

"Create an instrumental electronic track for a game intro"

检测: 生成模式 ("Create ... track")
推断: 风格 = "electronic", 纯音乐 = 是
询问: 标题？
汇总确认 → 用户确认

bash

listenhub music generate \
  --prompt "instrumental electronic track for a game intro" \
  --style "electronic" \
  --instrumental \
  --json

等待CLI返回结果，然后下载

{slug}.mp3

到当前目录。