speech-to-text

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

speech-to-text

speech-to-text

Transcribe any audio file to text. Supports multilingual auto-detection, timestamps, and speaker labels.
将任意音频文件转录为文字。支持多语言自动检测、时间戳和说话人标签。

Triggers

触发条件

  • transcribe / transcript / transcription
  • speech to text / STT / audio to text
  • what does this audio say / convert audio
  • 转录 / 语音转文字 / 识别音频
  • transcribe / transcript / transcription
  • speech to text / STT / audio to text
  • what does this audio say / convert audio
  • 转录 / 语音转文字 / 识别音频

Quick Start

快速开始

bash
undefined
bash
undefined

Transcribe with auto language detection

Transcribe with auto language detection

python3 skills/speech-to-text/scripts/stt.py audio.mp3
python3 skills/speech-to-text/scripts/stt.py audio.mp3

Specify language explicitly

Specify language explicitly

python3 skills/speech-to-text/scripts/stt.py interview.wav --language en
python3 skills/speech-to-text/scripts/stt.py interview.wav --language en

Save transcript to file

Save transcript to file

python3 skills/speech-to-text/scripts/stt.py podcast.m4a -o transcript.txt
python3 skills/speech-to-text/scripts/stt.py podcast.m4a -o transcript.txt

Output full JSON (with timestamps and speaker labels)

Output full JSON (with timestamps and speaker labels)

python3 skills/speech-to-text/scripts/stt.py meeting.wav --json -o result.json
undefined
python3 skills/speech-to-text/scripts/stt.py meeting.wav --json -o result.json
undefined

Arguments

参数

ArgumentDefaultDescription
file
requiredAudio file to transcribe (mp3, wav, m4a, ogg, flac, aac, webm). Max 50 MB, max 10 min.
--language
/
-l
auto-detectBCP-47 language code (e.g.
en
,
zh
,
ja
). Omit to auto-detect.
--output
/
-o
stdoutPath to save transcript text (or JSON if
--json
is set).
--json
offOutput full JSON response with timestamps and speaker labels.
--api-key
from env/configNoiz API key (overrides stored key).
参数默认值描述
file
必填待转录的音频文件(支持mp3、wav、m4a、ogg、flac、aac、webm格式)。最大50 MB,最长10分钟。
--language
/
-l
自动检测BCP-47语言代码(例如
en
zh
ja
)。省略则自动检测。
--output
/
-o
标准输出转录文本(若设置
--json
则为JSON文件)的保存路径。
--json
关闭输出包含时间戳和说话人标签的完整JSON响应。
--api-key
来自环境变量/配置Noiz API密钥(覆盖已存储的密钥)。

Output Format

输出格式

Without
--json
, only the transcript text is printed:
Hello, welcome to today's podcast. We have a special guest joining us...
With
--json
, the full structured response is printed:
json
{
  "language": "en",
  "transcript": "Hello, welcome to today's podcast...",
  "duration": 42.5,
  "segments": [
    {"text": "Hello, welcome to today's podcast.", "start": 0.0, "end": 3.2, "spk": 0},
    {"text": "We have a special guest joining us.", "start": 3.5, "end": 6.1, "spk": 0}
  ]
}
未设置
--json
时,仅打印转录文本:
Hello, welcome to today's podcast. We have a special guest joining us...
设置
--json
时,将打印完整的结构化响应:
json
{
  "language": "en",
  "transcript": "Hello, welcome to today's podcast...",
  "duration": 42.5,
  "segments": [
    {"text": "Hello, welcome to today's podcast.", "start": 0.0, "end": 3.2, "spk": 0},
    {"text": "We have a special guest joining us.", "start": 3.5, "end": 6.1, "spk": 0}
  ]
}

Supported Languages

支持的语言

Common codes:
en
(English),
zh
(Chinese),
ja
(Japanese),
ko
(Korean),
es
(Spanish),
fr
(French),
de
(German),
pt
(Portuguese),
ru
(Russian),
ar
(Arabic). Omit
--language
to auto-detect.
常用代码:
en
(英语)、
zh
(中文)、
ja
(日语)、
ko
(韩语)、
es
(西班牙语)、
fr
(法语)、
de
(德语)、
pt
(葡萄牙语)、
ru
(俄语)、
ar
(阿拉伯语)。省略
--language
参数将自动检测语言。

Configuration

配置

bash
undefined
bash
undefined

Save your API key once

Save your API key once

python3 skills/speech-to-text/scripts/stt.py config --set-api-key YOUR_KEY
python3 skills/speech-to-text/scripts/stt.py config --set-api-key YOUR_KEY

Or set via environment variable

Or set via environment variable

export NOIZ_API_KEY=YOUR_KEY

Get your API key at [developers.noiz.ai](https://developers.noiz.ai/api-keys).
export NOIZ_API_KEY=YOUR_KEY

前往[developers.noiz.ai](https://developers.noiz.ai/api-keys)获取API密钥。

Pricing

定价

Billed at $0.0006 per second of audio. A 10-minute file costs ~$0.36. New accounts include 10,000 free TTS characters; STT is billed separately.
按音频每秒0.0006美元计费。一个10分钟的文件约需0.36美元。新账户包含10000个免费TTS字符;STT需单独计费。

Security & data disclosure

安全性与数据披露

  • Credential storage: API key is saved to
    ~/.config/noiz/api_key
    (permissions
    0600
    ).
    NOIZ_API_KEY
    env var is also supported.
  • Network calls: The audio file is uploaded to
    https://noiz.ai/v1/speech-to-text
    for transcription. No data is sent until you run the command.
  • File limits: Max 50 MB per file, max 10 minutes (600 seconds) of audio.
  • 凭证存储:API密钥将保存至
    ~/.config/noiz/api_key
    (权限为
    0600
    )。同时支持通过
    NOIZ_API_KEY
    环境变量设置。
  • 网络调用:音频文件将上传至
    https://noiz.ai/v1/speech-to-text
    进行转录。仅当您运行命令时才会发送数据。
  • 文件限制:单文件最大50 MB,音频最长10分钟(600秒)。

Requirements

要求