speech-to-text
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesespeech-to-text
speech-to-text
Transcribe any audio file to text. Supports multilingual auto-detection, timestamps, and speaker labels.
将任意音频文件转录为文字。支持多语言自动检测、时间戳和说话人标签。
Triggers
触发条件
- transcribe / transcript / transcription
- speech to text / STT / audio to text
- what does this audio say / convert audio
- 转录 / 语音转文字 / 识别音频
- transcribe / transcript / transcription
- speech to text / STT / audio to text
- what does this audio say / convert audio
- 转录 / 语音转文字 / 识别音频
Quick Start
快速开始
bash
undefinedbash
undefinedTranscribe with auto language detection
Transcribe with auto language detection
python3 skills/speech-to-text/scripts/stt.py audio.mp3
python3 skills/speech-to-text/scripts/stt.py audio.mp3
Specify language explicitly
Specify language explicitly
python3 skills/speech-to-text/scripts/stt.py interview.wav --language en
python3 skills/speech-to-text/scripts/stt.py interview.wav --language en
Save transcript to file
Save transcript to file
python3 skills/speech-to-text/scripts/stt.py podcast.m4a -o transcript.txt
python3 skills/speech-to-text/scripts/stt.py podcast.m4a -o transcript.txt
Output full JSON (with timestamps and speaker labels)
Output full JSON (with timestamps and speaker labels)
python3 skills/speech-to-text/scripts/stt.py meeting.wav --json -o result.json
undefinedpython3 skills/speech-to-text/scripts/stt.py meeting.wav --json -o result.json
undefinedArguments
参数
| Argument | Default | Description |
|---|---|---|
| required | Audio file to transcribe (mp3, wav, m4a, ogg, flac, aac, webm). Max 50 MB, max 10 min. |
| auto-detect | BCP-47 language code (e.g. |
| stdout | Path to save transcript text (or JSON if |
| off | Output full JSON response with timestamps and speaker labels. |
| from env/config | Noiz API key (overrides stored key). |
| 参数 | 默认值 | 描述 |
|---|---|---|
| 必填 | 待转录的音频文件(支持mp3、wav、m4a、ogg、flac、aac、webm格式)。最大50 MB,最长10分钟。 |
| 自动检测 | BCP-47语言代码(例如 |
| 标准输出 | 转录文本(若设置 |
| 关闭 | 输出包含时间戳和说话人标签的完整JSON响应。 |
| 来自环境变量/配置 | Noiz API密钥(覆盖已存储的密钥)。 |
Output Format
输出格式
Without , only the transcript text is printed:
--jsonHello, welcome to today's podcast. We have a special guest joining us...With , the full structured response is printed:
--jsonjson
{
"language": "en",
"transcript": "Hello, welcome to today's podcast...",
"duration": 42.5,
"segments": [
{"text": "Hello, welcome to today's podcast.", "start": 0.0, "end": 3.2, "spk": 0},
{"text": "We have a special guest joining us.", "start": 3.5, "end": 6.1, "spk": 0}
]
}未设置时,仅打印转录文本:
--jsonHello, welcome to today's podcast. We have a special guest joining us...设置时,将打印完整的结构化响应:
--jsonjson
{
"language": "en",
"transcript": "Hello, welcome to today's podcast...",
"duration": 42.5,
"segments": [
{"text": "Hello, welcome to today's podcast.", "start": 0.0, "end": 3.2, "spk": 0},
{"text": "We have a special guest joining us.", "start": 3.5, "end": 6.1, "spk": 0}
]
}Supported Languages
支持的语言
Common codes: (English), (Chinese), (Japanese), (Korean), (Spanish), (French), (German), (Portuguese), (Russian), (Arabic). Omit to auto-detect.
enzhjakoesfrdeptruar--language常用代码:(英语)、(中文)、(日语)、(韩语)、(西班牙语)、(法语)、(德语)、(葡萄牙语)、(俄语)、(阿拉伯语)。省略参数将自动检测语言。
enzhjakoesfrdeptruar--languageConfiguration
配置
bash
undefinedbash
undefinedSave your API key once
Save your API key once
python3 skills/speech-to-text/scripts/stt.py config --set-api-key YOUR_KEY
python3 skills/speech-to-text/scripts/stt.py config --set-api-key YOUR_KEY
Or set via environment variable
Or set via environment variable
export NOIZ_API_KEY=YOUR_KEY
Get your API key at [developers.noiz.ai](https://developers.noiz.ai/api-keys).export NOIZ_API_KEY=YOUR_KEY
前往[developers.noiz.ai](https://developers.noiz.ai/api-keys)获取API密钥。Pricing
定价
Billed at $0.0006 per second of audio. A 10-minute file costs ~$0.36. New accounts include 10,000 free TTS characters; STT is billed separately.
按音频每秒0.0006美元计费。一个10分钟的文件约需0.36美元。新账户包含10000个免费TTS字符;STT需单独计费。
Security & data disclosure
安全性与数据披露
- Credential storage: API key is saved to (permissions
~/.config/noiz/api_key).0600env var is also supported.NOIZ_API_KEY - Network calls: The audio file is uploaded to for transcription. No data is sent until you run the command.
https://noiz.ai/v1/speech-to-text - File limits: Max 50 MB per file, max 10 minutes (600 seconds) of audio.
- 凭证存储:API密钥将保存至(权限为
~/.config/noiz/api_key)。同时支持通过0600环境变量设置。NOIZ_API_KEY - 网络调用:音频文件将上传至进行转录。仅当您运行命令时才会发送数据。
https://noiz.ai/v1/speech-to-text - 文件限制:单文件最大50 MB,音频最长10分钟(600秒)。
Requirements
要求
- package:
requestspip install requests - Get your API key at developers.noiz.ai
- 包:
requestspip install requests - 前往developers.noiz.ai获取API密钥。