speech-to-text

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

speech-to-text

Transcribe any audio file to text. Supports multilingual auto-detection, timestamps, and speaker labels.

将任意音频文件转录为文字。支持多语言自动检测、时间戳和说话人标签。

Triggers

触发条件

transcribe / transcript / transcription
speech to text / STT / audio to text
what does this audio say / convert audio
转录 / 语音转文字 / 识别音频

transcribe / transcript / transcription
speech to text / STT / audio to text
what does this audio say / convert audio
转录 / 语音转文字 / 识别音频

Quick Start

快速开始

bash

undefined

bash

undefined

Transcribe with auto language detection

python3 skills/speech-to-text/scripts/stt.py audio.mp3

Specify language explicitly

python3 skills/speech-to-text/scripts/stt.py interview.wav --language en

Save transcript to file

python3 skills/speech-to-text/scripts/stt.py podcast.m4a -o transcript.txt

Output full JSON (with timestamps and speaker labels)

python3 skills/speech-to-text/scripts/stt.py meeting.wav --json -o result.json

undefined

python3 skills/speech-to-text/scripts/stt.py meeting.wav --json -o result.json

undefined

Arguments

参数

Argument	Default	Description
`file`	required	Audio file to transcribe (mp3, wav, m4a, ogg, flac, aac, webm). Max 50 MB, max 10 min.
`--language` / `-l`	auto-detect	BCP-47 language code (e.g. `en` , `zh` , `ja` ). Omit to auto-detect.
`--output` / `-o`	stdout	Path to save transcript text (or JSON if `--json` is set).
`--json`	off	Output full JSON response with timestamps and speaker labels.
`--api-key`	from env/config	Noiz API key (overrides stored key).

参数	默认值	描述
`file`	必填	待转录的音频文件（支持mp3、wav、m4a、ogg、flac、aac、webm格式）。最大50 MB，最长10分钟。
`--language` / `-l`	自动检测	BCP-47语言代码（例如 `en` 、 `zh` 、 `ja` ）。省略则自动检测。
`--output` / `-o`	标准输出	转录文本（若设置 `--json` 则为JSON文件）的保存路径。
`--json`	关闭	输出包含时间戳和说话人标签的完整JSON响应。
`--api-key`	来自环境变量/配置	Noiz API密钥（覆盖已存储的密钥）。

Output Format

输出格式

Without

--json

, only the transcript text is printed:

Hello, welcome to today's podcast. We have a special guest joining us...

With

--json

, the full structured response is printed:

json

{
  "language": "en",
  "transcript": "Hello, welcome to today's podcast...",
  "duration": 42.5,
  "segments": [
    {"text": "Hello, welcome to today's podcast.", "start": 0.0, "end": 3.2, "spk": 0},
    {"text": "We have a special guest joining us.", "start": 3.5, "end": 6.1, "spk": 0}
  ]
}

未设置

--json

时，仅打印转录文本：

Hello, welcome to today's podcast. We have a special guest joining us...

设置

--json

时，将打印完整的结构化响应：

json

{
  "language": "en",
  "transcript": "Hello, welcome to today's podcast...",
  "duration": 42.5,
  "segments": [
    {"text": "Hello, welcome to today's podcast.", "start": 0.0, "end": 3.2, "spk": 0},
    {"text": "We have a special guest joining us.", "start": 3.5, "end": 6.1, "spk": 0}
  ]
}

Supported Languages

支持的语言

Common codes:

en

(English),

zh

(Chinese),

ja

(Japanese),

ko

(Korean),

es

(Spanish),

fr

(French),

de

(German),

pt

(Portuguese),

ru

(Russian),

ar

(Arabic). Omit

--language

to auto-detect.

常用代码：

en

（英语）、

zh

（中文）、

ja

（日语）、

ko

（韩语）、

es

（西班牙语）、

fr

（法语）、

de

（德语）、

pt

（葡萄牙语）、

ru

（俄语）、

ar

（阿拉伯语）。省略

--language

参数将自动检测语言。

Configuration

配置

bash

undefined

bash

undefined

Save your API key once

python3 skills/speech-to-text/scripts/stt.py config --set-api-key YOUR_KEY

Or set via environment variable

export NOIZ_API_KEY=YOUR_KEY


Get your API key at [developers.noiz.ai](https://developers.noiz.ai/api-keys).

export NOIZ_API_KEY=YOUR_KEY


前往[developers.noiz.ai](https://developers.noiz.ai/api-keys)获取API密钥。

Pricing

定价

Billed at $0.0006 per second of audio. A 10-minute file costs ~$0.36. New accounts include 10,000 free TTS characters; STT is billed separately.

按音频每秒0.0006美元计费。一个10分钟的文件约需0.36美元。新账户包含10000个免费TTS字符；STT需单独计费。

Security & data disclosure

安全性与数据披露

Credential storage: API key is saved to
```
~/.config/noiz/api_key
```
(permissions
```
0600
```
).
```
NOIZ_API_KEY
```
env var is also supported.
Network calls: The audio file is uploaded to
```
https://noiz.ai/v1/speech-to-text
```
for transcription. No data is sent until you run the command.
File limits: Max 50 MB per file, max 10 minutes (600 seconds) of audio.

凭证存储：API密钥将保存至
```
~/.config/noiz/api_key
```
（权限为
```
0600
```
）。同时支持通过
```
NOIZ_API_KEY
```
环境变量设置。
网络调用：音频文件将上传至
```
https://noiz.ai/v1/speech-to-text
```
进行转录。仅当您运行命令时才会发送数据。
文件限制：单文件最大50 MB，音频最长10分钟（600秒）。

Requirements

要求

```
requests
```
package:
```
pip install requests
```
Get your API key at developers.noiz.ai

```
requests
```
包：
```
pip install requests
```
前往developers.noiz.ai获取API密钥。