Loading...
Loading...
Use this skill whenever the user wants to transcribe audio to text, convert speech to text, or get a transcript from an audio or video file. Triggers include: any mention of 'transcribe', 'transcription', 'speech to text', 'STT', 'convert audio to text', 'what does this audio say', 'get transcript', 'subtitle generation', or requests to extract spoken words from a file. Also use when the user wants speaker identification from audio, timestamps for captions, or multilingual transcription.
npx skill4agent add noizai/skills speech-to-text# Transcribe with auto language detection
python3 skills/speech-to-text/scripts/stt.py audio.mp3
# Specify language explicitly
python3 skills/speech-to-text/scripts/stt.py interview.wav --language en
# Save transcript to file
python3 skills/speech-to-text/scripts/stt.py podcast.m4a -o transcript.txt
# Output full JSON (with timestamps and speaker labels)
python3 skills/speech-to-text/scripts/stt.py meeting.wav --json -o result.json| Argument | Default | Description |
|---|---|---|
| required | Audio file to transcribe (mp3, wav, m4a, ogg, flac, aac, webm). Max 50 MB, max 10 min. |
| auto-detect | BCP-47 language code (e.g. |
| stdout | Path to save transcript text (or JSON if |
| off | Output full JSON response with timestamps and speaker labels. |
| from env/config | Noiz API key (overrides stored key). |
--jsonHello, welcome to today's podcast. We have a special guest joining us...--json{
"language": "en",
"transcript": "Hello, welcome to today's podcast...",
"duration": 42.5,
"segments": [
{"text": "Hello, welcome to today's podcast.", "start": 0.0, "end": 3.2, "spk": 0},
{"text": "We have a special guest joining us.", "start": 3.5, "end": 6.1, "spk": 0}
]
}enzhjakoesfrdeptruar--language# Save your API key once
python3 skills/speech-to-text/scripts/stt.py config --set-api-key YOUR_KEY
# Or set via environment variable
export NOIZ_API_KEY=YOUR_KEY~/.config/noiz/api_key0600NOIZ_API_KEYhttps://noiz.ai/v1/speech-to-textrequestspip install requests