Loading...
Loading...
ASR (Automatic Speech Recognition) — enhanced speech-to-text built on Doubao large model, with audio preprocessing, denoising, and extended analysis capabilities. Async API. Choose this skill when: - Input is a video file (mp4/mov/mkv) — auto-extracts audio track - Audio needs denoising before recognition - File exceeds 512MB or 5 hours (no size limit) - Audio source is a TOS internal path (tos://bucket/key) - Need structured JSON output with timestamped utterances and metadata - Need speaker diarization, emotion/gender detection, speech rate, or sensitive word filtering Supports 99 languages, multiple formats (wav/mp3/m4a/aac/flac/ogg/mp4/mov/mkv), and auto language detection.
npx skill4agent add bytedance/agentkit-samples byted-las-asr-prosubmit/pollPOST https://operator.las.cn-beijing.volces.com/api/v1/submitPOST https://operator.las.cn-beijing.volces.com/api/v1/pollpython3 scripts/skill.py --helppython3 scripts/skill.py submit \
--audio-url "https://example.com/audio.wav" \
--audio-format wav \
--model-name bigmodel \
--region cn-beijing \
--out result.jsonpython3 scripts/skill.py submit \
--audio-url "https://example.com/audio.wav" \
--audio-format wav \
--no-waitpython3 scripts/skill.py poll <task_id>
python3 scripts/skill.py wait <task_id> --timeout 1800 --out result.jsonreferences/api.mdLAS_API_KEYenv.shaudio_format