Loading...
Loading...
Found 3 Skills
ASR (Automatic Speech Recognition) — enhanced speech-to-text built on Doubao large model, with audio preprocessing, denoising, and extended analysis capabilities. Async API. Choose this skill when: - Input is a video file (mp4/mov/mkv) — auto-extracts audio track - Audio needs denoising before recognition - File exceeds 512MB or 5 hours (no size limit) - Audio source is a TOS internal path (tos://bucket/key) - Need structured JSON output with timestamped utterances and metadata - Need speaker diarization, emotion/gender detection, speech rate, or sensitive word filtering Supports 99 languages, multiple formats (wav/mp3/m4a/aac/flac/ogg/mp4/mov/mkv), and auto language detection.
Use local FunASR service to transcribe audio or video files into timestamped Markdown files, supporting common formats such as mp4, mov, mp3, wav, m4a, etc. This skill should be used when users need speech-to-text conversion, meeting minutes, video subtitles, or podcast transcription.
Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch processing (up to 100 videos), translation to 100+ languages, and multiple output formats. Use when working with YouTube videos, subtitles, captions, or video-to-text conversion.