Search Results: audio-processing

Found 31 Skills

eachlabs-voice-audio

Text-to-speech, speech-to-text, voice conversion, and audio processing using EachLabs AI models. Supports ElevenLabs TTS, Whisper transcription with diarization, and RVC voice conversion. Use when the user needs TTS, transcription, or voice conversion.

🇺🇸|EnglishTranslated

AI & Machine Learningbytedance/agentkit-sample...

byted-las-asr-pro

ASR (Automatic Speech Recognition) — enhanced speech-to-text built on Doubao large model, with audio preprocessing, denoising, and extended analysis capabilities. Async API. Choose this skill when: - Input is a video file (mp4/mov/mkv) — auto-extracts audio track - Audio needs denoising before recognition - File exceeds 512MB or 5 hours (no size limit) - Audio source is a TOS internal path (tos://bucket/key) - Need structured JSON output with timestamped utterances and metadata - Need speaker diarization, emotion/gender detection, speech rate, or sensitive word filtering Supports 99 languages, multiple formats (wav/mp3/m4a/aac/flac/ogg/mp4/mov/mkv), and auto language detection.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningopenai/skills

transcribe

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningsteipete/clawdis

openai-whisper

Local speech-to-text with the Whisper CLI (no API key).

🇺🇸|EnglishTranslated

Tools & Utilitiesmichaelboeding/skills

media-utils

Internal utility skill for media assembly operations. NOT called directly by users. Used by producer skills (video-producer, podcast-producer, audio-producer, social-producer) to stitch, mix, and assemble final media outputs.

🇺🇸|EnglishTranslated

7 scripts/Checked

AI & Machine Learningyonatangross/orchestkit

multimodal-llm

Vision, audio, and multimodal LLM integration patterns. Use when processing images, transcribing audio, generating speech, or building multimodal AI pipelines.

🇺🇸|EnglishTranslated

AI & Machine Learningmartinholovsky/claude-ski...

speech-to-text

Expert skill for implementing speech-to-text with Faster Whisper. Covers audio processing, transcription optimization, privacy protection, and secure handling of voice data for JARVIS voice assistant.

🇺🇸|EnglishTranslated

AI & Machine Learninginfquest/vibe-ops-plugin

audio-transcribe

Convert audio/video to text using Whisper, with support for word-level timestamps. Use this when users need speech-to-text conversion, audio-to-text transcription, video-to-text extraction, subtitle generation, transcribe audio, speech to text, generate subtitles, or speech recognition.

🇨🇳|ChineseTranslated

1 scripts/Checked

AI & Machine Learningzainhas/togetherai-skills

together-audio

Text-to-speech (TTS) and speech-to-text (STT) via Together AI. TTS models include Orpheus, Kokoro, Cartesia Sonic, Rime, MiniMax with REST, streaming, and WebSocket support. STT models include Whisper and Voxtral. Use when users need voice synthesis, audio generation, speech recognition, transcription, TTS, STT, or real-time voice applications.

🇺🇸|EnglishTranslated

2 scripts/Checked

Uncategorizedguia-matthieu/clawfu-skil...

audio-editing

Master the essential audio post-production techniques—normalization, compression, EQ, and noise reduction—using the correct processing order to achieve professional-quality audio. Use when: Editing podcast episodes or video soundtracks; Cleaning up recorded voiceovers; Improving audio quality for marketing content; Preparing audio files for distribution; Troubleshooting common audio issues

🇺🇸|EnglishTranslated

Frontend Developmentdaffy0208/ai-dev-standard...

audio-producer

Expert in web audio, audio processing, and interactive sound design

🇺🇸|EnglishTranslated

Product & Designbenzema216/dreamina-claud...

music-to-storyboard

Generate storyboard from music analysis — shot-by-shot with camera movements

🇺🇸|EnglishTranslated