Search Results: speech-to-text

Found 60 Skills

bilibili-render-pdf

Generate a professional, detailed, figure-rich LaTeX course note and final PDF from a Bilibili lecture, tutorial, or technical talk. Use when the user provides a Bilibili URL (BV number) and wants structured Chinese teaching notes that combine the video's title, chapters, diagrams, formulas, code, subtitle explanations, the original video cover on the front page, and a final synthesis chapter, with key frames extracted from the highest usable video resolution and inserted as figures, and where the final deliverable must include a rendered PDF. Falls back to Whisper speech-to-text when no CC subtitles are available.

🇺🇸|EnglishTranslated

AI & Machine Learningnovitalabs/novita-skills

novita-ai

Novita AI: LLM, Image Generation & Editing, Video Generation, Audio (TTS/ASR), and GPU Cloud. Use this skill whenever the user wants to call Novita AI APIs — chat with LLMs (DeepSeek, Llama, Qwen), generate images (FLUX, Stable Diffusion, Seedream, Hunyuan Image), edit images (remove background, upscale, inpainting, img2img, outpainting, reimagine, merge face, replace background, remove text), generate videos (Kling, Wan, Hunyuan, Minimax Hailuo, Vidu, PixVerse, Seedance), do text-to-speech or speech-to-text (MiniMax TTS, GLM TTS, Fish Audio, ASR, voice cloning), run OpenAI-compatible batch jobs, manage GPU cloud instances and serverless endpoints, or check account balance and billing. Also trigger when the user mentions novita.ai, Novita AI, Novita API key, or wants to use any Novita platform service — even if they just say "generate an image" or "run an LLM" and Novita is available as a provider.

🇺🇸|EnglishTranslated

AI & Machine Learningcinience/alicloud-skills

aliyun-qwen-livetranslate

Use when live speech translation is needed with Alibaba Cloud Model Studio Qwen LiveTranslate models, including bilingual meetings, realtime interpretation, and speech-to-speech or speech-to-text translation flows.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningzainhas/togetherai-skills

together-audio

Text-to-speech (TTS) and speech-to-text (STT) via Together AI. TTS models include Orpheus, Kokoro, Cartesia Sonic, Rime, MiniMax with REST, streaming, and WebSocket support. STT models include Whisper and Voxtral. Use when users need voice synthesis, audio generation, speech recognition, transcription, TTS, STT, or real-time voice applications.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningcinience/alicloud-skills

alicloud-ai-audio-livetranslate

🇺🇸|EnglishTranslated

1 scripts/Checked

Uncategorizedgarrytan/gstack

cso

Chief Security Officer mode. Infrastructure-first security audit: secrets archaeology, dependency supply chain, CI/CD pipeline security, LLM/AI security, skill supply chain scanning, plus OWASP Top 10, STRIDE threat modeling, and active verification. Two modes: daily (zero-noise, 8/10 confidence gate) and comprehensive (monthly deep scan, 2/10 bar). Trend tracking across audit runs. Use when: "security audit", "threat model", "pentest review", "OWASP", "CSO review". (gstack) Voice triggers (speech-to-text aliases): "see-so", "see so", "security review", "security check", "vulnerability scan", "run security".

🇺🇸|English

Tools & Utilitiesyzfly/douyin-mcp-server

douyin-video

Watermark-free Douyin video download and transcript extraction tool. Retrieve watermark-free video download links from Douyin share links, download videos, extract voice transcripts from videos and automatically save them to files. Applicable scenarios include obtaining Douyin video information, downloading watermark-free videos, and batch extracting video transcripts. Triggered when users need to process Douyin video links or extract video content.

🇨🇳|ChineseTranslated

1 scripts/Attention

Backend Developmentteam-telnyx/skills

telnyx-voice-streaming-javascript

Stream call audio in real-time, fork media to external destinations, and transcribe speech live. Use for real-time analytics and AI integrations. This skill provides JavaScript SDK examples.

🇺🇸|EnglishTranslated

AI & Machine Learningbytedance/agentkit-sample...

byted-voice-to-text

Automatic Speech Recognition (ASR). Uses Volcano Engine BigModel ASR for speech recognition, with two available modes: Express Edition (≤2h/100MB, synchronous fast response) and Standard Edition (≤5h, asynchronous recognition). It supports Feishu voice messages, local audio files and audio URLs. Use this skill when you receive voice messages or audio attachments (.ogg/.mp3/.wav).

🇨🇳|ChineseTranslated

5 scripts/Attention

AI & Machine Learningagentiveau/myagentive

deepgram-transcription

Transcribe audio and video files using the Deepgram API. This skill should be used when the user requests transcription of audio files (mp3, wav, m4a, aac) or video files (mp4, mov, avi, etc.). Handles large video files by extracting audio first to reduce upload size and processing time.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningmembranedev/application-s...

deepgram

Deepgram integration. Manage Projects. Use when the user wants to interact with Deepgram data.

🇺🇸|EnglishTranslated

AI & Machine Learningtrpc-group/trpc-agent-go

whisper

Transcribe audio files to text using OpenAI Whisper

🇺🇸|EnglishTranslated

1 scripts/Checked