Loading...
Loading...
Found 59 Skills
Generate a professional, detailed, figure-rich LaTeX course note and final PDF from a Bilibili lecture, tutorial, or technical talk. Use when the user provides a Bilibili URL (BV number) and wants structured Chinese teaching notes that combine the video's title, chapters, diagrams, formulas, code, subtitle explanations, the original video cover on the front page, and a final synthesis chapter, with key frames extracted from the highest usable video resolution and inserted as figures, and where the final deliverable must include a rendered PDF. Falls back to Whisper speech-to-text when no CC subtitles are available.
Novita AI: LLM, Image Generation & Editing, Video Generation, Audio (TTS/ASR), and GPU Cloud. Use this skill whenever the user wants to call Novita AI APIs — chat with LLMs (DeepSeek, Llama, Qwen), generate images (FLUX, Stable Diffusion, Seedream, Hunyuan Image), edit images (remove background, upscale, inpainting, img2img, outpainting, reimagine, merge face, replace background, remove text), generate videos (Kling, Wan, Hunyuan, Minimax Hailuo, Vidu, PixVerse, Seedance), do text-to-speech or speech-to-text (MiniMax TTS, GLM TTS, Fish Audio, ASR, voice cloning), run OpenAI-compatible batch jobs, manage GPU cloud instances and serverless endpoints, or check account balance and billing. Also trigger when the user mentions novita.ai, Novita AI, Novita API key, or wants to use any Novita platform service — even if they just say "generate an image" or "run an LLM" and Novita is available as a provider.
Use when live speech translation is needed with Alibaba Cloud Model Studio Qwen LiveTranslate models, including bilingual meetings, realtime interpretation, and speech-to-speech or speech-to-text translation flows.
Text-to-speech (TTS) and speech-to-text (STT) via Together AI. TTS models include Orpheus, Kokoro, Cartesia Sonic, Rime, MiniMax with REST, streaming, and WebSocket support. STT models include Whisper and Voxtral. Use when users need voice synthesis, audio generation, speech recognition, transcription, TTS, STT, or real-time voice applications.
Use when live speech translation is needed with Alibaba Cloud Model Studio Qwen LiveTranslate models, including bilingual meetings, realtime interpretation, and speech-to-speech or speech-to-text translation flows.
Watermark-free Douyin video download and transcript extraction tool. Retrieve watermark-free video download links from Douyin share links, download videos, extract voice transcripts from videos and automatically save them to files. Applicable scenarios include obtaining Douyin video information, downloading watermark-free videos, and batch extracting video transcripts. Triggered when users need to process Douyin video links or extract video content.
Stream call audio in real-time, fork media to external destinations, and transcribe speech live. Use for real-time analytics and AI integrations. This skill provides JavaScript SDK examples.
Automatic Speech Recognition (ASR). Uses Volcano Engine BigModel ASR for speech recognition, with two available modes: Express Edition (≤2h/100MB, synchronous fast response) and Standard Edition (≤5h, asynchronous recognition). It supports Feishu voice messages, local audio files and audio URLs. Use this skill when you receive voice messages or audio attachments (.ogg/.mp3/.wav).
Transcribe audio and video files using the Deepgram API. This skill should be used when the user requests transcription of audio files (mp3, wav, m4a, aac) or video files (mp4, mov, avi, etc.). Handles large video files by extracting audio first to reduce upload size and processing time.
Deepgram integration. Manage Projects. Use when the user wants to interact with Deepgram data.
Transcribe audio files to text using OpenAI Whisper
Use when transcribing non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.