Loading...
Loading...
Found 939 Skills
Write correct and idiomatic Typst code for document typesetting. Use when creating or editing Typst (.typ) files, working with Typst markup, or answering questions about Typst syntax and features. Focuses on avoiding common syntax confusion (arrays vs content blocks, proper function definitions, state management).
Build text-to-speech applications using Qwen3-TTS, a powerful speech generation system supporting voice clone, voice design, and custom voice synthesis. Use when creating TTS applications, generating speech from text, cloning voices from audio samples, designing new voices via natural language descriptions, or fine-tuning TTS models. Supports 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian).
Process multimodal inputs (images, video, audio, PDFs) with Gemini 3 Pro. Covers image understanding, video analysis, audio processing, document extraction, media resolution control, OCR, and token optimization. Use when analyzing images, processing video, transcribing audio, extracting PDF content, or working with multimodal data.
State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX. Provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. The industry standard for Large Language Models (LLMs) and foundation models in science.
Master router for immersive visual experiences combining React Three Fiber, shaders, particles, post-processing, GSAP animation, and audio. Use when building 3D web experiences, visualizers, creative coding projects, interactive installations, or any project requiring multiple visual technologies. Dispatches to 6 specialized domain routers covering 29 total skills.
Use this skill when building AI voice agents with the ElevenLabs Agents Platform. This skill covers the complete platform including agent configuration (system prompts, turn-taking, workflows), voice & language features (multi-voice, pronunciation, speed control), knowledge base (RAG), tools (client/server/MCP/system), SDKs (React, JavaScript, React Native, Swift, Widget), Scribe (real-time STT), WebRTC/WebSocket connections, testing & evaluation, analytics, privacy/compliance (GDPR/HIPAA/SOC 2), cost optimization, CLI workflows ("agents as code"), and DevOps integration. Prevents 17+ common errors including package deprecation, Android audio cutoff, CSP violations, missing dynamic variables, case-sensitive tool names, webhook authentication failures, and WebRTC configuration issues. Provides production-tested templates for React, Next.js, React Native, Swift, and Cloudflare Workers. Token savings: ~73% (22k → 6k tokens). Production tested. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
Download videos, audio, playlists, and channels from YouTube and 1000+ websites using yt-dlp. Supports quality selection, format conversion, subtitle download, playlist filtering, metadata extraction, thumbnail download, and batch operations. Use when downloading YouTube videos in any quality (4K, 8K, HDR), extracting audio as MP3/M4A/FLAC, downloading entire playlists/channels, getting subtitles in multiple languages, converting to specific formats, downloading live streams, archiving content, or batch processing multiple URLs. Optimized for reliability with automatic retries, rate limiting, and error handling.
Complete guide for Google Gemini API using the CORRECT current SDK (@google/genai v1.27+, NOT the deprecated @google/generative-ai). Covers text generation, multimodal inputs (text + images + video + audio + PDFs), function calling, thinking mode, streaming, and system instructions with accurate 2025 model information (Gemini 2.5 Pro/Flash/Flash-Lite with 1M input tokens, NOT 2M). Use when: integrating Gemini API, implementing multimodal AI applications, using thinking mode for complex reasoning, function calling with parallel execution, streaming responses, deploying to Cloudflare Workers, building chat applications, or encountering SDK deprecation warnings, context window errors, model not found errors, function calling failures, or multimodal format errors. Keywords: gemini api, @google/genai, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite, multimodal gemini, thinking mode, google ai, genai sdk, function calling gemini, streaming gemini, gemini vision, gemini video, gemini audio, gemini pdf, system instructions, multi-turn chat, DEPRECATED @google/generative-ai, gemini context window, gemini models 2025, gemini 1m tokens, gemini tool use, parallel function calling, compositional function calling
Production-tested setup for AutoAnimate (@formkit/auto-animate) - a zero-config, drop-in animation library that automatically adds smooth transitions when DOM elements are added, removed, or moved. This skill should be used when building UIs that need simple, automatic animations for lists, accordions, toasts, or form validation messages without the complexity of full animation libraries. Use when: Adding smooth animations to dynamic lists, building filter/sort interfaces, creating accordion components, implementing toast notifications, animating form validation messages, needing simple transitions without animation code, working with Vite + React + Tailwind, deploying to Cloudflare Workers Static Assets, or encountering SSR errors with animation libraries. Keywords: auto-animate, @formkit/auto-animate, formkit, zero-config animation, automatic animations, drop-in animation, list animations, accordion animation, toast animation, form validation animation, lightweight animation, 2kb animation, prefers-reduced-motion, accessible animations, vite react animation, cloudflare workers animation, ssr safe animation
Complete guide for OpenAI's traditional/stateless APIs: Chat Completions (GPT-5, GPT-4o), Embeddings, Images (DALL-E 3), Audio (Whisper + TTS), and Moderation. Includes both Node.js SDK and fetch-based approaches for maximum compatibility. Use when: integrating OpenAI APIs, implementing chat completions with GPT-5/GPT-4o, generating text with streaming, using function calling/tools, creating structured outputs with JSON schemas, implementing embeddings for RAG, generating images with DALL-E 3, transcribing audio with Whisper, synthesizing speech with TTS, moderating content, deploying to Cloudflare Workers, or encountering errors like rate limits (429), invalid API keys (401), function calling failures, streaming parse errors, embeddings dimension mismatches, or token limit exceeded. Keywords: openai api, chat completions, gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o, gpt-4-turbo, openai sdk, openai streaming, function calling, structured output, json schema, openai embeddings, text-embedding-3, dall-e-3, image generation, whisper api, openai tts, text-to-speech, moderation api, openai fetch, cloudflare workers openai, openai rate limit, openai 429, reasoning_effort, verbosity
Python package for working with DICOM files. It allows you to read, modify, and write DICOM data in a Pythonic way. Essential for medical imaging processing, clinical data extraction, and AI in radiology.
Cross-compile OBS Studio plugins from Linux to Windows using MinGW, CMake presets, and CI/CD workflows. Covers toolchain files, headers-only linking, OBS SDK fetching, and multi-platform artifact packaging. Use when building OBS plugins for Windows from Linux or setting up CI pipelines.