Loading...
Loading...
Found 13 Skills
Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks. Triggers: "image analysis", "computer vision", "OCR", "object detection", "ImageAnalysisClient", "image caption".
Multimodal UI understanding and single-step planning via OpenAI-compatible Responses APIs. Use when you need AIQuery/AIAssert and plan-next to extract UI element coordinates, validate UI assertions, summarize screenshots, or decide the next UI action from an image. External agents handle execution via adb/hdc and multi-step loops. Defaults to Doubao models but can be pointed at other multimodal providers via base URL, API key, and model name.
Azure AI Vision integration. Manage data, records, and automate workflows. Use when the user wants to interact with Azure AI Vision data.
Android device control and UI automation via ADB using a TypeScript helper CLI. Use for device/emulator discovery, USB or Wi-Fi connection, app launch/force-stop, tap/swipe/keyevent/text input, screenshots, APK install handling, device reset for app, and ADB troubleshooting. Use with ai-vision for screenshot-based UI recognition and coordinate decisions.
Expert knowledge for Azure AI Custom Vision development including best practices, decision making, limits & quotas, security, integrations & coding patterns, and deployment. Use when exporting Custom Vision models, calling prediction APIs, using ONNX/TensorFlow, managing CMK/RBAC, or Smart Labeler, and other Azure AI Custom Vision related development tasks. Not for Azure AI Vision (use azure-ai-vision), Azure AI services (use microsoft-foundry-tools), Azure Machine Learning (use azure-machine-learning), Azure AI Foundry Local (use microsoft-foundry-local).
Expert knowledge for Azure AI Video Indexer development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when using Video Indexer APIs/widgets, live camera indexing, custom speech/brand models, or Azure OpenAI integrations, and other Azure AI Video Indexer related development tasks. Not for Azure AI services (use microsoft-foundry-tools), Azure AI Vision (use azure-ai-vision).
Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input, and multi-turn conversations. Use when other skills need image generation backend, or when user requests "generate image with Gemini", "Gemini text generation", or needs vision-capable AI generation.
Automated collection process for WeChat Channels search and result traversal (Android), supporting scenarios such as comprehensive page search and personal page search.
Use this skill when analyzing existing video files using FFmpeg and AI vision, extracting frames for design system generation, detecting scene boundaries, analyzing animation timing, extracting color palettes, or understanding audio-visual sync. Triggers on video analysis, frame extraction, scene detection, ffprobe, motion analysis, and AI vision analysis of video content.
Analyzes images using a vision-capable LLM (Optic). Can read workspace images, URLs, base64 data, or previously generated images by ID.
Analyze images using AI — segment objects, detect objects, extract text (OCR), describe images, ask questions about images. Use when the user requests "Segment image", "Detect objects", "OCR", "Extract text from image", "Describe image", "What's in this image", "Image analysis".
Analyze images using OpenAI's Vision API. Use bash command to execute the vision script like 'bash <base_dir>/scripts/vision.sh <image> <question>'. Can understand image content, objects, text, colors, and answer questions about images.