Search Results: ai-vision

Found 13 Skills

AI & Machine Learningmicrosoft/agent-skills

azure-ai-vision-imageanalysis-py

Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks. Triggers: "image analysis", "computer vision", "OCR", "object detection", "ImageAnalysisClient", "image caption".

🇺🇸|EnglishTranslated

AI & Machine Learninghttprunner/skills

ai-vision

Multimodal UI understanding and single-step planning via OpenAI-compatible Responses APIs. Use when you need AIQuery/AIAssert and plan-next to extract UI element coordinates, validate UI assertions, summarize screenshots, or decide the next UI action from an image. External agents handle execution via adb/hdc and multi-step loops. Defaults to Doubao models but can be pointed at other multimodal providers via base URL, API key, and model name.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learningmembranedev/application-s...

azure-ai-vision

Azure AI Vision integration. Manage data, records, and automate workflows. Use when the user wants to interact with Azure AI Vision data.

🇺🇸|EnglishTranslated

Mobile Developmenthttprunner/skills

android-adb

Android device control and UI automation via ADB using a TypeScript helper CLI. Use for device/emulator discovery, USB or Wi-Fi connection, app launch/force-stop, tap/swipe/keyevent/text input, screenshots, APK install handling, device reset for app, and ADB troubleshooting. Use with ai-vision for screenshot-based UI recognition and coordinate decisions.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningmicrosoftdocs/agent-skill...

azure-custom-vision

Expert knowledge for Azure AI Custom Vision development including best practices, decision making, limits & quotas, security, integrations & coding patterns, and deployment. Use when exporting Custom Vision models, calling prediction APIs, using ONNX/TensorFlow, managing CMK/RBAC, or Smart Labeler, and other Azure AI Custom Vision related development tasks. Not for Azure AI Vision (use azure-ai-vision), Azure AI services (use microsoft-foundry-tools), Azure Machine Learning (use azure-machine-learning), Azure AI Foundry Local (use microsoft-foundry-local).

🇺🇸|EnglishTranslated

AI & Machine Learningmicrosoftdocs/agent-skill...

azure-video-indexer

Expert knowledge for Azure AI Video Indexer development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when using Video Indexer APIs/widgets, live camera indexing, custom speech/brand models, or Azure OpenAI integrations, and other Azure AI Video Indexer related development tasks. Not for Azure AI services (use microsoft-foundry-tools), Azure AI Vision (use azure-ai-vision).

🇺🇸|EnglishTranslated

AI & Machine Learningjimliu/baoyu-skills

baoyu-danger-gemini-web

Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input, and multi-turn conversations. Use when other skills need image generation backend, or when user requests "generate image with Gemini", "Gemini text generation", or needs vision-capable AI generation.

🇺🇸|EnglishTranslated

18.5k

24 scripts/Attention

Automationhttprunner/skills

wechat-search-collector

Automated collection process for WeChat Channels search and result traversal (Android), supporting scenarios such as comprehensive page search and personal page search.

🇨🇳|ChineseTranslated

Tools & Utilitiesabsolutelyskilled/absolut...

video-analyzer

Use this skill when analyzing existing video files using FFmpeg and AI vision, extracting frames for design system generation, detecting scene boundaries, analyzing animation timing, extracting color palettes, or understanding audio-visual sync. Triggers on video analysis, frame extraction, scene detection, ffprobe, motion analysis, and AI vision analysis of video content.

🇺🇸|EnglishTranslated

AI & Machine Learningrotoslider/choom

image-analysis

Analyzes images using a vision-capable LLM (Optic). Can read workspace images, URLs, base64 data, or previously generated images by ID.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningfal-ai-community/skills

fal-vision

Analyze images using AI — segment objects, detect objects, extract text (OCR), describe images, ask questions about images. Use when the user requests "Segment image", "Detect objects", "OCR", "Extract text from image", "Describe image", "What's in this image", "Image analysis".

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learningzhayujie/chatgpt-on-wecha...

openai-image-vision

Analyze images using OpenAI's Vision API. Use bash command to execute the vision script like 'bash <base_dir>/scripts/vision.sh <image> <question>'. Can understand image content, objects, text, colors, and answer questions about images.

🇺🇸|EnglishTranslated

1 scripts/Attention