Search Results: computer-vision

Found 41 Skills

tao-generate-image-grounding

Two-step image grounding pipeline: extracts referring expressions from (image, caption) pairs and grounds them to pixel-space bounding boxes via a VLM. Use when the user wants to ground captions to bboxes, generate phrase-grounded annotations, auto-label images for grounding, or run the image_grounding pipeline. Triggers include 'image grounding', 'phrase grounding', 'ground captions', 'auto-label image grounding', 'image_grounding'.

🇺🇸|EnglishTranslated

AI & Machine Learningletta-ai/skills

video-processing

Guide for video analysis and frame-level event detection tasks using OpenCV and similar libraries. This skill should be used when detecting events in videos (jumps, movements, gestures), extracting frames, analyzing motion patterns, or implementing computer vision algorithms on video data. It provides verification strategies and helps avoid common pitfalls in video processing workflows.

🇺🇸|EnglishTranslated

Automationrdmptv/adbautoplayer

moai-domain-adb

Comprehensive ADB (Android Debug Bridge) automation skill for game bot development, device management, computer vision integration, and Tauri-Python orchestration. Provides modular expertise for building intelligent Android automation workflows.

🇺🇸|EnglishTranslated

61 scripts/Attention

AI & Machine Learningerichowens/some_claude_sk...

drone-inspection-specialist

Advanced CV for infrastructure inspection including forest fire detection, wildfire precondition assessment, roof inspection, hail damage analysis, thermal imaging, and 3D Gaussian Splatting reconstruction. Expert in multi-modal detection, insurance risk modeling, and reinsurance data pipelines. Activate on "fire detection", "wildfire risk", "roof inspection", "hail damage", "thermal analysis", "Gaussian Splatting", "3DGS", "insurance inspection", "defensible space", "property assessment", "catastrophe modeling", "NDVI", "fuel load". NOT for general drone flight control, SLAM, path planning, or sensor fusion (use drone-cv-expert), GPU shader development (use metal-shader-expert), or generic object detection without inspection context (use clip-aware-embeddings).

🇺🇸|EnglishTranslated

AI & Machine Learningdavidcastagnetoa/skills

histogram_analysis

Detectar sobreexposición, subexposición e iluminación desigual en frames capturados

🇺🇸|EnglishTranslated

AI & Machine Learningdavidcastagnetoa/skills

esrgan_super_resolution

Super-resolución de la foto del documento para mejorar calidad del face match cuando la foto es de baja resolución

🇺🇸|EnglishTranslated

AI & Machine Learningthincher/skills

minimax-understand-image

Use MiniMax MCP for image understanding and analysis. Trigger conditions: (1) Users request to analyze images, understand images, describe image content (2) Need to identify objects, text, and scenes in images (3) Use MiniMax's understand_image feature

🇨🇳|ChineseTranslated

1 scripts/Checked

AI & Machine Learningnvidia/skills

video-search

Search video archives using natural language — find events, objects, actions, and people across recorded video using fusion search (Cosmos Embed1 semantic search + CV attribute search). Use when asked to search for something in video, find actions and events, locate objects and people, or query video archives. For these types of questions, default to this top-level fusion search unless user specifies otherwise. Requires the search profile to be deployed.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

tao-train-depth-anything-v2

Monocular depth estimation using Metric Depth Anything v2 or Relative Depth Anything architectures. Predicts per-pixel depth from single RGB images. Use when training, evaluating, exporting, or running inference for a TAO monocular depth model. Trigger phrases include "train monocular depth", "DepthAnything v2", "metric depth from single image", "monocular depth estimation".

🇺🇸|EnglishTranslated

AI & Machine Learninglifangda/claude-plugins

transformers

Work with state-of-the-art machine learning models for NLP, computer vision, audio, and multimodal tasks using HuggingFace Transformers. This skill should be used when fine-tuning pre-trained models, performing inference with pipelines, generating text, training sequence models, or working with BERT, GPT, T5, ViT, and other transformer architectures. Covers model loading, tokenization, training with Trainer API, text generation strategies, and task-specific patterns for classification, NER, QA, summarization, translation, and image tasks. (plugin:scientific-packages@claude-scientific-skills)

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningaradotso/trending-skills

netryx-street-level-geolocation

Use Netryx to index street-view panoramas and geolocate any street-level photo to precise GPS coordinates using CosPlace, ALIKED/DISK, and LightGlue.

🇺🇸|EnglishTranslated

AI & Machine Learningcountbot-ai/countbot

image-analysis

图片分析与识别，可分析本地图片、网络图片、视频、文件。适用于 OCR、物体识别、场景理解等。当用户发送图片或要求分析图片时必须使用此技能。

🇺🇸|EnglishTranslated

2 scripts/Checked