Search Results: vlm

Found 35 Skills

perf-moe-vlm-training

Practical guidance for training MoE VLMs in Megatron Bridge. Compares FSDP and 3D-parallel approaches, using rounded lessons from Qwen3-VL, Qwen3-Next, and other multimodal experiments.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

adding-model-support

Guide for adding support for new LLM or VLM models in Megatron-Bridge. Covers bridge, provider, recipe, tests, docs, and examples.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

tao-analyze-gaps-vlm-bcq

Extract false-positive and false-negative gaps from VLM binary-classification-question (BCQ, yes/no) predictions. Use after running VLM evaluation when you have a predictions JSON and need to identify failure cases for DEFT root cause analysis on a binary-classification VLM workflow.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

tao-generate-image-grounding

Two-step image grounding pipeline: extracts referring expressions from (image, caption) pairs and grounds them to pixel-space bounding boxes via a VLM. Use when the user wants to ground captions to bboxes, generate phrase-grounded annotations, auto-label images for grounding, or run the image_grounding pipeline. Triggers include 'image grounding', 'phrase grounding', 'ground captions', 'auto-label image grounding', 'image_grounding'.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

tao-generate-video-reasoning-annotations

Multi-step video annotation pipeline that turns raw videos into Chain-of-Thought training data — multi-level captions, structured descriptions, and QA pairs (MCQ, binary, open-ended) with reasoning traces, via VLM/LLM distillation. Use when the user wants to "create video training data", "generate video QA datasets", "build CoT reasoning traces from videos", "auto-label videos", or run the video_reasoning_annotation pipeline. Triggers include "video annotation", "video CoT", "video QA", "chain-of-thought", "video captioning pipeline", "video distillation".

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningnvidia/skills

nemo-mbridge-perf-moe-dispatcher-selection

Choose the right MoE token dispatcher (`alltoall`, DeepEP, or HybridEP) for the hardware, EP degree, and optimization stage. Summarizes patterns from DSV3, Qwen3, Qwen3-Next, and VLM bring-up work.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

tao-finetune-huggingface-model

Fine-tune any HuggingFace CV / VLM / LLM model on local NVIDIA GPUs inside an NGC PyTorch container. Use when the user wants to fine-tune a HuggingFace model (full or LoRA), train a vision / VLM / LLM model end-to-end, generate a reproducible HF training pipeline, smoke-test a HuggingFace model locally before scale-up, push a fine-tuned model to the HF Hub with a model card, or emit a self-contained rerun skill for an existing HuggingFace finetune. Supports image classification, object detection, semantic / instance / panoptic segmentation, depth estimation, image-text-to-text VLM (SFT / LoRA), and LLM SFT / DPO / GRPO. Six-step workflow: inspect and qualify, hardware and NGC image, research, generate and smoke, train + eval + infer, push and emit rerun skill.

🇺🇸|EnglishTranslated

17 scripts/Attention

Backend Developmentnvidia/skills

vss-manage-video-io-storage

Use to call the VIOS REST API (sensor list, timelines, clip extraction, snapshots, add/delete sensors and streams). Not for VLM inference or search.

🇺🇸|EnglishTranslated

AI & Machine Learningtogethercomputer/skills

together-fine-tuning

LoRA, full fine-tuning, DPO preference tuning, VLM training, function-calling tuning, reasoning tuning, and BYOM uploads on Together AI. Reach for it whenever the user wants to adapt a model on custom data rather than only run inference, evaluate outputs, or host an existing model.

🇺🇸|EnglishTranslated

5 scripts/Checked

AI & Machine Learningnvidia/skills

vss-deploy-detection-tracking-2d

Use to deploy, run, debug, or tear down the RTVI-CV 2D detection / tracking microservice and call its REST API. Not for VLM, embedding, or analytics — use the matching vss-* skill.

🇺🇸|EnglishTranslated

24 scripts/Attention

Document Processingguglxni/hyperbots-agent-s...

hyperbots-api

Integrate with HyperAPI for financial document processing - OCR text extraction, document classification, PDF splitting, and structured data extraction from invoices, receipts, and financial documents. Use when the user needs to parse PDFs, extract text from documents, classify document types, split multi-document PDFs, or extract structured entities like invoice numbers, vendor names, line items. Keywords: hyperapi, hyperbots, document parsing, OCR, PDF processing, invoice extraction, receipt processing, document classification, VLM, vision language model.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningwanshuiyin/auto-claude-co...

experiment-plan

Turn a refined research proposal or method idea into a detailed, claim-driven experiment roadmap. Use after `research-refine`, or when the user asks for a detailed experiment plan, ablation matrix, evaluation protocol, run order, compute budget, or paper-ready validation that supports the core problem, novelty, simplicity, and any LLM / VLM / Diffusion / RL-based contribution.

🇺🇸|EnglishTranslated