Total 50,708 skills, AI & Machine Learning has 8496 skills
Showing 12 of 8496 skills
Fine-tune any HuggingFace CV / VLM / LLM model on local NVIDIA GPUs inside an NGC PyTorch container. Use when the user wants to fine-tune a HuggingFace model (full or LoRA), train a vision / VLM / LLM model end-to-end, generate a reproducible HF training pipeline, smoke-test a HuggingFace model locally before scale-up, push a fine-tuned model to the HF Hub with a model card, or emit a self-contained rerun skill for an existing HuggingFace finetune. Supports image classification, object detection, semantic / instance / panoptic segmentation, depth estimation, image-text-to-text VLM (SFT / LoRA), and LLM SFT / DPO / GRPO. Six-step workflow: inspect and qualify, hardware and NGC image, research, generate and smoke, train + eval + infer, push and emit rerun skill.
Practical guidance for training MoE VLMs in Megatron Bridge. Compares FSDP and 3D-parallel approaches, using rounded lessons from Qwen3-VL, Qwen3-Next, and other multimodal experiments.
Stereo depth estimation using FoundationStereo. Predicts disparity maps from stereo image pairs for 3D reconstruction. Use when training, evaluating, exporting, or running inference for a TAO FoundationStereo model. Trigger phrases include "train stereo depth", "FoundationStereo", "stereo disparity estimation", "3D reconstruction from stereo".
Plan, configure, and chain repo-native Nemotron customization steps into single-step or multi-step pipelines: curation, translation, SFT/PEFT (AutoModel or Megatron-Bridge), pretraining/CPT, RL alignment (DPO/RLVR/GRPO/RLHF), BYOB/MCQ benchmarks, checkpoint conversion, ModelOpt optimization, env profiles, and evaluation of trained checkpoints or existing/hosted endpoints. Use when a request names a Nemotron step or workflow, or asks to clean, translate, train, fine-tune, align, convert, optimize, evaluate, or compose these into a pipeline. Do NOT use for frontend/dashboard/visualization work, generic ML advice, billing/access, or non-Nemotron coding tasks.
PyTorch-based TAO image classification. Supports a wide range of backbones (FAN, EfficientNet, ResNet, etc.) with distillation and quantization for deployment. Use when training, evaluating, distilling, quantizing, exporting, or running inference for a TAO image-classification (PyT) model. Trigger phrases include "train image classifier", "TAO classification", "ResNet/EfficientNet/FAN backbone classifier", "classification-pyt".
Playbook for launching, monitoring, stopping, and debugging NeMo-RL recipes on a Kubernetes cluster via the nrl-k8s CLI. Covers ephemeral vs long-lived RayCluster modes, iterating on runs, and debugging hung or failed training jobs.
Used when executing implementation plans containing independent tasks in the current session
De-slop pass for any text: detects and erases the statistical fingerprints of AI writing (negative parallelism / "not X but Y", em-dash abuse, rule-of-three, false ranges, puffery vocabulary, uniform cadence, hedged both-sidesing) and rewrites the text into its target register — academic article, tweet, reddit post, email, blog, anything between. Use when the user says "fuck slop", "f*ck slop", "deslop", "de-slop this", "remove the AI tells", "humanize this", "make this not sound like AI", or invokes /fuck-slop. Also use before publishing any agent-drafted prose.
Interactively onboard a project to agent-driven development by running a structured interview and generating a complete AGENTS.md (or CLAUDE.md). Use this skill whenever a user mentions "AGENTS.md", "CLAUDE.md", "agent behavior", "agent instructions", "agent config", "set up agent rules", "onboard agent", "configure claude code", "agent guardrails", "agent workflow", or asks how to tell an AI agent how to behave in their project — even if they just say "help me write AGENTS.md" or "what should go in CLAUDE.md". Always prefer this skill over ad-hoc agent instruction generation.
Expert skill for OmniVoice, a massively multilingual zero-shot TTS model supporting 600+ languages with voice cloning and voice design capabilities.
Build and use free-code, the open-source fork of Claude Code CLI with telemetry removed, guardrails stripped, and all experimental features unlocked.
On-device, real-time multimodal AI voice and vision assistant powered by Gemma 4 E2B and Kokoro TTS, running entirely locally via FastAPI WebSocket server.