Total 50,131 skills
Showing 12 of 50131 skills
Multi-step video annotation pipeline that turns raw videos into Chain-of-Thought training data — multi-level captions, structured descriptions, and QA pairs (MCQ, binary, open-ended) with reasoning traces, via VLM/LLM distillation. Use when the user wants to "create video training data", "generate video QA datasets", "build CoT reasoning traces from videos", "auto-label videos", or run the video_reasoning_annotation pipeline. Triggers include "video annotation", "video CoT", "video QA", "chain-of-thought", "video captioning pipeline", "video distillation".
Converts cuTile Python GPU kernels (@ct.kernel) to cuTile.jl Julia equivalents. Handles kernel syntax translation, 0-indexed to 1-indexed conversion, broadcasting differences, memory layout (row-major to column-major), type system mapping, and launch API differences. Use when converting, porting, or translating cuTile Python kernels to Julia cuTile.jl, or debugging/optimizing existing Julia cuTile translations.
Extract false-positive and false-negative gaps from VLM binary-classification-question (BCQ, yes/no) predictions. Use after running VLM evaluation when you have a predictions JSON and need to identify failure cases for DEFT root cause analysis on a binary-classification VLM workflow.
DGX Cloud Lepton managed GPU compute platform with run/status/cancel interface. Use when submitting TAO jobs to DGX Cloud, dispatching training/eval/inference to Lepton GPU resources, or managing Lepton workspace deployments. Trigger phrases include "run on Lepton", "submit to DGX Cloud", "Lepton job", "managed GPU on DGX Cloud".
Kubernetes execution platform — submits TAO container jobs as single-pod k8s Jobs with NVIDIA GPU scheduling. Use when running on EKS / GKE / AKS / on-prem clusters with the NVIDIA GPU Operator installed, or when integrating TAO into an existing k8s-native ML platform.
Flash the FPGA on an HSB board connected to an NVIDIA devkit. Supports HSB Lattice boards (FPGA versions 2407, 2412, 2507, 2510) and Leopard Imaging VB1940 "all-in-one" cameras (FPGA versions 2507, 2510). Uses release-specific YAML manifests and board-type-specific program commands. Lattice and VB1940 commands must never be mixed.
Masked Auto-Encoder (MAE) for self-supervised pretraining and fine-tuning. Masks random patches and reconstructs them to learn visual representations; supports pretrain and finetune stages. Use when training, evaluating, exporting, or running inference for a TAO MAE backbone. Trigger phrases include "pretrain MAE", "self-supervised vision pretraining", "Masked Autoencoder", "Mask Auto-Encoder", "MAE fine-tune".
DINO (DETR with Improved DeNoising Anchor Boxes) for 2D object detection. Transformer-based detector with denoising training, multi-scale features, and optional distillation support. Use when training, evaluating, exporting, distilling, quantizing, or running inference for a TAO DINO detector. Trigger phrases include "train DINO", "DETR object detection", "TAO 2D detection", "DINO with distillation".
PointPillars for 3D object detection from LiDAR point clouds. Encodes point clouds into a pseudo-image via a pillar-based representation, then applies 2D detection — used in autonomous driving and robotics. Use when training, evaluating, exporting, pruning, retraining, or running inference for a TAO PointPillars model. Trigger phrases include "train PointPillars", "LiDAR 3D detection", "point-cloud object detection", "pillar-based 3D detector".
OCRNet for scene text recognition. Recognizes text content from cropped text-region images and supports CTC and attention-based decoders. Use when training, evaluating, exporting, pruning, quantizing, retraining, or running inference for a TAO OCRNet model. Trigger phrases include "train OCRNet", "scene text recognition", "OCR cropped text", "CTC / attention text decoder".
Luban - Skill Polishing Workshop. Transform a "usable Skill" into a public Skill asset that is "understandable, installable, shareable, verifiable, and continuously evolvable". The methodology consists of five craftsman-like steps: 1. Material Inspection: First challenge whether the premise of this Skill is valid; directly state if the "material" is not worth polishing. 2. Peer Research: Search for similar Skills online to clarify its position in the ecosystem. 3. Dimension Measurement: Evaluate using three metrics - structure, actual testing, and live verification (live verification means reconciling with real running outputs; a green CI can be deceptive). 4. Iterative Refinement: Freeze the original version as a baseline; only retain changes that pass the verification gate, otherwise revert. Try to institutionalize verification methods as tools and rules in the repository. 5. Post-Release Iteration: Release is not the end; maintain a benchmark observation list, and start the next iteration based on real feedback. This tool is used when users want to upgrade, optimize, polish, productize, or release their self-developed Skills. The final deliverables include a structured Skill Polishing Report, directly replaceable rewritten segments, and a shareable "Graduation Certificate" result card that can be screenshot. Trigger phrases include but are not limited to: "Let Luban take a look at this skill", "Polish at Luban's Workshop", "Polish my skill", "Upgrade my skill", "Optimize this skill", "Skill check-up", "Skill audit", "Productize my skill", "How to release this skill", "Benchmark against similar skills", "Why no one installs my skill", "Help me publish my skill to GitHub/ClawHub", "Improve SKILL.md". Even if users only provide a Skill directory, GitHub repository link, or a segment of SKILL.md saying "Help me figure out how to modify it", it should be triggered as long as the context is about making the Skill more usable and shareable. Do NOT use this for creating a new Skill from scratch (use skill-creator), regular code review (use code-review), or rewriting ordinary prompts unrelated to Skill assets.
店铺巡检工作流:编排 store list + store open + page visit + page screenshot 完成店铺状态检查和截图巡检。适用于定期检查店铺页面状态、批量截图存档。