Search Results: dpo

Found 25 Skills

AI & Machine Learningsickn33/antigravity-aweso...

hugging-face-model-trainer

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for...

🇺🇸|EnglishTranslated

Security & Compliancemukul975/privacy-data-pro...

brazil-lgpd

Guides compliance with Brazil's Lei Geral de Proteção de Dados (LGPD, Lei 13.709/2018). Covers the 10 lawful bases under Art. 7, DPO appointment, ANPD enforcement, data subject rights under Arts. 17-22, and international transfer mechanisms. Keywords: LGPD, Brazil data protection, ANPD, lawful bases, data subject rights, international transfers.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningnvidia/skills

tao-finetune-huggingface-model

Fine-tune any HuggingFace CV / VLM / LLM model on local NVIDIA GPUs inside an NGC PyTorch container. Use when the user wants to fine-tune a HuggingFace model (full or LoRA), train a vision / VLM / LLM model end-to-end, generate a reproducible HF training pipeline, smoke-test a HuggingFace model locally before scale-up, push a fine-tuned model to the HF Hub with a model card, or emit a self-contained rerun skill for an existing HuggingFace finetune. Supports image classification, object detection, semantic / instance / panoptic segmentation, depth estimation, image-text-to-text VLM (SFT / LoRA), and LLM SFT / DPO / GRPO. Six-step workflow: inspect and qualify, hardware and NGC image, research, generate and smoke, train + eval + infer, push and emit rerun skill.

🇺🇸|EnglishTranslated

17 scripts/Attention

AI & Machine Learningvuralserhat86/antigravity...

model_finetuning

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

🇺🇸|EnglishTranslated

AI & Machine Learningitsmostafa/llm-engineerin...

rlhf

Understanding Reinforcement Learning from Human Feedback (RLHF) for aligning language models. Use when learning about preference data, reward modeling, policy optimization, or direct alignment algorithms like DPO.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

fine-tuning-with-trl

🇺🇸|EnglishTranslated

AI & Machine Learningorchestra-research/ai-res...

axolotl

Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support

🇺🇸|EnglishTranslated

AI & Machine Learningawslabs/agent-plugins

finetuning-setup

Selects a base model and fine-tuning technique (SFT, DPO, or RLVR) for the user's use case by querying SageMaker Hub. Use when the user asks which model or technique to use, wants to start fine-tuning, or mentions a model name or family (e.g., "Llama", "Mistral") — always activate even for known model names because the exact Hub model ID must be resolved. Queries available models, validates technique compatibility, and confirms selections.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningawslabs/agent-plugins

dataset-evaluation

Validates dataset formatting and quality for SageMaker model fine-tuning (SFT, DPO, or RLVR). Use when the user says "is my dataset okay", "evaluate my data", "check my training data", "I have my own data", or before starting any fine-tuning job. Detects file format, checks schema compliance against the selected model and technique, and reports whether the data is ready for training or evaluation.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningawslabs/agent-plugins

finetuning

Generates a Jupyter notebook that fine-tunes a base model using SageMaker serverless training jobs. Use when the user says "start training", "fine-tune my model", "I'm ready to train", or when the plan reaches the finetuning step. Supports SFT, DPO, and RLVR trainers, including RLVR Lambda reward function creation.

🇺🇸|EnglishTranslated

2 scripts/Checked

Backend Developmentauth0/agent-skills

auth0-aspnetcore-api

Use when securing ASP.NET Core Web API endpoints with JWT Bearer token validation, scope/permission checks, or stateless auth - integrates Auth0.AspNetCore.Authentication.Api for REST APIs receiving access tokens from frontends or mobile apps. Also handles DPoP proof-of-possession token binding. Triggers on: AddAuth0ApiAuthentication, .NET Web API auth, JWT validation, UseAuthentication, UseAuthorization.

🇺🇸|EnglishTranslated