Search Results: llm-alignment

Found 2 Skills

AI & Machine Learningdavila7/claude-code-templ...

simpo-training

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

🇺🇸|EnglishTranslated

AI & Machine Learningomer-metin/skills-for-ant...

reinforcement-learning

Use when implementing RL algorithms, training agents with rewards, or aligning LLMs with human feedback - covers policy gradients, PPO, Q-learning, RLHF, and GRPOUse when ", " mentioned.

🇺🇸|EnglishTranslated