Loading...
Loading...
Found 4 Skills
A/B test agent variants measuring quality and total session token cost across simple and complex benchmarks. Use when creating compact agent versions, validating agent changes, comparing internal vs external agents, or deciding between variants for production. Use for "compare agents", "A/B test", "benchmark agents", or "test agent efficiency". Do NOT use for evaluating single agents, testing skills, or optimizing prompts without variant comparison.
Run cross-framework agent comparisons using evaluatorq from orqkit — compares any combination of agents (orq.ai, LangGraph, CrewAI, OpenAI Agents SDK, Vercel AI SDK) head-to-head on the same dataset with LLM-as-a-judge scoring. Use when comparing agents, benchmarking, or wanting side-by-side evaluation. Do NOT use when comparing only orq.ai configurations with no external agents (use run-experiment instead).
Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics
AI 개발/활용 도구 생태계(LangChain, LangGraph, CrewAI, 코딩 에이전트 등)를 비교하고 목적에 맞게 선택하는 모듈.