Search Results: agent-comparison

Found 4 Skills

AI & Machine Learningnotque/claude-code-toolki...

agent-comparison

A/B test agent variants measuring quality and total session token cost across simple and complex benchmarks. Use when creating compact agent versions, validating agent changes, comparing internal vs external agents, or deciding between variants for production. Use for "compare agents", "A/B test", "benchmark agents", or "test agent efficiency". Do NOT use for evaluating single agents, testing skills, or optimizing prompts without variant comparison.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningorq-ai/assistant-plugins

compare-agents

Run cross-framework agent comparisons using evaluatorq from orqkit — compares any combination of agents (orq.ai, LangGraph, CrewAI, OpenAI Agents SDK, Vercel AI SDK) head-to-head on the same dataset with LLM-as-a-judge scoring. Use when comparing agents, benchmarking, or wanting side-by-side evaluation. Do NOT use when comparing only orq.ai configurations with no external agents (use run-experiment instead).

🇺🇸|EnglishTranslated

AI & Machine Learningaffaan-m/everything-claud...

agent-eval

Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics

🇺🇸|EnglishTranslated

AI & Machine Learningolorolor/fundamentals-wit...

module5-tools-ecosystem

AI 개발/활용 도구 생태계(LangChain, LangGraph, CrewAI, 코딩 에이전트 등)를 비교하고 목적에 맞게 선택하는 모듈.

🇺🇸|EnglishTranslated