Search Results: ai-assistant-plugins

Found 10 Skills

AI & Machine Learningorq-ai/assistant-plugins

run-experiment

Create and run orq.ai experiments — compare configurations against datasets using evaluators, analyze results, and generate prioritized action plans. Use when evaluating LLM agents, deployments, conversations, or RAG pipelines end-to-end. Do NOT use without a dataset and evaluators. Do NOT use for cross-framework comparisons with external agents (use compare-agents).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

build-agent

Design, create, and configure orq.ai Agents with tools, instructions, knowledge bases, and memory stores. Use when building new agents, attaching KBs or memory, writing system instructions, selecting models, or setting up RAG pipelines. Do NOT use for debugging existing agents (use analyze-trace-failures) or comparing agents across frameworks (use compare-agents).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

analyze-trace-failures

Read production traces, identify what's failing, and build failure taxonomies using open coding and axial coding methodology. Use when debugging agent or pipeline quality, investigating "why are my outputs bad?", or before building any evaluator — error analysis must come first. Do NOT use when you already have identified failure modes and need evaluators (use build-evaluator) or datasets (use generate-synthetic-dataset).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

invoke-deployment

Invoke orq.ai deployments, agents, and models via the Python SDK or HTTP API. Use when a user wants to call a deployment with prompt variables, invoke an agent in a conversation, or call a model directly through the AI Router. Do NOT use for creating or editing deployments/agents (use optimize-prompt or build-agent). Do NOT use for running evaluations (use run-experiment).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

compare-agents

Run cross-framework agent comparisons using evaluatorq from orqkit — compares any combination of agents (orq.ai, LangGraph, CrewAI, OpenAI Agents SDK, Vercel AI SDK) head-to-head on the same dataset with LLM-as-a-judge scoring. Use when comparing agents, benchmarking, or wanting side-by-side evaluation. Do NOT use when comparing only orq.ai configurations with no external agents (use run-experiment instead).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

build-evaluator

Create validated LLM-as-a-Judge evaluators following best practices — binary Pass/Fail judges with TPR/TNR validation for measuring specific failure modes. Use when you need to automate quality checks, build guardrails, or measure a specific failure mode identified during trace analysis. Do NOT use when failures are fixable with prompt changes (use optimize-prompt) or when failure modes are unknown (use analyze-trace-failures first).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

setup-observability

Set up orq.ai observability for LLM applications. Use when setting up tracing, adding the AI Router proxy, integrating OpenTelemetry, auditing existing instrumentation, or enriching traces with metadata.

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

optimize-prompt

Analyze and optimize system prompts using a structured prompting guidelines framework — AI-powered analysis and rewriting. Use when a prompt needs improvement, experiment results show quality gaps, or you want a structured review of an existing system prompt. Do NOT use when production traces show failures (use analyze-trace-failures first to identify patterns). Do NOT use to build evaluators (use build-evaluator).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

generate-synthetic-dataset

Generate and curate evaluation datasets — structured generation via dimensions-tuples-NL, quick from description, expansion from existing data, plus dataset maintenance through deduplication, rebalancing, and gap-filling. Use when creating eval data, expanding test coverage, or cleaning datasets. Do NOT use when sufficient real production data exists (use analyze-trace-failures instead). Do NOT use for evaluator creation (use build-evaluator).

🇺🇸|EnglishTranslated

AI & Machine Learningjeremylongshore/claude-co...

plugin-creator

Create automatically creates new AI assistant code plugins with proper structure, validation, and marketplace integration when user mentions creating a plugin, new plugin, or plugin from template. specific to AI assistant-code-plugins repository workflow. Use when generating or creating new content. Trigger with phrases like 'generate', 'create', or 'scaffold'.

🇺🇸|EnglishTranslated