Search Results: ai-assistant

Found 186 Skills

AI & Machine Learningwhitespectre/ai-assistant...

eval-relevance

Score assistant responses for relevance on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate relevance, grade relevance, or critique topical alignment.

🇺🇸|EnglishTranslated

AI & Machine Learningteam-telnyx/skills

telnyx-ai-assistants-javascript

AI voice assistants with custom instructions, knowledge bases, and tool integrations.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/ai-agent-skills

cowagent-ai-assistant

Build and deploy autonomous AI agents with CowAgent - planning, memory, knowledge base, skills, and multi-channel support

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

analyze-trace-failures

Read production traces, identify what's failing, and build failure taxonomies using open coding and axial coding methodology. Use when debugging agent or pipeline quality, investigating "why are my outputs bad?", or before building any evaluator — error analysis must come first. Do NOT use when you already have identified failure modes and need evaluators (use build-evaluator) or datasets (use generate-synthetic-dataset).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

invoke-deployment

Invoke orq.ai deployments, agents, and models via the Python SDK or HTTP API. Use when a user wants to call a deployment with prompt variables, invoke an agent in a conversation, or call a model directly through the AI Router. Do NOT use for creating or editing deployments/agents (use optimize-prompt or build-agent). Do NOT use for running evaluations (use run-experiment).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

run-experiment

Create and run orq.ai experiments — compare configurations against datasets using evaluators, analyze results, and generate prioritized action plans. Use when evaluating LLM agents, deployments, conversations, or RAG pipelines end-to-end. Do NOT use without a dataset and evaluators. Do NOT use for cross-framework comparisons with external agents (use compare-agents).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

build-agent

Design, create, and configure orq.ai Agents with tools, instructions, knowledge bases, and memory stores. Use when building new agents, attaching KBs or memory, writing system instructions, selecting models, or setting up RAG pipelines. Do NOT use for debugging existing agents (use analyze-trace-failures) or comparing agents across frameworks (use compare-agents).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

setup-observability

Set up orq.ai observability for LLM applications. Use when setting up tracing, adding the AI Router proxy, integrating OpenTelemetry, auditing existing instrumentation, or enriching traces with metadata.

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

generate-synthetic-dataset

Generate and curate evaluation datasets — structured generation via dimensions-tuples-NL, quick from description, expansion from existing data, plus dataset maintenance through deduplication, rebalancing, and gap-filling. Use when creating eval data, expanding test coverage, or cleaning datasets. Do NOT use when sufficient real production data exists (use analyze-trace-failures instead). Do NOT use for evaluator creation (use build-evaluator).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

compare-agents

Run cross-framework agent comparisons using evaluatorq from orqkit — compares any combination of agents (orq.ai, LangGraph, CrewAI, OpenAI Agents SDK, Vercel AI SDK) head-to-head on the same dataset with LLM-as-a-judge scoring. Use when comparing agents, benchmarking, or wanting side-by-side evaluation. Do NOT use when comparing only orq.ai configurations with no external agents (use run-experiment instead).

🇺🇸|EnglishTranslated

AI & Machine Learningorq-ai/assistant-plugins

build-evaluator

Create validated LLM-as-a-Judge evaluators following best practices — binary Pass/Fail judges with TPR/TNR validation for measuring specific failure modes. Use when you need to automate quality checks, build guardrails, or measure a specific failure mode identified during trace analysis. Do NOT use when failures are fixable with prompt changes (use optimize-prompt) or when failure modes are unknown (use analyze-trace-failures first).

🇺🇸|EnglishTranslated

AI & Machine Learningwhitespectre/ai-assistant...

eval-guidance-actionability

Score assistant responses for guidance & actionability on a strict 1-5 scale, then return strict JSON only with dimension, score, rationale, and improvement suggestions. Use when the user asks to evaluate how actionable, helpful, or step-by-step a response is.

🇺🇸|EnglishTranslated