Search Results: llm

Found 1,572 Skills

AI & Machine Learningorq-ai/assistant-plugins

build-evaluator

Create validated LLM-as-a-Judge evaluators following best practices — binary Pass/Fail judges with TPR/TNR validation for measuring specific failure modes. Use when you need to automate quality checks, build guardrails, or measure a specific failure mode identified during trace analysis. Do NOT use when failures are fixable with prompt changes (use optimize-prompt) or when failure modes are unknown (use analyze-trace-failures first).

🇺🇸|EnglishTranslated

AI & Machine Learningcosmix/claude-loom

loom-model-evaluation

Evaluates ML models for performance, fairness, and reliability. Use for metric selection, cross-validation strategies, overfitting/underfitting diagnosis, hyperparameter tuning, LLM evaluation, A/B testing, and production monitoring for model drift.

🇺🇸|EnglishTranslated

AI & Machine Learninggustavogutierrez/engineer...

solution-architect

Design real technical solution architectures for scalable, secure, cost-aware systems by selecting patterns, components, integrations, data flows, and tradeoffs; use when asked for senior solution architecture, system architecture, SaaS architecture, LLM architecture, or architecture decisions after a spec.

🇺🇸|EnglishTranslated

AI & Machine Learninggarrytan/gstack

benchmark-models

Cross-model benchmark for gstack skills. Runs the same prompt through Claude, GPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost, and optionally quality via LLM judge. Answers "which model is actually best for this skill?" with data instead of vibes. Separate from /benchmark, which measures web page performance. Use when: "benchmark models", "compare models", "which model is best for X", "cross-model comparison", "model shootout". (gstack) Voice triggers (speech-to-text aliases): "compare models", "model shootout", "which model is best".

🇺🇸|EnglishTranslated

AI & Machine Learningjiekouai/jiekou-skills

jiekou-docs

Reference Documentation for Jiekou AI Model Services, covering LLM API (OpenAI-compatible), Image/Video/Audio APIs, integration solutions, authentication/billing/pricing/rate limiting, and troubleshooting. Suitable for questions like "How to integrate Jiekou AI into tools such as OpenAI SDK / LangChain?" and issues like Jiekou AI request failures.

🇨🇳|ChineseTranslated

AI & Machine Learningakillness/oh-my-skills

pydantic-ai

Build typed LLM applications with PydanticAI: schema-constrained outputs, tool integration, validation, retries, and deterministic downstream handoffs. Use when users need reliable structured outputs instead of free-form text generation.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

launching-evals

Run, monitor, analyze, and debug LLM evaluations via nemo-evaluator-launcher. Covers running evaluations, checking status and live progress, debugging failed runs, exporting artifacts and logs, and analyzing results. ALWAYS triggers on mentions of running evaluations, checking progress, debugging failed evals, analyzing or analysing runs or results, run directories or artifact paths on clusters, Slurm job issues, invocation IDs, or inspecting logs (client logs, server logs, SSH to cluster, tail logs, grep logs). Do NOT use for creating or modifying evaluation configs.

🇺🇸|EnglishTranslated

AI & Machine Learningsernote/audit-prompt-cach...

audit-prompt-caching

Use whenever the user mentions LLM prompt/prefix cache misses, cached_tokens=0, cache_read_input_tokens/cache_creation_input_tokens, prompt_cache_key, cache_control/cachePoint placement, stable prefixes, tool/schema stability, TTFT/prefill latency, OpenAI/Claude/Bedrock/OpenRouter routing, vLLM/SGLang KV reuse, or LLM cost/speed regressions on repeated long prompts. Use when reviewing LLM request shape changes: prompt text, message order, request builders, tools, schemas, response_format, provider API surface, model/router settings, agent loop structure, context compaction, or inference deployment. Use for speeding up agents only when prompt-cache stability, TTFT, or cache cost is central. Do not use for generic prompt writing, generic RAG design, token counting, or non-LLM performance.

🇺🇸|EnglishTranslated

8 scripts/Attention

Marketing & Growthalirezarezvani/claude-ski...

aeo

Answer Engine Optimization (AEO) skill — optimize content to be cited by AI language models (ChatGPT, Perplexity, Claude, Gemini, Mistral) as authoritative sources. Distinct from SEO — AEO optimizes for citation in LLM-generated responses, not search rankings. Use when planning content for AI-first search audiences, auditing existing content for E-E-A-T signals, tracking which pages get cited by which LLMs, or building a citation-friendly content strategy. Triggers — 'AEO audit', 'optimize for ChatGPT', 'get cited by Perplexity', 'LLM citation strategy', 'answer engine optimization', 'content for AI search', 'E-E-A-T audit'. Output is a markdown audit report (default) or JSON for pipeline integration. Stdlib-only Python tools.

🇺🇸|EnglishTranslated

3 scripts/Attention

DevOps & Cloud Servicesastronomer/agents

migrating-ai-sdk-to-common-ai

Migrates Airflow projects from airflow-ai-sdk to apache-airflow-providers-common-ai 0.1.0+. Use this skill when the user wants to replace airflow-ai-sdk with the official Airflow AI provider, migrate LLM decorators (@task.llm, @task.agent, @task.llm_branch, @task.embed), switch from model strings/objects to connection-based LLM configuration, or update imports from airflow_ai_sdk to the new provider. Also trigger when the user mentions common-ai provider, AIP-99, pydanticai connection, or migrating away from airflow-ai-sdk.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

knowledge-distillation

Compress large language models using knowledge distillation from teacher to student models. Use when deploying smaller models with retained performance, transferring GPT-4 capabilities to open-source models, or reducing inference costs. Covers temperature scaling, soft targets, reverse KLD, logit distillation, and MiniLLM training strategies.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

sglang

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

🇺🇸|EnglishTranslated