Loading...
Loading...
Found 1,927 Skills
Score assistant responses for guidance & actionability on a strict 1-5 scale, then return strict JSON only with dimension, score, rationale, and improvement suggestions. Use when the user asks to evaluate how actionable, helpful, or step-by-step a response is.
Turn rough ideas into structured, validated idea documents through collaborative dialogue. Explores context, asks clarifying questions one at a time, proposes alternative approaches with feasibility evaluation, and produces documents ready for requirements definition. Use when: "ideation", "brainstorm", "new idea", "explore an idea", "I want to build", "what if we", "let's think about", "propose approaches", "evaluate this idea", "idea document", "アイデア出し", "案出し", "ブレスト", "アイデアを整理", "検討したい".
Score how well a creator fits a brand's niche on a 1-10 scale with detailed written rationale. This skill should be used when evaluating creator-brand fit, scoring niche alignment, checking if an influencer matches a brand, assessing creator relevance, rating a creator's fit for a campaign, vetting a creator for niche match, deciding whether a creator is right for a brand, comparing creators by brand fit, or reviewing an influencer's profile against campaign requirements. For full creator vetting beyond niche fit (brand safety, rates, compliance), see creator-vetting-scorecard. For writing outreach to creators who pass vetting, see outreach-writer.
Evaluates and prevents unnecessary abstractions by analyzing interfaces, layers, and patterns against concrete requirements. Use when evaluating new abstractions, reviewing architecture proposals, detecting over-engineering, or simplifying existing code. Triggers on "is this abstraction necessary", "too many layers", "simplify architecture", "reduce complexity", "over-engineered", "do we need this interface", or when reviewing design patterns.
Evaluate third-party agent skills for security risks before adoption or update. Use when: (1) Installing or updating a skill from skills.sh, ClawHub, or any public registry, (2) Auditing skills for security risks or reviewing PRs that add/update skill dependencies, (3) Building a team/org allowlist of approved skills, (4) Investigating suspicious skill behavior or answering "is this skill safe?" / "should we adopt this skill?"
Code quality gatekeeper and auditor. Enforces strict quality gates, resolves the AI verification gap, and evaluates codebases across 12 critical dimensions with evidence-based scoring. Use when auditing code quality, reviewing AI-generated code, scoring codebases against industry standards, or enforcing pre-commit quality gates. Use for quality audit, code review, codebase evaluation, security assessment, technical debt analysis.
Fetch, organize, and analyze LangSmith traces for debugging and evaluation. Use when you need to: query traces/runs by project, metadata, status, or time window; download traces to JSON; organize outcomes into passed/failed/error buckets; analyze token/message/tool-call patterns; compare passed vs failed behavior; or investigate benchmark and production failures.
AI situational awareness — internal threat detection for hallucination risk, scope creep, and context degradation. Maps Cooper color codes to reasoning states and OODA loop to real-time decisions. Use during any task where reasoning quality matters, when operating in unfamiliar territory, after detecting early warning signs such as an uncertain fact or suspicious tool result, or before high-stakes output like irreversible changes or architectural decisions.
Specialized business logic evaluator for the Evaluate-Loop. Use this for evaluating tracks that implement core product logic — pipelines, dependency resolution, state machines, pricing/tier enforcement, packaging. Checks feature correctness against product rules, edge cases, state transitions, data flow, and user journey completeness. Dispatched by loop-execution-evaluator when track type is 'business-logic', 'generator', or 'core-feature'. Triggered by: 'evaluate logic', 'test business rules', 'verify business rules', 'check feature'.
Design LLM-as-Judge evaluators for subjective criteria that code-based checks cannot handle. Use when a failure mode requires interpretation (tone, faithfulness, relevance, completeness). Do NOT use when the failure mode can be checked with code (regex, schema validation, execution tests). Do NOT use when you need to validate or calibrate the judge — use validate-evaluator instead.
Create diverse synthetic test inputs for LLM pipeline evaluation using dimension-based tuple generation. Use when bootstrapping an eval dataset, when real user data is sparse, or when stress-testing specific failure hypotheses. Do NOT use when you already have 100+ representative real traces (use stratified sampling instead), or when the task is collecting production logs.
INVOKE THIS SKILL when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management. Uses the langsmith CLI tool.