Loading...
Loading...
Found 1,927 Skills
Fetch, organize, and analyze LangSmith traces for debugging and evaluation. Use when you need to: query traces/runs by project, metadata, status, or time window; download traces to JSON; organize outcomes into passed/failed/error buckets; analyze token/message/tool-call patterns; compare passed vs failed behavior; or investigate benchmark and production failures.
AI situational awareness — internal threat detection for hallucination risk, scope creep, and context degradation. Maps Cooper color codes to reasoning states and OODA loop to real-time decisions. Use during any task where reasoning quality matters, when operating in unfamiliar territory, after detecting early warning signs such as an uncertain fact or suspicious tool result, or before high-stakes output like irreversible changes or architectural decisions.
Specialized business logic evaluator for the Evaluate-Loop. Use this for evaluating tracks that implement core product logic — pipelines, dependency resolution, state machines, pricing/tier enforcement, packaging. Checks feature correctness against product rules, edge cases, state transitions, data flow, and user journey completeness. Dispatched by loop-execution-evaluator when track type is 'business-logic', 'generator', or 'core-feature'. Triggered by: 'evaluate logic', 'test business rules', 'verify business rules', 'check feature'.
Design LLM-as-Judge evaluators for subjective criteria that code-based checks cannot handle. Use when a failure mode requires interpretation (tone, faithfulness, relevance, completeness). Do NOT use when the failure mode can be checked with code (regex, schema validation, execution tests). Do NOT use when you need to validate or calibrate the judge — use validate-evaluator instead.
Create diverse synthetic test inputs for LLM pipeline evaluation using dimension-based tuple generation. Use when bootstrapping an eval dataset, when real user data is sparse, or when stress-testing specific failure hypotheses. Do NOT use when you already have 100+ representative real traces (use stratified sampling instead), or when the task is collecting production logs.
INVOKE THIS SKILL when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management. Uses the langsmith CLI tool.
When the user faces brand impersonation, fake websites, phishing sites, or trademark infringement. Also use when the user mentions "fake site," "impersonation," "phishing site," "trademark infringement," "domain squatting," or "brand abuse."
Personal writing style preferences. Reference this skill when writing, translating, or editing content to ensure consistent style, punctuation, and formatting.
Evaluate AI contribution in projects using the AI Assessment Scale (AIAS) 5-level framework. Measure AI involvement from no AI to full AI exploration across development stages.
Connect AI agents to your live Chrome session via CDP for real-time tab interaction, screenshots, and JS evaluation without re-login
Execute a comprehensive React Project Health Audit. Analyzes tech stack, architecture, state management, testing, code quality, performance, CI/CD, and documentation. Produces a Google Docs-ready report with section scores and weighted overall score. Use when the user asks to audit a React project, run a health check, evaluate frontend quality, or assess technical debt. Triggers on: 'react audit', 'health audit', 'react health', 'frontend audit', 'next.js audit', 'vite audit', 'project quality check'.
Review healthcare and EHR software interfaces against a comprehensive design style guide grounded in NIST, FDA, IEC 62366, ISO 9241, ISO 14971, WCAG 2.1, ONC SAFER, and HL7 FHIR standards. Produces a report-only assessment without modifying code or designs. Use when an agent needs to evaluate clinical UI screens, data display, forms, alerts, or workflows for patient-safety, usability, accessibility, and data-clarity compliance.