Loading...
Loading...
Found 1,928 Skills
Learn how to extend Dart's functionality to implement JavaScript-style "truthy" checks for easier conditional logic and value evaluations.
Use this skill for ANY question about creating test or evaluation datasets for LangChain agents. Covers generating datasets from traces (final_response, single_step, trajectory, RAG types), uploading to LangSmith, and managing evaluation data.
Systematic LLM prompt engineering: analyzes existing prompts for failure modes, generates structured variants (direct, few-shot, chain-of-thought), designs evaluation rubrics with weighted criteria, and produces test case suites for comparing prompt performance. Triggers on: "prompt engineering", "prompt lab", "generate prompt variants", "A/B test prompts", "evaluate prompt", "optimize prompt", "write a better prompt", "prompt design", "prompt iteration", "few-shot examples", "chain-of-thought prompt", "prompt failure modes", "improve this prompt". Use this skill when designing, improving, or evaluating LLM prompts specifically. NOT for evaluating Claude Code skills or SKILL.md files — use skill-evaluator instead.
Define the design rules (Skill Laws) that all Skills must follow, including core principles such as AI-first, human-centric, and ready-to-use. When to use: When users create a new Skill, optimize an existing Skill, ask about Skill design specifications, or need to evaluate Skill quality.
A decision-support framework that evaluates systems, architectures, and strategies through the entropy (decay) vs negentropy (growth) lens, while surfacing tacit knowledge gaps. Use this skill whenever the user is making architecture decisions, evaluating system designs, reviewing technical approaches, choosing between options, auditing existing systems, or planning strategies. Also trigger when the user explicitly asks to "apply the negentropy lens", mentions "entropy", "negentropy", "tacit knowledge", "knowledge engine", or "flip the switch". Nudge activation when you detect the user is at a decision point — even if they haven't asked for this lens — by briefly noting the entropic/negentropic dimension before proceeding.
Systematic design quality evaluation. Hierarchy, type, color, space, craft, system. Use when evaluating whether a design is ready to ship, running quality audits, or setting quality standards.
Use when the workflow needs to self-correct, improve over time, or establish feedback loops and evaluation cycles.
Expert skill for using Future AGI — the open-source end-to-end platform for evaluating, observing, and improving LLM and AI agent applications with tracing, evals, simulations, datasets, gateway, and guardrails.
Research and discovery workflow for document deliverables — competitive analyses, architecture comparisons, ADR scaffolding, literature reviews, vendor evaluations. No TDD requirement. Phases: gathering → synthesizing → completed. Triggers: 'discover', 'research', 'explore topic', or /discover.
Sector-rotation snapshot across A-share, HK, and US markets — point-in-time multi-factor scoring of momentum, capital flow, and valuation to rank sectors by current cycle strength. For ongoing 6–12 month cycle positioning and allocation recommendations use longbridge-sector-monitor. Triggers: "行业轮动", "板块轮动", "行业动量排名", "强势板块", "弱势板块", "行业资金流", "板块涨幅榜", "行業輪動", "板塊輪動", "行業動量排名", "強勢板塊", "弱勢板塊", "行業資金流", "板塊漲幅榜", "sector rotation", "sector momentum ranking", "leading sector", "lagging sector", "sector capital flow", "sector strength ranking".
Comprehensive testing doctrine for software and AI systems — covers positive patterns, anti-patterns, gates for coding agents writing tests, CI discipline, and an LLM/agent evaluation primer. Use when authoring or reviewing tests, adding mocks, deciding test placement, generating tests via agents, debugging flaky CI, designing eval suites for LLM features, or rebuilding a brittle test suite. Contains 12 positive patterns (selector hierarchy, table-driven, builders, real-system gates), 25 anti-patterns across Brittleness, Flakiness, Mock-misuse, Process, and AI-specific families, 7 mandatory gates for agents writing tests, flaky-test taxonomy with quarantine workflow, contract / property / mutation testing patterns, and an oracle-ladder primer for LLM-as-judge and agent eval. Language-agnostic — pseudo-code only. Don't use for general code review, library-specific debugging unrelated to tests, non-testing CI pipeline design, or production observability.
Create custom LLM evaluation benchmarks using the BYOB decorator framework. Use when the user wants to (1) create a new benchmark from a dataset, (2) pick or write a scorer, (3) compile and run a BYOB benchmark, (4) containerize a benchmark, or (5) use LLM-as-Judge evaluation. Triggers on mentions of BYOB, custom benchmark, bring your own benchmark, scorer, or benchmark compilation.