Loading...
Loading...
Found 1,162 Skills
After an agentic task completes, perform a retrospective analysis across 6 dimensions (goal alignment, efficiency, decision quality, error handling, communication, reusability). Score performance, identify inefficiency patterns, evaluate skill usage, and produce actionable improvement recommendations. Triggers on "how did it go", "retrospective", "review performance", "what could be better", or after any long agentic task completes.
Open-source AI observability platform for LLM tracing, evaluation, and monitoring. Use when debugging LLM applications with detailed traces, running evaluations on datasets, or monitoring production AI systems with real-time insights.
LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.
Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant.
When the user faces brand impersonation, fake websites, phishing sites, or trademark infringement. Also use when the user mentions "fake site," "impersonation," "phishing site," "trademark infringement," "domain squatting," or "brand abuse."
MUST READ before running any ADK evaluation. ADK evaluation methodology — eval metrics, evalset schema, LLM-as-judge, tool trajectory scoring, and common failure causes. Use when evaluating agent quality, running adk eval, or debugging eval results. Do NOT use for API code patterns (use adk-cheatsheet), deployment (use adk-deploy-guide), or project scaffolding (use adk-scaffold).
A-share Value Investment Analysis Tool, providing stock screening, in-depth individual stock analysis, industry comparison and valuation calculation functions. Based on value investment theory, it uses tushare to obtain public financial data, suitable for ordinary investors with low-frequency trading.
Create or evaluate an architecture decision record (ADR). Use when choosing between technologies (e.g., Kafka vs SQS), documenting a design decision with trade-offs and consequences, reviewing a system design proposal, or designing a new component from requirements and constraints.
Run a single experiment iteration. Edit the target file, evaluate, keep or discard.
A decision-support framework that evaluates systems, architectures, and strategies through the entropy (decay) vs negentropy (growth) lens, while surfacing tacit knowledge gaps. Use this skill whenever the user is making architecture decisions, evaluating system designs, reviewing technical approaches, choosing between options, auditing existing systems, or planning strategies. Also trigger when the user explicitly asks to "apply the negentropy lens", mentions "entropy", "negentropy", "tacit knowledge", "knowledge engine", or "flip the switch". Nudge activation when you detect the user is at a decision point — even if they haven't asked for this lens — by briefly noting the entropic/negentropic dimension before proceeding.
Design click/first-click tests to evaluate navigation and information findability.
Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycle. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM, even if they don't say "evals" explicitly. Use for making sure an AI app works correctly, catching regressions after prompt changes, debugging why an agent started behaving differently, or validating output quality before shipping.