Loading...
Loading...
Found 1,183 Skills
Use when validating subjective quality criteria that cannot be deterministically tested — applies LLM-based evaluation with structured rubrics for tone, aesthetics, UX feel, documentation quality, and code readability. Triggers: documentation quality check, error message tone review, UX copy evaluation, code readability assessment, design aesthetic review.
Evaluate a README file text, score it out of 100, and provide specific, actionable improvement suggestions.
Set up and improve harness engineering (AGENTS.md, docs/, lint rules, eval systems, project-level prompt engineering) for AI-agent-friendly codebases. Triggers on: new/empty project setup for AI agents, AGENTS.md or CLAUDE.md creation, harness engineering questions, making agents work better on a codebase. ALSO triggers when users are frustrated or complaining about agent quality — e.g. 'the agent keeps ignoring conventions', 'it never follows instructions', 'why does it keep doing X', 'the agent is broken' — because poor agent output almost always signals harness gaps, not model problems. Covers: context engineering, architectural constraints, multi-agent coordination, evaluation, long-running agent harness, and diagnosis of agent quality issues.
Instrument, trace, evaluate, and monitor LLM applications and AI agents with LangSmith. Use when setting up observability for LLM pipelines, running offline or online evaluations, managing prompts in the Prompt Hub, creating datasets for regression testing, or deploying agent servers. Triggers on: langsmith, langchain tracing, llm tracing, llm observability, llm evaluation, trace llm calls, @traceable, wrap_openai, langsmith evaluate, langsmith dataset, langsmith feedback, langsmith prompt hub, langsmith project, llm monitoring, llm debugging, llm quality, openevals, langsmith cli, langsmith experiment, annotate llm, llm judge.
General-purpose NocoBase reference utilities covering cross-cutting topics such as evaluator engines, expression syntax, and more. Use when you need authoritative reference information that applies across multiple NocoBase features.
Critical analysis of research papers, academic manuscripts, preprints, and technical studies — evaluating methodology, claims-evidence alignment, contribution significance, and intellectual honesty. Produces coherent analytical responses (not checklists) that distinguish genuine weaknesses from standard field limitations. Governs intellectual posture: collegial reader, not adversarial reviewer. Triggers on: "critique this paper", "review this research", "what do you think of this paper", "analyze this study", "evaluate the methodology", "is this paper sound", "assess this research", "strengths and weaknesses of this paper", "does the evidence support the claims". Use this skill when the user provides a research paper, preprint, or technical study and asks for critical evaluation of its scientific merit, methodology, or contribution — not formatting, citation hygiene, or submission readiness (use manuscript-review for those).
Analyze a startup from three perspectives: VC investor, job applicant, and CEO/founder. Use this skill whenever the user wants to evaluate a startup, assess whether to invest in or join a startup, do due diligence, evaluate a job offer from a startup, understand a startup's competitive position, or assess company health and trajectory. Triggers: "analyze this startup", "should I join [company]", "is [company] a good investment", "evaluate [company]", "due diligence on [company]", "what do you think of [startup]", "should I take this startup job offer", "how healthy is [company]", "startup assessment", "company analysis", "is [company] worth joining", "what's the outlook for [company]", "research [company] for me", any mention of evaluating or assessing a startup or tech company from investment, career, or strategic perspectives — provide all three perspectives by default.
Designs structured benchmarks for comparing algorithms, models, or implementations. Selects appropriate metrics (latency, throughput, memory, accuracy), designs representative test cases, captures hardware/software context, produces comparison tables with tradeoff analysis, and includes reproduction instructions. Triggers on: "benchmark", "compare performance", "which is faster", "latency comparison", "memory comparison", "run benchmark", "design benchmark", "compare implementations", "evaluate algorithms", "performance comparison", "throughput test", "speed test". Use this skill when comparing two or more implementations, algorithms, or models.
Design and evaluate compression strategies for long-running sessions
Analyze multi-round evaluation score data, count various indicators, and calculate rating levels. Suitable for analyzing score trends and calculating S/A/B ratings
Market intelligence, competitive analysis, technical evaluations, and technology decisions. Use when researching companies, analyzing competitors, evaluating frameworks, or making tech stack decisions.
This skill should be used when the user wants to invoke Codex CLI for complex coding tasks requiring high reasoning capabilities. Trigger phrases include "use codex", "ask codex", "run codex", "call codex", "codex cli", "GPT-5 reasoning", "OpenAI reasoning", or when users request complex implementation challenges, advanced reasoning, architecture design, or high-reasoning model assistance. Automatically triggers on codex-related requests and supports session continuation for iterative development.