Loading...
Loading...
Found 1,905 Skills
Organize online information of IPs and conduct multi-dimensional evaluation and scoring. Suitable for assessing the adaptation value of IPs such as novels and scripts, analyzing market potential and innovative attributes
Use this when you need to EVALUATE OR IMPROVE or OPTIMIZE an existing LLM agent's output quality - including improving tool selection accuracy, answer quality, reducing costs, or fixing issues where the agent gives wrong/incomplete responses. Evaluates agents systematically using MLflow evaluation with datasets, scorers, and tracing. Covers end-to-end evaluation workflow or individual components (tracing setup, dataset creation, scorer definition, evaluation execution).
Use when exploring the ai-agent-skills catalog to find, compare, and evaluate skills before installing. Always use --fields to limit output size and --dry-run before committing to an install.
Valuation analysis for a single stock via Longbridge — current PE / PB / PS / EV-EBITDA snapshot, historical percentile (1–3 years), industry median + relative premium, industry rank. Triggers: "估值贵不贵", "是不是被低估", "PE 历史百分位", "PB 分位", "行业溢价", "行业折价", "X 现在适合买不", "估值水平", "估值貴不貴", "是否被低估", "PE 歷史分位", "行業溢價", "行業折價", "is X expensive", "is X undervalued", "PE percentile", "industry valuation premium", "valuation snapshot".
Configures and runs LLM evaluation using Promptfoo framework. Use when setting up prompt testing, creating evaluation configs (promptfooconfig.yaml), writing Python custom assertions, implementing llm-rubric for LLM-as-judge, or managing few-shot examples in prompts. Triggers on keywords like "promptfoo", "eval", "LLM evaluation", "prompt testing", or "model comparison".
Automated reproduction of comprehensive model evaluation benchmarks following the Benchmark Suite V3. Auto-activates for model benchmarking, comparison evaluation, or performance testing between AI models.
Evaluate and score based on the evaluation criteria for vertical short dramas, covering dimensions such as core appealing points and story types. It is suitable for assessing the potential of adapting stories into vertical short dramas and analyzing market competitiveness
Conduct expert heuristic evaluations using Nielsen's heuristics and domain-specific criteria.
Help users evaluate emerging technologies. Use when someone is assessing new tools, making build vs buy decisions, evaluating AI vendors, or deciding on technical architecture.
Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing model outputs, or establishing quality standards for AI-generated content.
Make an evidence-based hiring decision and produce a Candidate Evaluation Decision Pack (criteria + scorecard, signal log, work sample/trial plan + rubric, reference check script + summary, decision memo). Use for candidate evaluation, hiring decisions, reference checks, work samples/take-homes, and hiring bar calibration. Category: Hiring & Teams.
Evaluate trade-offs and produce a Trade-off Evaluation Pack (trade-off brief, options+criteria matrix, all-in cost/opportunity cost table, impact ranges, recommendation, stop/continue triggers). Use for tradeoff/trade-off, pros and cons, cost-benefit, opportunity cost, build vs buy, ship fast vs ship better, continue vs stop (sunk costs). Category: Leadership.