Loading...
Loading...
Found 1,905 Skills
Graduate a workflow insight from learned/<topic>.md into AGENTS.md as a permanent constraint. Use when a lesson is stable enough to apply to every future session.
Industry valuation rank time series for a single stock via Longbridge — tracks how a stock's PE / PB / PS / dividend-yield rank within its sector has changed over time (rank N of total M). Answers "is my stock becoming relatively cheaper or more expensive vs peers?" Complements longbridge-valuation (single-stock percentile history) and longbridge-industry-valuation (current sector snapshot). Triggers: "行业排名变化", "估值排名", "PE排名历史", "行业估值位置", "排名走势", "估值相对同业", "行業排名變化", "估值排名", "PE排名歷史", "行業估值位置", "排名走勢", "valuation rank", "industry rank history", "PE rank trend", "relative valuation rank", "sector ranking over time", "how does AAPL rank in industry PE".
Technology stack evaluation and comparison with TCO analysis, security assessment, and ecosystem health scoring. Use when comparing frameworks, evaluating technology stacks, calculating total cost of ownership, assessing migration paths, or analyzing ecosystem viability.
Use this skill when you need to test or evaluate LangGraph/LangChain agents: writing unit or integration tests, generating test scaffolds, mocking LLM/tool behavior, running trajectory evaluation (match or LLM-as-judge), running LangSmith dataset evaluations, and comparing two agent versions with A/B-style offline analysis. Use it for Python and JavaScript/TypeScript workflows, evaluator design, experiment setup, regression gates, and debugging flaky/incorrect evaluation results.
LLM-as-judge evaluation framework with 5-dimension rubric (accuracy, groundedness, coherence, completeness, helpfulness) for scoring AI-generated content quality with weighted composite scores and evidence citations
Create validated LLM-as-a-Judge evaluators following best practices — binary Pass/Fail judges with TPR/TNR validation for measuring specific failure modes. Use when you need to automate quality checks, build guardrails, or measure a specific failure mode identified during trace analysis. Do NOT use when failures are fixable with prompt changes (use optimize-prompt) or when failure modes are unknown (use analyze-trace-failures first).
Industry valuation comparison and distribution analysis via Longbridge — cross-peer valuation matrix (PE / PB / PS / dividend yield), industry-percentile ranking, and industry premium / discount for a single stock. Triggers: "行业估值", "行业溢价", "行业折价", "行业对比", "行业百分位", "同行业估值", "板块估值", "行业贵不贵", "行業估值", "行業溢價", "行業折價", "行業對比", "行業百分位", "板塊估值", "industry valuation", "sector valuation", "industry premium", "industry percentile", "peer valuation", "sector PE", "TSLA.US industry valuation", "700.HK sector comparison".
Help users make better hiring decisions. Use when someone is evaluating job candidates, making hiring decisions, conducting reference checks, reviewing work samples or take-homes, calibrating their hiring bar, or deciding between finalists.
Use when testing Ralph's hat collection presets, validating preset configurations, or auditing the preset library for bugs and UX issues.
Strictly and meticulously judge and score story texts, analyze quality from the dimensions of market potential, innovation attributes, and content highlights. Suitable for initial novel screening and multi-dimensional evaluation and scoring
Use when evaluating agent performance, building test frameworks, measuring quality, or asking about "agent evaluation", "LLM-as-judge", "agent testing", "quality metrics", "evaluation rubrics", "agent benchmarks"
Evaluates agent skills against Anthropic's best practices. Use when asked to review, evaluate, assess, or audit a skill for quality. Analyzes SKILL.md structure, naming conventions, description quality, content organization, and identifies anti-patterns. Produces actionable improvement recommendations.