Loading...
Loading...
Found 13 Skills
Consult this skill when building evaluation or scoring systems. Use when implementing evaluation systems, creating quality gates, designing scoring rubrics, building decision frameworks. Do not use when simple pass/fail without scoring needs.
Produces a concrete eval suite plan grounded in Microsoft's Eval Scenario Library and MS Learn agent evaluation guidance — scenario types, evaluation methods, quality signals, thresholds, and priority order — before any test cases are generated or evals are run.
Analyzes Copilot Studio evaluation CSV results using Microsoft's Triage & Improvement Playbook. Returns a SHIP / ITERATE / BLOCK verdict with root cause classification, diagnostic triage, prioritized remediation, and pattern analysis.
Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
Expert in streamlining and enhancing the development of AI Agent Applications, including AI app / agent / workflow code generation, AI model comparison and recommendation, tracing setup, and evaluation planning / setup / execution.
Expert skill for generating GitHub Copilot skills from ING-internal documentation repositories. Use this skill when asked to create a skill from any ING documentation-as-code repo, generate a knowledge base skill for an ING framework, convert ING tool documentation into a Copilot skill, or turn any docs/ folder into an expert skill file. Also trigger when the user mentions "skill from docs", "generate skill", "create skill from repo", or references ING-internal frameworks like Baker, Merak, Kingsroad, or similar. Includes evaluation framework, grading agents, and benchmark tools for testing generated skills.
Iteratively improve any output until measurable criteria are met. Use when the user wants to refine existing work against specific standards — whether it's code, prose, data, config, or any other artifact. Triggers on phrases like "improve this", "make it better", "iterate", "refine", "keep improving", "not good enough yet", "optimize this", "polish this", "tighten this up", or when the user provides criteria and wants repeated improvement until they're satisfied. Also use when the user gives feedback on output and expects you to keep refining, even if they don't say "improve" explicitly.
Compare leading tech stocks to distinguish hype-driven overvaluation from fundamentally justified pricing, and identify undervalued tech names the market is overlooking. Use when the user asks to evaluate tech stock valuations, find overvalued or undervalued tech companies, assess whether a tech stock's growth justifies its multiple, compare tech company fundamentals, analyze revenue growth vs. valuation, or identify mispriced technology stocks.
Use when need explicit quality criteria and scoring scales to evaluate work consistently, compare alternatives objectively, set acceptance thresholds, reduce subjective bias, or when user mentions rubric, scoring criteria, quality standards, evaluation framework, inter-rater reliability, or grade/assess work.
Use when raising startup capital (pre-seed through Series C+): decide raise vs bootstrap, size a round, build a deck + data room, run investor targeting/outreach, negotiate SAFEs/term sheets, manage diligence, and set investor reporting cadence post-close.
Valuation and pricing framework focusing on valuation analysis / pricing logic / investment decisions. This Skill is mainly applied in scenarios such as answering user questions, writing reports, and creating financial articles. This report generates extensive content and is not suitable for simple conversation scenarios. Various information and data can be obtained via the wind.financial.data tool using appropriate keywords or keyword combinations. Users want to know how to value a company, the level of its current valuation, why the market is willing to assign this valuation, and whether there is room for revaluation.