Search Results: model-testing

Found 2 Skills

AI & Machine Learninglaunchdarkly/agent-skills

aiconfig-variations

Guide for experimenting with AI configurations. Helps you test different models, prompts, and parameters to find what works best through systematic experimentation.

🇺🇸|EnglishTranslated

Testing & QAprobabl-ai/skills

smoke-test-ml-pipeline

Owns the smoke test contract for an ML experiment: a small, diagnostic-by-construction pytest that fits the experiment's learner on a portion of the real `data/` source and predicts on a *disjoint* portion that deliberately carries **no pre-history buffer**. The assertion is structural — the number of predictions must equal the number of rows in the predict grid. A pipeline that loads-then-features-then-splits will silently drop the cold-start rows of the predict slice and the test will fail with a row-count mismatch; a pipeline that marks X early and references upstream history nodes from feature steps will pass trivially. The smoke test is the executable proof of the X-marker placement rule from `build-ml-pipeline`. TRIGGER when: `test-ml-pipeline` has dispatched here to write the smoke test for an approved experiment; `pytest tests/smoke/` is failing on row count; the user asks "why is the smoke test failing?"; a pipeline edit in `build-ml-pipeline` needs an executable proof; an experiment script changes the pipeline shape and the matching smoke test needs revisiting. SKIP when: the design note does not exist or is not yet approved (route to `iterate-ml-experiment`); the user is asking about a regression test or schema invariant (route to `regression-test-ml-pipeline` / `distribution-test-ml-pipeline` once those exist); the question is the *interpretation* of CV metrics, not predict-time correctness (route to `evaluate-ml-pipeline`). HOW TO USE: read the matching experiment's `journal/NN_*.md` and `experiments/NN_*.py` first to understand the pipeline's source binding (what env-dict keys does `build_learner` expect?). Then construct two env-dicts from the **real `data/` source** — a train env and a predict env — such that the predict env carries *only the rows we want predictions for* and *no pre-history buffer*. The hard assertion is that the prediction count matches the predict-env row count exactly. The soft assertion is that the smoke set's MAE is within `3 × CV_mean` (or the task-appropriate analogue). **Do not write the design note or run CV — that's other skills' job.**

🇺🇸|EnglishTranslated