Loading...
Loading...
Detect non-deterministic (flaky) tests by reading CI run logs or test result history. Aggregates pass rates per test, identifies intermittent failures, recommends quarantine or fix, and maintains a flaky test registry. Best run during Polish phase or after multiple CI runs.
npx skill4agent add donchitos/claude-code-game-studios test-flakinesstests/regression-suite.mdproduction/qa/flakiness-report-[date].md/regression-suite/test-flakiness [ci-log-path]/test-flakiness scan.github//test-flakiness registryscanregistryls -t .github/ 2>/dev/null
ls -t test-results/ 2>/dev/nulltest-results/.xmltest-results/Saved/Logs/Result: SuccessResult: Fail"No CI log data found. To detect flaky tests, this skill needs test result history from multiple runs. Options:
- Run the test suite at least 3 times and collect the output logs
- Check CI pipeline output and save a log to
test-results/- Run
to review tests already flagged as flaky in/test-flakiness registry"tests/regression-suite.md
<testcase name=<failure<errorclassnamenamePASSEDFAILEDResult: SuccessResult: FailTest passedTest failedtest_id → [run1_result, run2_result, run3_result, ...]| Cause | Symptoms | Fix direction |
|---|---|---|
| Timing / async | Fails after awaiting signals or timers; pass rate correlates with system load | Add explicit await/synchronisation; avoid time-based delays |
| Order dependency | Fails when run after specific other tests; passes in isolation | Add proper setup/teardown; ensure test isolation |
| Random seed | Fails intermittently with no pattern; involves RNG | Pass explicit seed; don't use |
| Resource leak | Fails more often later in a test run | Fix cleanup in teardown; check orphan nodes (Godot) or object disposal (Unity) |
| External state | Fails when a file, scene, or global exists from a prior test | Isolate test from file system; use in-memory mocks |
| Floating point | Fails on comparisons like | Use epsilon comparison ( |
| Scene/prefab load race | Fails when scenes are not yet ready | Await one frame after instantiation; use |
"Quarantine this test immediately. Disable it in CI by adding/@pytest.mark.skip/[Ignore]annotation. Log it inGdUnitSkipquarantine section. The test is now opt-in only. Fix the root cause before removing quarantine."tests/regression-suite.md
"This test is intermittently unreliable. Root cause appears to be [cause]. Suggested fix: [specific fix based on cause classification]. Do not quarantine yet — fix the test directly."
"This test shows suspected flakiness. Collect more run data before quarantining. Note it as 'suspected' in the regression suite."
## Flakiness Detection Results
**Runs analysed**: [N]
**Tests tracked**: [N]
### Flaky Tests Found
| Test | System | Fail Rate | Likely Cause | Recommendation |
|------|--------|-----------|--------------|----------------|
| [test_name] | [system] | [N]% | Timing | Quarantine + fix async |
| [test_name] | [system] | [N]% | Float comparison | Fix: use epsilon compare |
| [test_name] | [system] | [N]% | Order dependency | Investigate teardown |
### Clean Tests (no flakiness detected)
[N] tests ran across [N] runs with consistent results — no flakiness detected.
### Data Limitations
[Note if fewer than 5 runs were available — fewer runs = less statistical confidence]tests/regression-suite.mdEditproduction/qa/flakiness-report-[date].mdis_equal_approx