Loading...
Loading...
Internal sub-skill: agentic review of a printed CLI's sampled command output for plausibility issues that rule-based checks can't encode (substring-match relevance, format bugs, silent source drops, ranking failures). Invoked via the Skill tool by main printing-press SKILL.md (Phase 4.85) and printing-press-polish SKILL.md during the diagnostic loop. Not for direct user invocation — its actionable wrappers are /printing-press and /printing-press-polish.
npx skill4agent add mvanhorn/cli-printing-press printing-press-output-reviewscorecard --live-checkuser-invocable: false/printing-press/printing-press-polishcontext: fork$CLI_DIRscorecard --live-checkprinting-press scorecard --dir "$CLI_DIR" --live-check --json > /tmp/output-review-livecheck.json 2>&1 || true/tmp/output-review-livecheck.jsonReview the sampled outputs from the shipped CLI at. You have these ground-truth sources:$CLI_DIR
- Sampled command output: read
and inspect the/tmp/output-review-livecheck.jsonarray. Each entry has the command, example invocation, actual stdout (inlive_check.features[], bounded to ~4 KiB), the pass/fail reason, and aoutput_samplearray (populated by rule-based checks like the raw-HTML-entity detector).warnings- Review only
entries. Entries withstatus: passeither crashed, timed out, or had placeholder args (status: fail,<id>) that never produced real output — their sample is empty and there's nothing for you to judge. Phase 5 dogfood handles test-coverage and exit-code concerns.<url>$CLI_DIR/research.json(planned behavior per feature) andnovel_features(verified built commands).novel_features_built- The CLI binary at
— you may invoke additional commands to gather more output when a finding needs verification.$CLI_DIR/<cli-name>-pp-cliFor each of these checks, report findings under 50 words each. Only report issues a human user would notice in 5 minutes of hands-on testing — not every edge case a thorough QA pass might find:
- Output semantically matches query intent. For sampled novel features with a query argument, judge relevance beyond what the mechanical query-token check in live-check already enforced. A feature that passed live-check's
test still contains some query token somewhere — but "buttermilk" appearing as a substring of "butter" results, or "brownies" returning a chili recipe because the extractor fell back to adjacent content, both slip past the mechanical check. Only flag when a human user would look at the top results and say "this isn't what I asked for." Skip this check when the example has no query argument.outputMentionsQuery- No obvious format bugs. Does the output contain raw HTML entities, mojibake (question marks or replacement chars in titles), or malformed URLs (pointing at category index pages, feed endpoints, or random-selector routes rather than canonical content permalinks)? Rule-based live-check catches numeric entities; this layer catches the broader class.
- Aggregation commands show all requested sources. For commands with a
/--source/--siteCSV flag: if the user requested N sources, does output show N, or does stderr explain the missing ones? Silent drops of failed sources are a top failure mode for fan-out commands.--region- Result ordering/ranking makes sense. For commands that claim to rank or sort, does the top result look plausibly best given the query? Watch for broken score weights, off-by-one sort bugs, and silent fallback to recency when relevance computation fails.
Return a list of findings. For each: check name, severity (in Wave B;warningreserved for Wave C), one-line description, one-sentence fix suggestion. If the CLI passes all four checks, return "PASS — no findings."error
---OUTPUT-REVIEW-RESULT------OUTPUT-REVIEW-RESULT---
status: PASS
findings: []
---END-OUTPUT-REVIEW-RESULT------OUTPUT-REVIEW-RESULT---
status: WARN
findings:
- check: <check-name>
severity: warning
description: <one-line>
suggestion: <one-sentence>
- ...
---END-OUTPUT-REVIEW-RESULT------OUTPUT-REVIEW-RESULT---
status: SKIP
reason: <one-line description>
findings: []
---END-OUTPUT-REVIEW-RESULT---warningerrormanuscripts/<api>/<run>/proofs/phase-4.85-findings.mdscorecard.jsonstatus: SKIP--auto-approve-warningserrorlive_check.features[]