skill-tester
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSkill Tester
技能测试工具
The agent validates skill packages for structure compliance, tests Python scripts for syntax and stdlib-only imports, and scores quality across four dimensions (documentation, code quality, completeness, usability) with letter grades and improvement recommendations. It supports BASIC, STANDARD, and POWERFUL tier classification.
该Agent用于校验skill包的结构合规性,测试Python脚本的语法以及是否仅导入stdlib模块,从四个维度(文档、代码质量、完整性、易用性)对质量进行评分,给出字母等级和改进建议,支持BASIC、STANDARD、POWERFUL三个层级的分类校验。
Quick Start
快速开始
bash
undefinedbash
undefinedValidate skill structure and documentation
校验skill结构和文档
python skill_validator.py engineering/my-skill --tier POWERFUL --json
python skill_validator.py engineering/my-skill --tier POWERFUL --json
Test all Python scripts in a skill
测试skill内所有Python脚本
python script_tester.py engineering/my-skill --timeout 30
python script_tester.py engineering/my-skill --timeout 30
Score quality with improvement roadmap
质量评分并生成改进路线图
python quality_scorer.py engineering/my-skill --detailed --minimum-score 75
---python quality_scorer.py engineering/my-skill --detailed --minimum-score 75
---Core Workflows
核心工作流
Workflow 1: Validate a New Skill
工作流1:校验新Skill
- Run with target tier to check structure, frontmatter, required sections, and scripts
skill_validator.py - Review errors (blocking) and warnings (non-blocking) in the report
- Fix all errors -- missing SKILL.md, invalid frontmatter, external imports
- Validation checkpoint: Score >= 60; zero errors; all scripts pass
ast.parse()
bash
python skill_validator.py engineering/my-skill --tier STANDARD --json- 运行并指定目标层级,检查结构、前置元数据、必填章节和脚本
skill_validator.py - 查看报告中的错误(阻塞性)和警告(非阻塞性)
- 修复所有错误:缺失SKILL.md、无效前置元数据、外部导入等
- 校验检查点: 评分≥60;零错误;所有脚本通过校验
ast.parse()
bash
python skill_validator.py engineering/my-skill --tier STANDARD --jsonWorkflow 2: Test Skill Scripts
工作流2:测试Skill脚本
- Run to execute syntax validation, import analysis, and runtime tests
script_tester.py - Review per-script results: argparse detection, output, sample data execution
--help - Fix failures: add guards, replace external imports with stdlib
if __name__ == "__main__" - Validation checkpoint: All scripts pass syntax; zero external imports; exits cleanly
--help
bash
python script_tester.py engineering/my-skill --timeout 60 --json- 运行执行语法校验、导入分析和运行时测试
script_tester.py - 查看每个脚本的测试结果:argparse检测、输出、样例数据执行情况
--help - 修复失败项:添加防护、将外部导入替换为stdlib模块
if __name__ == "__main__" - 校验检查点: 所有脚本语法通过;零外部导入;命令正常退出
--help
bash
python script_tester.py engineering/my-skill --timeout 60 --jsonWorkflow 3: Score and Improve Quality
工作流3:质量评分与优化
- Run with
quality_scorer.pyfor component-level breakdowns--detailed - Review the prioritized improvement roadmap (up to 5 items)
- Address HIGH-priority items first (documentation gaps, missing error handling)
- Re-run to verify score improvement
- Validation checkpoint: Overall score >= 75; no dimension below 50%
bash
python quality_scorer.py engineering/my-skill --detailed --minimum-score 75 --json- 带参数运行
--detailed获取组件级评分明细quality_scorer.py - 查看优先级排序的改进路线图(最多5项)
- 优先处理高优先级项(文档缺口、缺失错误处理等)
- 重新运行工具验证评分提升
- 校验检查点: 整体评分≥75;所有维度得分不低于50%
bash
python quality_scorer.py engineering/my-skill --detailed --minimum-score 75 --jsonTier Requirements
层级要求
| Requirement | BASIC | STANDARD | POWERFUL |
|---|---|---|---|
| SKILL.md lines | 100+ | 200+ | 300+ |
| Python scripts | 1 (100-300 LOC) | 1-2 (300-500 LOC) | 2-3 (500-800 LOC) |
| Argparse | Basic | Subcommands | Multiple modes |
| Output formats | Single | JSON + text | JSON + text + validation |
| Error handling | Essential | Comprehensive | Advanced recovery |
| 要求 | BASIC | STANDARD | POWERFUL |
|---|---|---|---|
| SKILL.md行数 | 100+ | 200+ | 300+ |
| Python脚本 | 1个(100-300行代码) | 1-2个(300-500行代码) | 2-3个(500-800行代码) |
| Argparse | 基础功能 | 子命令 | 多模式 |
| 输出格式 | 单一种类 | JSON + 文本 | JSON + 文本 + 校验 |
| 错误处理 | 基础必要 | 全面覆盖 | 高级恢复机制 |
Quality Scoring Dimensions
质量评分维度
| Dimension | Weight | Measures |
|---|---|---|
| Documentation | 25% | SKILL.md depth, README clarity, reference quality |
| Code Quality | 25% | Complexity, error handling, output consistency |
| Completeness | 25% | Required files, sample data, expected outputs |
| Usability | 25% | Argparse help text, example clarity, ease of setup |
Grades: A+ (97+) through F (<40). Exit code 0 for A+ through C-, exit code 2 for D, exit code 1 for F.
| 维度 | 权重 | 评估项 |
|---|---|---|
| 文档 | 25% | SKILL.md深度、README清晰度、参考资料质量 |
| 代码质量 | 25% | 复杂度、错误处理、输出一致性 |
| 完整性 | 25% | 必填文件、样例数据、预期输出 |
| 易用性 | 25% | Argparse帮助文本、示例清晰度、配置简易度 |
等级划分: A+(97分及以上)到F(40分以下)。A+到C-返回退出码0,D返回退出码2,F返回退出码1。
CI/CD Integration
CI/CD集成
yaml
undefinedyaml
undefinedGitHub Actions example
GitHub Actions示例
- name: Validate Changed Skills run: | for skill in $(git diff --name-only | grep -E '^engineering/[^/]+/' | cut -d'/' -f1-2 | sort -u); do python engineering/skill-tester/scripts/skill_validator.py $skill --json python engineering/skill-tester/scripts/script_tester.py $skill python engineering/skill-tester/scripts/quality_scorer.py $skill --minimum-score 75 done
---- name: 校验变更的Skill run: | for skill in $(git diff --name-only | grep -E '^engineering/[^/]+/' | cut -d'/' -f1-2 | sort -u); do python engineering/skill-tester/scripts/skill_validator.py $skill --json python engineering/skill-tester/scripts/script_tester.py $skill python engineering/skill-tester/scripts/quality_scorer.py $skill --minimum-score 75 done
---Anti-Patterns
反模式
- Padding SKILL.md with filler -- line count thresholds measure substantive content; blank lines and boilerplate do not count
- External imports disguised as stdlib -- the import allowlist is manually maintained; if a legit stdlib module is flagged, add it to
stdlib_modules - Missing argparse help strings -- usability scoring requires parameters on every argument; empty help strings score zero
help= - No guard -- scripts without
__main__fail runtime tests when importedif __name__ == "__main__" - Relying on SKILL.md for usability -- usability is scored from scripts and README independently; a detailed SKILL.md does not compensate for missing output
--help
- SKILL.md填充无效内容 -- 行数门槛统计的是实质性内容,空行和模板内容不计入统计
- 将外部导入伪装成stdlib模块 -- 导入允许列表是手动维护的,如果合法的stdlib模块被误标记,可以将其添加到列表
stdlib_modules - 缺失argparse帮助字符串 -- 易用性评分要求每个参数都配置参数,空帮助字符串得分为0
help= - 没有防护 -- 没有
__main__的脚本被导入时会运行时测试失败if __name__ == "__main__" - 依赖SKILL.md保障易用性 -- 易用性是独立根据脚本和README评分的,详细的SKILL.md不能弥补输出缺失的问题
--help
Troubleshooting
故障排查
| Problem | Cause | Solution |
|---|---|---|
| Validator counts only non-blank lines; blank lines inflate raw line count but are excluded from the tally | Remove excessive blank lines or add more substantive content sections to meet the tier threshold |
| YAML frontmatter parse failure | Frontmatter contains invalid YAML syntax (unquoted colons, tabs instead of spaces, missing closing | Validate frontmatter through |
| External import false positive | The stdlib module allowlist in | Add the missing module name to the |
| Script execution timeout during testing | Script requires interactive input, enters an infinite loop, or performs long-running computation | Increase |
| Tier compliance check fails despite passing individual checks | | Fix the specific critical checks listed in the error message; review the |
| Quality scorer reports low usability despite good documentation | Usability dimension scores help text inside scripts, | Add |
| Script raised an unhandled exception before reaching the output formatter; errors are written to stderr | Run with |
| 问题 | 原因 | 解决方案 |
|---|---|---|
内容足够但提示 | 校验器仅统计非空行,空行会增加原始行数但不计入统计 | 删除过多空行,或添加更多实质性内容章节以满足层级门槛 |
| YAML前置元数据解析失败 | 前置元数据包含无效YAML语法(未加引号的冒号、用制表符代替空格、缺失闭合 | 本地通过 |
| 外部导入误报 | | 将缺失的模块名添加到对应脚本的 |
| 测试期间脚本执行超时 | 脚本需要交互式输入、进入无限循环或执行长时间计算 | 增大 |
| 单个检查都通过但层级合规检查失败 | | 修复错误消息中列出的特定关键检查项,查看目标层级的 |
| 文档完善但质量评分器提示易用性得分低 | 易用性维度独立评估脚本内的帮助文本、 | 为 |
| 脚本在到达输出格式化代码前触发了未处理异常,错误会输出到stderr | 加上 |
Success Criteria
成功标准
- Structure pass rate above 95%: Validated skills pass all required-file and directory-structure checks on first run in at least 95% of cases.
- Script syntax zero-defect: Every Python script in a validated skill compiles without via
SyntaxError.ast.parse() - Standard library compliance 100%: No external (non-stdlib) imports detected across all validated scripts.
- Quality score consistency within 5 points: Re-running on an unchanged skill produces scores that vary by no more than 5 points across runs.
quality_scorer.py - Execution time under 10 seconds per skill: Full validation, testing, and scoring pipeline completes in under 10 seconds for a single skill with up to 3 scripts.
- Actionable recommendation density: Every skill scoring below 75/100 receives at least 3 prioritized improvement suggestions in the roadmap.
- CI/CD gate reliability: When integrated as a GitHub Actions step, the tool exits with non-zero status for every skill that fails critical checks, blocking the merge.
- 结构通过率高于95%: 至少95%的已校验skill首次运行即可通过所有必填文件和目录结构检查
- 脚本语法零缺陷: 已校验skill中的所有Python脚本都可以通过编译,无
ast.parse()SyntaxError - 标准库合规率100%: 所有已校验脚本中未检测到外部(非stdlib)导入
- 质量评分波动在5分以内: 对未改动的skill重复运行,多次评分差值不超过5分
quality_scorer.py - 单skill执行时间低于10秒: 对最多包含3个脚本的单个skill,完整校验、测试、评分流程可在10秒内完成
- 可操作建议密度: 所有评分低于75/100的skill,路线图中至少包含3条优先级排序的改进建议
- CI/CD门禁可靠性: 作为GitHub Actions步骤集成时,所有未通过关键检查的skill都会触发非零退出码,阻止合并
Scope & Limitations
适用范围与限制
Covers:
- Structural validation of skill directories against tier-specific requirements (BASIC, STANDARD, POWERFUL)
- Static analysis of Python scripts including syntax checking, import validation, argparse detection, and main guard verification
- Multi-dimensional quality scoring across documentation, code quality, completeness, and usability
- Dual output formatting (JSON for CI/CD pipelines, human-readable for developer consumption)
Does NOT cover:
- Functional correctness of script logic or algorithm accuracy — the tester verifies structure and conventions, not business logic
- Performance benchmarking or memory profiling of scripts — see for runtime analysis
engineering/performance-profiler - Security vulnerability scanning of script code — see for dependency and code security audits
engineering/skill-security-auditor - Cross-skill dependency resolution or integration testing — skills are validated in isolation without verifying inter-skill compatibility
覆盖范围:
- 针对skill目录按层级要求(BASIC、STANDARD、POWERFUL)进行结构校验
- Python脚本静态分析,包括语法检查、导入校验、argparse检测、主函数防护验证
- 多维度质量评分,覆盖文档、代码质量、完整性、易用性
- 双输出格式(JSON适配CI/CD流水线,可读格式适配开发者使用)
不覆盖范围:
- 脚本逻辑的功能正确性或算法准确性 -- 测试工具仅校验结构和规范,不校验业务逻辑
- 脚本的性能基准测试或内存分析 -- 运行时分析请参考
engineering/performance-profiler - 脚本代码的安全漏洞扫描 -- 依赖和代码安全审计请参考
engineering/skill-security-auditor - 跨skill依赖解析或集成测试 -- skill是隔离校验的,不验证skill间的兼容性
Integration Points
集成点
| Skill | Integration | Data Flow |
|---|---|---|
| Run security audit after validation passes | |
| Embed skill-tester as a quality gate stage | Pipeline builder generates workflow YAML that invokes |
| Feed quality score deltas into changelog entries | Compare |
| Attach validation report to pull request reviews | |
| Complement structural testing with runtime profiling | After |
| Track quality score trends over time | Periodic |
| Skill | 集成方式 | 数据流 |
|---|---|---|
| 校验通过后运行安全审计 | |
| 将skill-tester嵌入为质量门禁阶段 | 流水线生成器生成工作流YAML,按顺序调用 |
| 将质量评分差值输入变更日志条目 | 对比版本间 |
| 将校验报告附加到PR评审中 | |
| 运行时分析作为结构测试的补充 | |
| 跟踪长期质量评分趋势 | 定期采集 |
Tool Reference
工具参考
skill_validator.py
skill_validator.py
Purpose: Validates a skill directory's structure, documentation, and Python scripts against the claude-skills ecosystem standards. Checks required files, YAML frontmatter, required SKILL.md sections, directory layout, script syntax, import compliance, and tier-specific requirements.
Usage:
bash
python skill_validator.py <skill_path> [--tier TIER] [--json] [--verbose]Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| positional | Yes | — | Path to the skill directory to validate |
| option | No | None | Target tier for validation: |
| flag | No | Off | Output results in JSON format instead of human-readable text |
| flag | No | Off | Enable verbose logging to stderr |
Example:
bash
python skill_validator.py engineering/my-skill --tier POWERFUL --jsonOutput Formats:
- Human-readable (default): Grouped report with STRUCTURE VALIDATION, SCRIPT VALIDATION, ERRORS, WARNINGS, and SUGGESTIONS sections. Displays overall score out of 100 with compliance level (EXCELLENT, GOOD, ACCEPTABLE, NEEDS_IMPROVEMENT, POOR).
- JSON (): Object with keys
--json,skill_path,timestamp,overall_score,compliance_level(dict of check name to pass/message/score),checks,warnings,errors.suggestions
Exit codes: on success (score >= 60 and no errors), on failure.
01用途: 对照claude-skills生态标准校验skill目录的结构、文档和Python脚本,检查必填文件、YAML前置元数据、SKILL.md必填章节、目录布局、脚本语法、导入合规性和层级特定要求。
用法:
bash
python skill_validator.py <skill_path> [--tier TIER] [--json] [--verbose]参数:
| 参数 | 类型 | 必填 | 默认值 | 说明 |
|---|---|---|---|---|
| 位置参数 | 是 | — | 待校验的skill目录路径 |
| 选项 | 否 | None | 校验目标层级: |
| 标记 | 否 | 关闭 | 输出JSON格式结果而非人类可读文本 |
| 标记 | 否 | 关闭 | 启用输出到stderr的详细日志 |
示例:
bash
python skill_validator.py engineering/my-skill --tier POWERFUL --json输出格式:
- 人类可读(默认): 分组报告,包含结构校验、脚本校验、错误、警告、建议章节,展示满分100的整体评分和合规等级(优秀、良好、可接受、需改进、较差)
- JSON(): 包含以下键的对象:
--json、skill_path、timestamp、overall_score、compliance_level(检查项名称到通过状态/消息/得分的映射)、checks、warnings、errorssuggestions
退出码: 成功(评分≥60且无错误)返回,失败返回
01script_tester.py
script_tester.py
Purpose: Tests all Python scripts within a skill's directory. Performs syntax validation via AST parsing, import analysis for stdlib compliance, argparse implementation verification, main guard detection, runtime execution with timeout protection, functionality testing, sample data processing against files in , and output format compliance checks.
scripts/--helpassets/Usage:
bash
python script_tester.py <skill_path> [--timeout SECONDS] [--json] [--verbose]Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| positional | Yes | — | Path to the skill directory containing scripts to test |
| option | No | | Timeout in seconds for each script execution test |
| flag | No | Off | Output results in JSON format instead of human-readable text |
| flag | No | Off | Enable verbose logging to stderr |
Example:
bash
python script_tester.py engineering/my-skill --timeout 60 --jsonOutput Formats:
- Human-readable (default): Report with SUMMARY (total/passed/partial/failed counts), GLOBAL ERRORS, and per-script sections showing status, execution time, individual test results, errors, and warnings.
- JSON (): Object with keys
--json,skill_path,timestamp(counts and overall status),summary,global_errors(dict per script withscript_results,overall_status,execution_time,tests,errors).warnings
Exit codes: on full success, on failure or global errors, on partial success.
012用途: 测试skill的目录下所有Python脚本,通过AST解析进行语法校验、stdlib合规性导入分析、argparse实现验证、主函数防护检测、带超时保护的运行时执行、功能测试、针对目录文件的样例数据处理、输出格式合规性检查。
scripts/--helpassets/用法:
bash
python script_tester.py <skill_path> [--timeout SECONDS] [--json] [--verbose]参数:
| 参数 | 类型 | 必填 | 默认值 | 说明 |
|---|---|---|---|---|
| 位置参数 | 是 | — | 包含待测试脚本的skill目录路径 |
| 选项 | 否 | | 每个脚本执行测试的超时时间(秒) |
| 标记 | 否 | 关闭 | 输出JSON格式结果而非人类可读文本 |
| 标记 | 否 | 关闭 | 启用输出到stderr的详细日志 |
示例:
bash
python script_tester.py engineering/my-skill --timeout 60 --json输出格式:
- 人类可读(默认): 报告包含摘要(总数/通过数/部分通过数/失败数)、全局错误、每个脚本的状态、执行时间、单个测试结果、错误和警告
- JSON(): 包含以下键的对象:
--json、skill_path、timestamp(计数和整体状态)、summary、global_errors(每个脚本的映射,包含script_results、overall_status、execution_time、tests、errors)warnings
退出码: 完全成功返回,失败或全局错误返回,部分成功返回
012quality_scorer.py
quality_scorer.py
Purpose: Provides a comprehensive multi-dimensional quality assessment for a skill. Evaluates four equally weighted dimensions — Documentation (25%), Code Quality (25%), Completeness (25%), and Usability (25%) — and produces an overall score, letter grade (A+ through F), tier recommendation, and a prioritized improvement roadmap.
Usage:
bash
python quality_scorer.py <skill_path> [--detailed] [--minimum-score SCORE] [--json] [--verbose]Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| positional | Yes | — | Path to the skill directory to assess |
| flag | No | Off | Show detailed component scores within each dimension |
| option | No | | Minimum acceptable overall score; exits with error code |
| flag | No | Off | Output results in JSON format instead of human-readable text |
| flag | No | Off | Enable verbose logging to stderr |
Example:
bash
python quality_scorer.py engineering/my-skill --detailed --minimum-score 75 --jsonOutput Formats:
- Human-readable (default): Report with overall score and letter grade, per-dimension scores with weights, summary statistics (highest/lowest dimension, dimensions above 70%, dimensions below 50%), and a prioritized improvement roadmap (up to 5 items with HIGH/MEDIUM/LOW priority). When is used, component-level breakdowns appear under each dimension.
--detailed - JSON (): Object with keys
--json,skill_path,timestamp,overall_score,letter_grade,tier_recommendation,summary_stats(per-dimension name/weight/score/details/suggestions),dimensions(list of priority/dimension/suggestion/current_score objects).improvement_roadmap
Exit codes: for grades A+ through C-, for grade F or when score is below , for grade D.
01--minimum-score2用途: 为skill提供全面的多维度质量评估,评估四个权重相同的维度 -- 文档(25%)、代码质量(25%)、完整性(25%)、易用性(25%),生成整体评分、字母等级(A+到F)、层级推荐和优先级排序的改进路线图。
用法:
bash
python quality_scorer.py <skill_path> [--detailed] [--minimum-score SCORE] [--json] [--verbose]参数:
| 参数 | 类型 | 必填 | 默认值 | 说明 |
|---|---|---|---|---|
| 位置参数 | 是 | — | 待评估的skill目录路径 |
| 标记 | 否 | 关闭 | 展示每个维度下的组件级详细评分 |
| 选项 | 否 | | 最低可接受整体评分,评分低于该阈值时返回错误码 |
| 标记 | 否 | 关闭 | 输出JSON格式结果而非人类可读文本 |
| 标记 | 否 | 关闭 | 启用输出到stderr的详细日志 |
示例:
bash
python quality_scorer.py engineering/my-skill --detailed --minimum-score 75 --json输出格式:
- 人类可读(默认): 报告包含整体评分和字母等级、带权重的各维度得分、统计摘要(最高/最低维度、70分以上维度、50分以下维度)、优先级排序的改进路线图(最多5项,分为高/中/低优先级)。使用参数时,每个维度下会展示组件级明细
--detailed - JSON(): 包含以下键的对象:
--json、skill_path、timestamp、overall_score、letter_grade、tier_recommendation、summary_stats(每个维度的名称/权重/得分/明细/建议)、dimensions(优先级/维度/建议/当前得分对象列表)improvement_roadmap
退出码: 等级A+到C-返回,等级F或评分低于返回,等级D返回
0--minimum-score12