testability-scoring

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Testability Scoring

可测试性评分

<default_to_action> When assessing testability:
  1. RUN assessment against target URL
  2. ANALYZE all 10 principles automatically
  3. GENERATE HTML report with radar chart
  4. PRIORITIZE improvements by impact/effort
  5. INTEGRATE with QX Partner for holistic view
Quick Assessment:
bash
undefined
<default_to_action> 评估可测试性时:
  1. 针对目标URL执行评估
  2. 自动分析全部10项原则
  3. 生成带雷达图的HTML报告
  4. 按影响/投入优先级排序改进项
  5. 与QX Partner集成以获取全面视图
快速评估:
bash
undefined

Run assessment on any URL

对任意URL运行评估

TEST_URL='https://example.com/' npx playwright test tests/testability-scoring/testability-scoring.spec.js --project=chromium --workers=1
TEST_URL='https://example.com/' npx playwright test tests/testability-scoring/testability-scoring.spec.js --project=chromium --workers=1

Or use shell script wrapper

或使用Shell脚本封装

.claude/skills/testability-scoring/scripts/run-assessment.sh https://example.com/

**The 10 Principles at a Glance:**
| Principle | Weight | Key Question |
|-----------|--------|--------------|
| **Observability** | 15% | Can we see what's happening? |
| **Controllability** | 15% | Can we control the application? |
| **Algorithmic Simplicity** | 10% | Are behaviors predictable? |
| **Algorithmic Transparency** | 10% | Can we understand what it does? |
| **Algorithmic Stability** | 10% | Does behavior remain consistent? |
| **Explainability** | 10% | Is the interface understandable? |
| **Unbugginess** | 10% | How error-free is it? |
| **Smallness** | 10% | Are components appropriately sized? |
| **Decomposability** | 5% | Can we test parts in isolation? |
| **Similarity** | 5% | Is the tech stack familiar? |

**Grade Scale:**
- **A (90-100)**: Excellent testability
- **B (80-89)**: Good testability
- **C (70-79)**: Adequate testability
- **D (60-69)**: Below average
- **F (0-59)**: Poor testability
</default_to_action>
.claude/skills/testability-scoring/scripts/run-assessment.sh https://example.com/

**10项核心原则概览:**
| 原则 | 权重 | 核心问题 |
|-----------|--------|--------------|
| **可观测性** | 15% | 我们能否了解系统内部状态? |
| **可控制性** | 15% | 我们能否控制应用行为? |
| **算法简洁性** | 10% | 系统行为是否可预测? |
| **算法透明性** | 10% | 我们能否理解系统的运作逻辑? |
| **算法稳定性** | 10% | 系统行为是否保持一致? |
| **可解释性** | 10% | 界面是否易于理解? |
| **低缺陷性** | 10% | 系统的无错误运行能力如何? |
| **轻量化** | 10% | 组件规模是否合理? |
| **可分解性** | 5% | 我们能否独立测试各个组件? |
| **相似性** | 5% | 技术栈是否为人熟知? |

**评分等级:**
- **A (90-100)**: 优秀可测试性
- **B (80-89)**: 良好可测试性
- **C (70-79)**: 合格可测试性
- **D (60-69)**: 低于平均水平
- **F (0-59)**: 较差可测试性
</default_to_action>

Quick Reference Card

快速参考卡片

Running Assessments

运行评估

MethodCommandWhen to Use
Shell Script
./scripts/run-assessment.sh URL
One-time assessment
ENV Override
TEST_URL='URL' npx playwright test...
CI/CD integration
Config FileUpdate
tests/testability-scoring/config.js
Repeated runs
方法命令适用场景
Shell脚本
./scripts/run-assessment.sh URL
一次性评估
环境变量覆盖
TEST_URL='URL' npx playwright test...
CI/CD集成
配置文件更新
tests/testability-scoring/config.js
重复运行

Principle Details

原则详情

High Weight (15% each)

高权重(各15%)

PrincipleMeasuresIndicators
ObservabilityState visibility, logging, monitoringConsole output, network tracking, error visibility
ControllabilityInput control, state manipulationAPI access, test data injection, determinism
原则衡量维度评估指标
可观测性状态可见性、日志、监控控制台输出、网络追踪、错误可见性
可控制性输入控制、状态操控API访问、测试数据注入、确定性

Medium Weight (10% each)

中权重(各10%)

PrincipleMeasuresIndicators
SimplicityPredictable behaviorClear I/O relationships, low complexity
TransparencyUnderstanding what system doesVisible processes, readable code
StabilityConsistent behaviorChange resilience, maintainability
ExplainabilityInterface understandingGood docs, semantic structure, help text
UnbugginessError-free operationConsole errors, warnings, runtime issues
SmallnessComponent sizeElement count, script bloat, page complexity
原则衡量维度评估指标
简洁性行为可预测性清晰的输入输出关系、低复杂度
透明性系统运作逻辑的可理解性可见的流程、可读性强的代码
稳定性行为一致性变更适应性、可维护性
可解释性界面可理解性完善的文档、语义化结构、帮助文本
低缺陷性无错误运行能力控制台错误、警告、运行时问题
轻量化组件规模元素数量、脚本冗余、页面复杂度

Low Weight (5% each)

低权重(各5%)

PrincipleMeasuresIndicators
DecomposabilityIsolation testingComponent separation, modular design
SimilarityTechnology familiarityStandard frameworks, known patterns

原则衡量维度评估指标
可分解性隔离测试能力组件解耦、模块化设计
相似性技术栈熟悉度标准框架、已知模式

Assessment Workflow

评估流程

1. Navigate to URL → 2. Collect Metrics → 3. Score Principles
4. Generate JSON ← 5. Calculate Grades ← 6. Apply Weights
7. Generate HTML Report with Radar Chart
8. Open in Browser (auto-opens)
1. 访问URL → 2. 收集指标 → 3. 为各项原则评分
4. 生成JSON数据 ← 5. 计算等级 ← 6. 应用权重
7. 生成带雷达图的HTML报告
8. 在浏览器中打开(自动打开)

Output Files

输出文件

tests/reports/
├── testability-results-<timestamp>.json  # Raw data
├── testability-report-<timestamp>.html   # Visual report
└── latest.json                           # Symlink

tests/reports/
├── testability-results-<timestamp>.json  # 原始数据
├── testability-report-<timestamp>.html   # 可视化报告
└── latest.json                           # 符号链接

Integration Examples

集成示例

CI/CD Integration

CI/CD集成

yaml
undefined
yaml
undefined

GitHub Actions

GitHub Actions

  • name: Testability Assessment run: | timeout 180 .claude/skills/testability-scoring/scripts/run-assessment.sh ${{ env.APP_URL }}
  • name: Upload Reports uses: actions/upload-artifact@v3 with: name: testability-reports path: tests/reports/testability-*.html
undefined
  • name: 可测试性评估 run: | timeout 180 .claude/skills/testability-scoring/scripts/run-assessment.sh ${{ env.APP_URL }}
  • name: 上传报告 uses: actions/upload-artifact@v3 with: name: testability-reports path: tests/reports/testability-*.html
undefined

QX Partner Integration

QX Partner集成

typescript
// Combine testability with QX analysis
const qxAnalysis = await Task("QX Analysis", {
  target: 'https://example.com',
  integrateTestability: true
}, "qx-partner");

// Returns combined insights:
// - QX Score: 78/100
// - Testability Integration: Observability 72/100
// - Combined Insight: Low observability may mask UX issues
typescript
// 结合可测试性与QX分析
const qxAnalysis = await Task("QX Analysis", {
  target: 'https://example.com',
  integrateTestability: true
}, "qx-partner");

// 返回合并后的洞察结果:
// - QX评分: 78/100
// - 可测试性集成: 可观测性 72/100
// - 综合洞察: 低可观测性可能掩盖UX问题

Programmatic Usage

程序化调用

typescript
import { runTestabilityAssessment } from './testability';

const results = await runTestabilityAssessment('https://example.com');
console.log(`Overall: ${results.overallScore}/100 (${results.grade})`);
console.log('Recommendations:', results.recommendations);

typescript
import { runTestabilityAssessment } from './testability';

const results = await runTestabilityAssessment('https://example.com');
console.log(`总体评分: ${results.overallScore}/100 (等级: ${results.grade})`);
console.log('改进建议:', results.recommendations);

Agent Integration

Agent集成

typescript
// Run testability assessment
const assessment = await Task("Testability Assessment", {
  url: 'https://example.com',
  generateReport: true,
  openBrowser: true
}, "qe-quality-analyzer");

// Use with QX Partner for holistic analysis
const qxReport = await Task("Full QX Analysis", {
  target: 'https://example.com',
  integrateTestability: true,
  detectOracleProblems: true
}, "qx-partner");

typescript
// 运行可测试性评估
const assessment = await Task("Testability Assessment", {
  url: 'https://example.com',
  generateReport: true,
  openBrowser: true
}, "qe-quality-analyzer");

// 与QX Partner集成以获取全面分析
const qxReport = await Task("Full QX Analysis", {
  target: 'https://example.com',
  integrateTestability: true,
  detectOracleProblems: true
}, "qx-partner");

Vibium Integration (Optional)

Vibium集成(可选)

Overview

概述

Vibium browser automation can be used alongside Playwright for enhanced testability assessment. While Playwright remains the primary engine, Vibium offers complementary capabilities for certain metrics.
Installation:
bash
claude mcp add vibium -- npx -y vibium
Vibium浏览器自动化可与Playwright配合使用,增强可测试性评估能力。Playwright仍为核心引擎,Vibium为特定指标提供补充能力。
安装:
bash
claude mcp add vibium -- npx -y vibium

Vibium-Enhanced Metrics

Vibium增强指标

PrincipleVibium EnhancementBenefit
ObservabilityAuto-wait duration trackingMeasures DOM stability (30s timeout, 100ms polling)
ControllabilityElement interaction success rateValidates automation readiness via MCP
StabilityScreenshot consistencyVisual regression detection for layout stability
ExplainabilityElement attribute extractionARIA labels, semantic HTML validation
原则Vibium增强能力优势
可观测性自动等待时长追踪衡量DOM稳定性(30秒超时,100毫秒轮询)
可控制性元素交互成功率通过MCP验证自动化就绪度
稳定性截图一致性检测布局稳定性的视觉回归
可解释性元素属性提取ARIA标签、语义化HTML验证

When to Use Vibium

何时使用Vibium

USE Vibium for:
  • Element stability metrics (auto-wait duration analysis)
  • Visual consistency checks (screenshot comparison)
  • MCP-native AI agent integration
  • Lightweight Docker images (400MB vs 1.2GB)
USE Playwright for:
  • Console error detection (Vibium V1 lacks console API)
  • Network performance metrics (BiDi network APIs coming in V2)
  • Comprehensive browser coverage (Firefox, Safari)
  • Production-proven stability (Vibium V1 released Dec 2024)
推荐使用Vibium的场景:
  • 元素稳定性指标(自动等待时长分析)
  • 视觉一致性检查(截图对比)
  • MCP原生AI Agent集成
  • 轻量级Docker镜像(400MB vs 1.2GB)
推荐使用Playwright的场景:
  • 控制台错误检测(Vibium V1缺少控制台API)
  • 网络性能指标(BiDi网络API将在V2版本中推出)
  • 全面的浏览器覆盖(Firefox、Safari)
  • 经过生产环境验证的稳定性(Vibium V1于2024年12月发布)

Hybrid Assessment Example

混合评估示例

typescript
// Testability assessment using both engines
const assessment = {
  // Playwright: Comprehensive metrics
  playwright: await runPlaywrightAssessment(url),

  // Vibium: Stability metrics
  vibium: {
    elementStability: await measureAutoWaitDuration(url),
    visualConsistency: await compareScreenshots(url),
    accessibilityAttributes: await extractARIALabels(url)
  }
};

// Enhanced Observability Score
const observability =
  (assessment.playwright.consoleErrors * 0.6) +
  (assessment.vibium.elementStability * 0.4);
typescript
// 使用双引擎进行可测试性评估
const assessment = {
  // Playwright: 所有10项原则的核心评估
  playwright: await runPlaywrightAssessment(url),

  // Vibium: 稳定性指标的可选增强
  vibium: {
    elementStability: await measureAutoWaitDuration(url),
    visualConsistency: await compareScreenshots(url),
    accessibilityAttributes: await extractARIALabels(url)
  }
};

// 增强后的可观测性评分
const observability =
  (assessment.playwright.consoleErrors * 0.6) +
  (assessment.vibium.elementStability * 0.4);

Vibium MCP Tools for Testability

迁移策略

typescript
// 1. Element Stability Measurement
const browser = await browser_launch();
await browser_navigate({ url });
const startTime = Date.now();
const element = await browser_find({ selector: ".critical-element" });
const autoWaitDuration = Date.now() - startTime;
// Lower duration = better stability

// 2. Visual Consistency Check
const screenshot1 = await browser_screenshot();
await browser_navigate({ url }); // Reload
const screenshot2 = await browser_screenshot();
const visualDiff = compareImages(screenshot1.png, screenshot2.png);
// Lower diff = better stability

// 3. Accessibility Attribute Extraction
const elements = await browser_find({ selector: "button, a, input" });
const ariaLabels = elements.map(el => el.attributes["aria-label"]);
const semanticScore = (ariaLabels.filter(Boolean).length / elements.length) * 100;
当前版本(V2.2): 混合模式
  • Playwright: 所有10项原则的核心引擎
  • Vibium: 稳定性指标的可选增强
未来版本(V3.0): 当Vibium V2发布时
  • 若满足以下条件,将评估Vibium作为核心引擎的可行性:
    • 具备控制台/网络API
    • 经过生产环境稳定性验证
    • 社区采用率提升

Migration Strategy

Agent协作提示

内存命名空间

Current (V2.2): Hybrid approach
  • Playwright: Primary engine for all 10 principles
  • Vibium: Optional enhancement for stability metrics
Future (V3.0): When Vibium V2 ships
  • Evaluate Vibium as primary engine if:
    • Console/Network APIs available
    • Production stability proven
    • Community adoption increases
aqe/testability/
├── assessments/*       - 按URL分类的评估结果
├── historical/*        - 用于趋势分析的历史评分
├── recommendations/*   - 改进建议
├── integration/*       - QX集成数据
└── vibium/*           - Vibium专属指标(可选)

Agent Coordination Hints

集群协作

Memory Namespace

aqe/testability/
├── assessments/*       - Assessment results by URL
├── historical/*        - Historical scores for trend analysis
├── recommendations/*   - Improvement recommendations
├── integration/*       - QX integration data
└── vibium/*           - Vibium-specific metrics (optional)
typescript
const testabilityFleet = await FleetManager.coordinate({
  strategy: 'testability-assessment',
  agents: [
    'qe-quality-analyzer',  - 核心评估
    'qx-partner',           - UX集成
    'qe-visual-tester'      - 视觉验证
  ],
  topology: 'sequential'
});

Fleet Coordination

常见问题与解决方案

typescript
const testabilityFleet = await FleetManager.coordinate({
  strategy: 'testability-assessment',
  agents: [
    'qe-quality-analyzer',  // Primary assessment
    'qx-partner',           // UX integration
    'qe-visual-tester'      // Visual validation
  ],
  topology: 'sequential'
});

问题解决方案
测试超时增加超时时间:
timeout 300 ./scripts/run-assessment.sh URL
结果不完整检查控制台错误,增加网络超时时间
报告未自动打开使用
AUTO_OPEN=false
,手动打开报告
配置未生效使用
TEST_URL
环境变量替代配置文件
Vibium不可用通过
claude mcp add vibium -- npx -y vibium
安装(可选)
混合模式报错Vibium为可选组件,无Vibium时评估仍可正常运行

Common Issues & Solutions

相关技能

IssueSolution
Tests timing outIncrease timeout:
timeout 300 ./scripts/run-assessment.sh URL
Partial resultsCheck console errors, increase network timeout
Report not openingUse
AUTO_OPEN=false
, open manually
Config not updatingUse
TEST_URL
env var instead
Vibium not availableInstall via
claude mcp add vibium -- npx -y vibium
(optional)
Hybrid mode errorsVibium is optional; assessments work without it

  • accessibility-testing - WCAG合规性测试(与可解释性有重叠)
  • visual-testing-advanced - UI一致性测试
  • performance-testing - 加载时间指标测试

Related Skills

致谢与参考

框架来源

  • accessibility-testing - WCAG compliance (overlaps with Explainability)
  • visual-testing-advanced - UI consistency
  • performance-testing - Load time metrics

Credits & References

实现细节

Framework Origin

Implementation

Vibium资源


Vibium Resources

注意事项


可测试性是一种投资,而非事后弥补。
良好的可测试性:
  • 减少调试时间
  • 实现更快的反馈循环
  • 更易发现缺陷
  • 支持持续测试
低评分=高风险。请按权重×影响优先级安排改进工作。

Remember

Testability is an investment, not an afterthought.
Good testability:
  • Reduces debugging time
  • Enables faster feedback loops
  • Makes defects easier to find
  • Supports continuous testing
Low scores = High risk. Prioritize improvements by weight × impact.