test-anti-patterns

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Test Anti-Pattern Detection

测试反模式检测

Analyze .NET test code for anti-patterns, code smells, and quality issues that undermine test reliability, maintainability, and diagnostic value.

分析.NET测试代码中的反模式、代码异味和质量问题，这些问题会损害测试的可靠性、可维护性和诊断价值。

When to Use

适用场景

User asks to review test quality or find test smells
User wants to know why tests are flaky or unreliable
User asks "are my tests good?" or "what's wrong with my tests?"
User requests a test audit or test code review
User wants to improve existing test code

用户要求评审测试质量或查找测试异味
用户想了解测试不稳定、不可靠的原因
用户询问「我的测试写得好吗？」或「我的测试有什么问题？」
用户请求测试审计或测试代码评审
用户想要优化现有测试代码

When Not to Use

不适用场景

User wants to write new tests from scratch (use
```
writing-mstest-tests
```
)
User wants to run or execute tests (use
```
run-tests
```
)
User wants to migrate between test frameworks or versions (use migration skills)
User wants to measure code coverage (out of scope)

用户想要从零编写新测试（使用
```
writing-mstest-tests
```
能力）
用户想要运行或执行测试（使用
```
run-tests
```
能力）
用户想要在测试框架或版本之间迁移（使用迁移相关能力）
用户想要统计代码覆盖率（不在能力范围内）

Inputs

输入参数

Input	Required	Description
Test code	Yes	One or more test files or classes to analyze
Production code	No	The code under test, for context on what tests should verify
Specific concern	No	A focused area like "flakiness" or "naming" to narrow the review

输入	是否必填	描述
测试代码	是	待分析的一个或多个测试文件或类
生产代码	否	被测业务代码，用于提供测试预期验证逻辑的上下文
特定关注项	否	可指定聚焦领域如「不稳定性」或「命名规范」来缩小评审范围

Workflow

工作流程

Step 1: Gather the test code

步骤1：收集测试代码

Read the test files the user wants reviewed. If the user points to a directory or project, scan for all test files (files containing

[TestClass]

[TestMethod]

[Fact]

[Test]

, or

[Theory]

attributes).

If production code is available, read it too -- this is critical for detecting tests that are coupled to implementation details rather than behavior.

读取用户要评审的测试文件。如果用户指定了目录或项目，扫描所有测试文件（包含

[TestClass]

、

[TestMethod]

、

[Fact]

、

[Test]

或

[Theory]

特性的文件）。

如果提供了生产代码也需要读取——这对检测「测试耦合实现细节而非行为」的问题至关重要。

Step 2: Scan for anti-patterns

步骤2：扫描反模式

Check each test file against the anti-pattern catalog below. Report findings grouped by severity.

对照下方反模式目录检查每个测试文件，按严重程度分组上报发现的问题。

Critical -- Tests that give false confidence

严重——会传递错误信心的测试

Anti-Pattern	What to Look For
No assertions	Test methods that execute code but never assert anything. A passing test without assertions proves nothing.
Swallowed exceptions	`try { ... } catch { }` or `catch (Exception)` without rethrowing or asserting. Failures are silently hidden.
Assert in catch block only	`try { Act(); } catch (Exception ex) { Assert.Fail(ex.Message); }` -- use `Assert.ThrowsException` or equivalent instead. The test passes when no exception is thrown even if the result is wrong.
Always-true assertions	`Assert.IsTrue(true)` , `Assert.AreEqual(x, x)` , or conditions that can never fail.
Commented-out assertions	Assertions that were disabled but the test still runs, giving the illusion of coverage.

反模式	识别要点
无断言	测试方法执行了代码但没有任何断言逻辑。无断言的测试通过不证明任何逻辑正确性。
异常被吞	使用 `try { ... } catch { }` 或 `catch (Exception)` 但没有重抛异常也没有断言，故障被静默隐藏。
仅在catch块内断言	写法如 `try { Act(); } catch (Exception ex) { Assert.Fail(ex.Message); }` ——应改用 `Assert.ThrowsException` 或等效方法。当前写法下如果没有抛出异常，哪怕结果错误测试也会通过。
永真断言	如 `Assert.IsTrue(true)` 、 `Assert.AreEqual(x, x)` ，或永远不会失败的判断条件。
注释掉的断言	断言被禁用但测试仍然运行，造成有测试覆盖的假象。

High -- Tests likely to cause pain

高危——大概率会带来维护负担的测试

Anti-Pattern	What to Look For
Flakiness indicators	`Thread.Sleep(...)` , `Task.Delay(...)` for synchronization, `DateTime.Now` / `DateTime.UtcNow` without abstraction, `Random` without a seed, environment-dependent paths.
Test ordering dependency	Static mutable fields modified across tests, `[TestInitialize]` that doesn't fully reset state, tests that fail when run individually but pass in suite (or vice versa).
Over-mocking	More mock setup lines than actual test logic. Verifying exact call sequences on mocks rather than outcomes. Mocking types the test owns.
Implementation coupling	Testing private methods via reflection, asserting on internal state, verifying exact method call counts on collaborators instead of observable behavior.
Broad exception assertions	`Assert.ThrowsException<Exception>(...)` instead of the specific exception type. Also: `[ExpectedException(typeof(Exception))]` .

反模式	识别要点
不稳定测试指标	使用 `Thread.Sleep(...)` 、 `Task.Delay(...)` 做同步，未做抽象直接使用 `DateTime.Now` / `DateTime.UtcNow` ，未指定种子使用 `Random` ，依赖环境特定路径。
测试执行顺序依赖	静态可变字段在多测试间被修改， `[TestInitialize]` 没有完全重置状态，测试单独运行失败但在套件中运行成功（或反之）。
过度Mock	Mock配置代码行数多于实际测试逻辑，验证Mock的精确调用顺序而非最终结果，Mock测试本身可控的类型。
实现耦合	通过反射测试私有方法，断言内部状态，验证协作方的精确方法调用次数而非可观测行为。
宽泛异常断言	使用 `Assert.ThrowsException<Exception>(...)` 而非具体异常类型，也包括 `[ExpectedException(typeof(Exception))]` 写法。

Medium -- Maintainability and clarity issues

中危——可维护性和清晰度问题

Anti-Pattern	What to Look For
Poor naming	Test names like `Test1` , `TestMethod` , names that don't describe the scenario or expected outcome. Good: `Add_NegativeNumber_ThrowsArgumentException` .
Magic values	Unexplained numbers or strings in arrange/assert: `Assert.AreEqual(42, result)` -- what does 42 mean?
Duplicate tests	Three or more test methods with near-identical bodies that differ only in a single input value. Should be data-driven ( `[DataRow]` , `[Theory]` , `[TestCase]` ). Note: Two tests covering distinct boundary conditions (e.g., zero vs. negative) are NOT duplicates -- separate tests for different edge cases provide clearer failure diagnostics and are a valid practice.
Giant tests	Test methods exceeding ~30 lines or testing multiple behaviors at once. Hard to diagnose when they fail.
Assertion messages that repeat the assertion	`Assert.AreEqual(expected, actual, "Expected and actual are not equal")` adds no information. Messages should describe the business meaning.
Missing AAA separation	Arrange, Act, Assert phases are interleaved or indistinguishable.

反模式	识别要点
命名不规范	测试命名如 `Test1` 、 `TestMethod` ，名称无法描述测试场景或预期结果。规范示例： `Add_NegativeNumber_ThrowsArgumentException` 。
魔法值	准备/断言阶段出现未解释的数字或字符串：如 `Assert.AreEqual(42, result)` ——42代表什么含义？
重复测试	三个及以上测试方法的实现几乎完全相同，仅输入值不同，应该改用数据驱动写法（ `[DataRow]` 、 `[Theory]` 、 `[TestCase]` ）。注意：覆盖不同边界条件的两个测试（如零值 vs 负值）不属于重复——为不同边界场景编写单独测试可以提供更清晰的故障诊断，是合理实践。
巨型测试	测试方法超过约30行，或同时测试多个行为，失败时难以定位问题。
无效断言信息	如 `Assert.AreEqual(expected, actual, "Expected and actual are not equal")` 没有提供额外信息，断言信息应该描述业务含义。
缺失AAA阶段划分	准备（Arrange）、执行（Act）、断言（Assert）阶段交错或无法区分。

Low -- Style and hygiene

低危——风格和规范问题

Anti-Pattern	What to Look For
Unused test infrastructure	`[TestInitialize]` / `[SetUp]` that does nothing, test helper methods that are never called.
IDisposable not disposed	Test creates `HttpClient` , `Stream` , or other disposable objects without `using` or cleanup.
Console.WriteLine debugging	Leftover `Console.WriteLine` or `Debug.WriteLine` statements used during test development.
Inconsistent naming convention	Mix of naming styles in the same test class (e.g., some use `Method_Scenario_Expected` , others use `ShouldDoSomething` ).

反模式	识别要点
未使用的测试基础设施	空实现的 `[TestInitialize]` / `[SetUp]` ，从未被调用的测试辅助方法。
IDisposable资源未释放	测试创建了 `HttpClient` 、 `Stream` 或其他可释放对象，但没有使用 `using` 或手动清理。
调试日志残留	测试开发阶段遗留的 `Console.WriteLine` 或 `Debug.WriteLine` 语句。
命名规范不一致	同一个测试类中混用多种命名风格（如部分使用 `方法_场景_预期` 格式，部分使用 `ShouldDoSomething` 格式）。

Step 3: Calibrate severity honestly

步骤3：客观校准严重程度

Before reporting, re-check each finding against these severity rules:

Critical/High: Only for issues that cause tests to give false confidence or be unreliable. A test that always passes regardless of correctness is Critical. Flaky shared state is High.
Medium: Only for issues that actively harm maintainability -- 5+ nearly-identical tests, truly meaningless names like
```
Test1
```
.
Low: Cosmetic naming mismatches, minor style preferences, assertion messages that could be better. When in doubt, rate Low.
Not an issue: Separate tests for distinct boundary conditions (zero vs. negative vs. null). Explicit per-test setup instead of
```
[TestInitialize]
```
(this improves isolation). Tests that are short and clear but could theoretically be consolidated.

IMPORTANT: If the tests are well-written, say so clearly up front. Do not inflate severity to justify the review. A review that finds zero Critical/High issues and only minor Low suggestions is a valid and valuable outcome. Lead with what the tests do well.

上报问题前，对照以下规则重新检查每个问题的严重等级：

严重/高危：仅用于会导致测试传递错误信心或不可靠的问题。无论逻辑是否正确都永远通过的测试属于严重，由共享状态导致的不稳定属于高危。
中危：仅用于确实损害可维护性的问题——5个及以上几乎完全相同的测试、
```
Test1
```
这类完全无意义的命名。
低危：表层命名不匹配、次要风格偏好、可优化的断言信息。拿不准的时候就归为低危。
不是问题：覆盖不同边界条件的独立测试（零值 vs 负值 vs 空值），不使用
```
[TestInitialize
```
而是每个测试单独配置（这反而会提升隔离性），简短清晰、理论上可合并但不需要合并的测试。

重要提示：如果测试写得很好，要在最开头明确说明。不要为了证明评审有价值而抬高问题严重等级。一个仅发现零个严重/高危问题、只有少量低危优化建议的评审是合理且有价值的结果，要先说明测试的优点。

Step 4: Report findings

步骤4：上报发现的问题

Present findings in this structure:

Summary -- Total issues found, broken down by severity (Critical / High / Medium / Low). If tests are well-written, lead with that assessment.
Critical and High findings -- List each with:
- The anti-pattern name
- The specific location (file, method name, line)
- A brief explanation of why it's a problem
- A concrete fix (show before/after code when helpful)
Medium and Low findings -- Summarize in a table unless the user wants full detail
Positive observations -- Call out things the tests do well (sealed class, specific exception types, data-driven tests, clear AAA structure, proper use of fakes, good naming). Don't only report negatives.

按以下结构呈现结果：

总结——发现的问题总数，按严重程度拆分（严重/高危/中危/低危）。如果测试质量很好，要先给出这个结论。
严重和高危问题——逐个列出，包含：
- 反模式名称
- 具体位置（文件、方法名、行号）
- 问题影响的简要说明
- 具体修复方案（必要时展示修改前后的代码对比）
中危和低危问题——除非用户要求完整细节，否则用表格汇总展示
正面评价——指出测试做得好的地方（密封类、特定异常类型断言、数据驱动测试、清晰的AAA结构、正确使用测试替身、命名规范等），不要只报问题。

Step 5: Prioritize recommendations

步骤5：优先级排序建议

If there are many findings, recommend which to fix first:

Critical -- Fix immediately, these tests may be giving false confidence
High -- Fix soon, these cause flakiness or maintenance burden
Medium/Low -- Fix opportunistically during related edits

如果发现的问题很多，按以下顺序推荐修复优先级：

严重——立即修复，这类测试可能正在传递错误的信心
高危——尽快修复，这类问题会导致测试不稳定或增加维护负担
中危/低危——在后续相关代码修改时顺带修复

Validation

校验项

Common Pitfalls

常见误区

Pitfall	Solution
Reporting style issues as critical	Naming and formatting are Medium/Low, never Critical
Suggesting rewrites instead of targeted fixes	Show minimal diffs -- change the assertion, not the whole test
Flagging intentional design choices	If `Thread.Sleep` is in an integration test testing actual timing, that's not an anti-pattern. Consider context.
Inventing false positives on clean code	If tests follow best practices, say so. A review finding "0 Critical, 0 High, 1 Low" is perfectly valid. Don't inflate findings to justify the review.
Flagging separate boundary tests as duplicates	Two tests for zero and negative inputs test different edge cases. Only flag as duplicates when 3+ tests have truly identical bodies differing by a single value.
Rating cosmetic issues as Medium	Naming mismatches (e.g., method name says `ArgumentException` but asserts `ArgumentOutOfRangeException` ) are Low, not Medium -- the test still works correctly.
Ignoring the test framework	xUnit uses `[Fact]` / `[Theory]` , NUnit uses `[Test]` / `[TestCase]` , MSTest uses `[TestMethod]` / `[DataRow]` -- use correct terminology
Missing the forest for the trees	If 80% of tests have no assertions, lead with that systemic issue rather than listing every instance

误区	解决方案
把风格问题判定为严重等级	命名和格式问题属于中危/低危，永远不能归为严重
建议完全重写而非定向修复	展示最小修改差异——只改有问题的断言，不要重写整个测试
把 intentional 设计判定为问题	如果 `Thread.Sleep` 是用于测试实际时序的集成测试，就不属于反模式，要结合上下文判断
在干净的代码上强行制造问题	如果测试符合最佳实践，就如实说明。「0个严重、0个高危、1个低危」的评审结果完全合理，不要为了证明评审有价值而夸大问题
把边界条件独立测试判定为重复	零值和负值输入的两个测试覆盖的是不同边界场景，只有当3个及以上测试的实现完全相同、仅单值不同时才判定为重复
把表层问题判定为中危	命名不匹配（如方法名写的是 `ArgumentException` 但实际断言的是 `ArgumentOutOfRangeException` ）属于低危，不是中危——测试本身功能是正常的
忽略测试框架差异	xUnit用 `[Fact]` / `[Theory]` ，NUnit用 `[Test]` / `[TestCase]` ，MSTest用 `[TestMethod]` / `[DataRow]` ——要使用正确的术语
抓小放大	如果80%的测试都没有断言，要先说明这个系统性问题，而不是逐个罗列每个实例