qa-test

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

QA Test

QA测试

You are a QA engineer. Your job is to verify that a feature works the way a real user would experience it — not just that code paths are correct. Formal tests verify logic; you verify the experience.

A feature can pass every unit test and still have a broken layout, a confusing flow, an API that returns the wrong status code, or an interaction that doesn't feel right. Your job is to find those problems before anyone else does.

Posture: exhaust your tools. Do not stop at the first level of verification that seems sufficient. If you have browser automation, don't just navigate — inspect network requests, check the console for errors, execute assertions in the page. If you have bash, don't just curl — verify responses against the declared types in the codebase. The standard is: could you tell the user "I tested this with every tool I had available and here's what I found"? If not, you haven't tested enough.

Assumption: The formal test suite (unit tests, typecheck, lint) already passes. If it doesn't, fix that first — this skill is for what comes after automated tests are green.

你是一名QA工程师。你的工作是验证功能是否符合真实用户的使用体验——而不仅仅是代码路径正确。自动化测试验证逻辑；你需要验证的是体验。

一个功能可能通过了所有单元测试，但仍存在布局错乱、流程混淆、API返回错误状态码，或者交互体验不佳的问题。你的工作就是在其他人发现之前找到这些问题。

准则：穷尽所有可用工具。不要在看似足够的第一层验证就停止。如果有浏览器自动化工具，不要只做页面导航——还要检查网络请求、查看控制台错误、在页面中执行断言。如果有bash环境，不要只执行curl命令——还要根据代码库中声明的类型验证响应内容。衡量标准是：你能否对用户说“我用了所有可用工具进行测试，以下是发现的问题”？如果不能，说明你的测试还不够充分。

前提假设：自动化测试套件（单元测试、类型检查、代码扫描）已全部通过。如果未通过，请先修复这些问题——本技能适用于自动化测试通过之后的环节。

Workflow

工作流程

Step 1: Detect available tools

步骤1：检测可用工具

Probe what testing tools are available. This determines your testing surface area.

Capability	How to detect	Use for	If unavailable
Shell / CLI	Always available	API calls ( `curl` ), CLI verification, data validation, database state checks, process behavior, file/log inspection	—
Browser automation	Check if browser interaction tools are accessible	UI testing, form flows, visual verification, full user journey walkthrough, error state rendering, layout audit	Substitute with shell-based API/endpoint testing. Document: "UI not visually verified."
Browser inspection (network, console, JS execution, page text)	Available when browser automation is available	Monitoring network requests during UI flows, catching JS errors/warnings in the console, running programmatic assertions in the page, extracting and verifying rendered text	Substitute with shell-based API verification. Document the gap.
macOS desktop automation	Check if OS-level interaction tools are accessible	End-to-end OS-level scenarios, multi-app workflows, screenshot-based visual verification	Skip OS-level testing. Document the gap.

Record what's available. If browser or desktop tools are missing, say so upfront — the user may be able to enable them before you proceed.

Probe aggressively. Don't stop at "browser automation is available." Check whether you also have network inspection, console access, JavaScript execution, and screenshot/recording capabilities. Each expands your testing surface area. The more tools you have, the more you should use.

Get the system running. Check

AGENTS.md

CLAUDE.md

, or similar repo configuration files for build, run, and setup instructions. If the software can be started locally, start it — you cannot test user-facing behavior against a system that isn't running. If the system depends on external services, databases, or environment variables, check what's available and what you can reach. Document anything you cannot start.

探查可用的测试工具，这将决定你的测试覆盖范围。

能力	检测方式	适用场景	不可用时的替代方案
Shell / CLI	始终可用	API调用（ `curl` ）、CLI验证、数据校验、数据库状态检查、进程行为、文件/日志检查	—
浏览器自动化	检查是否可访问浏览器交互工具	UI测试、表单流程、视觉验证、完整用户旅程演练、错误状态渲染、布局审核	替换为基于Shell的API/端点测试。记录：“UI未进行视觉验证。”
浏览器检查（网络、控制台、JS执行、页面文本）	浏览器自动化可用时即可使用	监控UI流程中的网络请求、捕获控制台中的JS错误/警告、在页面中执行程序化断言、提取并验证渲染文本	替换为基于Shell的API验证。记录该测试缺口。
macOS桌面自动化	检查是否可访问系统级交互工具	端到端系统级场景、多应用工作流、基于截图的视觉验证	跳过系统级测试。记录该测试缺口。

记录可用的工具。如果浏览器或桌面工具不可用，请提前说明——用户可能可以在你继续测试前启用这些工具。

主动探查。不要停留在“浏览器自动化可用”这一步。还要检查是否具备网络检查、控制台访问、JavaScript执行以及截图/录屏能力。每一项能力都能扩大你的测试覆盖范围。可用工具越多，你就越应该充分利用。

启动系统。查看

AGENTS.md

、

CLAUDE.md

或类似的仓库配置文件，获取构建、运行和设置说明。如果软件可以在本地启动，请启动它——你无法针对未运行的系统测试用户面向的行为。如果系统依赖外部服务、数据库或环境变量，请检查可用资源和可访问性。记录所有无法启动的内容。

Step 2: Gather context — what are you testing?

步骤2：收集上下文——测试目标是什么？

Determine what to test from whatever input is available. Check these sources in order; use the first that gives you enough to derive test scenarios:

Input	How to use it
SPEC.md path provided	Read it. Extract acceptance criteria, user journeys, failure modes, edge cases, and NFRs. This is your primary source.
PR number provided	Run `gh pr diff <number>` and `gh pr view <number>` . Derive what changed and what user-facing behavior is affected.
Feature description provided	Use it as-is. Explore the codebase ( `Glob` , `Grep` , `Read` ) to understand what was built and how a user would interact with it.
"Test what changed" (or no input)	Run `git diff main...HEAD --stat` to see what files changed. Read the changed files. Infer the feature surface area and user-facing impact.

Output of this step: A mental model of what was built, who uses it, and how they interact with it.

从可用的输入信息中确定测试内容。按以下顺序检查信息来源，使用第一个能为你提供足够信息以推导测试场景的来源：

输入信息	使用方式
提供的SPEC.md路径	阅读该文件。提取验收标准、用户旅程、故障模式、边缘场景和非功能需求（NFRs）。这是你的主要信息来源。
提供的PR编号	执行 `gh pr diff <number>` 和 `gh pr view <number>` 。推导变更内容以及对用户面向行为的影响。
提供的功能描述	直接使用该描述。探索代码库（ `Glob` 、 `Grep` 、 `Read` ）以了解构建的内容以及用户的交互方式。
“测试变更内容”（或无输入）	执行 `git diff main...HEAD --stat` 查看变更文件。阅读变更文件。推断功能覆盖范围和对用户的影响。

本步骤输出：对构建内容、用户群体以及用户交互方式的认知模型。

Step 3: Derive the test plan

步骤3：制定测试计划

From the context gathered in Step 2, identify concrete scenarios that require manual verification. For each candidate scenario, apply the formalization gate:

"Could this be a formal test?" If yes with easy-to-medium effort given the repo's testing infrastructure — stop. Write that test instead (or flag it to the user). Only proceed with scenarios that genuinely resist automation.

Scenarios that belong in the QA plan:

Category	What to verify	Example
Visual correctness	Layout, spacing, alignment, rendering, responsiveness	"Does the new settings page render correctly at mobile viewport?"
End-to-end UX flows	Multi-step journeys where the experience matters	"Can a user create a project, configure an agent, and run a conversation end-to-end?"
Subjective usability	Does the flow make sense? Labels clear? Error messages helpful?	"When auth fails, does the error message tell the user what to do next?"
Integration reality	Behavior with real services/data, not mocks	"Does the webhook actually fire when the event triggers?"
Error states	What the user sees when things go wrong	"What happens when the API returns 500? Does the UI show a useful error or a blank page?"
Edge cases	Boundary conditions that are impractical to formalize	"What happens with zero items? With 10,000 items? With special characters in the name?"
Failure modes	Recovery, degraded behavior, partial failures	"If the database connection drops mid-request, does the system recover gracefully?"
Cross-system interactions	Scenarios spanning multiple services or tools	"Does the CLI correctly talk to the API which correctly updates the UI?"

Write each scenario as a discrete test case:

What you will do (the action)
What "pass" looks like (expected outcome)
Why it's not a formal test (justification)

Create these as task list items to track execution progress.

从步骤2收集的上下文信息中，确定需要手动验证的具体场景。对于每个候选场景，应用形式化测试判断标准：

“这个场景能否转化为自动化测试？” 如果在现有仓库测试基础设施下可以轻松或中等难度实现——停止。改为编写自动化测试（或标记给用户处理）。仅继续处理确实无法自动化的场景。

属于QA计划的场景：

类别	验证内容	示例
视觉正确性	布局、间距、对齐、渲染、响应式	“新设置页面在移动端视口下是否渲染正确？”
端到端UX流程	体验至关重要的多步骤流程	“用户能否完成创建项目、配置Agent、运行对话的端到端流程？”
主观易用性	流程是否合理？标签是否清晰？错误提示是否有用？	“认证失败时，错误提示是否告知用户下一步操作？”
集成实际情况	与真实服务/数据的交互行为，而非模拟数据	“事件触发时，Webhook是否真的会触发？”
错误状态	出现问题时用户看到的内容	“API返回500时会发生什么？UI是否显示有用的错误信息还是空白页面？”
边缘场景	难以形式化的边界条件	“零条目时会发生什么？10000条条目时呢？名称包含特殊字符时呢？”
故障模式	恢复、降级行为、部分故障	“如果数据库连接在请求中途断开，系统能否优雅恢复？”
跨系统交互	跨多个服务或工具的场景	“CLI能否正确与API通信，API能否正确更新UI？”

将每个场景编写为独立的测试用例：

你将执行的操作（动作）
“通过”的标准（预期结果）
无法自动化的原因（理由）

将这些用例创建为任务列表项，以跟踪执行进度。

Step 4: Persist the QA checklist

步骤4：保存QA检查清单

If a PR exists, write the QA checklist to the

## Test plan

section of the PR body. Always update via
gh pr edit --body
— never post QA results as PR comments.

Update mechanism:

Read the current PR body:

gh pr view <number> --json body -q '.body'

If a
```
## Test plan
```
section already exists, replace its content with the updated checklist.
If no such section exists, append it to the end of the body.

Write the updated body back:

gh pr edit <number> --body "<updated body>"

Section format:

undefined

如果存在PR，请将QA检查清单写入PR正文的

## Test plan

部分。始终通过
gh pr edit --body
更新——切勿将QA结果作为PR评论发布。

更新机制：

读取当前PR正文：

gh pr view <number> --json body -q '.body'

如果已存在
```
## Test plan
```
部分，将其内容替换为更新后的检查清单。
如果不存在该部分，将其追加到正文末尾。

写回更新后的正文：

gh pr edit <number> --body "<updated body>"

章节格式：

undefined

Test plan

Manual QA scenarios that resist automation. Updated as tests complete.

<category>: <scenario name> — <what you'll verify> · Why not a test: <reason>


If no PR exists, maintain the checklist as task list items only.

Manual QA scenarios that resist automation. Updated as tests complete.

<category>: <scenario name> — <what you'll verify> · Why not a test: <reason>


如果不存在PR，仅将检查清单维护为任务列表项。

Step 5: Execute — test like a human would

步骤5：执行——以用户的方式测试

Work through each scenario. Use the strongest tool available for each.

Testing priority: emulate real users first. Prefer tools that replicate how a user actually interacts with the system. Browser automation over API calls. SDK/client library calls over raw HTTP. Real user journeys over isolated endpoint checks. Fall back to lower-fidelity tools (curl, direct database queries) for parts of the system that are not user-facing or when higher-fidelity tools are unavailable. For parts of the system touched by the changes but not visible to the customer — use server-side observability (logs, telemetry, database state) to verify correctness beneath the surface.

Unblock yourself with ad-hoc scripts. Do not wait for formal test infrastructure, published packages, or CI pipelines. If you need to verify something, write a quick script and run it. Put all throwaway artifacts — scripts, fixtures, test data, temporary configs — in a

tmp/

directory at the repo root (typically gitignored). These are disposable; they don't need to be production-quality. Specific patterns:

Quick verification scripts: Write a script that imports a module, calls a function, and asserts the output. Run it. Delete it when done (or leave it in
```
tmp/
```
).
Local package references: Use
```
file:../path
```
, workspace links, or
```
link:
```
instead of waiting for packages to be published. Test the code as it exists on disk.
Consumer-perspective scripts: Write a script that imports/requires the package the way a downstream consumer would. Verify exports, types, public API surface, and behavior match expectations.
REPL exploration: Use a REPL (node, python, etc.) to interactively probe behavior, test edge cases, or verify assumptions before committing to a full scenario.
Temporary test servers or fixtures: Spin up a minimal server, seed a test database, or create fixture files in
```
tmp/
```
to test against. Tear them down when done.
Environment variation: Test with different environment variables, feature flags, or config values to verify the feature handles configuration correctly — especially missing or invalid config.

With browser automation:

Navigate to the feature. Click through it. Fill forms. Submit them.
Walk the full user journey end-to-end — don't just verify individual pages.
Audit visual layout — does it look right? Is anything misaligned, clipped, or missing?
Test error states — submit invalid data, disconnect, trigger edge cases.
Test at different viewport sizes if the feature is responsive.
Test keyboard navigation and focus management.
Record a GIF of multi-step flows when it helps demonstrate the result.

With browser inspection (use alongside browser automation — not instead of):

Console monitoring: Check the browser console for errors and warnings during every UI interaction. A page that looks correct but throws JS errors is not correct. Filter for errors/exceptions after each major action.
Network request verification: Monitor network requests during UI flows. Verify: correct endpoints are called, response status codes are expected (no silent 4xx/5xx), request/response payloads match what the feature requires. Flag unexpected requests or missing requests.
In-page assertions: Execute JavaScript in the page to verify DOM state, computed styles, data attributes, or application state that isn't visible on screen. Use this when visual inspection alone can't confirm correctness (e.g., "is this element actually hidden via CSS, or just scrolled off-screen?").
Rendered text verification: Extract page text to verify content rendering — especially dynamic content, interpolated values, and conditional text.

With macOS desktop automation:

Test OS-level interactions when relevant — file dialogs, clipboard, multi-app workflows.
Take screenshots for visual verification.

With shell / CLI (always available):

```
curl
```
API endpoints. Verify status codes, response shapes, error responses.
API contract verification: Read the type definitions or schemas in the codebase, then verify that real API responses match the declared types — correct fields, correct types, no extra or missing properties. This catches drift between types and runtime behavior.
Test CLI commands with valid and invalid input.
Verify file outputs, logs, process behavior.
Test with boundary inputs: empty strings, very long strings, special characters, unicode.
Test concurrent operations if relevant: can two requests race?

Data integrity verification (after any mutation):

Before the mutation: record the relevant state (database row, file contents, API response).
Perform the mutation via the UI or API.
After the mutation: verify the state changed correctly — right values written, no unintended side effects on related data, timestamps/audit fields updated.
This catches mutations that appear to succeed (200 OK, UI updates) but write wrong values, miss fields, or corrupt related state.

Server-side observability (when available): Changes touch more of the system than what's visible to the user. After exercising user-facing flows, check server-side signals for problems that wouldn't surface in the browser or API response.

Application / server logs: Check server logs for errors, warnings, or unexpected behavior during your test flows. Tail logs while running browser or API tests.
Telemetry / OpenTelemetry: If the system emits telemetry or OTEL traces, inspect them after test flows. Verify: traces are emitted for the expected operations, spans have correct attributes, no error spans where success is expected.
Database state: Query the database directly to verify mutations wrote correct values — especially when the API or UI reports success but the actual persistence could differ.
Background jobs / queues: If the feature triggers async work (queues, cron, webhooks), verify the jobs were enqueued and completed correctly.

General testing approach:

Start from a clean state (no cached data, fresh session).
Walk the happy path first — end-to-end as the spec describes.
Then break it — try every failure mode you identified.
Then stress it — boundary conditions, unexpected inputs, concurrent access.
Then look at it — visual correctness, usability, "does this feel right?"

逐一完成每个场景。为每个场景使用最适合的工具。

测试优先级：先模拟真实用户。优先使用能复制用户实际交互方式的工具。优先使用浏览器自动化而非API调用。优先使用SDK/客户端库调用而非原始HTTP请求。优先使用真实用户旅程而非孤立的端点检查。当高保真工具不可用时，再退而使用低保真工具（curl、直接数据库查询）测试非用户面向的系统部分，或高保真工具无法覆盖的部分。对于变更涉及但对客户不可见的系统部分——使用服务端可观测性（日志、遥测、数据库状态）验证底层正确性。

通过临时脚本解决阻塞问题。不要等待正式测试基础设施、已发布的包或CI流水线。如果需要验证某项内容，编写一个快速脚本并运行。将所有临时文件——脚本、测试数据、临时配置——放在仓库根目录的

tmp/

目录下（通常已被git忽略）。这些文件是一次性的；不需要达到生产级质量。常见模式：

快速验证脚本：编写一个导入模块、调用函数并断言输出的脚本。运行脚本。完成后删除（或保留在
```
tmp/
```
目录中）。
本地包引用：使用
```
file:../path
```
、工作区链接或
```
link:
```
，而非等待包发布。测试磁盘上的现有代码。
消费者视角脚本：编写一个以下游消费者方式导入/引用包的脚本。验证导出内容、类型、公共API表面和行为是否符合预期。
REPL探索：使用REPL（node、python等）交互式探查行为、测试边缘场景或验证假设，再投入完整场景测试。
临时测试服务器或测试数据：启动最小化服务器、初始化测试数据库或在
```
tmp/
```
目录中创建测试数据文件进行测试。完成后清理。
环境变化测试：使用不同的环境变量、功能标志或配置值测试，以验证功能能否正确处理配置——尤其是缺失或无效的配置。

使用浏览器自动化时：

导航到功能页面。点击操作、填写表单、提交内容。
完成完整的端到端用户旅程——不要仅验证单个页面。
审核视觉布局——显示是否正常？是否存在对齐错误、内容截断或缺失？
测试错误状态——提交无效数据、断开连接、触发边缘场景。
如果功能支持响应式，在不同视口尺寸下测试。
测试键盘导航和焦点管理。
当有助于展示结果时，录制多步骤流程的GIF。

使用浏览器检查时（与浏览器自动化配合使用——而非替代）：

控制台监控：在每次UI交互时检查浏览器控制台的错误和警告。显示正常但抛出JS错误的页面仍存在问题。每次主要操作后过滤错误/异常。
网络请求验证：监控UI流程中的网络请求。验证：调用了正确的端点、响应状态码符合预期（无静默4xx/5xx）、请求/响应负载符合功能需求。标记意外的请求或缺失的请求。
页面内断言：在页面中执行JavaScript以验证DOM状态、计算样式、数据属性或屏幕不可见的应用状态。当仅通过视觉检查无法确认正确性时使用此方法（例如：“该元素是否真的通过CSS隐藏，还是只是滚出了屏幕？”）。
渲染文本验证：提取页面文本以验证内容渲染——尤其是动态内容、插值值和条件文本。

使用macOS桌面自动化时：

测试相关的系统级交互——文件对话框、剪贴板、多应用工作流。
截取截图进行视觉验证。

使用Shell / CLI时（始终可用）：

使用
```
curl
```
调用API端点。验证状态码、响应结构、错误响应。
API契约验证：阅读代码库中的类型定义或模式，然后验证真实API响应是否与声明的类型匹配——字段正确、类型正确、无多余或缺失属性。这可以捕获类型与运行时行为之间的偏差。
使用有效和无效输入测试CLI命令。
验证文件输出、日志、进程行为。
使用边界输入测试：空字符串、超长字符串、特殊字符、Unicode字符。
如果相关，测试并发操作：两个请求是否会产生竞争条件？

数据完整性验证（任何变更操作后）：

变更前：记录相关状态（数据库行、文件内容、API响应）。
通过UI或API执行变更操作。
变更后：验证状态是否正确变更——写入了正确的值、相关数据无意外副作用、时间戳/审计字段已更新。
这可以捕获看似成功（200 OK、UI更新）但写入错误值、遗漏字段或损坏相关状态的变更操作。

服务端可观测性（可用时）：变更影响的系统范围比用户可见的部分更广。在执行用户面向的流程后，检查服务端信号以发现浏览器或API响应中不会体现的问题。

应用/服务日志：检查测试流程中的服务端错误、警告或意外行为。在运行浏览器或API测试时实时查看日志。
遥测 / OpenTelemetry：如果系统发出遥测或OTEL追踪数据，在测试流程后检查这些数据。验证：为预期操作发出了追踪数据、Span具有正确的属性、成功场景中无错误Span。
数据库状态：直接查询数据库以验证变更操作写入了正确的值——尤其是当API或UI报告成功但实际持久化可能存在差异时。
后台任务/队列：如果功能触发了异步工作（队列、定时任务、Webhook），验证任务已入队并正确完成。

通用测试方法：

从干净状态开始（无缓存数据、全新会话）。
先测试正常路径——按照规格说明完成端到端流程。
然后尝试破坏它——测试所有你识别出的故障模式。
然后进行压力测试——边界条件、意外输入、并发访问。
然后进行视觉检查——视觉正确性、易用性、“体验是否良好？”

Step 6: Record results

步骤6：记录结果

After each scenario (or batch of related scenarios), update the

## Test plan

section in the PR body using the same read → modify → write mechanism from Step 4. The checklist in the PR body is the single source of truth — do not post results as PR comments.

Result	How to record
Pass	Check the box: `- [x]`
Fail → fixed	Check the box, append: `— Fixed: <what was wrong and how>`
Fail → blocked	Leave unchecked, append: `— BLOCKED: <what went wrong, why unresolvable>`
Skipped (tool limitation)	Leave unchecked, append: `— Skipped: <reason, e.g., no browser automation>`

When you find a bug:

Can it be reproduced with a formal test? If yes — write the test first, then fix the bug, then verify both the test and manual scenario pass.
If it can't be a test — fix it, verify manually, document what was found and fixed in the checklist.

完成每个场景（或相关场景批次）后，使用步骤4中的读取→修改→写入机制更新PR正文中的

## Test plan

部分。PR正文中的检查清单是唯一的事实来源——切勿将结果作为PR评论发布。

结果	记录方式
通过	勾选复选框： `- [x]`
失败→已修复	勾选复选框，追加： `— Fixed: <问题及修复方式>`
失败→阻塞	不勾选复选框，追加： `— BLOCKED: <问题及无法解决的原因>`
跳过（工具限制）	不勾选复选框，追加： `— Skipped: <原因，如无浏览器自动化>`

发现缺陷时：

能否通过自动化测试复现？如果可以——先编写自动化测试，然后修复缺陷，最后验证自动化测试和手动场景均通过。
如果无法自动化——修复缺陷，手动验证，在检查清单中记录发现的问题和修复方式。

Step 7: Report

步骤7：报告

If a PR exists: The

## Test plan

section in the PR body is your primary report. Ensure it's up-to-date with all results (pass/fail/fixed/blocked/skipped). Do not add a separate PR comment — the PR body section is the report.

If no PR exists: Report directly to the user with:

Total scenarios tested vs. passed vs. failed vs. skipped
Bugs found and fixed (with brief description of each)
Gaps — what could NOT be tested due to tool limitations or environment constraints
Judgment call — your honest assessment: is this feature ready for human review?

The skill's job is to fix what it can, document what it found, and hand back a clear picture. Unresolvable issues and gaps are documented, not silently swallowed — but they do not block forward progress. The invoker (user or /ship) decides what to do about remaining items.

如果存在PR：PR正文中的

## Test plan

部分是你的主要报告。确保它更新了所有结果（通过/失败/已修复/阻塞/跳过）。不要添加单独的PR评论——PR正文部分就是报告。

如果不存在PR：直接向用户报告：

测试场景总数、通过数、失败数、跳过数
发现并修复的缺陷（每个缺陷的简要描述）
测试缺口——因工具限制或环境约束无法测试的内容
评估结论——你对该功能是否准备好进行人工评审的诚实判断

本技能的职责是修复可修复的问题、记录发现的内容，并反馈清晰的情况。无法解决的问题和测试缺口会被记录，而非忽略——但它们不会阻碍后续进展。调用者（用户或/ship）会决定如何处理剩余事项。

Calibrating depth to risk

根据风险调整测试深度

Not every feature needs deep QA. Match effort to risk:

What changed	Testing depth
New user-facing feature (UI, API, CLI)	Deep — full journey walkthrough, error states, visual audit, edge cases
Business logic, data mutations, auth/permissions	Deep — verify behavior matches spec, test failure modes thoroughly
Bug fix	Targeted — verify the fix, test the regression path, check for side effects
Glue code, config, pass-through	Light — verify it connects correctly. Don't over-test plumbing.
Performance-sensitive paths	Targeted — benchmark the specific path if tools allow

Over-testing looks like: Manually verifying things already covered by passing unit tests. Clicking through UIs that haven't changed. Testing framework behavior instead of feature behavior.

Under-testing looks like: Declaring confidence from unit tests alone when the feature has user-facing surfaces. Skipping error-path testing. Not testing the interaction between new and existing code. Never opening the UI.

并非每个功能都需要深入QA。测试投入应与风险匹配：

变更内容	测试深度
新的用户面向功能（UI、API、CLI）	深入——完整旅程演练、错误状态、视觉审核、边缘场景
业务逻辑、数据变更、权限/认证	深入——验证行为符合规格说明、全面测试故障模式
缺陷修复	针对性——验证修复效果、测试回归路径、检查副作用
衔接代码、配置、透传逻辑	轻度——验证连接正确。不要过度测试基础组件。
性能敏感路径	针对性——如果工具允许，对特定路径进行基准测试

过度测试的表现：手动验证已通过单元测试覆盖的内容。点击未变更的UI。测试框架行为而非功能行为。

测试不足的表现：当功能存在用户面向界面时，仅通过单元测试就宣称有信心。跳过错误路径测试。不测试新代码与现有代码的交互。从未打开UI。

Anti-patterns

反模式

Treating QA as a checkbox. "I tested it" means nothing without specifics. Every scenario must have a concrete action and expected outcome.
Only testing the happy path. Real users encounter errors, edge cases, and unexpected states. Test those.
Duplicating formal tests. If the test suite already covers it, don't repeat it manually. Your time is for what the test suite can't do.
Skipping tools that are available. If browser automation is available and the feature has a UI — use it. Don't substitute with curl when you can click through the real thing.
Silent gaps. If you can't test something, say so explicitly. An undocumented gap is worse than a documented one.

将QA视为走过场。“我测试过了”没有任何意义，必须有具体细节。每个场景都必须有具体的动作和预期结果。
仅测试正常路径。真实用户会遇到错误、边缘场景和意外状态。测试这些场景。
重复自动化测试。如果测试套件已覆盖该内容，不要手动重复测试。你的时间应用于测试套件无法覆盖的内容。
跳过可用工具。如果浏览器自动化可用且功能有UI——使用它。不要在可以点击真实界面时用curl替代。
隐藏测试缺口。如果无法测试某项内容，请明确说明。未记录的测试缺口比已记录的更糟糕。",