ai-regression-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI Regression Testing
AI回归测试
Testing patterns specifically designed for AI-assisted development, where the same model writes code and reviews it — creating systematic blind spots that only automated tests can catch.
专为AI辅助开发设计的测试模式——在AI既写代码又审核代码的场景下,会产生系统性的盲点,只有自动化测试才能发现这些问题。
When to Activate
适用场景
- AI agent (Claude Code, Cursor, Codex) has modified API routes or backend logic
- A bug was found and fixed — need to prevent re-introduction
- Project has a sandbox/mock mode that can be leveraged for DB-free testing
- Running or similar review commands after code changes
/bug-check - Multiple code paths exist (sandbox vs production, feature flags, etc.)
- AI Agent(Claude Code、Cursor、Codex)修改了API路由或后端逻辑
- 发现并修复了某个Bug,需要防止其再次出现
- 项目具备沙箱/模拟模式,可用于无数据库测试
- 代码变更后运行或类似的审核命令
/bug-check - 存在多条代码路径(沙箱 vs 生产环境、功能标志等)
The Core Problem
核心问题
When an AI writes code and then reviews its own work, it carries the same assumptions into both steps. This creates a predictable failure pattern:
AI writes fix → AI reviews fix → AI says "looks correct" → Bug still existsReal-world example (observed in production):
Fix 1: Added notification_settings to API response
→ Forgot to add it to the SELECT query
→ AI reviewed and missed it (same blind spot)
Fix 2: Added it to SELECT query
→ TypeScript build error (column not in generated types)
→ AI reviewed Fix 1 but didn't catch the SELECT issue
Fix 3: Changed to SELECT *
→ Fixed production path, forgot sandbox path
→ AI reviewed and missed it AGAIN (4th occurrence)
Fix 4: Test caught it instantly on first run ✅The pattern: sandbox/production path inconsistency is the #1 AI-introduced regression.
当AI编写代码后又自行审核时,会在两个步骤中带入相同的假设,从而形成可预测的失败模式:
AI编写修复代码 → AI审核修复代码 → AI判定「代码无误」 → Bug依然存在真实生产案例:
修复1:在API响应中添加notification_settings字段
→ 忘记在SELECT查询中包含该字段
→ AI审核时未发现问题(存在相同盲点)
修复2:在SELECT查询中添加该字段
→ 出现TypeScript构建错误(生成的类型中无该列)
→ AI审核修复1时未发现SELECT查询的问题
修复3:改为SELECT *查询
→ 修复了生产环境路径,但遗漏了沙箱环境路径
→ AI审核时再次遗漏问题(第4次出现同类错误)
修复4:首次运行测试就立即发现了问题 ✅这类模式的核心结论:沙箱/生产环境路径不一致是AI引入的最常见回归缺陷。
Sandbox-Mode API Testing
沙箱模式API测试
Most projects with AI-friendly architecture have a sandbox/mock mode. This is the key to fast, DB-free API testing.
大多数具备AI友好架构的项目都有沙箱/模拟模式,这是实现快速、无数据库API测试的关键。
Setup (Vitest + Next.js App Router)
环境搭建(Vitest + Next.js App Router)
typescript
// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";
export default defineConfig({
test: {
environment: "node",
globals: true,
include: ["__tests__/**/*.test.ts"],
setupFiles: ["__tests__/setup.ts"],
},
resolve: {
alias: {
"@": path.resolve(__dirname, "."),
},
},
});typescript
// __tests__/setup.ts
// Force sandbox mode — no database needed
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";typescript
// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";
export default defineConfig({
test: {
environment: "node",
globals: true,
include: ["__tests__/**/*.test.ts"],
setupFiles: ["__tests__/setup.ts"],
},
resolve: {
alias: {
"@": path.resolve(__dirname, "."),
},
},
});typescript
// __tests__/setup.ts
// 强制启用沙箱模式 — 无需数据库
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";Test Helper for Next.js API Routes
Next.js API路由测试工具
typescript
// __tests__/helpers.ts
import { NextRequest } from "next/server";
export function createTestRequest(
url: string,
options?: {
method?: string;
body?: Record<string, unknown>;
headers?: Record<string, string>;
sandboxUserId?: string;
},
): NextRequest {
const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
const reqHeaders: Record<string, string> = { ...headers };
if (sandboxUserId) {
reqHeaders["x-sandbox-user-id"] = sandboxUserId;
}
const init: { method: string; headers: Record<string, string>; body?: string } = {
method,
headers: reqHeaders,
};
if (body) {
init.body = JSON.stringify(body);
reqHeaders["content-type"] = "application/json";
}
return new NextRequest(fullUrl, init);
}
export async function parseResponse(response: Response) {
const json = await response.json();
return { status: response.status, json };
}typescript
// __tests__/helpers.ts
import { NextRequest } from "next/server";
export function createTestRequest(
url: string,
options?: {
method?: string;
body?: Record<string, unknown>;
headers?: Record<string, string>;
sandboxUserId?: string;
},
): NextRequest {
const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
const reqHeaders: Record<string, string> = { ...headers };
if (sandboxUserId) {
reqHeaders["x-sandbox-user-id"] = sandboxUserId;
}
const init: { method: string; headers: Record<string, string>; body?: string } = {
method,
headers: reqHeaders,
};
if (body) {
init.body = JSON.stringify(body);
reqHeaders["content-type"] = "application/json";
}
return new NextRequest(fullUrl, init);
}
export async function parseResponse(response: Response) {
const json = await response.json();
return { status: response.status, json };
}Writing Regression Tests
编写回归测试
The key principle: write tests for bugs that were found, not for code that works.
typescript
// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";
// Define the contract — what fields MUST be in the response
const REQUIRED_FIELDS = [
"id",
"email",
"full_name",
"phone",
"role",
"created_at",
"avatar_url",
"notification_settings", // ← Added after bug found it missing
];
describe("GET /api/user/profile", () => {
it("returns all required fields", async () => {
const req = createTestRequest("/api/user/profile");
const res = await GET(req);
const { status, json } = await parseResponse(res);
expect(status).toBe(200);
for (const field of REQUIRED_FIELDS) {
expect(json.data).toHaveProperty(field);
}
});
// Regression test — this exact bug was introduced by AI 4 times
it("notification_settings is not undefined (BUG-R1 regression)", async () => {
const req = createTestRequest("/api/user/profile");
const res = await GET(req);
const { json } = await parseResponse(res);
expect("notification_settings" in json.data).toBe(true);
const ns = json.data.notification_settings;
expect(ns === null || typeof ns === "object").toBe(true);
});
});核心原则:针对已发现的Bug编写测试,而非针对正常工作的代码。
typescript
// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";
// 定义契约 — 响应中必须包含的字段
const REQUIRED_FIELDS = [
"id",
"email",
"full_name",
"phone",
"role",
"created_at",
"avatar_url",
"notification_settings", // ← 发现Bug后新增的字段
];
describe("GET /api/user/profile", () => {
it("返回所有必填字段", async () => {
const req = createTestRequest("/api/user/profile");
const res = await GET(req);
const { status, json } = await parseResponse(res);
expect(status).toBe(200);
for (const field of REQUIRED_FIELDS) {
expect(json.data).toHaveProperty(field);
}
});
// 回归测试 — 该Bug被AI引入了4次
it("notification_settings字段不为undefined(BUG-R1回归测试)", async () => {
const req = createTestRequest("/api/user/profile");
const res = await GET(req);
const { json } = await parseResponse(res);
expect("notification_settings" in json.data).toBe(true);
const ns = json.data.notification_settings;
expect(ns === null || typeof ns === "object").toBe(true);
});
});Testing Sandbox/Production Parity
测试沙箱/生产环境一致性
The most common AI regression: fixing production path but forgetting sandbox path (or vice versa).
typescript
// Test that sandbox responses match the expected contract
describe("GET /api/user/messages (conversation list)", () => {
it("includes partner_name in sandbox mode", async () => {
const req = createTestRequest("/api/user/messages", {
sandboxUserId: "user-001",
});
const res = await GET(req);
const { json } = await parseResponse(res);
// This caught a bug where partner_name was added
// to production path but not sandbox path
if (json.data.length > 0) {
for (const conv of json.data) {
expect("partner_name" in conv).toBe(true);
}
}
});
});AI最常引入的回归缺陷:修复了生产环境路径,但遗漏了沙箱环境路径(反之亦然)。
typescript
// 测试沙箱环境响应是否符合预期契约
describe("GET /api/user/messages(对话列表)", () => {
it("沙箱模式下返回partner_name字段", async () => {
const req = createTestRequest("/api/user/messages", {
sandboxUserId: "user-001",
});
const res = await GET(req);
const { json } = await parseResponse(res);
// 该测试发现了一个Bug:partner_name字段被添加到生产环境路径,但未添加到沙箱环境路径
if (json.data.length > 0) {
for (const conv of json.data) {
expect("partner_name" in conv).toBe(true);
}
}
});
});Integrating Tests into Bug-Check Workflow
将测试集成到Bug检查工作流
Custom Command Definition
自定义命令定义
markdown
<!-- .claude/commands/bug-check.md -->markdown
<!-- .claude/commands/bug-check.md -->Bug Check
Bug检查
Step 1: Automated Tests (mandatory, cannot skip)
步骤1:自动化测试(必填,不可跳过)
Run these commands FIRST before any code review:
npm run test # Vitest test suite
npm run build # TypeScript type check + build- If tests fail → report as highest priority bug
- If build fails → report type errors as highest priority
- Only proceed to Step 2 if both pass
在进行任何代码审核前,先运行以下命令:
npm run test # 执行Vitest测试套件
npm run build # TypeScript类型检查 + 构建- 若测试失败 → 标记为最高优先级Bug
- 若构建失败 → 标记类型错误为最高优先级
- 仅当两者都通过时,才能进入步骤2
Step 2: Code Review (AI review)
步骤2:代码审核(AI审核)
- Sandbox / production path consistency
- API response shape matches frontend expectations
- SELECT clause completeness
- Error handling with rollback
- Optimistic update race conditions
- 沙箱/生产环境路径一致性检查
- API响应结构是否符合前端预期
- SELECT语句完整性检查
- 带回滚的错误处理检查
- 乐观更新的竞态条件检查
Step 3: For each bug fixed, propose a regression test
步骤3:针对每个修复的Bug,编写回归测试
undefinedundefinedThe Workflow
工作流流程
User: "バグチェックして" (or "/bug-check")
│
├─ Step 1: npm run test
│ ├─ FAIL → Bug found mechanically (no AI judgment needed)
│ └─ PASS → Continue
│
├─ Step 2: npm run build
│ ├─ FAIL → Type error found mechanically
│ └─ PASS → Continue
│
├─ Step 3: AI code review (with known blind spots in mind)
│ └─ Findings reported
│
└─ Step 4: For each fix, write a regression test
└─ Next bug-check catches if fix breaks用户:「进行Bug检查」(或输入"/bug-check")
│
├─ 步骤1:执行npm run test
│ ├─ 失败 → 自动发现Bug(无需AI判断)
│ └─ 通过 → 继续
│
├─ 步骤2:执行npm run build
│ ├─ 失败 → 自动发现类型错误
│ └─ 通过 → 继续
│
├─ 步骤3:AI代码审核(重点关注已知盲点)
│ └─ 输出审核结果
│
└─ 步骤4:针对每个修复的Bug,编写回归测试
└─ 下次Bug检查时可发现修复是否引入新问题Common AI Regression Patterns
常见AI回归缺陷模式
Pattern 1: Sandbox/Production Path Mismatch
模式1:沙箱/生产环境路径不匹配
Frequency: Most common (observed in 3 out of 4 regressions)
typescript
// ❌ AI adds field to production path only
if (isSandboxMode()) {
return { data: { id, email, name } }; // Missing new field
}
// Production path
return { data: { id, email, name, notification_settings } };
// ✅ Both paths must return the same shape
if (isSandboxMode()) {
return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };Test to catch it:
typescript
it("sandbox and production return same fields", async () => {
// In test env, sandbox mode is forced ON
const res = await GET(createTestRequest("/api/user/profile"));
const { json } = await parseResponse(res);
for (const field of REQUIRED_FIELDS) {
expect(json.data).toHaveProperty(field);
}
});出现频率:最常见(4次回归缺陷中占3次)
typescript
// ❌ AI仅在生产环境路径中添加字段
if (isSandboxMode()) {
return { data: { id, email, name } }; // 缺少新增字段
}
// 生产环境路径
return { data: { id, email, name, notification_settings } };
// ✅ 两种路径必须返回相同结构
if (isSandboxMode()) {
return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };测试方案:
typescript
it("沙箱与生产环境返回相同字段", async () => {
// 测试环境中强制启用沙箱模式
const res = await GET(createTestRequest("/api/user/profile"));
const { json } = await parseResponse(res);
for (const field of REQUIRED_FIELDS) {
expect(json.data).toHaveProperty(field);
}
});Pattern 2: SELECT Clause Omission
模式2:SELECT语句遗漏字段
Frequency: Common with Supabase/Prisma when adding new columns
typescript
// ❌ New column added to response but not to SELECT
const { data } = await supabase
.from("users")
.select("id, email, name") // notification_settings not here
.single();
return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings is always undefined
// ✅ Use SELECT * or explicitly include new columns
const { data } = await supabase
.from("users")
.select("*")
.single();出现频率:在使用Supabase/Prisma添加新列时常见
typescript
// ❌ 响应中添加了新字段,但未在SELECT语句中包含
const { data } = await supabase
.from("users")
.select("id, email, name") // 未包含notification_settings
.single();
return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings始终为undefined
// ✅ 使用SELECT *或显式包含新列
const { data } = await supabase
.from("users")
.select("*")
.single();Pattern 3: Error State Leakage
模式3:错误状态泄露
Frequency: Moderate — when adding error handling to existing components
typescript
// ❌ Error state set but old data not cleared
catch (err) {
setError("Failed to load");
// reservations still shows data from previous tab!
}
// ✅ Clear related state on error
catch (err) {
setReservations([]); // Clear stale data
setError("Failed to load");
}出现频率:中等 — 在现有组件中添加错误处理时容易出现
typescript
// ❌ 设置了错误状态,但未清除旧数据
catch (err) {
setError("加载失败");
// reservations仍显示上一个标签页的数据!
}
// ✅ 出错时清除相关状态
catch (err) {
setReservations([]); // 清除过期数据
setError("加载失败");
}Pattern 4: Optimistic Update Without Proper Rollback
模式4:乐观更新未正确回滚
typescript
// ❌ No rollback on failure
const handleRemove = async (id: string) => {
setItems(prev => prev.filter(i => i.id !== id));
await fetch(`/api/items/${id}`, { method: "DELETE" });
// If API fails, item is gone from UI but still in DB
};
// ✅ Capture previous state and rollback on failure
const handleRemove = async (id: string) => {
const prevItems = [...items];
setItems(prev => prev.filter(i => i.id !== id));
try {
const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
if (!res.ok) throw new Error("API error");
} catch {
setItems(prevItems); // Rollback
alert("削除に失敗しました");
}
};typescript
// ❌ 失败时未回滚状态
const handleRemove = async (id: string) => {
setItems(prev => prev.filter(i => i.id !== id));
await fetch(`/api/items/${id}`, { method: "DELETE" });
// 若API失败,UI中已移除该条目,但数据库中仍存在
};
// ✅ 保存之前的状态,失败时回滚
const handleRemove = async (id: string) => {
const prevItems = [...items];
setItems(prev => prev.filter(i => i.id !== id));
try {
const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
if (!res.ok) throw new Error("API错误");
} catch {
setItems(prevItems); // 回滚状态
alert("删除失败");
}
};Strategy: Test Where Bugs Were Found
策略:针对Bug出现的位置编写测试
Don't aim for 100% coverage. Instead:
Bug found in /api/user/profile → Write test for profile API
Bug found in /api/user/messages → Write test for messages API
Bug found in /api/user/favorites → Write test for favorites API
No bug in /api/user/notifications → Don't write test (yet)Why this works with AI development:
- AI tends to make the same category of mistake repeatedly
- Bugs cluster in complex areas (auth, multi-path logic, state management)
- Once tested, that exact regression cannot happen again
- Test count grows organically with bug fixes — no wasted effort
不要追求100%覆盖率,而是:
在/api/user/profile中发现Bug → 为个人资料API编写测试
在/api/user/messages中发现Bug → 为消息API编写测试
在/api/user/favorites中发现Bug → 为收藏API编写测试
在/api/user/notifications中未发现Bug → 暂不编写测试为何该策略适用于AI开发:
- AI倾向于重复犯同一类错误
- Bug集中出现在复杂区域(认证、多路径逻辑、状态管理)
- 一旦编写测试,就能彻底防止该类回归缺陷再次出现
- 测试数量随Bug修复自然增长 — 无无效工作量
Quick Reference
快速参考
| AI Regression Pattern | Test Strategy | Priority |
|---|---|---|
| Sandbox/production mismatch | Assert same response shape in sandbox mode | 🔴 High |
| SELECT clause omission | Assert all required fields in response | 🔴 High |
| Error state leakage | Assert state cleanup on error | 🟡 Medium |
| Missing rollback | Assert state restored on API failure | 🟡 Medium |
| Type cast masking null | Assert field is not undefined | 🟡 Medium |
| AI回归缺陷模式 | 测试策略 | 优先级 |
|---|---|---|
| 沙箱/生产环境路径不匹配 | 断言沙箱模式下响应结构一致 | 🔴 高 |
| SELECT语句遗漏字段 | 断言响应包含所有必填字段 | 🔴 高 |
| 错误状态泄露 | 断言出错时状态已清除 | 🟡 中 |
| 缺失回滚逻辑 | 断言API失败时状态已恢复 | 🟡 中 |
| 类型转换掩盖null值 | 断言字段不为undefined | 🟡 中 |
DO / DON'T
注意事项
DO:
- Write tests immediately after finding a bug (before fixing it if possible)
- Test the API response shape, not the implementation
- Run tests as the first step of every bug-check
- Keep tests fast (< 1 second total with sandbox mode)
- Name tests after the bug they prevent (e.g., "BUG-R1 regression")
DON'T:
- Write tests for code that has never had a bug
- Trust AI self-review as a substitute for automated tests
- Skip sandbox path testing because "it's just mock data"
- Write integration tests when unit tests suffice
- Aim for coverage percentage — aim for regression prevention
建议:
- 发现Bug后立即编写测试(若可能,在修复前编写)
- 测试API响应结构,而非具体实现
- 将测试作为每次Bug检查的第一步
- 保持测试快速(沙箱模式下总耗时<1秒)
- 以测试预防的Bug命名测试用例(如「BUG-R1回归测试」)
不建议:
- 为从未出现过Bug的代码编写测试
- 用AI自审替代自动化测试
- 因「只是模拟数据」而跳过沙箱路径测试
- 能用单元测试时却编写集成测试
- 追求覆盖率百分比 — 应追求回归缺陷预防