ai-regression-testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AI Regression Testing

AI回归测试

Testing patterns specifically designed for AI-assisted development, where the same model writes code and reviews it — creating systematic blind spots that only automated tests can catch.

专为AI辅助开发设计的测试模式——在AI既写代码又审核代码的场景下，会产生系统性的盲点，只有自动化测试才能发现这些问题。

When to Activate

适用场景

AI agent (Claude Code, Cursor, Codex) has modified API routes or backend logic
A bug was found and fixed — need to prevent re-introduction
Project has a sandbox/mock mode that can be leveraged for DB-free testing
Running
```
/bug-check
```
or similar review commands after code changes
Multiple code paths exist (sandbox vs production, feature flags, etc.)

AI Agent（Claude Code、Cursor、Codex）修改了API路由或后端逻辑
发现并修复了某个Bug，需要防止其再次出现
项目具备沙箱/模拟模式，可用于无数据库测试
代码变更后运行
```
/bug-check
```
或类似的审核命令
存在多条代码路径（沙箱 vs 生产环境、功能标志等）

The Core Problem

核心问题

When an AI writes code and then reviews its own work, it carries the same assumptions into both steps. This creates a predictable failure pattern:

AI writes fix → AI reviews fix → AI says "looks correct" → Bug still exists

Real-world example (observed in production):

Fix 1: Added notification_settings to API response
  → Forgot to add it to the SELECT query
  → AI reviewed and missed it (same blind spot)

Fix 2: Added it to SELECT query
  → TypeScript build error (column not in generated types)
  → AI reviewed Fix 1 but didn't catch the SELECT issue

Fix 3: Changed to SELECT *
  → Fixed production path, forgot sandbox path
  → AI reviewed and missed it AGAIN (4th occurrence)

Fix 4: Test caught it instantly on first run ✅

The pattern: sandbox/production path inconsistency is the #1 AI-introduced regression.

当AI编写代码后又自行审核时，会在两个步骤中带入相同的假设，从而形成可预测的失败模式：

AI编写修复代码 → AI审核修复代码 → AI判定「代码无误」 → Bug依然存在

真实生产案例：

修复1：在API响应中添加notification_settings字段
  → 忘记在SELECT查询中包含该字段
  → AI审核时未发现问题（存在相同盲点）

修复2：在SELECT查询中添加该字段
  → 出现TypeScript构建错误（生成的类型中无该列）
  → AI审核修复1时未发现SELECT查询的问题

修复3：改为SELECT *查询
  → 修复了生产环境路径，但遗漏了沙箱环境路径
  → AI审核时再次遗漏问题（第4次出现同类错误）

修复4：首次运行测试就立即发现了问题 ✅

这类模式的核心结论：沙箱/生产环境路径不一致是AI引入的最常见回归缺陷。

Sandbox-Mode API Testing

沙箱模式API测试

Most projects with AI-friendly architecture have a sandbox/mock mode. This is the key to fast, DB-free API testing.

大多数具备AI友好架构的项目都有沙箱/模拟模式，这是实现快速、无数据库API测试的关键。

Setup (Vitest + Next.js App Router)

环境搭建（Vitest + Next.js App Router）

typescript

// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    environment: "node",
    globals: true,
    include: ["__tests__/**/*.test.ts"],
    setupFiles: ["__tests__/setup.ts"],
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "."),
    },
  },
});

typescript

// __tests__/setup.ts
// Force sandbox mode — no database needed
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";

typescript

// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    environment: "node",
    globals: true,
    include: ["__tests__/**/*.test.ts"],
    setupFiles: ["__tests__/setup.ts"],
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "."),
    },
  },
});

typescript

// __tests__/setup.ts
// 强制启用沙箱模式 — 无需数据库
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";

Test Helper for Next.js API Routes

Next.js API路由测试工具

typescript

// __tests__/helpers.ts
import { NextRequest } from "next/server";

export function createTestRequest(
  url: string,
  options?: {
    method?: string;
    body?: Record<string, unknown>;
    headers?: Record<string, string>;
    sandboxUserId?: string;
  },
): NextRequest {
  const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
  const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
  const reqHeaders: Record<string, string> = { ...headers };

  if (sandboxUserId) {
    reqHeaders["x-sandbox-user-id"] = sandboxUserId;
  }

  const init: { method: string; headers: Record<string, string>; body?: string } = {
    method,
    headers: reqHeaders,
  };

  if (body) {
    init.body = JSON.stringify(body);
    reqHeaders["content-type"] = "application/json";
  }

  return new NextRequest(fullUrl, init);
}

export async function parseResponse(response: Response) {
  const json = await response.json();
  return { status: response.status, json };
}

typescript

// __tests__/helpers.ts
import { NextRequest } from "next/server";

export function createTestRequest(
  url: string,
  options?: {
    method?: string;
    body?: Record<string, unknown>;
    headers?: Record<string, string>;
    sandboxUserId?: string;
  },
): NextRequest {
  const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
  const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
  const reqHeaders: Record<string, string> = { ...headers };

  if (sandboxUserId) {
    reqHeaders["x-sandbox-user-id"] = sandboxUserId;
  }

  const init: { method: string; headers: Record<string, string>; body?: string } = {
    method,
    headers: reqHeaders,
  };

  if (body) {
    init.body = JSON.stringify(body);
    reqHeaders["content-type"] = "application/json";
  }

  return new NextRequest(fullUrl, init);
}

export async function parseResponse(response: Response) {
  const json = await response.json();
  return { status: response.status, json };
}

Writing Regression Tests

编写回归测试

The key principle: write tests for bugs that were found, not for code that works.

typescript

// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";

// Define the contract — what fields MUST be in the response
const REQUIRED_FIELDS = [
  "id",
  "email",
  "full_name",
  "phone",
  "role",
  "created_at",
  "avatar_url",
  "notification_settings",  // ← Added after bug found it missing
];

describe("GET /api/user/profile", () => {
  it("returns all required fields", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { status, json } = await parseResponse(res);

    expect(status).toBe(200);
    for (const field of REQUIRED_FIELDS) {
      expect(json.data).toHaveProperty(field);
    }
  });

  // Regression test — this exact bug was introduced by AI 4 times
  it("notification_settings is not undefined (BUG-R1 regression)", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { json } = await parseResponse(res);

    expect("notification_settings" in json.data).toBe(true);
    const ns = json.data.notification_settings;
    expect(ns === null || typeof ns === "object").toBe(true);
  });
});

核心原则：针对已发现的Bug编写测试，而非针对正常工作的代码。

typescript

// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";

// 定义契约 — 响应中必须包含的字段
const REQUIRED_FIELDS = [
  "id",
  "email",
  "full_name",
  "phone",
  "role",
  "created_at",
  "avatar_url",
  "notification_settings",  // ← 发现Bug后新增的字段
];

describe("GET /api/user/profile", () => {
  it("返回所有必填字段", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { status, json } = await parseResponse(res);

    expect(status).toBe(200);
    for (const field of REQUIRED_FIELDS) {
      expect(json.data).toHaveProperty(field);
    }
  });

  // 回归测试 — 该Bug被AI引入了4次
  it("notification_settings字段不为undefined（BUG-R1回归测试）", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { json } = await parseResponse(res);

    expect("notification_settings" in json.data).toBe(true);
    const ns = json.data.notification_settings;
    expect(ns === null || typeof ns === "object").toBe(true);
  });
});

Testing Sandbox/Production Parity

测试沙箱/生产环境一致性

The most common AI regression: fixing production path but forgetting sandbox path (or vice versa).

typescript

// Test that sandbox responses match the expected contract
describe("GET /api/user/messages (conversation list)", () => {
  it("includes partner_name in sandbox mode", async () => {
    const req = createTestRequest("/api/user/messages", {
      sandboxUserId: "user-001",
    });
    const res = await GET(req);
    const { json } = await parseResponse(res);

    // This caught a bug where partner_name was added
    // to production path but not sandbox path
    if (json.data.length > 0) {
      for (const conv of json.data) {
        expect("partner_name" in conv).toBe(true);
      }
    }
  });
});

AI最常引入的回归缺陷：修复了生产环境路径，但遗漏了沙箱环境路径（反之亦然）。

typescript

// 测试沙箱环境响应是否符合预期契约
describe("GET /api/user/messages（对话列表）", () => {
  it("沙箱模式下返回partner_name字段", async () => {
    const req = createTestRequest("/api/user/messages", {
      sandboxUserId: "user-001",
    });
    const res = await GET(req);
    const { json } = await parseResponse(res);

    // 该测试发现了一个Bug：partner_name字段被添加到生产环境路径，但未添加到沙箱环境路径
    if (json.data.length > 0) {
      for (const conv of json.data) {
        expect("partner_name" in conv).toBe(true);
      }
    }
  });
});

Integrating Tests into Bug-Check Workflow

将测试集成到Bug检查工作流

Custom Command Definition

自定义命令定义

markdown

<!-- .claude/commands/bug-check.md -->

markdown

<!-- .claude/commands/bug-check.md -->

Bug Check

Bug检查

Step 1: Automated Tests (mandatory, cannot skip)

步骤1：自动化测试（必填，不可跳过）

Run these commands FIRST before any code review:

npm run test       # Vitest test suite
npm run build      # TypeScript type check + build

If tests fail → report as highest priority bug
If build fails → report type errors as highest priority
Only proceed to Step 2 if both pass

在进行任何代码审核前，先运行以下命令：

npm run test       # 执行Vitest测试套件
npm run build      # TypeScript类型检查 + 构建

若测试失败 → 标记为最高优先级Bug
若构建失败 → 标记类型错误为最高优先级
仅当两者都通过时，才能进入步骤2

Step 2: Code Review (AI review)

步骤2：代码审核（AI审核）

Sandbox / production path consistency
API response shape matches frontend expectations
SELECT clause completeness
Error handling with rollback
Optimistic update race conditions

沙箱/生产环境路径一致性检查
API响应结构是否符合前端预期
SELECT语句完整性检查
带回滚的错误处理检查
乐观更新的竞态条件检查

Step 3: For each bug fixed, propose a regression test

步骤3：针对每个修复的Bug，编写回归测试

undefined

undefined

The Workflow

工作流流程

User: "バグチェックして" (or "/bug-check")
  │
  ├─ Step 1: npm run test
  │   ├─ FAIL → Bug found mechanically (no AI judgment needed)
  │   └─ PASS → Continue
  │
  ├─ Step 2: npm run build
  │   ├─ FAIL → Type error found mechanically
  │   └─ PASS → Continue
  │
  ├─ Step 3: AI code review (with known blind spots in mind)
  │   └─ Findings reported
  │
  └─ Step 4: For each fix, write a regression test
      └─ Next bug-check catches if fix breaks

用户：「进行Bug检查」（或输入"/bug-check"）
  │
  ├─ 步骤1：执行npm run test
  │   ├─ 失败 → 自动发现Bug（无需AI判断）
  │   └─ 通过 → 继续
  │
  ├─ 步骤2：执行npm run build
  │   ├─ 失败 → 自动发现类型错误
  │   └─ 通过 → 继续
  │
  ├─ 步骤3：AI代码审核（重点关注已知盲点）
  │   └─ 输出审核结果
  │
  └─ 步骤4：针对每个修复的Bug，编写回归测试
      └─ 下次Bug检查时可发现修复是否引入新问题

Common AI Regression Patterns

常见AI回归缺陷模式

Pattern 1: Sandbox/Production Path Mismatch

模式1：沙箱/生产环境路径不匹配

Frequency: Most common (observed in 3 out of 4 regressions)

typescript

// ❌ AI adds field to production path only
if (isSandboxMode()) {
  return { data: { id, email, name } };  // Missing new field
}
// Production path
return { data: { id, email, name, notification_settings } };

// ✅ Both paths must return the same shape
if (isSandboxMode()) {
  return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };

Test to catch it:

typescript

it("sandbox and production return same fields", async () => {
  // In test env, sandbox mode is forced ON
  const res = await GET(createTestRequest("/api/user/profile"));
  const { json } = await parseResponse(res);

  for (const field of REQUIRED_FIELDS) {
    expect(json.data).toHaveProperty(field);
  }
});

出现频率：最常见（4次回归缺陷中占3次）

typescript

// ❌ AI仅在生产环境路径中添加字段
if (isSandboxMode()) {
  return { data: { id, email, name } };  // 缺少新增字段
}
// 生产环境路径
return { data: { id, email, name, notification_settings } };

// ✅ 两种路径必须返回相同结构
if (isSandboxMode()) {
  return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };

测试方案：

typescript

it("沙箱与生产环境返回相同字段", async () => {
  // 测试环境中强制启用沙箱模式
  const res = await GET(createTestRequest("/api/user/profile"));
  const { json } = await parseResponse(res);

  for (const field of REQUIRED_FIELDS) {
    expect(json.data).toHaveProperty(field);
  }
});

Pattern 2: SELECT Clause Omission

模式2：SELECT语句遗漏字段

Frequency: Common with Supabase/Prisma when adding new columns

typescript

// ❌ New column added to response but not to SELECT
const { data } = await supabase
  .from("users")
  .select("id, email, name")  // notification_settings not here
  .single();

return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings is always undefined

// ✅ Use SELECT * or explicitly include new columns
const { data } = await supabase
  .from("users")
  .select("*")
  .single();

出现频率：在使用Supabase/Prisma添加新列时常见

typescript

// ❌ 响应中添加了新字段，但未在SELECT语句中包含
const { data } = await supabase
  .from("users")
  .select("id, email, name")  // 未包含notification_settings
  .single();

return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings始终为undefined

// ✅ 使用SELECT *或显式包含新列
const { data } = await supabase
  .from("users")
  .select("*")
  .single();

Pattern 3: Error State Leakage

模式3：错误状态泄露

Frequency: Moderate — when adding error handling to existing components

typescript

// ❌ Error state set but old data not cleared
catch (err) {
  setError("Failed to load");
  // reservations still shows data from previous tab!
}

// ✅ Clear related state on error
catch (err) {
  setReservations([]);  // Clear stale data
  setError("Failed to load");
}

出现频率：中等 — 在现有组件中添加错误处理时容易出现

typescript

// ❌ 设置了错误状态，但未清除旧数据
catch (err) {
  setError("加载失败");
  // reservations仍显示上一个标签页的数据！
}

// ✅ 出错时清除相关状态
catch (err) {
  setReservations([]);  // 清除过期数据
  setError("加载失败");
}

Pattern 4: Optimistic Update Without Proper Rollback

模式4：乐观更新未正确回滚

typescript

// ❌ No rollback on failure
const handleRemove = async (id: string) => {
  setItems(prev => prev.filter(i => i.id !== id));
  await fetch(`/api/items/${id}`, { method: "DELETE" });
  // If API fails, item is gone from UI but still in DB
};

// ✅ Capture previous state and rollback on failure
const handleRemove = async (id: string) => {
  const prevItems = [...items];
  setItems(prev => prev.filter(i => i.id !== id));
  try {
    const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
    if (!res.ok) throw new Error("API error");
  } catch {
    setItems(prevItems);  // Rollback
    alert("削除に失敗しました");
  }
};

typescript

// ❌ 失败时未回滚状态
const handleRemove = async (id: string) => {
  setItems(prev => prev.filter(i => i.id !== id));
  await fetch(`/api/items/${id}`, { method: "DELETE" });
  // 若API失败，UI中已移除该条目，但数据库中仍存在
};

// ✅ 保存之前的状态，失败时回滚
const handleRemove = async (id: string) => {
  const prevItems = [...items];
  setItems(prev => prev.filter(i => i.id !== id));
  try {
    const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
    if (!res.ok) throw new Error("API错误");
  } catch {
    setItems(prevItems);  // 回滚状态
    alert("删除失败");
  }
};

Strategy: Test Where Bugs Were Found

策略：针对Bug出现的位置编写测试

Don't aim for 100% coverage. Instead:

Bug found in /api/user/profile     → Write test for profile API
Bug found in /api/user/messages    → Write test for messages API
Bug found in /api/user/favorites   → Write test for favorites API
No bug in /api/user/notifications  → Don't write test (yet)

Why this works with AI development:

AI tends to make the same category of mistake repeatedly
Bugs cluster in complex areas (auth, multi-path logic, state management)
Once tested, that exact regression cannot happen again
Test count grows organically with bug fixes — no wasted effort

不要追求100%覆盖率，而是：

在/api/user/profile中发现Bug → 为个人资料API编写测试
在/api/user/messages中发现Bug → 为消息API编写测试
在/api/user/favorites中发现Bug → 为收藏API编写测试
在/api/user/notifications中未发现Bug → 暂不编写测试

为何该策略适用于AI开发：

AI倾向于重复犯同一类错误
Bug集中出现在复杂区域（认证、多路径逻辑、状态管理）
一旦编写测试，就能彻底防止该类回归缺陷再次出现
测试数量随Bug修复自然增长 — 无无效工作量

Quick Reference

快速参考

AI Regression Pattern	Test Strategy	Priority
Sandbox/production mismatch	Assert same response shape in sandbox mode	🔴 High
SELECT clause omission	Assert all required fields in response	🔴 High
Error state leakage	Assert state cleanup on error	🟡 Medium
Missing rollback	Assert state restored on API failure	🟡 Medium
Type cast masking null	Assert field is not undefined	🟡 Medium

AI回归缺陷模式	测试策略	优先级
沙箱/生产环境路径不匹配	断言沙箱模式下响应结构一致	🔴 高
SELECT语句遗漏字段	断言响应包含所有必填字段	🔴 高
错误状态泄露	断言出错时状态已清除	🟡 中
缺失回滚逻辑	断言API失败时状态已恢复	🟡 中
类型转换掩盖null值	断言字段不为undefined	🟡 中

DO / DON'T

注意事项

DO:

Write tests immediately after finding a bug (before fixing it if possible)
Test the API response shape, not the implementation
Run tests as the first step of every bug-check
Keep tests fast (< 1 second total with sandbox mode)
Name tests after the bug they prevent (e.g., "BUG-R1 regression")

DON'T:

Write tests for code that has never had a bug
Trust AI self-review as a substitute for automated tests
Skip sandbox path testing because "it's just mock data"
Write integration tests when unit tests suffice
Aim for coverage percentage — aim for regression prevention

建议：

发现Bug后立即编写测试（若可能，在修复前编写）
测试API响应结构，而非具体实现
将测试作为每次Bug检查的第一步
保持测试快速（沙箱模式下总耗时<1秒）
以测试预防的Bug命名测试用例（如「BUG-R1回归测试」）

不建议：

为从未出现过Bug的代码编写测试
用AI自审替代自动化测试
因「只是模拟数据」而跳过沙箱路径测试
能用单元测试时却编写集成测试
追求覆盖率百分比 — 应追求回归缺陷预防