test-data-management

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Test Data Management

测试数据管理

<default_to_action> When creating or managing test data:

NEVER use production PII directly
GENERATE synthetic data with faker libraries
ANONYMIZE production data if used (mask, hash)
ISOLATE test data (transactions, per-test cleanup)
SCALE with batch generation (10k+ records/sec)

Quick Data Strategy:

Unit tests: Minimal data (just enough)
Integration: Realistic data (full complexity)
Performance: Volume data (10k+ records)

Critical Success Factors:

40% of test failures from inadequate data
GDPR fines up to €20M for PII violations
Never store production PII in test environments </default_to_action>

<default_to_action> 创建或管理测试数据时：

绝不要直接使用生产环境的PII
使用faker库生成合成数据
若使用生产数据，必须进行匿名化处理（掩码、哈希）
隔离测试数据（事务处理、单测后清理）
通过批量生成实现规模扩展（每秒1万+条记录）

快速数据策略：

单元测试：极简数据（满足需求即可）
集成测试：真实场景数据（完整复杂度）
性能测试：大规模数据（1万+条记录）

关键成功因素：

40%的测试失败源于测试数据不足
违反PII规定的GDPR罚款最高可达2000万欧元
绝不要在测试环境中存储生产环境的PII </default_to_action>

Quick Reference Card

快速参考卡片

When to Use

适用场景

Creating test datasets
Handling sensitive data
Performance testing with volume
GDPR/CCPA compliance

创建测试数据集
处理敏感数据
带大规模数据的性能测试
GDPR/CCPA合规验证

Data Strategies

数据策略

Type	When	Size
Minimal	Unit tests	1-10 records
Realistic	Integration	100-1000 records
Volume	Performance	10k+ records
Edge cases	Boundary testing	Targeted

类型	适用场景	数据规模
极简型	单元测试	1-10条记录
真实场景型	集成测试	100-1000条记录
大规模型	性能测试	1万+条记录
边缘场景型	边界测试	针对性数据

Privacy Techniques

隐私处理技术

Technique	Use Case
Synthetic	Generate fake data (preferred)
Masking	j***@example.com
Hashing	Irreversible pseudonymization
Tokenization	Reversible with key

技术	适用场景
合成数据	生成虚假数据（优先选择）
掩码处理	j***@example.com
哈希处理	不可逆的假名化
令牌化	可通过密钥还原

Synthetic Data Generation

合成数据生成

javascript

import { faker } from '@faker-js/faker';

// Seed for reproducibility
faker.seed(123);

function generateUser() {
  return {
    id: faker.string.uuid(),
    email: faker.internet.email(),
    firstName: faker.person.firstName(),
    lastName: faker.person.lastName(),
    phone: faker.phone.number(),
    address: {
      street: faker.location.streetAddress(),
      city: faker.location.city(),
      zip: faker.location.zipCode()
    },
    createdAt: faker.date.past()
  };
}

// Generate 1000 users
const users = Array.from({ length: 1000 }, generateUser);

javascript

import { faker } from '@faker-js/faker';

// Seed for reproducibility
faker.seed(123);

function generateUser() {
  return {
    id: faker.string.uuid(),
    email: faker.internet.email(),
    firstName: faker.person.firstName(),
    lastName: faker.person.lastName(),
    phone: faker.phone.number(),
    address: {
      street: faker.location.streetAddress(),
      city: faker.location.city(),
      zip: faker.location.zipCode()
    },
    createdAt: faker.date.past()
  };
}

// Generate 1000 users
const users = Array.from({ length: 1000 }, generateUser);

Test Data Builder Pattern

测试数据构建器模式

typescript

class UserBuilder {
  private user: Partial<User> = {};

  asAdmin() {
    this.user.role = 'admin';
    this.user.permissions = ['read', 'write', 'delete'];
    return this;
  }

  asCustomer() {
    this.user.role = 'customer';
    this.user.permissions = ['read'];
    return this;
  }

  withEmail(email: string) {
    this.user.email = email;
    return this;
  }

  build(): User {
    return {
      id: this.user.id ?? faker.string.uuid(),
      email: this.user.email ?? faker.internet.email(),
      role: this.user.role ?? 'customer',
      ...this.user
    } as User;
  }
}

// Usage
const admin = new UserBuilder().asAdmin().withEmail('admin@test.com').build();
const customer = new UserBuilder().asCustomer().build();

typescript

class UserBuilder {
  private user: Partial<User> = {};

  asAdmin() {
    this.user.role = 'admin';
    this.user.permissions = ['read', 'write', 'delete'];
    return this;
  }

  asCustomer() {
    this.user.role = 'customer';
    this.user.permissions = ['read'];
    return this;
  }

  withEmail(email: string) {
    this.user.email = email;
    return this;
  }

  build(): User {
    return {
      id: this.user.id ?? faker.string.uuid(),
      email: this.user.email ?? faker.internet.email(),
      role: this.user.role ?? 'customer',
      ...this.user
    } as User;
  }
}

// Usage
const admin = new UserBuilder().asAdmin().withEmail('admin@test.com').build();
const customer = new UserBuilder().asCustomer().build();

Data Anonymization

数据匿名化

javascript

// Masking
function maskEmail(email) {
  const [user, domain] = email.split('@');
  return `${user[0]}***@${domain}`;
}
// john@example.com → j***@example.com

function maskCreditCard(cc) {
  return `****-****-****-${cc.slice(-4)}`;
}
// 4242424242424242 → ****-****-****-4242

// Anonymize production data
const anonymizedUsers = prodUsers.map(user => ({
  id: user.id, // Keep ID for relationships
  email: `user-${user.id}@example.com`, // Fake email
  firstName: faker.person.firstName(), // Generated
  phone: null, // Remove PII
  createdAt: user.createdAt // Keep non-PII
}));

javascript

// Masking
function maskEmail(email) {
  const [user, domain] = email.split('@');
  return `${user[0]}***@${domain}`;
}
// john@example.com → j***@example.com

function maskCreditCard(cc) {
  return `****-****-****-${cc.slice(-4)}`;
}
// 4242424242424242 → ****-****-****-4242

// Anonymize production data
const anonymizedUsers = prodUsers.map(user => ({
  id: user.id, // Keep ID for relationships
  email: `user-${user.id}@example.com`, // Fake email
  firstName: faker.person.firstName(), // Generated
  phone: null, // Remove PII
  createdAt: user.createdAt // Keep non-PII
}));

Database Transaction Isolation

数据库事务隔离

javascript

// Best practice: use transactions for cleanup
beforeEach(async () => {
  await db.beginTransaction();
});

afterEach(async () => {
  await db.rollbackTransaction(); // Auto cleanup!
});

test('user registration', async () => {
  const user = await userService.register({
    email: 'test@example.com'
  });
  expect(user.id).toBeDefined();
  // Automatic rollback after test - no cleanup needed
});

javascript

// Best practice: use transactions for cleanup
beforeEach(async () => {
  await db.beginTransaction();
});

afterEach(async () => {
  await db.rollbackTransaction(); // Auto cleanup!
});

test('user registration', async () => {
  const user = await userService.register({
    email: 'test@example.com'
  });
  expect(user.id).toBeDefined();
  // Automatic rollback after test - no cleanup needed
});

Volume Data Generation

大规模数据生成

javascript

// Generate 10,000 users efficiently
async function generateLargeDataset(count = 10000) {
  const batchSize = 1000;
  const batches = Math.ceil(count / batchSize);

  for (let i = 0; i < batches; i++) {
    const users = Array.from({ length: batchSize }, (_, index) => ({
      id: i * batchSize + index,
      email: `user${i * batchSize + index}@example.com`,
      firstName: faker.person.firstName()
    }));

    await db.users.insertMany(users); // Batch insert
    console.log(`Batch ${i + 1}/${batches}`);
  }
}

javascript

// Generate 10,000 users efficiently
async function generateLargeDataset(count = 10000) {
  const batchSize = 1000;
  const batches = Math.ceil(count / batchSize);

  for (let i = 0; i < batches; i++) {
    const users = Array.from({ length: batchSize }, (_, index) => ({
      id: i * batchSize + index,
      email: `user${i * batchSize + index}@example.com`,
      firstName: faker.person.firstName()
    }));

    await db.users.insertMany(users); // Batch insert
    console.log(`Batch ${i + 1}/${batches}`);
  }
}

Agent-Driven Data Generation

Agent驱动的数据生成

typescript

// High-speed generation with constraints
await Task("Generate Test Data", {
  schema: 'ecommerce',
  count: { users: 10000, products: 500, orders: 5000 },
  preserveReferentialIntegrity: true,
  constraints: {
    age: { min: 18, max: 90 },
    roles: ['customer', 'admin']
  }
}, "qe-test-data-architect");

// GDPR-compliant anonymization
await Task("Anonymize Production Data", {
  source: 'production-snapshot',
  piiFields: ['email', 'phone', 'ssn'],
  method: 'pseudonymization',
  retainStructure: true
}, "qe-test-data-architect");

typescript

// High-speed generation with constraints
await Task("Generate Test Data", {
  schema: 'ecommerce',
  count: { users: 10000, products: 500, orders: 5000 },
  preserveReferentialIntegrity: true,
  constraints: {
    age: { min: 18, max: 90 },
    roles: ['customer', 'admin']
  }
}, "qe-test-data-architect");

// GDPR-compliant anonymization
await Task("Anonymize Production Data", {
  source: 'production-snapshot',
  piiFields: ['email', 'phone', 'ssn'],
  method: 'pseudonymization',
  retainStructure: true
}, "qe-test-data-architect");

Agent Coordination Hints

Agent协作提示

Memory Namespace

内存命名空间

aqe/test-data-management/
├── schemas/*            - Data schemas
├── generators/*         - Generator configs
├── anonymization/*      - PII handling rules
└── fixtures/*           - Reusable fixtures

aqe/test-data-management/
├── schemas/*            - Data schemas
├── generators/*         - Generator configs
├── anonymization/*      - PII handling rules
└── fixtures/*           - Reusable fixtures

Fleet Coordination

集群协作

typescript

const dataFleet = await FleetManager.coordinate({
  strategy: 'test-data-generation',
  agents: [
    'qe-test-data-architect',  // Generate data
    'qe-test-executor',        // Execute with data
    'qe-security-scanner'      // Validate no PII exposure
  ],
  topology: 'sequential'
});

typescript

const dataFleet = await FleetManager.coordinate({
  strategy: 'test-data-generation',
  agents: [
    'qe-test-data-architect',  // Generate data
    'qe-test-executor',        // Execute with data
    'qe-security-scanner'      // Validate no PII exposure
  ],
  topology: 'sequential'
});

Related Skills

Remember

注意事项

Test data is infrastructure, not an afterthought. 40% of test failures are caused by inadequate test data. Poor data = poor tests.

Never use production PII directly. GDPR fines up to €20M or 4% of revenue. Always use synthetic data or properly anonymized production snapshots.

With Agents:

qe-test-data-architect

generates 10k+ records/sec with realistic patterns, relationships, and constraints. Agents ensure GDPR/CCPA compliance automatically and eliminate test data bottlenecks.

测试数据是基础设施，而非事后补充项。 40%的测试失败由测试数据不足导致。劣质数据=劣质测试。

绝不要直接使用生产环境的PII。 GDPR罚款最高可达2000万欧元或全球年营业额的4%。请始终使用合成数据或经过恰当匿名化的生产环境快照。

借助Agent：

qe-test-data-architect

test-data-management

Original

Translation

Test Data Management

测试数据管理

Quick Reference Card

快速参考卡片

When to Use

适用场景

Data Strategies

数据策略

Privacy Techniques

隐私处理技术

Synthetic Data Generation

合成数据生成

Test Data Builder Pattern

测试数据构建器模式

Data Anonymization

数据匿名化

Database Transaction Isolation

数据库事务隔离

Volume Data Generation

大规模数据生成

Agent-Driven Data Generation

Agent驱动的数据生成

Agent Coordination Hints

Agent协作提示

Memory Namespace

内存命名空间

Fleet Coordination

集群协作

Related Skills

相关技能

Remember

注意事项