test-data-management

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Test Data Management

测试数据管理

<default_to_action> When creating or managing test data:
  1. NEVER use production PII directly
  2. GENERATE synthetic data with faker libraries
  3. ANONYMIZE production data if used (mask, hash)
  4. ISOLATE test data (transactions, per-test cleanup)
  5. SCALE with batch generation (10k+ records/sec)
Quick Data Strategy:
  • Unit tests: Minimal data (just enough)
  • Integration: Realistic data (full complexity)
  • Performance: Volume data (10k+ records)
Critical Success Factors:
  • 40% of test failures from inadequate data
  • GDPR fines up to €20M for PII violations
  • Never store production PII in test environments </default_to_action>
<default_to_action> 创建或管理测试数据时:
  1. 绝不要直接使用生产环境的PII
  2. 使用faker库生成合成数据
  3. 若使用生产数据,必须进行匿名化处理(掩码、哈希)
  4. 隔离测试数据(事务处理、单测后清理)
  5. 通过批量生成实现规模扩展(每秒1万+条记录)
快速数据策略:
  • 单元测试:极简数据(满足需求即可)
  • 集成测试:真实场景数据(完整复杂度)
  • 性能测试:大规模数据(1万+条记录)
关键成功因素:
  • 40%的测试失败源于测试数据不足
  • 违反PII规定的GDPR罚款最高可达2000万欧元
  • 绝不要在测试环境中存储生产环境的PII </default_to_action>

Quick Reference Card

快速参考卡片

When to Use

适用场景

  • Creating test datasets
  • Handling sensitive data
  • Performance testing with volume
  • GDPR/CCPA compliance
  • 创建测试数据集
  • 处理敏感数据
  • 带大规模数据的性能测试
  • GDPR/CCPA合规验证

Data Strategies

数据策略

TypeWhenSize
MinimalUnit tests1-10 records
RealisticIntegration100-1000 records
VolumePerformance10k+ records
Edge casesBoundary testingTargeted
类型适用场景数据规模
极简型单元测试1-10条记录
真实场景型集成测试100-1000条记录
大规模型性能测试1万+条记录
边缘场景型边界测试针对性数据

Privacy Techniques

隐私处理技术

TechniqueUse Case
SyntheticGenerate fake data (preferred)
Maskingj***@example.com
HashingIrreversible pseudonymization
TokenizationReversible with key

技术适用场景
合成数据生成虚假数据(优先选择)
掩码处理j***@example.com
哈希处理不可逆的假名化
令牌化可通过密钥还原

Synthetic Data Generation

合成数据生成

javascript
import { faker } from '@faker-js/faker';

// Seed for reproducibility
faker.seed(123);

function generateUser() {
  return {
    id: faker.string.uuid(),
    email: faker.internet.email(),
    firstName: faker.person.firstName(),
    lastName: faker.person.lastName(),
    phone: faker.phone.number(),
    address: {
      street: faker.location.streetAddress(),
      city: faker.location.city(),
      zip: faker.location.zipCode()
    },
    createdAt: faker.date.past()
  };
}

// Generate 1000 users
const users = Array.from({ length: 1000 }, generateUser);

javascript
import { faker } from '@faker-js/faker';

// Seed for reproducibility
faker.seed(123);

function generateUser() {
  return {
    id: faker.string.uuid(),
    email: faker.internet.email(),
    firstName: faker.person.firstName(),
    lastName: faker.person.lastName(),
    phone: faker.phone.number(),
    address: {
      street: faker.location.streetAddress(),
      city: faker.location.city(),
      zip: faker.location.zipCode()
    },
    createdAt: faker.date.past()
  };
}

// Generate 1000 users
const users = Array.from({ length: 1000 }, generateUser);

Test Data Builder Pattern

测试数据构建器模式

typescript
class UserBuilder {
  private user: Partial<User> = {};

  asAdmin() {
    this.user.role = 'admin';
    this.user.permissions = ['read', 'write', 'delete'];
    return this;
  }

  asCustomer() {
    this.user.role = 'customer';
    this.user.permissions = ['read'];
    return this;
  }

  withEmail(email: string) {
    this.user.email = email;
    return this;
  }

  build(): User {
    return {
      id: this.user.id ?? faker.string.uuid(),
      email: this.user.email ?? faker.internet.email(),
      role: this.user.role ?? 'customer',
      ...this.user
    } as User;
  }
}

// Usage
const admin = new UserBuilder().asAdmin().withEmail('admin@test.com').build();
const customer = new UserBuilder().asCustomer().build();

typescript
class UserBuilder {
  private user: Partial<User> = {};

  asAdmin() {
    this.user.role = 'admin';
    this.user.permissions = ['read', 'write', 'delete'];
    return this;
  }

  asCustomer() {
    this.user.role = 'customer';
    this.user.permissions = ['read'];
    return this;
  }

  withEmail(email: string) {
    this.user.email = email;
    return this;
  }

  build(): User {
    return {
      id: this.user.id ?? faker.string.uuid(),
      email: this.user.email ?? faker.internet.email(),
      role: this.user.role ?? 'customer',
      ...this.user
    } as User;
  }
}

// Usage
const admin = new UserBuilder().asAdmin().withEmail('admin@test.com').build();
const customer = new UserBuilder().asCustomer().build();

Data Anonymization

数据匿名化

javascript
// Masking
function maskEmail(email) {
  const [user, domain] = email.split('@');
  return `${user[0]}***@${domain}`;
}
// john@example.com → j***@example.com

function maskCreditCard(cc) {
  return `****-****-****-${cc.slice(-4)}`;
}
// 4242424242424242 → ****-****-****-4242

// Anonymize production data
const anonymizedUsers = prodUsers.map(user => ({
  id: user.id, // Keep ID for relationships
  email: `user-${user.id}@example.com`, // Fake email
  firstName: faker.person.firstName(), // Generated
  phone: null, // Remove PII
  createdAt: user.createdAt // Keep non-PII
}));

javascript
// Masking
function maskEmail(email) {
  const [user, domain] = email.split('@');
  return `${user[0]}***@${domain}`;
}
// john@example.com → j***@example.com

function maskCreditCard(cc) {
  return `****-****-****-${cc.slice(-4)}`;
}
// 4242424242424242 → ****-****-****-4242

// Anonymize production data
const anonymizedUsers = prodUsers.map(user => ({
  id: user.id, // Keep ID for relationships
  email: `user-${user.id}@example.com`, // Fake email
  firstName: faker.person.firstName(), // Generated
  phone: null, // Remove PII
  createdAt: user.createdAt // Keep non-PII
}));

Database Transaction Isolation

数据库事务隔离

javascript
// Best practice: use transactions for cleanup
beforeEach(async () => {
  await db.beginTransaction();
});

afterEach(async () => {
  await db.rollbackTransaction(); // Auto cleanup!
});

test('user registration', async () => {
  const user = await userService.register({
    email: 'test@example.com'
  });
  expect(user.id).toBeDefined();
  // Automatic rollback after test - no cleanup needed
});

javascript
// Best practice: use transactions for cleanup
beforeEach(async () => {
  await db.beginTransaction();
});

afterEach(async () => {
  await db.rollbackTransaction(); // Auto cleanup!
});

test('user registration', async () => {
  const user = await userService.register({
    email: 'test@example.com'
  });
  expect(user.id).toBeDefined();
  // Automatic rollback after test - no cleanup needed
});

Volume Data Generation

大规模数据生成

javascript
// Generate 10,000 users efficiently
async function generateLargeDataset(count = 10000) {
  const batchSize = 1000;
  const batches = Math.ceil(count / batchSize);

  for (let i = 0; i < batches; i++) {
    const users = Array.from({ length: batchSize }, (_, index) => ({
      id: i * batchSize + index,
      email: `user${i * batchSize + index}@example.com`,
      firstName: faker.person.firstName()
    }));

    await db.users.insertMany(users); // Batch insert
    console.log(`Batch ${i + 1}/${batches}`);
  }
}

javascript
// Generate 10,000 users efficiently
async function generateLargeDataset(count = 10000) {
  const batchSize = 1000;
  const batches = Math.ceil(count / batchSize);

  for (let i = 0; i < batches; i++) {
    const users = Array.from({ length: batchSize }, (_, index) => ({
      id: i * batchSize + index,
      email: `user${i * batchSize + index}@example.com`,
      firstName: faker.person.firstName()
    }));

    await db.users.insertMany(users); // Batch insert
    console.log(`Batch ${i + 1}/${batches}`);
  }
}

Agent-Driven Data Generation

Agent驱动的数据生成

typescript
// High-speed generation with constraints
await Task("Generate Test Data", {
  schema: 'ecommerce',
  count: { users: 10000, products: 500, orders: 5000 },
  preserveReferentialIntegrity: true,
  constraints: {
    age: { min: 18, max: 90 },
    roles: ['customer', 'admin']
  }
}, "qe-test-data-architect");

// GDPR-compliant anonymization
await Task("Anonymize Production Data", {
  source: 'production-snapshot',
  piiFields: ['email', 'phone', 'ssn'],
  method: 'pseudonymization',
  retainStructure: true
}, "qe-test-data-architect");

typescript
// High-speed generation with constraints
await Task("Generate Test Data", {
  schema: 'ecommerce',
  count: { users: 10000, products: 500, orders: 5000 },
  preserveReferentialIntegrity: true,
  constraints: {
    age: { min: 18, max: 90 },
    roles: ['customer', 'admin']
  }
}, "qe-test-data-architect");

// GDPR-compliant anonymization
await Task("Anonymize Production Data", {
  source: 'production-snapshot',
  piiFields: ['email', 'phone', 'ssn'],
  method: 'pseudonymization',
  retainStructure: true
}, "qe-test-data-architect");

Agent Coordination Hints

Agent协作提示

Memory Namespace

内存命名空间

aqe/test-data-management/
├── schemas/*            - Data schemas
├── generators/*         - Generator configs
├── anonymization/*      - PII handling rules
└── fixtures/*           - Reusable fixtures
aqe/test-data-management/
├── schemas/*            - Data schemas
├── generators/*         - Generator configs
├── anonymization/*      - PII handling rules
└── fixtures/*           - Reusable fixtures

Fleet Coordination

集群协作

typescript
const dataFleet = await FleetManager.coordinate({
  strategy: 'test-data-generation',
  agents: [
    'qe-test-data-architect',  // Generate data
    'qe-test-executor',        // Execute with data
    'qe-security-scanner'      // Validate no PII exposure
  ],
  topology: 'sequential'
});

typescript
const dataFleet = await FleetManager.coordinate({
  strategy: 'test-data-generation',
  agents: [
    'qe-test-data-architect',  // Generate data
    'qe-test-executor',        // Execute with data
    'qe-security-scanner'      // Validate no PII exposure
  ],
  topology: 'sequential'
});

Related Skills

相关技能

  • database-testing - Schema and integrity testing
  • compliance-testing - GDPR/CCPA compliance
  • performance-testing - Volume data for perf tests

  • database-testing - 数据库架构与完整性测试
  • compliance-testing - GDPR/CCPA合规测试
  • performance-testing - 用于性能测试的大规模数据

Remember

注意事项

Test data is infrastructure, not an afterthought. 40% of test failures are caused by inadequate test data. Poor data = poor tests.
Never use production PII directly. GDPR fines up to €20M or 4% of revenue. Always use synthetic data or properly anonymized production snapshots.
With Agents:
qe-test-data-architect
generates 10k+ records/sec with realistic patterns, relationships, and constraints. Agents ensure GDPR/CCPA compliance automatically and eliminate test data bottlenecks.
测试数据是基础设施,而非事后补充项。 40%的测试失败由测试数据不足导致。劣质数据=劣质测试。
绝不要直接使用生产环境的PII。 GDPR罚款最高可达2000万欧元或全球年营业额的4%。请始终使用合成数据或经过恰当匿名化的生产环境快照。
借助Agent:
qe-test-data-architect
可每秒生成1万+条具备真实模式、关联关系与约束条件的记录。Agent可自动确保GDPR/CCPA合规,消除测试数据瓶颈。