test-data-management
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTest Data Management
测试数据管理
<default_to_action>
When creating or managing test data:
- NEVER use production PII directly
- GENERATE synthetic data with faker libraries
- ANONYMIZE production data if used (mask, hash)
- ISOLATE test data (transactions, per-test cleanup)
- SCALE with batch generation (10k+ records/sec)
Quick Data Strategy:
- Unit tests: Minimal data (just enough)
- Integration: Realistic data (full complexity)
- Performance: Volume data (10k+ records)
Critical Success Factors:
- 40% of test failures from inadequate data
- GDPR fines up to €20M for PII violations
- Never store production PII in test environments </default_to_action>
<default_to_action>
创建或管理测试数据时:
- 绝不要直接使用生产环境的PII
- 使用faker库生成合成数据
- 若使用生产数据,必须进行匿名化处理(掩码、哈希)
- 隔离测试数据(事务处理、单测后清理)
- 通过批量生成实现规模扩展(每秒1万+条记录)
快速数据策略:
- 单元测试:极简数据(满足需求即可)
- 集成测试:真实场景数据(完整复杂度)
- 性能测试:大规模数据(1万+条记录)
关键成功因素:
- 40%的测试失败源于测试数据不足
- 违反PII规定的GDPR罚款最高可达2000万欧元
- 绝不要在测试环境中存储生产环境的PII </default_to_action>
Quick Reference Card
快速参考卡片
When to Use
适用场景
- Creating test datasets
- Handling sensitive data
- Performance testing with volume
- GDPR/CCPA compliance
- 创建测试数据集
- 处理敏感数据
- 带大规模数据的性能测试
- GDPR/CCPA合规验证
Data Strategies
数据策略
| Type | When | Size |
|---|---|---|
| Minimal | Unit tests | 1-10 records |
| Realistic | Integration | 100-1000 records |
| Volume | Performance | 10k+ records |
| Edge cases | Boundary testing | Targeted |
| 类型 | 适用场景 | 数据规模 |
|---|---|---|
| 极简型 | 单元测试 | 1-10条记录 |
| 真实场景型 | 集成测试 | 100-1000条记录 |
| 大规模型 | 性能测试 | 1万+条记录 |
| 边缘场景型 | 边界测试 | 针对性数据 |
Privacy Techniques
隐私处理技术
| Technique | Use Case |
|---|---|
| Synthetic | Generate fake data (preferred) |
| Masking | j***@example.com |
| Hashing | Irreversible pseudonymization |
| Tokenization | Reversible with key |
| 技术 | 适用场景 |
|---|---|
| 合成数据 | 生成虚假数据(优先选择) |
| 掩码处理 | j***@example.com |
| 哈希处理 | 不可逆的假名化 |
| 令牌化 | 可通过密钥还原 |
Synthetic Data Generation
合成数据生成
javascript
import { faker } from '@faker-js/faker';
// Seed for reproducibility
faker.seed(123);
function generateUser() {
return {
id: faker.string.uuid(),
email: faker.internet.email(),
firstName: faker.person.firstName(),
lastName: faker.person.lastName(),
phone: faker.phone.number(),
address: {
street: faker.location.streetAddress(),
city: faker.location.city(),
zip: faker.location.zipCode()
},
createdAt: faker.date.past()
};
}
// Generate 1000 users
const users = Array.from({ length: 1000 }, generateUser);javascript
import { faker } from '@faker-js/faker';
// Seed for reproducibility
faker.seed(123);
function generateUser() {
return {
id: faker.string.uuid(),
email: faker.internet.email(),
firstName: faker.person.firstName(),
lastName: faker.person.lastName(),
phone: faker.phone.number(),
address: {
street: faker.location.streetAddress(),
city: faker.location.city(),
zip: faker.location.zipCode()
},
createdAt: faker.date.past()
};
}
// Generate 1000 users
const users = Array.from({ length: 1000 }, generateUser);Test Data Builder Pattern
测试数据构建器模式
typescript
class UserBuilder {
private user: Partial<User> = {};
asAdmin() {
this.user.role = 'admin';
this.user.permissions = ['read', 'write', 'delete'];
return this;
}
asCustomer() {
this.user.role = 'customer';
this.user.permissions = ['read'];
return this;
}
withEmail(email: string) {
this.user.email = email;
return this;
}
build(): User {
return {
id: this.user.id ?? faker.string.uuid(),
email: this.user.email ?? faker.internet.email(),
role: this.user.role ?? 'customer',
...this.user
} as User;
}
}
// Usage
const admin = new UserBuilder().asAdmin().withEmail('admin@test.com').build();
const customer = new UserBuilder().asCustomer().build();typescript
class UserBuilder {
private user: Partial<User> = {};
asAdmin() {
this.user.role = 'admin';
this.user.permissions = ['read', 'write', 'delete'];
return this;
}
asCustomer() {
this.user.role = 'customer';
this.user.permissions = ['read'];
return this;
}
withEmail(email: string) {
this.user.email = email;
return this;
}
build(): User {
return {
id: this.user.id ?? faker.string.uuid(),
email: this.user.email ?? faker.internet.email(),
role: this.user.role ?? 'customer',
...this.user
} as User;
}
}
// Usage
const admin = new UserBuilder().asAdmin().withEmail('admin@test.com').build();
const customer = new UserBuilder().asCustomer().build();Data Anonymization
数据匿名化
javascript
// Masking
function maskEmail(email) {
const [user, domain] = email.split('@');
return `${user[0]}***@${domain}`;
}
// john@example.com → j***@example.com
function maskCreditCard(cc) {
return `****-****-****-${cc.slice(-4)}`;
}
// 4242424242424242 → ****-****-****-4242
// Anonymize production data
const anonymizedUsers = prodUsers.map(user => ({
id: user.id, // Keep ID for relationships
email: `user-${user.id}@example.com`, // Fake email
firstName: faker.person.firstName(), // Generated
phone: null, // Remove PII
createdAt: user.createdAt // Keep non-PII
}));javascript
// Masking
function maskEmail(email) {
const [user, domain] = email.split('@');
return `${user[0]}***@${domain}`;
}
// john@example.com → j***@example.com
function maskCreditCard(cc) {
return `****-****-****-${cc.slice(-4)}`;
}
// 4242424242424242 → ****-****-****-4242
// Anonymize production data
const anonymizedUsers = prodUsers.map(user => ({
id: user.id, // Keep ID for relationships
email: `user-${user.id}@example.com`, // Fake email
firstName: faker.person.firstName(), // Generated
phone: null, // Remove PII
createdAt: user.createdAt // Keep non-PII
}));Database Transaction Isolation
数据库事务隔离
javascript
// Best practice: use transactions for cleanup
beforeEach(async () => {
await db.beginTransaction();
});
afterEach(async () => {
await db.rollbackTransaction(); // Auto cleanup!
});
test('user registration', async () => {
const user = await userService.register({
email: 'test@example.com'
});
expect(user.id).toBeDefined();
// Automatic rollback after test - no cleanup needed
});javascript
// Best practice: use transactions for cleanup
beforeEach(async () => {
await db.beginTransaction();
});
afterEach(async () => {
await db.rollbackTransaction(); // Auto cleanup!
});
test('user registration', async () => {
const user = await userService.register({
email: 'test@example.com'
});
expect(user.id).toBeDefined();
// Automatic rollback after test - no cleanup needed
});Volume Data Generation
大规模数据生成
javascript
// Generate 10,000 users efficiently
async function generateLargeDataset(count = 10000) {
const batchSize = 1000;
const batches = Math.ceil(count / batchSize);
for (let i = 0; i < batches; i++) {
const users = Array.from({ length: batchSize }, (_, index) => ({
id: i * batchSize + index,
email: `user${i * batchSize + index}@example.com`,
firstName: faker.person.firstName()
}));
await db.users.insertMany(users); // Batch insert
console.log(`Batch ${i + 1}/${batches}`);
}
}javascript
// Generate 10,000 users efficiently
async function generateLargeDataset(count = 10000) {
const batchSize = 1000;
const batches = Math.ceil(count / batchSize);
for (let i = 0; i < batches; i++) {
const users = Array.from({ length: batchSize }, (_, index) => ({
id: i * batchSize + index,
email: `user${i * batchSize + index}@example.com`,
firstName: faker.person.firstName()
}));
await db.users.insertMany(users); // Batch insert
console.log(`Batch ${i + 1}/${batches}`);
}
}Agent-Driven Data Generation
Agent驱动的数据生成
typescript
// High-speed generation with constraints
await Task("Generate Test Data", {
schema: 'ecommerce',
count: { users: 10000, products: 500, orders: 5000 },
preserveReferentialIntegrity: true,
constraints: {
age: { min: 18, max: 90 },
roles: ['customer', 'admin']
}
}, "qe-test-data-architect");
// GDPR-compliant anonymization
await Task("Anonymize Production Data", {
source: 'production-snapshot',
piiFields: ['email', 'phone', 'ssn'],
method: 'pseudonymization',
retainStructure: true
}, "qe-test-data-architect");typescript
// High-speed generation with constraints
await Task("Generate Test Data", {
schema: 'ecommerce',
count: { users: 10000, products: 500, orders: 5000 },
preserveReferentialIntegrity: true,
constraints: {
age: { min: 18, max: 90 },
roles: ['customer', 'admin']
}
}, "qe-test-data-architect");
// GDPR-compliant anonymization
await Task("Anonymize Production Data", {
source: 'production-snapshot',
piiFields: ['email', 'phone', 'ssn'],
method: 'pseudonymization',
retainStructure: true
}, "qe-test-data-architect");Agent Coordination Hints
Agent协作提示
Memory Namespace
内存命名空间
aqe/test-data-management/
├── schemas/* - Data schemas
├── generators/* - Generator configs
├── anonymization/* - PII handling rules
└── fixtures/* - Reusable fixturesaqe/test-data-management/
├── schemas/* - Data schemas
├── generators/* - Generator configs
├── anonymization/* - PII handling rules
└── fixtures/* - Reusable fixturesFleet Coordination
集群协作
typescript
const dataFleet = await FleetManager.coordinate({
strategy: 'test-data-generation',
agents: [
'qe-test-data-architect', // Generate data
'qe-test-executor', // Execute with data
'qe-security-scanner' // Validate no PII exposure
],
topology: 'sequential'
});typescript
const dataFleet = await FleetManager.coordinate({
strategy: 'test-data-generation',
agents: [
'qe-test-data-architect', // Generate data
'qe-test-executor', // Execute with data
'qe-security-scanner' // Validate no PII exposure
],
topology: 'sequential'
});Related Skills
相关技能
- database-testing - Schema and integrity testing
- compliance-testing - GDPR/CCPA compliance
- performance-testing - Volume data for perf tests
- database-testing - 数据库架构与完整性测试
- compliance-testing - GDPR/CCPA合规测试
- performance-testing - 用于性能测试的大规模数据
Remember
注意事项
Test data is infrastructure, not an afterthought. 40% of test failures are caused by inadequate test data. Poor data = poor tests.
Never use production PII directly. GDPR fines up to €20M or 4% of revenue. Always use synthetic data or properly anonymized production snapshots.
With Agents: generates 10k+ records/sec with realistic patterns, relationships, and constraints. Agents ensure GDPR/CCPA compliance automatically and eliminate test data bottlenecks.
qe-test-data-architect测试数据是基础设施,而非事后补充项。 40%的测试失败由测试数据不足导致。劣质数据=劣质测试。
绝不要直接使用生产环境的PII。 GDPR罚款最高可达2000万欧元或全球年营业额的4%。请始终使用合成数据或经过恰当匿名化的生产环境快照。
借助Agent: 可每秒生成1万+条具备真实模式、关联关系与约束条件的记录。Agent可自动确保GDPR/CCPA合规,消除测试数据瓶颈。
qe-test-data-architect