nosql-database-design

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

NoSQL Database Design

NoSQL数据库设计

Overview

概述

Design scalable NoSQL schemas for MongoDB (document) and DynamoDB (key-value). Covers data modeling patterns, denormalization strategies, and query optimization for NoSQL systems.
为MongoDB(文档型)和DynamoDB(键值型)设计可扩展的NoSQL模式。涵盖NoSQL系统的数据建模模式、反规范化策略和查询优化。

When to Use

适用场景

  • MongoDB collection design
  • DynamoDB table and index design
  • Document structure modeling
  • Embedding vs. referencing decisions
  • Query pattern optimization
  • NoSQL indexing strategies
  • Data denormalization planning
  • MongoDB集合设计
  • DynamoDB表与索引设计
  • 文档结构建模
  • 嵌入与引用决策
  • 查询模式优化
  • NoSQL索引策略
  • 数据反规范化规划

MongoDB Schema Design

MongoDB模式设计

Document Structure Design

文档结构设计

MongoDB - Embedded Documents:
javascript
// Single document with embedded arrays
db.createCollection("users")

db.users.insertOne({
  _id: ObjectId("..."),
  email: "john@example.com",
  name: "John Doe",
  createdAt: new Date(),

  // Embedded address
  address: {
    street: "123 Main St",
    city: "New York",
    state: "NY",
    zipCode: "10001"
  },

  // Embedded array of items
  orders: [
    {
      orderId: ObjectId("..."),
      date: new Date(),
      total: 149.99
    },
    {
      orderId: ObjectId("..."),
      date: new Date(),
      total: 89.99
    }
  ]
})
MongoDB - Referenced Documents:
javascript
// Separate collections with references
db.createCollection("users")
db.createCollection("orders")

db.users.insertOne({
  _id: ObjectId("..."),
  email: "john@example.com",
  name: "John Doe"
})

db.orders.insertMany([
  {
    _id: ObjectId("..."),
    userId: ObjectId("..."),  // Reference to user
    orderDate: new Date(),
    total: 149.99
  },
  {
    _id: ObjectId("..."),
    userId: ObjectId("..."),
    orderDate: new Date(),
    total: 89.99
  }
])

// Query with $lookup for JOINs
db.orders.aggregate([
  {
    $match: { userId: ObjectId("...") }
  },
  {
    $lookup: {
      from: "users",
      localField: "userId",
      foreignField: "_id",
      as: "user"
    }
  }
])
MongoDB - Embedded Documents:
javascript
// Single document with embedded arrays
db.createCollection("users")

db.users.insertOne({
  _id: ObjectId("..."),
  email: "john@example.com",
  name: "John Doe",
  createdAt: new Date(),

  // Embedded address
  address: {
    street: "123 Main St",
    city: "New York",
    state: "NY",
    zipCode: "10001"
  },

  // Embedded array of items
  orders: [
    {
      orderId: ObjectId("..."),
      date: new Date(),
      total: 149.99
    },
    {
      orderId: ObjectId("..."),
      date: new Date(),
      total: 89.99
    }
  ]
})
MongoDB - Referenced Documents:
javascript
// Separate collections with references
db.createCollection("users")
db.createCollection("orders")

db.users.insertOne({
  _id: ObjectId("..."),
  email: "john@example.com",
  name: "John Doe"
})

db.orders.insertMany([
  {
    _id: ObjectId("..."),
    userId: ObjectId("..."),  // Reference to user
    orderDate: new Date(),
    total: 149.99
  },
  {
    _id: ObjectId("..."),
    userId: ObjectId("..."),
    orderDate: new Date(),
    total: 89.99
  }
])

// Query with $lookup for JOINs
db.orders.aggregate([
  {
    $match: { userId: ObjectId("...") }
  },
  {
    $lookup: {
      from: "users",
      localField: "userId",
      foreignField: "_id",
      as: "user"
    }
  }
])

Indexing in MongoDB

MongoDB中的索引

javascript
// Single field index
db.users.createIndex({ email: 1 })
db.orders.createIndex({ createdAt: -1 })

// Compound index
db.orders.createIndex({ userId: 1, createdAt: -1 })

// Text index for search
db.products.createIndex({ name: "text", description: "text" })

// Geospatial index
db.stores.createIndex({ location: "2dsphere" })

// TTL index for auto-expiration
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 })

// Sparse index (only documents with field)
db.users.createIndex({ phone: 1 }, { sparse: true })

// Check index usage
db.users.aggregate([{ $indexStats: {} }])
javascript
// Single field index
db.users.createIndex({ email: 1 })
db.orders.createIndex({ createdAt: -1 })

// Compound index
db.orders.createIndex({ userId: 1, createdAt: -1 })

// Text index for search
db.products.createIndex({ name: "text", description: "text" })

// Geospatial index
db.stores.createIndex({ location: "2dsphere" })

// TTL index for auto-expiration
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 })

// Sparse index (only documents with field)
db.users.createIndex({ phone: 1 }, { sparse: true })

// Check index usage
db.users.aggregate([{ $indexStats: {} }])

Schema Validation

模式验证

javascript
// Define collection validation schema
db.createCollection("products", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "price", "category"],
      properties: {
        _id: { bsonType: "objectId" },
        name: {
          bsonType: "string",
          description: "Product name (required)"
        },
        price: {
          bsonType: "decimal",
          minimum: 0,
          description: "Price must be positive"
        },
        category: {
          enum: ["electronics", "clothing", "food"],
          description: "Category must be one of listed values"
        },
        tags: {
          bsonType: "array",
          items: { bsonType: "string" }
        },
        createdAt: {
          bsonType: "date"
        }
      }
    }
  }
})
javascript
// Define collection validation schema
db.createCollection("products", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "price", "category"],
      properties: {
        _id: { bsonType: "objectId" },
        name: {
          bsonType: "string",
          description: "Product name (required)"
        },
        price: {
          bsonType: "decimal",
          minimum: 0,
          description: "Price must be positive"
        },
        category: {
          enum: ["electronics", "clothing", "food"],
          description: "Category must be one of listed values"
        },
        tags: {
          bsonType: "array",
          items: { bsonType: "string" }
        },
        createdAt: {
          bsonType: "date"
        }
      }
    }
  }
})

DynamoDB Schema Design

DynamoDB模式设计

Table Structure

表结构

javascript
// DynamoDB table with single primary key
const TableName = "users"
const params = {
  TableName,
  KeySchema: [
    { AttributeName: "userId", KeyType: "HASH" }  // Partition key
  ],
  AttributeDefinitions: [
    { AttributeName: "userId", AttributeType: "S" }  // String
  ],
  BillingMode: "PAY_PER_REQUEST"  // On-demand
}

// DynamoDB table with composite primary key
const ordersParams = {
  TableName: "orders",
  KeySchema: [
    { AttributeName: "userId", KeyType: "HASH" },      // Partition key
    { AttributeName: "orderId", KeyType: "RANGE" }    // Sort key
  ],
  AttributeDefinitions: [
    { AttributeName: "userId", AttributeType: "S" },
    { AttributeName: "orderId", AttributeType: "S" }
  ],
  BillingMode: "PAY_PER_REQUEST"
}
javascript
// DynamoDB table with single primary key
const TableName = "users"
const params = {
  TableName,
  KeySchema: [
    { AttributeName: "userId", KeyType: "HASH" }  // Partition key
  ],
  AttributeDefinitions: [
    { AttributeName: "userId", AttributeType: "S" }  // String
  ],
  BillingMode: "PAY_PER_REQUEST"  // On-demand
}

// DynamoDB table with composite primary key
const ordersParams = {
  TableName: "orders",
  KeySchema: [
    { AttributeName: "userId", KeyType: "HASH" },      // Partition key
    { AttributeName: "orderId", KeyType: "RANGE" }    // Sort key
  ],
  AttributeDefinitions: [
    { AttributeName: "userId", AttributeType: "S" },
    { AttributeName: "orderId", AttributeType: "S" }
  ],
  BillingMode: "PAY_PER_REQUEST"
}

Global Secondary Indexes (GSI)

全局二级索引(GSI)

javascript
// Add GSI for querying by email
const gsiParams = {
  TableName: "users",
  AttributeDefinitions: [
    { AttributeName: "email", AttributeType: "S" }
  ],
  GlobalSecondaryIndexes: [
    {
      IndexName: "emailIndex",
      KeySchema: [
        { AttributeName: "email", KeyType: "HASH" }
      ],
      Projection: {
        ProjectionType: "ALL"  // Return all attributes
      },
      BillingMode: "PAY_PER_REQUEST"
    }
  ]
}

// GSI with composite key for time-based queries
const timeIndexParams = {
  GlobalSecondaryIndexes: [
    {
      IndexName: "userCreatedIndex",
      KeySchema: [
        { AttributeName: "userId", KeyType: "HASH" },
        { AttributeName: "createdAt", KeyType: "RANGE" }
      ],
      Projection: { ProjectionType: "ALL" },
      BillingMode: "PAY_PER_REQUEST"
    }
  ]
}
javascript
// Add GSI for querying by email
const gsiParams = {
  TableName: "users",
  AttributeDefinitions: [
    { AttributeName: "email", AttributeType: "S" }
  ],
  GlobalSecondaryIndexes: [
    {
      IndexName: "emailIndex",
      KeySchema: [
        { AttributeName: "email", KeyType: "HASH" }
      ],
      Projection: {
        ProjectionType: "ALL"  // Return all attributes
      },
      BillingMode: "PAY_PER_REQUEST"
    }
  ]
}

// GSI with composite key for time-based queries
const timeIndexParams = {
  GlobalSecondaryIndexes: [
    {
      IndexName: "userCreatedIndex",
      KeySchema: [
        { AttributeName: "userId", KeyType: "HASH" },
        { AttributeName: "createdAt", KeyType: "RANGE" }
      ],
      Projection: { ProjectionType: "ALL" },
      BillingMode: "PAY_PER_REQUEST"
    }
  ]
}

DynamoDB Item Operations

DynamoDB条目操作

javascript
// Put item (insert/update)
const putParams = {
  TableName: "users",
  Item: {
    userId: { S: "user-123" },
    email: { S: "john@example.com" },
    name: { S: "John Doe" },
    createdAt: { N: Date.now().toString() },
    metadata: {
      M: {
        joinDate: { N: Date.now().toString() },
        source: { S: "web" }
      }
    }
  }
}

// Query using GSI
const queryParams = {
  TableName: "users",
  IndexName: "emailIndex",
  KeyConditionExpression: "email = :email",
  ExpressionAttributeValues: {
    ":email": { S: "john@example.com" }
  }
}

// Batch get items
const batchGetParams = {
  RequestItems: {
    "users": {
      Keys: [
        { userId: { S: "user-123" } },
        { userId: { S: "user-456" } }
      ]
    }
  }
}
javascript
// Put item (insert/update)
const putParams = {
  TableName: "users",
  Item: {
    userId: { S: "user-123" },
    email: { S: "john@example.com" },
    name: { S: "John Doe" },
    createdAt: { N: Date.now().toString() },
    metadata: {
      M: {
        joinDate: { N: Date.now().toString() },
        source: { S: "web" }
      }
    }
  }
}

// Query using GSI
const queryParams = {
  TableName: "users",
  IndexName: "emailIndex",
  KeyConditionExpression: "email = :email",
  ExpressionAttributeValues: {
    ":email": { S: "john@example.com" }
  }
}

// Batch get items
const batchGetParams = {
  RequestItems: {
    "users": {
      Keys: [
        { userId: { S: "user-123" } },
        { userId: { S: "user-456" } }
      ]
    }
  }
}

Denormalization Patterns

反规范化模式

MongoDB - Embedding for Performance:
javascript
// Embed frequently accessed data to avoid lookups
db.orders.insertOne({
  _id: ObjectId("..."),
  userId: ObjectId("..."),
  userEmail: "john@example.com",      // Denormalized
  userName: "John Doe",                // Denormalized
  createdAt: new Date(),
  items: [
    {
      productId: ObjectId("..."),
      productName: "Laptop",            // Denormalized
      productPrice: 999.99,             // Denormalized
      quantity: 1
    }
  ]
})
DynamoDB - Denormalization with Consistency:
javascript
// Store related data in same item to ensure consistency
const params = {
  TableName: "orders",
  Item: {
    userId: { S: "user-123" },
    orderId: { S: "order-456" },
    orderDate: { N: Date.now().toString() },

    // User data snapshot at order time
    userSnapshot: {
      M: {
        email: { S: "john@example.com" },
        address: { S: "123 Main St" }
      }
    },

    // Items with product information
    items: {
      L: [
        {
          M: {
            productId: { S: "prod-789" },
            name: { S: "Laptop" },
            price: { N: "999.99" },
            quantity: { N: "1" }
          }
        }
      ]
    }
  }
}
MongoDB - 为性能嵌入数据:
javascript
// Embed frequently accessed data to avoid lookups
db.orders.insertOne({
  _id: ObjectId("..."),
  userId: ObjectId("..."),
  userEmail: "john@example.com",      // Denormalized
  userName: "John Doe",                // Denormalized
  createdAt: new Date(),
  items: [
    {
      productId: ObjectId("..."),
      productName: "Laptop",            // Denormalized
      productPrice: 999.99,             // Denormalized
      quantity: 1
    }
  ]
})
DynamoDB - 带一致性的反规范化:
javascript
// Store related data in same item to ensure consistency
const params = {
  TableName: "orders",
  Item: {
    userId: { S: "user-123" },
    orderId: { S: "order-456" },
    orderDate: { N: Date.now().toString() },

    // User data snapshot at order time
    userSnapshot: {
      M: {
        email: { S: "john@example.com" },
        address: { S: "123 Main St" }
      }
    },

    // Items with product information
    items: {
      L: [
        {
          M: {
            productId: { S: "prod-789" },
            name: { S: "Laptop" },
            price: { N: "999.99" },
            quantity: { N: "1" }
          }
        }
      ]
    }
  }
}

Design Patterns

设计模式

MongoDB - Time-Series Pattern:
javascript
// Efficient time-series data storage
db.sensor_data.insertOne({
  _id: ObjectId("..."),
  sensorId: "sensor-123",
  date: ISODate("2024-01-15"),
  measurements: [
    { time: "12:00", temperature: 72.5, humidity: 45 },
    { time: "12:01", temperature: 72.6, humidity: 45.2 },
    { time: "12:02", temperature: 72.4, humidity: 44.8 }
  ]
})

// Index for efficient queries
db.sensor_data.createIndex({ sensorId: 1, date: -1 })
DynamoDB - One-to-Many Relationship:
javascript
// Store one-to-many relationships efficiently
// User comments using userId as partition key, commentId as sort key
const commentParams = {
  TableName: "comments",
  Item: {
    userId: { S: "user-123" },           // Partition key
    commentId: { S: "comment-789" },     // Sort key
    postId: { S: "post-456" },
    content: { S: "Great article!" },
    createdAt: { N: Date.now().toString() }
  }
}
MongoDB - 时间序列模式:
javascript
// Efficient time-series data storage
db.sensor_data.insertOne({
  _id: ObjectId("..."),
  sensorId: "sensor-123",
  date: ISODate("2024-01-15"),
  measurements: [
    { time: "12:00", temperature: 72.5, humidity: 45 },
    { time: "12:01", temperature: 72.6, humidity: 45.2 },
    { time: "12:02", temperature: 72.4, humidity: 44.8 }
  ]
})

// Index for efficient queries
db.sensor_data.createIndex({ sensorId: 1, date: -1 })
DynamoDB - 一对多关系:
javascript
// Store one-to-many relationships efficiently
// User comments using userId as partition key, commentId as sort key
const commentParams = {
  TableName: "comments",
  Item: {
    userId: { S: "user-123" },           // Partition key
    commentId: { S: "comment-789" },     // Sort key
    postId: { S: "post-456" },
    content: { S: "Great article!" },
    createdAt: { N: Date.now().toString() }
  }
}

Capacity Planning

容量规划

MongoDB - Horizontal Scaling:
javascript
// Sharding for large collections
sh.shardCollection("ecommerce.orders", { userId: "hashed" })

// Monitor shard distribution
db.orders.aggregate([
  { $group: { _id: "$userId", count: { $sum: 1 } } },
  { $sort: { count: -1 } }
])
DynamoDB - Partition Key Design:
javascript
// Good: Distribute across many keys
// Partition key: "USER#123" (spreads across partitions)
// Sort key: "ORDER#2024-01"

// Bad: Hot partition
// Partition key: "ADMIN" (all admin operations hit same partition)

// Solution: Add timestamp or random suffix
// Partition key: "ADMIN#20240115#random"
MongoDB - 水平扩展:
javascript
// Sharding for large collections
sh.shardCollection("ecommerce.orders", { userId: "hashed" })

// Monitor shard distribution
db.orders.aggregate([
  { $group: { _id: "$userId", count: { $sum: 1 } } },
  { $sort: { count: -1 } }
])
DynamoDB - 分区键设计:
javascript
// Good: Distribute across many keys
// Partition key: "USER#123" (spreads across partitions)
// Sort key: "ORDER#2024-01"

// Bad: Hot partition
// Partition key: "ADMIN" (all admin operations hit same partition)

// Solution: Add timestamp or random suffix
// Partition key: "ADMIN#20240115#random"

Migration Considerations

迁移注意事项

  • Plan data migration strategy
  • Consider consistency requirements
  • Test query patterns before finalizing schema
  • Monitor performance after deployment
  • Document relationships and access patterns
  • Plan for schema evolution
  • 规划数据迁移策略
  • 考虑一致性要求
  • 最终确定模式前测试查询模式
  • 部署后监控性能
  • 记录关系与访问模式
  • 规划模式演进

Resources

资源