data-modeling

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Modeling

数据建模

When to Use This Skill

何时使用此技能

Use this skill when:
  • Data Modeling tasks - Working on data modeling with entity-relationship diagrams (erds), data dictionaries, and conceptual/logical/physical models. documents data structures, relationships, and attributes
  • Planning or design - Need guidance on Data Modeling approaches
  • Best practices - Want to follow established patterns and standards
在以下场景中使用此技能:
  • 数据建模任务 - 开展使用实体关系图(ERDs)、数据字典以及概念/逻辑/物理模型的数据建模工作,记录数据结构、关系和属性
  • 规划或设计 - 需要数据建模方法的指导
  • 最佳实践 - 希望遵循既定的模式和标准

Overview

概述

Create and document data structures using Entity-Relationship Diagrams (ERDs), data dictionaries, and structured data models. Supports conceptual, logical, and physical modeling levels for database design and data architecture.
使用实体关系图(ERDs)、数据字典和结构化数据模型创建并记录数据结构。支持数据库设计和数据架构的概念、逻辑和物理建模层级。

What is Data Modeling?

什么是数据建模?

Data modeling creates visual and structured representations of data elements and their relationships. It documents:
  • Entities: Things about which data is stored
  • Attributes: Properties of entities
  • Relationships: How entities connect
  • Constraints: Rules governing data
数据建模 创建数据元素及其关系的可视化和结构化表示。它记录:
  • 实体:需要存储数据的事物
  • 属性:实体的特性
  • 关系:实体之间的关联方式
  • 约束:管理数据的规则

Modeling Levels

建模层级

LevelPurposeAudienceDetail
ConceptualBusiness conceptsBusiness usersEntities, high-level relationships
LogicalData structureAnalysts, designersEntities, attributes, all relationships
PhysicalImplementationDevelopers, DBAsTables, columns, types, indexes
层级用途受众细节程度
概念层业务概念业务用户实体、高层级关系
逻辑层数据结构分析师、设计师实体、属性、所有关系
物理层实现落地开发人员、数据库管理员(DBAs)表、列、类型、索引

Conceptual Model

概念模型

High-level view of business concepts:
  • Major entities only
  • Key relationships
  • No attributes (or minimal)
  • No technical details
业务概念的高层级视图:
  • 仅包含主要实体
  • 关键关系
  • 无属性(或极少)
  • 无技术细节

Logical Model

逻辑模型

Technology-independent data structure:
  • All entities and attributes
  • Primary and foreign keys
  • All relationships with cardinality
  • Normalization applied
  • No physical implementation details
与技术无关的数据结构:
  • 所有实体和属性
  • 主键和外键
  • 带基数的所有关系
  • 已应用规范化
  • 无物理实现细节

Physical Model

物理模型

Database-specific implementation:
  • Table names (physical naming)
  • Column names and data types
  • Indexes and constraints
  • Views and stored procedures
  • Database-specific features
特定数据库的实现方案:
  • 表名(物理命名)
  • 列名和数据类型
  • 索引和约束
  • 视图和存储过程
  • 数据库特定特性

ERD Notation

ERD 符号

Entity (Rectangle)

实体(矩形)

An entity represents a thing about which data is stored.
text
┌─────────────────┐
│    CUSTOMER     │
├─────────────────┤
│ customer_id PK  │
│ name            │
│ email           │
│ created_at      │
└─────────────────┘
Entity Types:
TypeDescriptionExample
StrongIndependent existenceCustomer, Product
WeakDepends on another entityOrder Line (depends on Order)
AssociativeResolves M:N relationshipsEnrollment (Student-Course)
实体代表需要存储数据的事物。
text
┌─────────────────┐
│    CUSTOMER     │
├─────────────────┤
│ customer_id PK  │
│ name            │
│ email           │
│ created_at      │
└─────────────────┘
实体类型:
类型描述示例
强实体可独立存在Customer、Product
弱实体依赖其他实体Order Line(依赖Order)
关联实体解决多对多关系Enrollment(Student-Course)

Attributes

属性

TypeSymbolDescription
Primary Key (PK)Underlined/PKUnique identifier
Foreign Key (FK)FKReference to another entity
Required* or NOT NULLMust have value
Optional○ or NULLMay be empty
Derived/Calculated from other attributes
Composite{attrs}Made of sub-attributes
Multi-valued[attr]Can have multiple values
类型符号描述
主键(PK)下划线/PK唯一标识符
外键(FK)FK引用其他实体
必填* 或 NOT NULL必须有值
可选○ 或 NULL可为空
派生/由其他属性计算得出
复合{attrs}由子属性组成
多值[attr]可拥有多个值

Relationships (Lines)

关系(连线)

Notation Styles:
StyleUsed In
ChenAcademic, conceptual
Crow's FootIndustry standard
UMLSoftware design
IDEF1XGovernment, structured
Crow's Foot Notation:
SymbolMeaning
──
One (mandatory)
──○
Zero or one (optional)
──<
Many
──○<
Zero or many
符号风格:
风格应用场景
Chen学术、概念建模
乌鸦脚行业标准
UML软件设计
IDEF1X政府、结构化建模
乌鸦脚符号:
符号含义
──
一个(强制)
──○
零或一个(可选)
──<
多个
──○<
零或多个

Cardinality

基数

NotationMeaningExample
1:1One to oneEmployee → Workstation
1:MOne to manyCustomer → Orders
M:NMany to manyStudents ↔ Courses
Reading Cardinality:
"One [Entity A] has [min]..[max] [Entity B]"
Example: "One Customer has 0..many Orders"
符号含义示例
1:1一对一Employee → Workstation
1:M一对多Customer → Orders
M:N多对多Students ↔ Courses
基数读法:
“一个[实体A]拥有[最小]..[最大]个[实体B]”
示例:“一个客户拥有0..多个订单”

Workflow

工作流程

Phase 1: Identify Entities

阶段1:识别实体

Step 1: Extract Nouns from Requirements

步骤1:从需求中提取名词

From business requirements, identify:
  • Things the business tracks
  • Subjects of business rules
  • Sources and targets of data
从业务需求中识别:
  • 业务需要跟踪的事物
  • 业务规则的主体
  • 数据的来源和目标

Step 2: Filter Candidates

步骤2:筛选候选实体

KeepExclude
Independent conceptsAttributes (properties of entities)
Things with multiple instancesSynonyms (same concept, different name)
Things requiring data storageActions (verbs, not nouns)
保留排除
独立概念属性(实体的特性)
有多个实例的事物同义词(同一概念的不同名称)
需要存储数据的事物动作(动词,非名词)

Step 3: Document Entities

步骤3:记录实体

markdown
undefined
markdown
undefined

Entities

实体

EntityDescriptionExample
CustomerPerson or organization that purchasesJohn Smith, Acme Corp
OrderPurchase transactionOrder #12345
ProductItem available for saleWidget, Gadget
undefined
实体描述示例
Customer进行购买的个人或组织John Smith、Acme Corp
Order购买交易订单#12345
Product可售卖的商品Widget、Gadget
undefined

Phase 2: Define Attributes

阶段2:定义属性

Step 1: List Attributes for Each Entity

步骤1:为每个实体列出属性

For each entity, identify:
  • What do we need to know about this entity?
  • What uniquely identifies it?
  • What data does the business reference?
针对每个实体,确定:
  • 我们需要了解该实体的哪些信息?
  • 什么能唯一标识它?
  • 业务会引用哪些数据?

Step 2: Classify Attributes

步骤2:分类属性

AttributeTypeRequiredNotes
customer_idPKYesSurrogate key
emailUniqueYesBusiness key
nameStringYes
phoneStringNoOptional
属性类型必填说明
customer_idPK代理键
email唯一业务键
name字符串
phone字符串可选

Step 3: Identify Keys

步骤3:识别键

  • Primary Key (PK): Unique identifier
  • Natural Key: Business-meaningful identifier
  • Surrogate Key: System-generated identifier
  • Composite Key: Multiple attributes combined
  • 主键(PK):唯一标识符
  • 自然键:具有业务含义的标识符
  • 代理键:系统生成的标识符
  • 复合键:多个属性组合而成

Phase 3: Define Relationships

阶段3:定义关系

Step 1: Identify Connections

步骤1:识别关联

For each pair of entities:
  • Is there a business connection?
  • What is the nature of the relationship?
  • What is the cardinality?
针对每对实体:
  • 它们之间是否存在业务关联?
  • 关系的性质是什么?
  • 基数是多少?

Step 2: Document Relationships

步骤2:记录关系

markdown
undefined
markdown
undefined

Relationships

关系

RelationshipFromToCardinalityDescription
placesCustomerOrder1:MCustomer places orders
containsOrderProductM:NOrder contains products
undefined
关系来源目标基数描述
placesCustomerOrder1:M客户下订单
containsOrderProductM:N订单包含商品
undefined

Step 3: Resolve Many-to-Many

步骤3:解决多对多关系

M:N relationships require associative entities:
text
Student ──M:N── Course

Becomes:

Student ──1:M── Enrollment ──M:1── Course
多对多关系需要关联实体:
text
Student ──M:N── Course

转换为:

Student ──1:M── Enrollment ──M:1── Course

Phase 4: Normalize (Logical Model)

阶段4:规范化(逻辑模型)

Normal Forms:
FormRuleViolation Example
1NFAtomic values, no repeating groupsPhone1, Phone2, Phone3
2NFNo partial dependenciesNon-key depends on part of composite key
3NFNo transitive dependenciesNon-key depends on non-key
BCNFEvery determinant is a candidate keyOverlap in candidate keys
When to Denormalize:
  • Read performance critical
  • Reporting/analytics use cases
  • Data warehouse design
  • Justified with clear trade-off analysis
范式:
范式规则违规示例
1NF值为原子性,无重复组Phone1、Phone2、Phone3
2NF无部分依赖非键属性依赖于复合键的一部分
3NF无传递依赖非键属性依赖于其他非键属性
BCNF每个决定因素都是候选键候选键存在重叠
何时反规范化:
  • 读取性能至关重要
  • 报表/分析场景
  • 数据仓库设计
  • 经过清晰的权衡分析后证明合理

Phase 5: Create Physical Model

阶段5:创建物理模型

Step 1: Map to Physical Types

步骤1:映射到物理类型

Logical TypePhysical (PostgreSQL)Physical (SQL Server)
String(50)VARCHAR(50)NVARCHAR(50)
IntegerINTEGERINT
Decimal(10,2)NUMERIC(10,2)DECIMAL(10,2)
DateDATEDATE
TimestampTIMESTAMPDATETIME2
BooleanBOOLEANBIT
逻辑类型PostgreSQL 物理类型SQL Server 物理类型
String(50)VARCHAR(50)NVARCHAR(50)
IntegerINTEGERINT
Decimal(10,2)NUMERIC(10,2)DECIMAL(10,2)
DateDATEDATE
TimestampTIMESTAMPDATETIME2
BooleanBOOLEANBIT

Step 2: Define Constraints

步骤2:定义约束

  • Primary key constraints
  • Foreign key constraints
  • Unique constraints
  • Check constraints
  • Default values
  • 主键约束
  • 外键约束
  • 唯一约束
  • 检查约束
  • 默认值

Step 3: Plan Indexes

步骤3:规划索引

  • Primary key (automatic)
  • Foreign keys (for joins)
  • Frequently queried columns
  • Covering indexes for performance
  • 主键(自动创建)
  • 外键(用于连接)
  • 频繁查询的列
  • 用于性能优化的覆盖索引

Output Formats

输出格式

Mermaid ERD

Mermaid ERD

mermaid
erDiagram
    CUSTOMER ||--o{ ORDER : places
    ORDER ||--|{ ORDER_LINE : contains
    PRODUCT ||--o{ ORDER_LINE : includes

    CUSTOMER {
        int customer_id PK
        string name
        string email UK
        date created_at
    }

    ORDER {
        int order_id PK
        int customer_id FK
        date order_date
        decimal total
        string status
    }

    ORDER_LINE {
        int order_id PK,FK
        int product_id PK,FK
        int quantity
        decimal unit_price
    }

    PRODUCT {
        int product_id PK
        string name
        string sku UK
        decimal price
        int stock_qty
    }
mermaid
erDiagram
    CUSTOMER ||--o{ ORDER : places
    ORDER ||--|{ ORDER_LINE : contains
    PRODUCT ||--o{ ORDER_LINE : includes

    CUSTOMER {
        int customer_id PK
        string name
        string email UK
        date created_at
    }

    ORDER {
        int order_id PK
        int customer_id FK
        date order_date
        decimal total
        string status
    }

    ORDER_LINE {
        int order_id PK,FK
        int product_id PK,FK
        int quantity
        decimal unit_price
    }

    PRODUCT {
        int product_id PK
        string name
        string sku UK
        decimal price
        int stock_qty
    }

Data Dictionary

数据字典

markdown
undefined
markdown
undefined

Data Dictionary

数据字典

CUSTOMER

CUSTOMER

ColumnTypeNullKeyDefaultDescription
customer_idINTNoPKAUTOUnique identifier
nameVARCHAR(100)NoCustomer full name
emailVARCHAR(255)NoUKContact email
phoneVARCHAR(20)YesNULLContact phone
created_atTIMESTAMPNoNOW()Record creation
Indexes:
  • pk_customer
    (customer_id) - Primary
  • uk_customer_email
    (email) - Unique
  • ix_customer_name
    (name) - Search
Constraints:
  • Email format validation (CHECK)
  • Name length minimum 2 characters
undefined
类型可为空默认值描述
customer_idINTPKAUTO唯一标识符
nameVARCHAR(100)客户全名
emailVARCHAR(255)UK联系邮箱
phoneVARCHAR(20)NULL联系电话
created_atTIMESTAMPNOW()记录创建时间
索引:
  • pk_customer
    (customer_id) - 主键
  • uk_customer_email
    (email) - 唯一键
  • ix_customer_name
    (name) - 搜索用
约束:
  • 邮箱格式验证(CHECK)
  • 名称长度至少2个字符
undefined

Structured Data (YAML)

结构化数据(YAML)

yaml
data_model:
  name: "E-Commerce"
  version: "1.0"
  date: "2025-01-15"
  level: "logical"  # conceptual, logical, physical
  analyst: "data-modeler"

  entities:
    - name: "Customer"
      type: "strong"
      description: "Person or organization that makes purchases"
      attributes:
        - name: "customer_id"
          type: "integer"
          key: "primary"
          required: true
          generated: true

        - name: "email"
          type: "string"
          length: 255
          key: "unique"
          required: true

        - name: "name"
          type: "string"
          length: 100
          required: true

    - name: "Order"
      type: "strong"
      description: "Purchase transaction"
      attributes:
        - name: "order_id"
          type: "integer"
          key: "primary"
          required: true

        - name: "customer_id"
          type: "integer"
          key: "foreign"
          references: "Customer.customer_id"
          required: true

  relationships:
    - name: "places"
      from: "Customer"
      to: "Order"
      cardinality: "1:M"
      from_participation: "optional"  # 0..1
      to_participation: "mandatory"   # 1..M
      description: "Customer places orders"

  constraints:
    - entity: "Customer"
      type: "check"
      expression: "LENGTH(name) >= 2"
      description: "Name minimum length"

  indexes:
    - entity: "Order"
      name: "ix_order_date"
      columns: ["order_date"]
      purpose: "Date range queries"
yaml
data_model:
  name: "E-Commerce"
  version: "1.0"
  date: "2025-01-15"
  level: "logical"  # conceptual, logical, physical
  analyst: "data-modeler"

  entities:
    - name: "Customer"
      type: "strong"
      description: "Person or organization that makes purchases"
      attributes:
        - name: "customer_id"
          type: "integer"
          key: "primary"
          required: true
          generated: true

        - name: "email"
          type: "string"
          length: 255
          key: "unique"
          required: true

        - name: "name"
          type: "string"
          length: 100
          required: true

    - name: "Order"
      type: "strong"
      description: "Purchase transaction"
      attributes:
        - name: "order_id"
          type: "integer"
          key: "primary"
          required: true

        - name: "customer_id"
          type: "integer"
          key: "foreign"
          references: "Customer.customer_id"
          required: true

  relationships:
    - name: "places"
      from: "Customer"
      to: "Order"
      cardinality: "1:M"
      from_participation: "optional"  # 0..1
      to_participation: "mandatory"   # 1..M
      description: "Customer places orders"

  constraints:
    - entity: "Customer"
      type: "check"
      expression: "LENGTH(name) >= 2"
      description: "Name minimum length"

  indexes:
    - entity: "Order"
      name: "ix_order_date"
      columns: ["order_date"]
      purpose: "Date range queries"

Narrative Summary

叙述性摘要

markdown
undefined
markdown
undefined

Data Model: E-Commerce

数据模型:电商系统

Version: 1.0 Date: [ISO Date] Level: Logical
版本: 1.0 日期: [ISO 日期] 层级: 逻辑层

Entity Summary

实体摘要

EntityDescriptionKey Relationships
CustomerPurchasersPlaces Orders
OrderTransactionsBelongs to Customer, Contains Products
ProductItems for saleIncluded in Orders
Order LineOrder detailsLinks Order to Product
实体描述关键关系
Customer购买者下订单
Order交易记录属于客户,包含商品
Product在售商品被包含在订单中
Order Line订单详情关联订单与商品

Key Relationships

关键关系

  1. Customer → Order (1:M)
    • One customer can place many orders
    • Each order belongs to exactly one customer
  2. Order ↔ Product (M:N via Order Line)
    • An order can contain many products
    • A product can appear in many orders
  1. Customer → Order (1:M)
    • 一个客户可下多个订单
    • 每个订单仅属于一个客户
  2. Order ↔ Product (M:N via Order Line)
    • 一个订单可包含多个商品
    • 一个商品可出现在多个订单中

Data Integrity Rules

数据完整性规则

  1. Orders cannot exist without a customer
  2. Order lines must reference valid order and product
  3. Stock quantity cannot be negative
  4. Email must be unique per customer
  1. 订单不能脱离客户存在
  2. 订单行必须引用有效的订单和商品
  3. 库存数量不能为负数
  4. 每个客户的邮箱必须唯一

Notes

说明

  • Consider partitioning Orders by date for large volumes
  • Product price stored in Order Line for historical accuracy
undefined
  • 对于大数据量,考虑按日期对订单进行分区
  • 订单行中存储商品价格以保证历史准确性
undefined

Common Patterns

常见模式

Inheritance (Subtype/Supertype)

继承(子类型/超类型)

mermaid
erDiagram
    PERSON ||--o| EMPLOYEE : "is a"
    PERSON ||--o| CUSTOMER : "is a"

    PERSON {
        int person_id PK
        string name
        string email
    }

    EMPLOYEE {
        int person_id PK,FK
        date hire_date
        decimal salary
    }

    CUSTOMER {
        int person_id PK,FK
        string company
        decimal credit_limit
    }
mermaid
erDiagram
    PERSON ||--o| EMPLOYEE : "is a"
    PERSON ||--o| CUSTOMER : "is a"

    PERSON {
        int person_id PK
        string name
        string email
    }

    EMPLOYEE {
        int person_id PK,FK
        date hire_date
        decimal salary
    }

    CUSTOMER {
        int person_id PK,FK
        string company
        decimal credit_limit
    }

Self-Referencing

自引用

mermaid
erDiagram
    EMPLOYEE ||--o{ EMPLOYEE : "manages"

    EMPLOYEE {
        int employee_id PK
        string name
        int manager_id FK
    }
mermaid
erDiagram
    EMPLOYEE ||--o{ EMPLOYEE : "manages"

    EMPLOYEE {
        int employee_id PK
        string name
        int manager_id FK
    }

Audit Trail

审计追踪

mermaid
erDiagram
    ENTITY ||--o{ ENTITY_HISTORY : "has history"

    ENTITY {
        int id PK
        string data
        timestamp updated_at
    }

    ENTITY_HISTORY {
        int history_id PK
        int entity_id FK
        string data
        timestamp valid_from
        timestamp valid_to
        string changed_by
    }
mermaid
erDiagram
    ENTITY ||--o{ ENTITY_HISTORY : "has history"

    ENTITY {
        int id PK
        string data
        timestamp updated_at
    }

    ENTITY_HISTORY {
        int history_id PK
        int entity_id FK
        string data
        timestamp valid_from
        timestamp valid_to
        string changed_by
    }

Integration

集成

Upstream

上游

  • Requirements - Data requirements source
  • domain-storytelling - Domain concepts
  • process-modeling - Data in processes
  • 需求 - 数据需求来源
  • domain-storytelling - 领域概念
  • process-modeling - 流程中的数据

Downstream

下游

  • Database design - Physical implementation
  • API design - Data contracts
  • Integration - Data exchange
  • Database design - 物理实现
  • API design - 数据契约
  • Integration - 数据交换

Related Skills

相关技能

  • process-modeling
    - Process context for data
  • journey-mapping
    - Customer data touchpoints
  • decision-analysis
    - Data-driven decisions
  • capability-mapping
    - Data supporting capabilities
  • process-modeling
    - 数据的流程上下文
  • journey-mapping
    - 客户数据接触点
  • decision-analysis
    - 数据驱动的决策
  • capability-mapping
    - 数据支持的能力

Version History

版本历史

  • v1.0.0 (2025-12-26): Initial release
  • v1.0.0 (2025-12-26): 初始版本