data-modeling
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseData Modeling
数据建模
When to Use This Skill
何时使用此技能
Use this skill when:
- Data Modeling tasks - Working on data modeling with entity-relationship diagrams (erds), data dictionaries, and conceptual/logical/physical models. documents data structures, relationships, and attributes
- Planning or design - Need guidance on Data Modeling approaches
- Best practices - Want to follow established patterns and standards
在以下场景中使用此技能:
- 数据建模任务 - 开展使用实体关系图(ERDs)、数据字典以及概念/逻辑/物理模型的数据建模工作,记录数据结构、关系和属性
- 规划或设计 - 需要数据建模方法的指导
- 最佳实践 - 希望遵循既定的模式和标准
Overview
概述
Create and document data structures using Entity-Relationship Diagrams (ERDs), data dictionaries, and structured data models. Supports conceptual, logical, and physical modeling levels for database design and data architecture.
使用实体关系图(ERDs)、数据字典和结构化数据模型创建并记录数据结构。支持数据库设计和数据架构的概念、逻辑和物理建模层级。
What is Data Modeling?
什么是数据建模?
Data modeling creates visual and structured representations of data elements and their relationships. It documents:
- Entities: Things about which data is stored
- Attributes: Properties of entities
- Relationships: How entities connect
- Constraints: Rules governing data
数据建模 创建数据元素及其关系的可视化和结构化表示。它记录:
- 实体:需要存储数据的事物
- 属性:实体的特性
- 关系:实体之间的关联方式
- 约束:管理数据的规则
Modeling Levels
建模层级
| Level | Purpose | Audience | Detail |
|---|---|---|---|
| Conceptual | Business concepts | Business users | Entities, high-level relationships |
| Logical | Data structure | Analysts, designers | Entities, attributes, all relationships |
| Physical | Implementation | Developers, DBAs | Tables, columns, types, indexes |
| 层级 | 用途 | 受众 | 细节程度 |
|---|---|---|---|
| 概念层 | 业务概念 | 业务用户 | 实体、高层级关系 |
| 逻辑层 | 数据结构 | 分析师、设计师 | 实体、属性、所有关系 |
| 物理层 | 实现落地 | 开发人员、数据库管理员(DBAs) | 表、列、类型、索引 |
Conceptual Model
概念模型
High-level view of business concepts:
- Major entities only
- Key relationships
- No attributes (or minimal)
- No technical details
业务概念的高层级视图:
- 仅包含主要实体
- 关键关系
- 无属性(或极少)
- 无技术细节
Logical Model
逻辑模型
Technology-independent data structure:
- All entities and attributes
- Primary and foreign keys
- All relationships with cardinality
- Normalization applied
- No physical implementation details
与技术无关的数据结构:
- 所有实体和属性
- 主键和外键
- 带基数的所有关系
- 已应用规范化
- 无物理实现细节
Physical Model
物理模型
Database-specific implementation:
- Table names (physical naming)
- Column names and data types
- Indexes and constraints
- Views and stored procedures
- Database-specific features
特定数据库的实现方案:
- 表名(物理命名)
- 列名和数据类型
- 索引和约束
- 视图和存储过程
- 数据库特定特性
ERD Notation
ERD 符号
Entity (Rectangle)
实体(矩形)
An entity represents a thing about which data is stored.
text
┌─────────────────┐
│ CUSTOMER │
├─────────────────┤
│ customer_id PK │
│ name │
│ email │
│ created_at │
└─────────────────┘Entity Types:
| Type | Description | Example |
|---|---|---|
| Strong | Independent existence | Customer, Product |
| Weak | Depends on another entity | Order Line (depends on Order) |
| Associative | Resolves M:N relationships | Enrollment (Student-Course) |
实体代表需要存储数据的事物。
text
┌─────────────────┐
│ CUSTOMER │
├─────────────────┤
│ customer_id PK │
│ name │
│ email │
│ created_at │
└─────────────────┘实体类型:
| 类型 | 描述 | 示例 |
|---|---|---|
| 强实体 | 可独立存在 | Customer、Product |
| 弱实体 | 依赖其他实体 | Order Line(依赖Order) |
| 关联实体 | 解决多对多关系 | Enrollment(Student-Course) |
Attributes
属性
| Type | Symbol | Description |
|---|---|---|
| Primary Key (PK) | Underlined/PK | Unique identifier |
| Foreign Key (FK) | FK | Reference to another entity |
| Required | * or NOT NULL | Must have value |
| Optional | ○ or NULL | May be empty |
| Derived | / | Calculated from other attributes |
| Composite | {attrs} | Made of sub-attributes |
| Multi-valued | [attr] | Can have multiple values |
| 类型 | 符号 | 描述 |
|---|---|---|
| 主键(PK) | 下划线/PK | 唯一标识符 |
| 外键(FK) | FK | 引用其他实体 |
| 必填 | * 或 NOT NULL | 必须有值 |
| 可选 | ○ 或 NULL | 可为空 |
| 派生 | / | 由其他属性计算得出 |
| 复合 | {attrs} | 由子属性组成 |
| 多值 | [attr] | 可拥有多个值 |
Relationships (Lines)
关系(连线)
Notation Styles:
| Style | Used In |
|---|---|
| Chen | Academic, conceptual |
| Crow's Foot | Industry standard |
| UML | Software design |
| IDEF1X | Government, structured |
Crow's Foot Notation:
| Symbol | Meaning |
|---|---|
| One (mandatory) |
| Zero or one (optional) |
| Many |
| Zero or many |
符号风格:
| 风格 | 应用场景 |
|---|---|
| Chen | 学术、概念建模 |
| 乌鸦脚 | 行业标准 |
| UML | 软件设计 |
| IDEF1X | 政府、结构化建模 |
乌鸦脚符号:
| 符号 | 含义 |
|---|---|
| 一个(强制) |
| 零或一个(可选) |
| 多个 |
| 零或多个 |
Cardinality
基数
| Notation | Meaning | Example |
|---|---|---|
| 1:1 | One to one | Employee → Workstation |
| 1:M | One to many | Customer → Orders |
| M:N | Many to many | Students ↔ Courses |
Reading Cardinality:
"One [Entity A] has [min]..[max] [Entity B]"
Example: "One Customer has 0..many Orders"
| 符号 | 含义 | 示例 |
|---|---|---|
| 1:1 | 一对一 | Employee → Workstation |
| 1:M | 一对多 | Customer → Orders |
| M:N | 多对多 | Students ↔ Courses |
基数读法:
“一个[实体A]拥有[最小]..[最大]个[实体B]”
示例:“一个客户拥有0..多个订单”
Workflow
工作流程
Phase 1: Identify Entities
阶段1:识别实体
Step 1: Extract Nouns from Requirements
步骤1:从需求中提取名词
From business requirements, identify:
- Things the business tracks
- Subjects of business rules
- Sources and targets of data
从业务需求中识别:
- 业务需要跟踪的事物
- 业务规则的主体
- 数据的来源和目标
Step 2: Filter Candidates
步骤2:筛选候选实体
| Keep | Exclude |
|---|---|
| Independent concepts | Attributes (properties of entities) |
| Things with multiple instances | Synonyms (same concept, different name) |
| Things requiring data storage | Actions (verbs, not nouns) |
| 保留 | 排除 |
|---|---|
| 独立概念 | 属性(实体的特性) |
| 有多个实例的事物 | 同义词(同一概念的不同名称) |
| 需要存储数据的事物 | 动作(动词,非名词) |
Step 3: Document Entities
步骤3:记录实体
markdown
undefinedmarkdown
undefinedEntities
实体
| Entity | Description | Example |
|---|---|---|
| Customer | Person or organization that purchases | John Smith, Acme Corp |
| Order | Purchase transaction | Order #12345 |
| Product | Item available for sale | Widget, Gadget |
undefined| 实体 | 描述 | 示例 |
|---|---|---|
| Customer | 进行购买的个人或组织 | John Smith、Acme Corp |
| Order | 购买交易 | 订单#12345 |
| Product | 可售卖的商品 | Widget、Gadget |
undefinedPhase 2: Define Attributes
阶段2:定义属性
Step 1: List Attributes for Each Entity
步骤1:为每个实体列出属性
For each entity, identify:
- What do we need to know about this entity?
- What uniquely identifies it?
- What data does the business reference?
针对每个实体,确定:
- 我们需要了解该实体的哪些信息?
- 什么能唯一标识它?
- 业务会引用哪些数据?
Step 2: Classify Attributes
步骤2:分类属性
| Attribute | Type | Required | Notes |
|---|---|---|---|
| customer_id | PK | Yes | Surrogate key |
| Unique | Yes | Business key | |
| name | String | Yes | |
| phone | String | No | Optional |
| 属性 | 类型 | 必填 | 说明 |
|---|---|---|---|
| customer_id | PK | 是 | 代理键 |
| 唯一 | 是 | 业务键 | |
| name | 字符串 | 是 | |
| phone | 字符串 | 否 | 可选 |
Step 3: Identify Keys
步骤3:识别键
- Primary Key (PK): Unique identifier
- Natural Key: Business-meaningful identifier
- Surrogate Key: System-generated identifier
- Composite Key: Multiple attributes combined
- 主键(PK):唯一标识符
- 自然键:具有业务含义的标识符
- 代理键:系统生成的标识符
- 复合键:多个属性组合而成
Phase 3: Define Relationships
阶段3:定义关系
Step 1: Identify Connections
步骤1:识别关联
For each pair of entities:
- Is there a business connection?
- What is the nature of the relationship?
- What is the cardinality?
针对每对实体:
- 它们之间是否存在业务关联?
- 关系的性质是什么?
- 基数是多少?
Step 2: Document Relationships
步骤2:记录关系
markdown
undefinedmarkdown
undefinedRelationships
关系
| Relationship | From | To | Cardinality | Description |
|---|---|---|---|---|
| places | Customer | Order | 1:M | Customer places orders |
| contains | Order | Product | M:N | Order contains products |
undefined| 关系 | 来源 | 目标 | 基数 | 描述 |
|---|---|---|---|---|
| places | Customer | Order | 1:M | 客户下订单 |
| contains | Order | Product | M:N | 订单包含商品 |
undefinedStep 3: Resolve Many-to-Many
步骤3:解决多对多关系
M:N relationships require associative entities:
text
Student ──M:N── Course
Becomes:
Student ──1:M── Enrollment ──M:1── Course多对多关系需要关联实体:
text
Student ──M:N── Course
转换为:
Student ──1:M── Enrollment ──M:1── CoursePhase 4: Normalize (Logical Model)
阶段4:规范化(逻辑模型)
Normal Forms:
| Form | Rule | Violation Example |
|---|---|---|
| 1NF | Atomic values, no repeating groups | Phone1, Phone2, Phone3 |
| 2NF | No partial dependencies | Non-key depends on part of composite key |
| 3NF | No transitive dependencies | Non-key depends on non-key |
| BCNF | Every determinant is a candidate key | Overlap in candidate keys |
When to Denormalize:
- Read performance critical
- Reporting/analytics use cases
- Data warehouse design
- Justified with clear trade-off analysis
范式:
| 范式 | 规则 | 违规示例 |
|---|---|---|
| 1NF | 值为原子性,无重复组 | Phone1、Phone2、Phone3 |
| 2NF | 无部分依赖 | 非键属性依赖于复合键的一部分 |
| 3NF | 无传递依赖 | 非键属性依赖于其他非键属性 |
| BCNF | 每个决定因素都是候选键 | 候选键存在重叠 |
何时反规范化:
- 读取性能至关重要
- 报表/分析场景
- 数据仓库设计
- 经过清晰的权衡分析后证明合理
Phase 5: Create Physical Model
阶段5:创建物理模型
Step 1: Map to Physical Types
步骤1:映射到物理类型
| Logical Type | Physical (PostgreSQL) | Physical (SQL Server) |
|---|---|---|
| String(50) | VARCHAR(50) | NVARCHAR(50) |
| Integer | INTEGER | INT |
| Decimal(10,2) | NUMERIC(10,2) | DECIMAL(10,2) |
| Date | DATE | DATE |
| Timestamp | TIMESTAMP | DATETIME2 |
| Boolean | BOOLEAN | BIT |
| 逻辑类型 | PostgreSQL 物理类型 | SQL Server 物理类型 |
|---|---|---|
| String(50) | VARCHAR(50) | NVARCHAR(50) |
| Integer | INTEGER | INT |
| Decimal(10,2) | NUMERIC(10,2) | DECIMAL(10,2) |
| Date | DATE | DATE |
| Timestamp | TIMESTAMP | DATETIME2 |
| Boolean | BOOLEAN | BIT |
Step 2: Define Constraints
步骤2:定义约束
- Primary key constraints
- Foreign key constraints
- Unique constraints
- Check constraints
- Default values
- 主键约束
- 外键约束
- 唯一约束
- 检查约束
- 默认值
Step 3: Plan Indexes
步骤3:规划索引
- Primary key (automatic)
- Foreign keys (for joins)
- Frequently queried columns
- Covering indexes for performance
- 主键(自动创建)
- 外键(用于连接)
- 频繁查询的列
- 用于性能优化的覆盖索引
Output Formats
输出格式
Mermaid ERD
Mermaid ERD
mermaid
erDiagram
CUSTOMER ||--o{ ORDER : places
ORDER ||--|{ ORDER_LINE : contains
PRODUCT ||--o{ ORDER_LINE : includes
CUSTOMER {
int customer_id PK
string name
string email UK
date created_at
}
ORDER {
int order_id PK
int customer_id FK
date order_date
decimal total
string status
}
ORDER_LINE {
int order_id PK,FK
int product_id PK,FK
int quantity
decimal unit_price
}
PRODUCT {
int product_id PK
string name
string sku UK
decimal price
int stock_qty
}mermaid
erDiagram
CUSTOMER ||--o{ ORDER : places
ORDER ||--|{ ORDER_LINE : contains
PRODUCT ||--o{ ORDER_LINE : includes
CUSTOMER {
int customer_id PK
string name
string email UK
date created_at
}
ORDER {
int order_id PK
int customer_id FK
date order_date
decimal total
string status
}
ORDER_LINE {
int order_id PK,FK
int product_id PK,FK
int quantity
decimal unit_price
}
PRODUCT {
int product_id PK
string name
string sku UK
decimal price
int stock_qty
}Data Dictionary
数据字典
markdown
undefinedmarkdown
undefinedData Dictionary
数据字典
CUSTOMER
CUSTOMER
| Column | Type | Null | Key | Default | Description |
|---|---|---|---|---|---|
| customer_id | INT | No | PK | AUTO | Unique identifier |
| name | VARCHAR(100) | No | Customer full name | ||
| VARCHAR(255) | No | UK | Contact email | ||
| phone | VARCHAR(20) | Yes | NULL | Contact phone | |
| created_at | TIMESTAMP | No | NOW() | Record creation |
Indexes:
- (customer_id) - Primary
pk_customer - (email) - Unique
uk_customer_email - (name) - Search
ix_customer_name
Constraints:
- Email format validation (CHECK)
- Name length minimum 2 characters
undefined| 列 | 类型 | 可为空 | 键 | 默认值 | 描述 |
|---|---|---|---|---|---|
| customer_id | INT | 否 | PK | AUTO | 唯一标识符 |
| name | VARCHAR(100) | 否 | 客户全名 | ||
| VARCHAR(255) | 否 | UK | 联系邮箱 | ||
| phone | VARCHAR(20) | 是 | NULL | 联系电话 | |
| created_at | TIMESTAMP | 否 | NOW() | 记录创建时间 |
索引:
- (customer_id) - 主键
pk_customer - (email) - 唯一键
uk_customer_email - (name) - 搜索用
ix_customer_name
约束:
- 邮箱格式验证(CHECK)
- 名称长度至少2个字符
undefinedStructured Data (YAML)
结构化数据(YAML)
yaml
data_model:
name: "E-Commerce"
version: "1.0"
date: "2025-01-15"
level: "logical" # conceptual, logical, physical
analyst: "data-modeler"
entities:
- name: "Customer"
type: "strong"
description: "Person or organization that makes purchases"
attributes:
- name: "customer_id"
type: "integer"
key: "primary"
required: true
generated: true
- name: "email"
type: "string"
length: 255
key: "unique"
required: true
- name: "name"
type: "string"
length: 100
required: true
- name: "Order"
type: "strong"
description: "Purchase transaction"
attributes:
- name: "order_id"
type: "integer"
key: "primary"
required: true
- name: "customer_id"
type: "integer"
key: "foreign"
references: "Customer.customer_id"
required: true
relationships:
- name: "places"
from: "Customer"
to: "Order"
cardinality: "1:M"
from_participation: "optional" # 0..1
to_participation: "mandatory" # 1..M
description: "Customer places orders"
constraints:
- entity: "Customer"
type: "check"
expression: "LENGTH(name) >= 2"
description: "Name minimum length"
indexes:
- entity: "Order"
name: "ix_order_date"
columns: ["order_date"]
purpose: "Date range queries"yaml
data_model:
name: "E-Commerce"
version: "1.0"
date: "2025-01-15"
level: "logical" # conceptual, logical, physical
analyst: "data-modeler"
entities:
- name: "Customer"
type: "strong"
description: "Person or organization that makes purchases"
attributes:
- name: "customer_id"
type: "integer"
key: "primary"
required: true
generated: true
- name: "email"
type: "string"
length: 255
key: "unique"
required: true
- name: "name"
type: "string"
length: 100
required: true
- name: "Order"
type: "strong"
description: "Purchase transaction"
attributes:
- name: "order_id"
type: "integer"
key: "primary"
required: true
- name: "customer_id"
type: "integer"
key: "foreign"
references: "Customer.customer_id"
required: true
relationships:
- name: "places"
from: "Customer"
to: "Order"
cardinality: "1:M"
from_participation: "optional" # 0..1
to_participation: "mandatory" # 1..M
description: "Customer places orders"
constraints:
- entity: "Customer"
type: "check"
expression: "LENGTH(name) >= 2"
description: "Name minimum length"
indexes:
- entity: "Order"
name: "ix_order_date"
columns: ["order_date"]
purpose: "Date range queries"Narrative Summary
叙述性摘要
markdown
undefinedmarkdown
undefinedData Model: E-Commerce
数据模型:电商系统
Version: 1.0
Date: [ISO Date]
Level: Logical
版本: 1.0
日期: [ISO 日期]
层级: 逻辑层
Entity Summary
实体摘要
| Entity | Description | Key Relationships |
|---|---|---|
| Customer | Purchasers | Places Orders |
| Order | Transactions | Belongs to Customer, Contains Products |
| Product | Items for sale | Included in Orders |
| Order Line | Order details | Links Order to Product |
| 实体 | 描述 | 关键关系 |
|---|---|---|
| Customer | 购买者 | 下订单 |
| Order | 交易记录 | 属于客户,包含商品 |
| Product | 在售商品 | 被包含在订单中 |
| Order Line | 订单详情 | 关联订单与商品 |
Key Relationships
关键关系
-
Customer → Order (1:M)
- One customer can place many orders
- Each order belongs to exactly one customer
-
Order ↔ Product (M:N via Order Line)
- An order can contain many products
- A product can appear in many orders
-
Customer → Order (1:M)
- 一个客户可下多个订单
- 每个订单仅属于一个客户
-
Order ↔ Product (M:N via Order Line)
- 一个订单可包含多个商品
- 一个商品可出现在多个订单中
Data Integrity Rules
数据完整性规则
- Orders cannot exist without a customer
- Order lines must reference valid order and product
- Stock quantity cannot be negative
- Email must be unique per customer
- 订单不能脱离客户存在
- 订单行必须引用有效的订单和商品
- 库存数量不能为负数
- 每个客户的邮箱必须唯一
Notes
说明
- Consider partitioning Orders by date for large volumes
- Product price stored in Order Line for historical accuracy
undefined- 对于大数据量,考虑按日期对订单进行分区
- 订单行中存储商品价格以保证历史准确性
undefinedCommon Patterns
常见模式
Inheritance (Subtype/Supertype)
继承(子类型/超类型)
mermaid
erDiagram
PERSON ||--o| EMPLOYEE : "is a"
PERSON ||--o| CUSTOMER : "is a"
PERSON {
int person_id PK
string name
string email
}
EMPLOYEE {
int person_id PK,FK
date hire_date
decimal salary
}
CUSTOMER {
int person_id PK,FK
string company
decimal credit_limit
}mermaid
erDiagram
PERSON ||--o| EMPLOYEE : "is a"
PERSON ||--o| CUSTOMER : "is a"
PERSON {
int person_id PK
string name
string email
}
EMPLOYEE {
int person_id PK,FK
date hire_date
decimal salary
}
CUSTOMER {
int person_id PK,FK
string company
decimal credit_limit
}Self-Referencing
自引用
mermaid
erDiagram
EMPLOYEE ||--o{ EMPLOYEE : "manages"
EMPLOYEE {
int employee_id PK
string name
int manager_id FK
}mermaid
erDiagram
EMPLOYEE ||--o{ EMPLOYEE : "manages"
EMPLOYEE {
int employee_id PK
string name
int manager_id FK
}Audit Trail
审计追踪
mermaid
erDiagram
ENTITY ||--o{ ENTITY_HISTORY : "has history"
ENTITY {
int id PK
string data
timestamp updated_at
}
ENTITY_HISTORY {
int history_id PK
int entity_id FK
string data
timestamp valid_from
timestamp valid_to
string changed_by
}mermaid
erDiagram
ENTITY ||--o{ ENTITY_HISTORY : "has history"
ENTITY {
int id PK
string data
timestamp updated_at
}
ENTITY_HISTORY {
int history_id PK
int entity_id FK
string data
timestamp valid_from
timestamp valid_to
string changed_by
}Integration
集成
Upstream
上游
- Requirements - Data requirements source
- domain-storytelling - Domain concepts
- process-modeling - Data in processes
- 需求 - 数据需求来源
- domain-storytelling - 领域概念
- process-modeling - 流程中的数据
Downstream
下游
- Database design - Physical implementation
- API design - Data contracts
- Integration - Data exchange
- Database design - 物理实现
- API design - 数据契约
- Integration - 数据交换
Related Skills
相关技能
- - Process context for data
process-modeling - - Customer data touchpoints
journey-mapping - - Data-driven decisions
decision-analysis - - Data supporting capabilities
capability-mapping
- - 数据的流程上下文
process-modeling - - 客户数据接触点
journey-mapping - - 数据驱动的决策
decision-analysis - - 数据支持的能力
capability-mapping
Version History
版本历史
- v1.0.0 (2025-12-26): Initial release
- v1.0.0 (2025-12-26): 初始版本