postgresql-table-design

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PostgreSQL Table Design

PostgreSQL 表设计

Core Rules

核心规则

  • Define a PRIMARY KEY for reference tables (users, orders, etc.). Not always needed for time-series/event/log data. When used, prefer
    BIGINT GENERATED ALWAYS AS IDENTITY
    ; use
    UUID
    only when global uniqueness/opacity is needed.
  • Normalize first (to 3NF) to eliminate data redundancy and update anomalies; denormalize only for measured, high-ROI reads where join performance is proven problematic. Premature denormalization creates maintenance burden.
  • Add NOT NULL everywhere it’s semantically required; use DEFAULTs for common values.
  • Create indexes for access paths you actually query: PK/unique (auto), FK columns (manual!), frequent filters/sorts, and join keys.
  • Prefer TIMESTAMPTZ for event time; NUMERIC for money; TEXT for strings; BIGINT for integer values, DOUBLE PRECISION for floats (or
    NUMERIC
    for exact decimal arithmetic).
  • 为参考表(如用户表、订单表等)定义PRIMARY KEY。时间序列/事件/日志类数据并非总是需要主键。使用主键时,优先选择
    BIGINT GENERATED ALWAYS AS IDENTITY
    ;仅当需要全局唯一性或不透明性时才使用
    UUID
  • **先规范化(至3NF)**以消除数据冗余和更新异常;仅在连接性能被证实存在问题、且读取操作能获得高投资回报率的情况下,才进行反规范化。过早反规范化会增加维护负担。
  • 在所有语义上必填的字段添加NOT NULL约束;为常用值设置DEFAULT默认值。
  • 为实际查询的访问路径创建索引:主键/唯一索引(自动创建)、外键列(需手动创建!)、频繁过滤/排序的字段,以及连接键。
  • 事件时间优先使用TIMESTAMPTZ;金额使用NUMERIC;字符串使用TEXT;整数值使用BIGINT;浮点数使用DOUBLE PRECISION(如需精确十进制运算则使用
    NUMERIC
    )。

PostgreSQL “Gotchas”

PostgreSQL 常见陷阱

  • Identifiers: unquoted → lowercased. Avoid quoted/mixed-case names. Convention: use
    snake_case
    for table/column names.
  • Unique + NULLs: UNIQUE allows multiple NULLs. Use
    UNIQUE (...) NULLS NOT DISTINCT
    (PG15+) to restrict to one NULL.
  • FK indexes: PostgreSQL does not auto-index FK columns. Add them.
  • No silent coercions: length/precision overflows error out (no truncation). Example: inserting 999 into
    NUMERIC(2,0)
    fails with error, unlike some databases that silently truncate or round.
  • Sequences/identity have gaps (normal; don't "fix"). Rollbacks, crashes, and concurrent transactions create gaps in ID sequences (1, 2, 5, 6...). This is expected behavior—don't try to make IDs consecutive.
  • Heap storage: no clustered PK by default (unlike SQL Server/MySQL InnoDB);
    CLUSTER
    is one-off reorganization, not maintained on subsequent inserts. Row order on disk is insertion order unless explicitly clustered.
  • MVCC: updates/deletes leave dead tuples; vacuum handles them—design to avoid hot wide-row churn.
  • 标识符:未加引号的标识符会被转换为小写。避免使用带引号/大小写混合的名称。命名规范:表/列名使用
    snake_case
  • 唯一约束与NULL值:UNIQUE约束允许多个NULL值。使用
    UNIQUE (...) NULLS NOT DISTINCT
    (PostgreSQL 15+版本支持)来限制仅允许一个NULL值。
  • 外键索引:PostgreSQL不会自动为外键列创建索引,需要手动添加。
  • 无隐式类型转换:长度/精度溢出会直接报错(不会自动截断)。例如:将999插入
    NUMERIC(2,0)
    字段会报错,而部分数据库会自动截断或四舍五入。
  • 序列/自增字段存在间隙(属于正常现象;无需“修复”)。回滚、崩溃和并发事务会导致ID序列出现间隙(如1,2,5,6...)。这是预期行为——不要尝试让ID连续。
  • 堆存储:默认情况下不会对主键进行聚簇(与SQL Server/MySQL InnoDB不同);
    CLUSTER
    命令是一次性的重组操作,后续插入不会维持聚簇顺序。磁盘上的行顺序为插入顺序,除非显式执行聚簇操作。
  • MVCC(多版本并发控制):更新/删除操作会留下死元组;由vacuum进程清理——设计时应避免频繁更新宽行数据导致的膨胀。

Data Types

数据类型

  • IDs:
    BIGINT GENERATED ALWAYS AS IDENTITY
    preferred (
    GENERATED BY DEFAULT
    also fine);
    UUID
    when merging/federating/used in a distributed system or for opaque IDs. Generate with
    uuidv7()
    (preferred if using PG18+) or
    gen_random_uuid()
    (if using an older PG version).
  • Integers: prefer
    BIGINT
    unless storage space is critical;
    INTEGER
    for smaller ranges; avoid
    SMALLINT
    unless constrained.
  • Floats: prefer
    DOUBLE PRECISION
    over
    REAL
    unless storage space is critical. Use
    NUMERIC
    for exact decimal arithmetic.
  • Strings: prefer
    TEXT
    ; if length limits needed, use
    CHECK (LENGTH(col) <= n)
    instead of
    VARCHAR(n)
    ; avoid
    CHAR(n)
    . Use
    BYTEA
    for binary data. Large strings/binary (>2KB default threshold) automatically stored in TOAST with compression. TOAST storage:
    PLAIN
    (no TOAST),
    EXTENDED
    (compress + out-of-line),
    EXTERNAL
    (out-of-line, no compress),
    MAIN
    (compress, keep in-line if possible). Default
    EXTENDED
    usually optimal. Control with
    ALTER TABLE tbl ALTER COLUMN col SET STORAGE strategy
    and
    ALTER TABLE tbl SET (toast_tuple_target = 4096)
    for threshold. Case-insensitive: for locale/accent handling use non-deterministic collations; for plain ASCII use expression indexes on
    LOWER(col)
    (preferred unless column needs case-insensitive PK/FK/UNIQUE) or
    CITEXT
    .
  • Money:
    NUMERIC(p,s)
    (never float).
  • Time:
    TIMESTAMPTZ
    for timestamps;
    DATE
    for date-only;
    INTERVAL
    for durations. Avoid
    TIMESTAMP
    (without timezone). Use
    now()
    for transaction start time,
    clock_timestamp()
    for current wall-clock time.
  • Booleans:
    BOOLEAN
    with
    NOT NULL
    constraint unless tri-state values are required.
  • Enums:
    CREATE TYPE ... AS ENUM
    for small, stable sets (e.g. US states, days of week). For business-logic-driven and evolving values (e.g. order statuses) → use TEXT (or INT) + CHECK or lookup table.
  • Arrays:
    TEXT[]
    ,
    INTEGER[]
    , etc. Use for ordered lists where you query elements. Index with GIN for containment (
    @>
    ,
    <@
    ) and overlap (
    &&
    ) queries. Access:
    arr[1]
    (1-indexed),
    arr[1:3]
    (slicing). Good for tags, categories; avoid for relations—use junction tables instead. Literal syntax:
    '{val1,val2}'
    or
    ARRAY[val1,val2]
    .
  • Range types:
    daterange
    ,
    numrange
    ,
    tstzrange
    for intervals. Support overlap (
    &&
    ), containment (
    @>
    ), operators. Index with GiST. Good for scheduling, versioning, numeric ranges. Pick a bounds scheme and use it consistently; prefer
    [)
    (inclusive/exclusive) by default.
  • Network types:
    INET
    for IP addresses,
    CIDR
    for network ranges,
    MACADDR
    for MAC addresses. Support network operators (
    <<
    ,
    >>
    ,
    &&
    ).
  • Geometric types:
    POINT
    ,
    LINE
    ,
    POLYGON
    ,
    CIRCLE
    for 2D spatial data. Index with GiST. Consider PostGIS for advanced spatial features.
  • Text search:
    TSVECTOR
    for full-text search documents,
    TSQUERY
    for search queries. Index
    tsvector
    with GIN. Always specify language:
    to_tsvector('english', col)
    and
    to_tsquery('english', 'query')
    . Never use single-argument versions. This applies to both index expressions and queries.
  • Domain types:
    CREATE DOMAIN email AS TEXT CHECK (VALUE ~ '^[^@]+@[^@]+$')
    for reusable custom types with validation. Enforces constraints across tables.
  • Composite types:
    CREATE TYPE address AS (street TEXT, city TEXT, zip TEXT)
    for structured data within columns. Access with
    (col).field
    syntax.
  • JSONB: preferred over JSON; index with GIN. Use only for optional/semi-structured attrs. ONLY use JSON if the original ordering of the contents MUST be preserved.
  • Vector types:
    vector
    type by
    pgvector
    for vector similarity search for embeddings.
  • ID字段:优先选择
    BIGINT GENERATED ALWAYS AS IDENTITY
    GENERATED BY DEFAULT
    也可);在合并/联邦/分布式系统中使用或需要不透明ID时使用
    UUID
    。使用
    uuidv7()
    (PostgreSQL 18+版本优先)或
    gen_random_uuid()
    (旧版本PostgreSQL)生成UUID。
  • 整数类型:除非存储空间受限,否则优先使用
    BIGINT
    ;小范围数值使用
    INTEGER
    ;避免使用
    SMALLINT
    ,除非有明确限制。
  • 浮点类型:除非存储空间受限,否则优先使用
    DOUBLE PRECISION
    而非
    REAL
    。如需精确十进制运算,使用
    NUMERIC
  • 字符串类型:优先使用
    TEXT
    ;如需限制长度,使用
    CHECK (LENGTH(col) <= n)
    而非
    VARCHAR(n)
    ;避免使用
    CHAR(n)
    。二进制数据使用
    BYTEA
    。大字符串/二进制数据(超过默认阈值2KB)会自动存储在TOAST中并压缩。TOAST存储策略:
    PLAIN
    (不使用TOAST)、
    EXTENDED
    (压缩+离线存储)、
    EXTERNAL
    (离线存储、不压缩)、
    MAIN
    (压缩、尽可能在线存储)。默认的
    EXTENDED
    通常是最优选择。可通过
    ALTER TABLE tbl ALTER COLUMN col SET STORAGE strategy
    ALTER TABLE tbl SET (toast_tuple_target = 4096)
    调整阈值。大小写不敏感处理:针对区域/重音处理使用非确定性排序规则;针对纯ASCII,优先使用基于
    LOWER(col)
    的表达式索引(除非列需要大小写不敏感的主键/外键/唯一约束),或使用
    CITEXT
    类型。
  • 金额类型:使用
    NUMERIC(p,s)
    (绝不要用浮点类型)。
  • 时间类型:时间戳使用
    TIMESTAMPTZ
    ;仅日期使用
    DATE
    ;时长使用
    INTERVAL
    。避免使用
    TIMESTAMP
    (无时区)。事务开始时间使用
    now()
    ,当前系统时间使用
    clock_timestamp()
  • 布尔类型:使用
    BOOLEAN
    并添加
    NOT NULL
    约束,除非需要三态值。
  • 枚举类型:对于小型、稳定的集合(如美国各州、星期几),使用
    CREATE TYPE ... AS ENUM
    创建枚举。对于业务逻辑驱动且不断变化的值(如订单状态),使用TEXT(或INT)+ CHECK约束或查找表。
  • 数组类型
    TEXT[]
    INTEGER[]
    等。用于需要查询元素的有序列表。使用GIN索引支持包含(
    @>
    <@
    )和重叠(
    &&
    )查询。访问方式:
    arr[1]
    (从1开始索引)、
    arr[1:3]
    (切片)。适用于标签、分类;避免用于关系型数据——改用关联表。字面量语法:
    '{val1,val2}'
    ARRAY[val1,val2]
  • 范围类型
    daterange
    numrange
    tstzrange
    用于表示区间。支持重叠(
    &&
    )、包含(
    @>
    )等操作符。使用GiST索引。适用于调度、版本控制、数值范围。选择一种边界方案并保持一致;默认优先使用
    [)
    (左闭右开)。
  • 网络类型
    INET
    用于IP地址,
    CIDR
    用于网络范围,
    MACADDR
    用于MAC地址。支持网络操作符(
    <<
    >>
    &&
    )。
  • 几何类型
    POINT
    LINE
    POLYGON
    CIRCLE
    用于二维空间数据。使用GiST索引。如需高级空间功能,考虑使用PostGIS扩展。
  • 全文搜索
    TSVECTOR
    用于全文搜索文档,
    TSQUERY
    用于搜索查询。使用GIN索引
    tsvector
    字段。必须指定语言:
    to_tsvector('english', col)
    to_tsquery('english', 'query')
    。绝不要使用单参数版本。此规则适用于索引表达式和查询。
  • 域类型
    CREATE DOMAIN email AS TEXT CHECK (VALUE ~ '^[^@]+@[^@]+$')
    用于创建可复用的带验证规则的自定义类型,可跨表强制执行约束。
  • 复合类型
    CREATE TYPE address AS (street TEXT, city TEXT, zip TEXT)
    用于在列中存储结构化数据。通过
    (col).field
    语法访问字段。
  • JSONB:优先于JSON使用;使用GIN索引。仅用于可选/半结构化属性。仅当必须保留内容原始顺序时才使用JSON。
  • 向量类型:由
    pgvector
    扩展提供的
    vector
    类型,用于嵌入向量的相似度搜索。

Do not use the following data types

以下数据类型请勿使用

  • DO NOT use
    timestamp
    (without time zone); DO use
    timestamptz
    instead.
  • DO NOT use
    char(n)
    or
    varchar(n)
    ; DO use
    text
    instead.
  • DO NOT use
    money
    type; DO use
    numeric
    instead.
  • DO NOT use
    timetz
    type; DO use
    timestamptz
    instead.
  • DO NOT use
    timestamptz(0)
    or any other precision specification; DO use
    timestamptz
    instead
  • DO NOT use
    serial
    type; DO use
    generated always as identity
    instead.
  • 请勿使用
    timestamp
    (无时区);请改用
    timestamptz
  • 请勿使用
    char(n)
    varchar(n)
    ;请改用
    text
  • 请勿使用
    money
    类型;请改用
    numeric
  • 请勿使用
    timetz
    类型;请改用
    timestamptz
  • 请勿使用
    timestamptz(0)
    或任何其他精度指定;请直接使用
    timestamptz
  • 请勿使用
    serial
    类型;请改用
    generated always as identity

Table Types

表类型

  • Regular: default; fully durable, logged.
  • TEMPORARY: session-scoped, auto-dropped, not logged. Faster for scratch work.
  • UNLOGGED: persistent but not crash-safe. Faster writes; good for caches/staging.
  • 常规表:默认类型;完全持久化、已记录日志。
  • 临时表:会话级作用域,自动删除,不记录日志。适用于临时数据处理,速度更快。
  • 无日志表(UNLOGGED):持久化但不保证崩溃安全。写入速度更快;适用于缓存/临时 staging 数据。

Row-Level Security

行级安全

Enable with
ALTER TABLE tbl ENABLE ROW LEVEL SECURITY
. Create policies:
CREATE POLICY user_access ON orders FOR SELECT TO app_users USING (user_id = current_user_id())
. Built-in user-based access control at the row level.
通过
ALTER TABLE tbl ENABLE ROW LEVEL SECURITY
启用。创建策略:
CREATE POLICY user_access ON orders FOR SELECT TO app_users USING (user_id = current_user_id())
。实现基于用户的内置行级访问控制。

Constraints

约束

  • PK: implicit UNIQUE + NOT NULL; creates a B-tree index.
  • FK: specify
    ON DELETE/UPDATE
    action (
    CASCADE
    ,
    RESTRICT
    ,
    SET NULL
    ,
    SET DEFAULT
    ). Add explicit index on referencing column—speeds up joins and prevents locking issues on parent deletes/updates. Use
    DEFERRABLE INITIALLY DEFERRED
    for circular FK dependencies checked at transaction end.
  • UNIQUE: creates a B-tree index; allows multiple NULLs unless
    NULLS NOT DISTINCT
    (PG15+). Standard behavior:
    (1, NULL)
    and
    (1, NULL)
    are allowed. With
    NULLS NOT DISTINCT
    : only one
    (1, NULL)
    allowed. Prefer
    NULLS NOT DISTINCT
    unless you specifically need duplicate NULLs.
  • CHECK: row-local constraints; NULL values pass the check (three-valued logic). Example:
    CHECK (price > 0)
    allows NULL prices. Combine with
    NOT NULL
    to enforce:
    price NUMERIC NOT NULL CHECK (price > 0)
    .
  • EXCLUDE: prevents overlapping values using operators.
    EXCLUDE USING gist (room_id WITH =, booking_period WITH &&)
    prevents double-booking rooms. Requires appropriate index type (often GiST).
  • 主键(PK):隐式包含UNIQUE + NOT NULL约束;会创建B树索引。
  • 外键(FK):指定
    ON DELETE/UPDATE
    动作(
    CASCADE
    RESTRICT
    SET NULL
    SET DEFAULT
    )。为引用列添加显式索引——可加速连接并避免父表删除/更新时的锁定问题。对于循环外键依赖,使用
    DEFERRABLE INITIALLY DEFERRED
    在事务结束时检查约束。
  • 唯一约束(UNIQUE):创建B树索引;允许多个NULL值,除非使用
    NULLS NOT DISTINCT
    (PostgreSQL 15+)。标准行为:
    (1, NULL)
    (1, NULL)
    是允许的。使用
    NULLS NOT DISTINCT
    时:仅允许一个
    (1, NULL)
    。除非明确需要重复NULL值,否则优先使用
    NULLS NOT DISTINCT
  • 检查约束(CHECK):行级本地约束;NULL值会通过检查(三值逻辑)。示例:
    CHECK (price > 0)
    允许price为NULL。如需强制非空,需结合
    NOT NULL
    price NUMERIC NOT NULL CHECK (price > 0)
  • 排除约束(EXCLUDE):使用操作符防止值重叠。
    EXCLUDE USING gist (room_id WITH =, booking_period WITH &&)
    可防止房间重复预订。需要合适的索引类型(通常是GiST)。

Indexing

索引

  • B-tree: default for equality/range queries (
    =
    ,
    <
    ,
    >
    ,
    BETWEEN
    ,
    ORDER BY
    )
  • Composite: order matters—index used if equality on leftmost prefix (
    WHERE a = ? AND b > ?
    uses index on
    (a,b)
    , but
    WHERE b = ?
    does not). Put most selective/frequently filtered columns first.
  • Covering:
    CREATE INDEX ON tbl (id) INCLUDE (name, email)
    - includes non-key columns for index-only scans without visiting table.
  • Partial: for hot subsets (
    WHERE status = 'active'
    CREATE INDEX ON tbl (user_id) WHERE status = 'active'
    ). Any query with
    status = 'active'
    can use this index.
  • Expression: for computed search keys (
    CREATE INDEX ON tbl (LOWER(email))
    ). Expression must match exactly in WHERE clause:
    WHERE LOWER(email) = 'user@example.com'
    .
  • GIN: JSONB containment/existence, arrays (
    @>
    ,
    ?
    ), full-text search (
    @@
    )
  • GiST: ranges, geometry, exclusion constraints
  • BRIN: very large, naturally ordered data (time-series)—minimal storage overhead. Effective when row order on disk correlates with indexed column (insertion order or after
    CLUSTER
    ).
  • B树索引:默认用于等值/范围查询(
    =
    <
    >
    BETWEEN
    ORDER BY
  • 复合索引:列的顺序很重要——如果查询对最左侧前缀列使用等值条件(如
    WHERE a = ? AND b > ?
    ),则会使用
    (a,b)
    索引,但
    WHERE b = ?
    不会使用该索引。将选择性最高/过滤最频繁的列放在前面。
  • 覆盖索引
    CREATE INDEX ON tbl (id) INCLUDE (name, email)
    - 包含非键列,支持仅索引扫描,无需访问表数据。
  • 部分索引:针对热点子集(如
    WHERE status = 'active'
    CREATE INDEX ON tbl (user_id) WHERE status = 'active'
    )。任何包含
    status = 'active'
    的查询都可使用该索引。
  • 表达式索引:用于计算后的搜索键(如
    CREATE INDEX ON tbl (LOWER(email))
    )。WHERE子句中的表达式必须与索引完全匹配:
    WHERE LOWER(email) = 'user@example.com'
  • GIN索引:支持JSONB包含/存在查询、数组查询(
    @>
    ?
    )、全文搜索(
    @@
  • GiST索引:支持范围类型、几何类型、排除约束
  • BRIN索引:适用于超大规模、自然有序的数据(如时间序列)——存储开销极小。当磁盘上的行顺序与索引列相关时(插入顺序或执行
    CLUSTER
    后)效果最佳。

Partitioning

分区

  • Use for very large tables (>100M rows) where queries consistently filter on partition key (often time/date).
  • Alternate use: use for tables where data maintenance tasks dictates e.g. data pruned or bulk replaced periodically
  • RANGE: common for time-series (
    PARTITION BY RANGE (created_at)
    ). Create partitions:
    CREATE TABLE logs_2024_01 PARTITION OF logs FOR VALUES FROM ('2024-01-01') TO ('2024-02-01')
    . TimescaleDB automates time-based or ID-based partitioning with retention policies and compression.
  • LIST: for discrete values (
    PARTITION BY LIST (region)
    ). Example:
    FOR VALUES IN ('us-east', 'us-west')
    .
  • HASH: for even distribution when no natural key (
    PARTITION BY HASH (user_id)
    ). Creates N partitions with modulus.
  • Constraint exclusion: requires
    CHECK
    constraints on partitions for query planner to prune. Auto-created for declarative partitioning (PG10+).
  • Prefer declarative partitioning or hypertables. Do NOT use table inheritance.
  • Limitations: no global UNIQUE constraints—include partition key in PK/UNIQUE. FKs from partitioned tables not supported; use triggers.
  • 适用于超大型表(超过1亿行),且查询始终按分区键过滤(通常是时间/日期)。
  • 另一种场景:数据维护需求决定,例如定期清理或批量替换数据
  • 范围分区:常用于时间序列数据(
    PARTITION BY RANGE (created_at)
    )。创建分区:
    CREATE TABLE logs_2024_01 PARTITION OF logs FOR VALUES FROM ('2024-01-01') TO ('2024-02-01')
    TimescaleDB可自动完成基于时间或ID的分区,并提供保留策略和压缩功能。
  • 列表分区:用于离散值(
    PARTITION BY LIST (region)
    )。示例:
    FOR VALUES IN ('us-east', 'us-west')
  • 哈希分区:无自然键时用于均匀分布数据(
    PARTITION BY HASH (user_id)
    )。通过取模创建N个分区。
  • 约束排除:需要为分区添加
    CHECK
    约束,以便查询规划器进行分区裁剪。声明式分区(PostgreSQL 10+)会自动创建该约束。
  • 优先使用声明式分区或 hypertables。请勿使用表继承。
  • 局限性:无全局唯一约束——需将分区键包含在主键/唯一约束中。分区表的外键不被支持;需使用触发器实现。

Special Considerations

特殊场景考量

Update-Heavy Tables

更新频繁的表

  • Separate hot/cold columns—put frequently updated columns in separate table to minimize bloat.
  • Use
    fillfactor=90
    to leave space for HOT updates that avoid index maintenance.
  • Avoid updating indexed columns—prevents beneficial HOT updates.
  • Partition by update patterns—separate frequently updated rows in a different partition from stable data.
  • 分离冷热列——将频繁更新的列放在单独的表中,以减少数据膨胀。
  • 使用
    fillfactor=90
    ——为HOT更新预留空间,避免索引维护。
  • 避免更新带索引的列——会阻止有效的HOT更新。
  • 按更新模式分区——将频繁更新的行与稳定数据放在不同分区。

Insert-Heavy Workloads

插入密集型工作负载

  • Minimize indexes—only create what you query; every index slows inserts.
  • Use
    COPY
    or multi-row
    INSERT
    instead of single-row inserts.
  • UNLOGGED tables for rebuildable staging data—much faster writes.
  • Defer index creation for bulk loads—>drop index, load data, recreate indexes.
  • Partition by time/hash to distribute load. TimescaleDB automates partitioning and compression of insert-heavy data.
  • Use a natural key for primary key such as a (timestamp, device_id) if enforcing global uniqueness is important many insert-heavy tables don't need a primary key at all.
  • If you do need a surrogate key, Prefer
    BIGINT GENERATED ALWAYS AS IDENTITY
    over
    UUID
    .
  • 最小化索引数量——仅创建查询所需的索引;每个索引都会减慢插入速度。
  • **使用
    COPY
    或多行
    INSERT
    **替代单行插入。
  • **无日志表(UNLOGGED)**用于可重建的staging数据——写入速度更快。
  • 批量加载时延迟创建索引——先删除索引,加载数据,再重新创建索引。
  • 按时间/哈希分区以分散负载。TimescaleDB可自动完成插入密集型数据的分区和压缩。
  • 使用自然键作为主键,如(timestamp, device_id),如果需要强制全局唯一性;许多插入密集型表根本不需要主键。
  • 如果确实需要代理键,优先选择
    BIGINT GENERATED ALWAYS AS IDENTITY
    而非
    UUID

Upsert-Friendly Design

友好的Upsert设计

  • Requires UNIQUE index on conflict target columns—
    ON CONFLICT (col1, col2)
    needs exact matching unique index (partial indexes don't work).
  • Use
    EXCLUDED.column
    to reference would-be-inserted values; only update columns that actually changed to reduce write overhead.
  • DO NOTHING
    faster
    than
    DO UPDATE
    when no actual update needed.
  • 需要唯一索引在冲突目标列上——
    ON CONFLICT (col1, col2)
    需要完全匹配的唯一索引(部分索引无效)。
  • **使用
    EXCLUDED.column
    **引用待插入的值;仅更新实际变化的列,以减少写入开销。
  • DO NOTHING
    DO UPDATE
    更快
    当无需实际更新时。

Safe Schema Evolution

安全的架构演进

  • Transactional DDL: most DDL operations can run in transactions and be rolled back—
    BEGIN; ALTER TABLE...; ROLLBACK;
    for safe testing.
  • Concurrent index creation:
    CREATE INDEX CONCURRENTLY
    avoids blocking writes but can't run in transactions.
  • Volatile defaults cause rewrites: adding
    NOT NULL
    columns with volatile defaults (e.g.,
    now()
    ,
    gen_random_uuid()
    ) rewrites entire table. Non-volatile defaults are fast.
  • Drop constraints before columns:
    ALTER TABLE DROP CONSTRAINT
    then
    DROP COLUMN
    to avoid dependency issues.
  • Function signature changes:
    CREATE OR REPLACE
    with different arguments creates overloads, not replacements. DROP old version if no overload desired.
  • 事务性DDL:大多数DDL操作可在事务中执行并回滚——
    BEGIN; ALTER TABLE...; ROLLBACK;
    可用于安全测试。
  • 并发创建索引
    CREATE INDEX CONCURRENTLY
    不会阻塞写入,但不能在事务中执行。
  • 易变默认值会导致重写:添加带易变默认值(如
    now()
    gen_random_uuid()
    )的
    NOT NULL
    列会重写整个表。非易变默认值的操作速度更快。
  • 先删除约束再删除列
    ALTER TABLE DROP CONSTRAINT
    然后
    DROP COLUMN
    以避免依赖问题。
  • 函数签名变更
    CREATE OR REPLACE
    使用不同参数会创建重载,而非替换。如果不需要重载,请删除旧版本。

Generated Columns

生成列

  • ... GENERATED ALWAYS AS (<expr>) STORED
    for computed, indexable fields. PG18+ adds
    VIRTUAL
    columns (computed on read, not stored).
  • ... GENERATED ALWAYS AS (<expr>) STORED
    用于可索引的计算字段。PostgreSQL 18+新增
    VIRTUAL
    列(读取时计算,不存储)。

Extensions

扩展

  • pgcrypto
    :
    crypt()
    for password hashing.
  • uuid-ossp
    : alternative UUID functions; prefer
    pgcrypto
    for new projects.
  • pg_trgm
    : fuzzy text search with
    %
    operator,
    similarity()
    function. Index with GIN for
    LIKE '%pattern%'
    acceleration.
  • citext
    : case-insensitive text type. Prefer expression indexes on
    LOWER(col)
    unless you need case-insensitive constraints.
  • btree_gin
    /
    btree_gist
    : enable mixed-type indexes (e.g., GIN index on both JSONB and text columns).
  • hstore
    : key-value pairs; mostly superseded by JSONB but useful for simple string mappings.
  • timescaledb
    : essential for time-series—automated partitioning, retention, compression, continuous aggregates.
  • postgis
    : comprehensive geospatial support beyond basic geometric types—essential for location-based applications.
  • pgvector
    : vector similarity search for embeddings.
  • pgaudit
    : audit logging for all database activity.
  • pgcrypto
    crypt()
    用于密码哈希。
  • uuid-ossp
    :替代UUID函数;新项目优先使用
    pgcrypto
  • pg_trgm
    :支持模糊文本搜索,使用
    %
    操作符、
    similarity()
    函数。使用GIN索引加速
    LIKE '%pattern%'
    查询。
  • citext
    :大小写不敏感的文本类型。除非需要大小写不敏感的约束,否则优先使用基于
    LOWER(col)
    的表达式索引。
  • btree_gin
    /
    btree_gist
    :支持混合类型索引(如同时包含JSONB和文本列的GIN索引)。
  • hstore
    :键值对;大部分场景已被JSONB取代,但在简单字符串映射时仍有用。
  • timescaledb
    :时间序列数据必备——自动分区、数据保留、压缩、连续聚合。
  • postgis
    :提供超越基础几何类型的全面地理空间支持——位置应用必备。
  • pgvector
    :用于嵌入向量的相似度搜索。
  • pgaudit
    :记录所有数据库活动的审计日志。

JSONB Guidance

JSONB 指南

  • Prefer
    JSONB
    with GIN index.
  • Default:
    CREATE INDEX ON tbl USING GIN (jsonb_col);
    → accelerates:
    • Containment
      jsonb_col @> '{"k":"v"}'
    • Key existence
      jsonb_col ? 'k'
      , any/all keys
      ?\|
      ,
      ?&
    • Path containment on nested docs
    • Disjunction
      jsonb_col @> ANY(ARRAY['{"status":"active"}', '{"status":"pending"}'])
  • Heavy
    @>
    workloads: consider opclass
    jsonb_path_ops
    for smaller/faster containment-only indexes:
    • CREATE INDEX ON tbl USING GIN (jsonb_col jsonb_path_ops);
    • Trade-off: loses support for key existence (
      ?
      ,
      ?|
      ,
      ?&
      ) queries—only supports containment (
      @>
      )
  • Equality/range on a specific scalar field: extract and index with B-tree (generated column or expression):
    • ALTER TABLE tbl ADD COLUMN price INT GENERATED ALWAYS AS ((jsonb_col->>'price')::INT) STORED;
    • CREATE INDEX ON tbl (price);
    • Prefer queries like
      WHERE price BETWEEN 100 AND 500
      (uses B-tree) over
      WHERE (jsonb_col->>'price')::INT BETWEEN 100 AND 500
      without index.
  • Arrays inside JSONB: use GIN +
    @>
    for containment (e.g., tags). Consider
    jsonb_path_ops
    if only doing containment.
  • Keep core relations in tables; use JSONB for optional/variable attributes.
  • Use constraints to limit allowed JSONB values in a column e.g.
    config JSONB NOT NULL CHECK(jsonb_typeof(config) = 'object')
  • 优先使用
    JSONB
    并搭配GIN索引。
  • 默认索引:
    CREATE INDEX ON tbl USING GIN (jsonb_col);
    → 加速以下操作:
    • 包含查询
      jsonb_col @> '{"k":"v"}'
    • 键存在查询
      jsonb_col ? 'k'
      任意/所有键存在
      ?|
      ?&
    • 嵌套文档的路径包含查询
    • 析取查询
      jsonb_col @> ANY(ARRAY['{"status":"active"}', '{"status":"pending"}'])
  • 密集
    @>
    工作负载:考虑使用
    jsonb_path_ops
    操作符类创建更小/更快的仅包含查询索引:
    • CREATE INDEX ON tbl USING GIN (jsonb_col jsonb_path_ops);
    • 权衡:失去键存在(
      ?
      ?|
      ?&
      )查询支持——仅支持包含查询(
      @>
  • 特定标量字段的等值/范围查询:提取字段并使用B树索引(生成列或表达式):
    • ALTER TABLE tbl ADD COLUMN price INT GENERATED ALWAYS AS ((jsonb_col->>'price')::INT) STORED;
    • CREATE INDEX ON tbl (price);
    • 优先使用
      WHERE price BETWEEN 100 AND 500
      (使用B树索引)而非无索引的
      WHERE (jsonb_col->>'price')::INT BETWEEN 100 AND 500
  • JSONB内的数组:使用GIN +
    @>
    进行包含查询(如标签)。如果仅进行包含查询,考虑
    jsonb_path_ops
  • 核心关系数据放在表中;JSONB用于可选/可变属性。
  • 使用约束限制列中允许的JSONB值,例如
    config JSONB NOT NULL CHECK(jsonb_typeof(config) = 'object')

Examples

示例

Users

用户表

sql
CREATE TABLE users (
  user_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  name TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE UNIQUE INDEX ON users (LOWER(email));
CREATE INDEX ON users (created_at);
sql
CREATE TABLE users (
  user_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  name TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE UNIQUE INDEX ON users (LOWER(email));
CREATE INDEX ON users (created_at);

Orders

订单表

sql
CREATE TABLE orders (
  order_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  user_id BIGINT NOT NULL REFERENCES users(user_id),
  status TEXT NOT NULL DEFAULT 'PENDING' CHECK (status IN ('PENDING','PAID','CANCELED')),
  total NUMERIC(10,2) NOT NULL CHECK (total > 0),
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ON orders (user_id);
CREATE INDEX ON orders (created_at);
sql
CREATE TABLE orders (
  order_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  user_id BIGINT NOT NULL REFERENCES users(user_id),
  status TEXT NOT NULL DEFAULT 'PENDING' CHECK (status IN ('PENDING','PAID','CANCELED')),
  total NUMERIC(10,2) NOT NULL CHECK (total > 0),
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX ON orders (user_id);
CREATE INDEX ON orders (created_at);

JSONB

JSONB示例

sql
CREATE TABLE profiles (
  user_id BIGINT PRIMARY KEY REFERENCES users(user_id),
  attrs JSONB NOT NULL DEFAULT '{}',
  theme TEXT GENERATED ALWAYS AS (attrs->>'theme') STORED
);
CREATE INDEX profiles_attrs_gin ON profiles USING GIN (attrs);
sql
CREATE TABLE profiles (
  user_id BIGINT PRIMARY KEY REFERENCES users(user_id),
  attrs JSONB NOT NULL DEFAULT '{}',
  theme TEXT GENERATED ALWAYS AS (attrs->>'theme') STORED
);
CREATE INDEX profiles_attrs_gin ON profiles USING GIN (attrs);