postgresql-best-practices

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PostgreSQL Best Practices

PostgreSQL开发最佳实践

Core Principles

核心原则

  • Leverage PostgreSQL's advanced features for robust data modeling
  • Optimize queries using EXPLAIN ANALYZE and proper indexing strategies
  • Use native PostgreSQL data types appropriately
  • Implement proper connection pooling and resource management
  • Follow PostgreSQL-specific security best practices
  • 利用PostgreSQL的高级特性实现稳健的数据建模
  • 使用EXPLAIN ANALYZE和合理的索引策略优化查询
  • 恰当使用PostgreSQL原生数据类型
  • 实现合理的连接池与资源管理
  • 遵循PostgreSQL特定的安全最佳实践

Schema Design

模式设计

Data Types

数据类型

  • Use appropriate native types:
    UUID
    ,
    JSONB
    ,
    ARRAY
    ,
    INET
    ,
    CIDR
  • Prefer
    TIMESTAMPTZ
    over
    TIMESTAMP
    for timezone-aware applications
  • Use
    TEXT
    instead of
    VARCHAR
    when no length limit is needed
  • Consider
    NUMERIC
    for precise decimal calculations (financial data)
  • Use
    SERIAL
    or
    BIGSERIAL
    for auto-incrementing IDs, or
    UUID
    for distributed systems
sql
CREATE TABLE orders (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL REFERENCES customers(customer_id),
    order_data JSONB NOT NULL DEFAULT '{}',
    tags TEXT[] DEFAULT '{}',
    total_amount NUMERIC(12, 2) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
  • 使用合适的原生类型:
    UUID
    JSONB
    ARRAY
    INET
    CIDR
  • 对于需支持时区的应用,优先使用
    TIMESTAMPTZ
    而非
    TIMESTAMP
  • 当无需长度限制时,使用
    TEXT
    替代
    VARCHAR
  • 对于精确的十进制计算(如金融数据),考虑使用
    NUMERIC
  • 对于自增ID使用
    SERIAL
    BIGSERIAL
    ,分布式系统中使用
    UUID
sql
CREATE TABLE orders (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL REFERENCES customers(customer_id),
    order_data JSONB NOT NULL DEFAULT '{}',
    tags TEXT[] DEFAULT '{}',
    total_amount NUMERIC(12, 2) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Table Design

表设计

  • Always define primary keys
  • Use foreign keys with appropriate ON DELETE/UPDATE actions
  • Add NOT NULL constraints where appropriate
  • Use CHECK constraints for data validation
  • Consider partitioning for large tables
sql
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    sku VARCHAR(50) NOT NULL UNIQUE,
    name TEXT NOT NULL,
    price NUMERIC(10, 2) NOT NULL CHECK (price >= 0),
    status VARCHAR(20) NOT NULL DEFAULT 'active'
        CHECK (status IN ('active', 'inactive', 'discontinued')),
    metadata JSONB DEFAULT '{}'
);
  • 始终定义主键
  • 使用带有合适ON DELETE/UPDATE动作的外键
  • 在合适的字段添加NOT NULL约束
  • 使用CHECK约束进行数据验证
  • 针对大表考虑使用分区
sql
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    sku VARCHAR(50) NOT NULL UNIQUE,
    name TEXT NOT NULL,
    price NUMERIC(10, 2) NOT NULL CHECK (price >= 0),
    status VARCHAR(20) NOT NULL DEFAULT 'active'
        CHECK (status IN ('active', 'inactive', 'discontinued')),
    metadata JSONB DEFAULT '{}'
);

Partitioning

分区

  • Use declarative partitioning for large tables (millions of rows)
  • Choose appropriate partition strategy: RANGE, LIST, or HASH
  • Create indexes on partitioned tables after partitioning
sql
CREATE TABLE events (
    event_id BIGSERIAL,
    event_type VARCHAR(50) NOT NULL,
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2024_q1 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
  • 针对大表(百万级行)使用声明式分区
  • 选择合适的分区策略:RANGE、LIST或HASH
  • 分区完成后为分区表创建索引
sql
CREATE TABLE events (
    event_id BIGSERIAL,
    event_type VARCHAR(50) NOT NULL,
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2024_q1 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

Indexing Strategies

索引策略

Index Types

索引类型

  • Use B-tree indexes (default) for equality and range queries
  • Use GIN indexes for JSONB, arrays, and full-text search
  • Use GiST indexes for geometric data and range types
  • Use BRIN indexes for large, naturally ordered data
  • Consider partial indexes for filtered queries
sql
-- B-tree index for common lookups
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- GIN index for JSONB queries
CREATE INDEX idx_orders_data ON orders USING GIN (order_data);

-- Partial index for active records only
CREATE INDEX idx_active_products ON products(name) WHERE status = 'active';

-- Covering index to avoid table lookup
CREATE INDEX idx_orders_covering ON orders(customer_id)
    INCLUDE (order_date, total_amount);
  • 使用B-tree索引(默认)处理等值和范围查询
  • 针对JSONB、数组和全文搜索使用GIN索引
  • 针对几何数据和范围类型使用GiST索引
  • 针对大型、自然有序的数据使用BRIN索引
  • 针对过滤查询考虑使用部分索引
sql
-- B-tree index for common lookups
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- GIN index for JSONB queries
CREATE INDEX idx_orders_data ON orders USING GIN (order_data);

-- Partial index for active records only
CREATE INDEX idx_active_products ON products(name) WHERE status = 'active';

-- Covering index to avoid table lookup
CREATE INDEX idx_orders_covering ON orders(customer_id)
    INCLUDE (order_date, total_amount);

Index Maintenance

索引维护

  • Regularly run ANALYZE to update statistics
  • Use REINDEX for bloated indexes
  • Monitor index usage with
    pg_stat_user_indexes
  • Remove unused indexes to reduce write overhead
sql
-- Check index usage
SELECT schemaname, relname, indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
  • 定期运行ANALYZE更新统计信息
  • 针对膨胀的索引使用REINDEX
  • 使用
    pg_stat_user_indexes
    监控索引使用情况
  • 删除未使用的索引以减少写入开销
sql
-- Check index usage
SELECT schemaname, relname, indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

Query Optimization

查询优化

EXPLAIN ANALYZE

EXPLAIN ANALYZE

  • Always analyze query plans for slow queries
  • Look for sequential scans on large tables
  • Identify missing indexes from query plans
  • Watch for high row estimates vs actual rows
sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.order_id)
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.created_at > '2024-01-01'
GROUP BY c.customer_id, c.name;
  • 始终为慢查询分析执行计划
  • 关注大表上的顺序扫描
  • 从执行计划中识别缺失的索引
  • 注意预估行数与实际行数的差异
sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.order_id)
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.created_at > '2024-01-01'
GROUP BY c.customer_id, c.name;

Common Table Expressions (CTEs)

通用表表达式(CTEs)

  • Use CTEs for complex query organization
  • Note: CTEs are optimization fences in older PostgreSQL versions
  • Use
    MATERIALIZED
    /
    NOT MATERIALIZED
    hints in PostgreSQL 12+
sql
WITH recent_orders AS MATERIALIZED (
    SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent
    FROM orders
    WHERE order_date > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY customer_id
)
SELECT c.name, ro.order_count, ro.total_spent
FROM customers c
JOIN recent_orders ro ON c.customer_id = ro.customer_id
WHERE ro.total_spent > 1000;
  • 使用CTEs组织复杂查询
  • 注意:在旧版PostgreSQL中,CTEs是优化屏障
  • 在PostgreSQL 12+中使用
    MATERIALIZED
    /
    NOT MATERIALIZED
    提示
sql
WITH recent_orders AS MATERIALIZED (
    SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent
    FROM orders
    WHERE order_date > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY customer_id
)
SELECT c.name, ro.order_count, ro.total_spent
FROM customers c
JOIN recent_orders ro ON c.customer_id = ro.customer_id
WHERE ro.total_spent > 1000;

Window Functions

窗口函数

  • Use window functions for analytics queries
  • Leverage PARTITION BY and ORDER BY for complex calculations
sql
SELECT
    order_id,
    customer_id,
    total_amount,
    SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS order_rank
FROM orders;
  • 使用窗口函数处理分析类查询
  • 利用PARTITION BY和ORDER BY进行复杂计算
sql
SELECT
    order_id,
    customer_id,
    total_amount,
    SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS order_rank
FROM orders;

JSONB Best Practices

JSONB最佳实践

  • Use JSONB over JSON for better performance and indexing
  • Create GIN indexes for JSONB columns you query
  • Use containment operators (@>, <@) for efficient queries
  • Extract frequently queried fields to regular columns
sql
-- Efficient JSONB query with GIN index
SELECT * FROM products
WHERE metadata @> '{"category": "electronics"}';

-- Extract specific fields
SELECT
    product_id,
    metadata->>'brand' AS brand,
    (metadata->>'rating')::numeric AS rating
FROM products
WHERE metadata ? 'rating';
  • 为获得更好的性能和索引支持,优先使用JSONB而非JSON
  • 为需要查询的JSONB列创建GIN索引
  • 使用包含操作符(@>, <@)实现高效查询
  • 将频繁查询的字段提取为常规列
sql
-- Efficient JSONB query with GIN index
SELECT * FROM products
WHERE metadata @> '{"category": "electronics"}';

-- Extract specific fields
SELECT
    product_id,
    metadata->>'brand' AS brand,
    (metadata->>'rating')::numeric AS rating
FROM products
WHERE metadata ? 'rating';

Connection Management

连接管理

Connection Pooling

连接池

  • Use PgBouncer or pgpool-II for connection pooling
  • Set appropriate pool sizes based on workload
  • Use transaction pooling mode for short-lived connections
  • 使用PgBouncer或pgpool-II实现连接池
  • 根据工作负载设置合适的池大小
  • 针对短连接使用事务池模式

Connection Settings

连接设置

sql
-- Recommended session settings
SET statement_timeout = '30s';
SET lock_timeout = '10s';
SET idle_in_transaction_session_timeout = '60s';
sql
-- Recommended session settings
SET statement_timeout = '30s';
SET lock_timeout = '10s';
SET idle_in_transaction_session_timeout = '60s';

Transactions and Locking

事务与锁

  • Use appropriate transaction isolation levels
  • Keep transactions short to reduce lock contention
  • Use advisory locks for application-level locking
  • Monitor and resolve lock conflicts
sql
-- Use advisory locks for application coordination
SELECT pg_advisory_lock(hashtext('resource_name'));
-- Do work
SELECT pg_advisory_unlock(hashtext('resource_name'));

-- Check for blocking queries
SELECT blocked_locks.pid AS blocked_pid,
       blocking_locks.pid AS blocking_pid,
       blocked_activity.query AS blocked_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_locks blocking_locks
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.relation = blocked_locks.relation
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocked_activity
    ON blocked_activity.pid = blocked_locks.pid;
  • 使用合适的事务隔离级别
  • 保持事务简短以减少锁竞争
  • 使用 advisory locks 实现应用级锁
  • 监控并解决锁冲突
sql
-- Use advisory locks for application coordination
SELECT pg_advisory_lock(hashtext('resource_name'));
-- Do work
SELECT pg_advisory_unlock(hashtext('resource_name'));

-- Check for blocking queries
SELECT blocked_locks.pid AS blocked_pid,
       blocking_locks.pid AS blocking_pid,
       blocked_activity.query AS blocked_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_locks blocking_locks
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.relation = blocked_locks.relation
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocked_activity
    ON blocked_activity.pid = blocked_locks.pid;

Maintenance

维护

Vacuum and Analyze

Vacuum与Analyze

  • Enable autovacuum and tune for your workload
  • Run manual VACUUM ANALYZE after bulk operations
  • Monitor table bloat
sql
-- Check table bloat
SELECT schemaname, relname,
       n_live_tup, n_dead_tup,
       round(n_dead_tup * 100.0 / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
  • 启用autovacuum并根据工作负载调优
  • 批量操作后手动运行VACUUM ANALYZE
  • 监控表膨胀情况
sql
-- Check table bloat
SELECT schemaname, relname,
       n_live_tup, n_dead_tup,
       round(n_dead_tup * 100.0 / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

Backup Strategies

备份策略

  • Use pg_dump for logical backups
  • Use pg_basebackup for physical backups
  • Implement point-in-time recovery (PITR) with WAL archiving
  • Test backup restoration regularly
  • 使用pg_dump进行逻辑备份
  • 使用pg_basebackup进行物理备份
  • 结合WAL归档实现时间点恢复(PITR)
  • 定期测试备份恢复

Security

安全

  • Use SSL/TLS for connections
  • Implement row-level security (RLS) for multi-tenant applications
  • Use roles and GRANT/REVOKE for access control
  • Audit sensitive operations with pgAudit extension
sql
-- Enable row-level security
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY documents_tenant_policy ON documents
    FOR ALL
    USING (tenant_id = current_setting('app.tenant_id')::uuid);

-- Grant minimal privileges
GRANT SELECT, INSERT, UPDATE ON orders TO app_user;
GRANT USAGE ON SEQUENCE orders_order_id_seq TO app_user;
  • 为连接使用SSL/TLS
  • 针对多租户应用实现行级安全(RLS)
  • 使用角色和GRANT/REVOKE进行访问控制
  • 使用pgAudit扩展审计敏感操作
sql
-- Enable row-level security
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY documents_tenant_policy ON documents
    FOR ALL
    USING (tenant_id = current_setting('app.tenant_id')::uuid);

-- Grant minimal privileges
GRANT SELECT, INSERT, UPDATE ON orders TO app_user;
GRANT USAGE ON SEQUENCE orders_order_id_seq TO app_user;

Monitoring

监控

  • Monitor with pg_stat_statements extension
  • Track slow queries and optimize regularly
  • Set up alerts for replication lag, connection count, and disk usage
  • Use pg_stat_activity to monitor active queries
sql
-- Enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slow queries
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
  • 使用pg_stat_statements扩展进行监控
  • 跟踪慢查询并定期优化
  • 针对复制延迟、连接数和磁盘使用设置告警
  • 使用pg_stat_activity监控活跃查询
sql
-- Enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slow queries
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;