powerbi-core
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePower BI Core Concepts and Data Modeling
Power BI核心概念与数据建模
Overview
概述
Core Power BI knowledge covering data modeling best practices, connectivity modes, source types, relationships, and common pitfalls. This skill provides the foundational architecture guidance every Power BI developer needs.
涵盖Power BI数据建模最佳实践、连接模式、数据源类型、关系配置及常见陷阱的核心知识。本内容为每位Power BI开发者提供必备的架构指导。
Data Model Design - Star Schema
数据模型设计 - 星型架构
Always design data models using star schema topology:
| Component | Purpose | Example |
|---|---|---|
| Fact table | Numeric events/transactions | Sales, Orders, WebVisits |
| Dimension table | Descriptive attributes | Date, Product, Customer, Geography |
| Bridge table | Many-to-many resolution | StudentCourse, OrderProduct |
Mandatory rules:
- One fact table at the center, dimensions radiating out
- Relationships flow from dimension (one side) to fact (many side)
- Use surrogate integer keys, not natural/business keys
- Keep fact tables narrow (keys + measures only)
- Denormalize dimensions (flatten snowflake into star)
- Create a dedicated Date dimension table (disable auto date/time)
- Never use bidirectional cross-filtering unless absolutely required and contained
始终采用星型架构拓扑设计数据模型:
| 组件 | 用途 | 示例 |
|---|---|---|
| 事实表 | 存储数值型事件/交易数据 | 销售、订单、网站访问量 |
| 维度表 | 存储描述性属性数据 | 日期、产品、客户、地域 |
| 桥接表 | 解决多对多关系问题 | 学生课程、订单产品 |
强制规则:
- 中心为一张事实表,维度表向外辐射
- 关系从维度表(一方)流向事实表(多方)
- 使用代理整数键,而非自然/业务键
- 保持事实表精简(仅包含键与度量值)
- 对维度表进行反规范化(将雪花架构扁平化为星型架构)
- 创建专用的日期维度表(禁用自动日期/时间功能)
- 除非绝对必要且可控,否则切勿使用双向交叉筛选
Storage Modes
存储模式
| Mode | Data Location | Refresh | Performance | Use When |
|---|---|---|---|---|
| Import | In-memory VertiPaq | Scheduled/on-demand | Fastest queries | Default choice, data under 1GB compressed |
| DirectQuery | Source database | Real-time | Depends on source | Real-time needed, data too large for import |
| Dual | Both | Scheduled + real-time | Best of both | Dimension tables in composite models |
| Direct Lake | OneLake delta tables | Framing (seconds) | Near-import speed | Fabric lakehouse/warehouse scenarios |
Import mode considerations:
- 1GB PBIX file size limit (10GB for Premium/PPU in service)
- Data is a snapshot at refresh time; not real-time
- Scheduled refresh limit: 8/day (Pro), 48/day (Premium/PPU)
DirectQuery limitations:
- No Power Query transformations applied at query time
- Single source per model (unless composite)
- Performance depends entirely on source query speed
- Many DAX functions unavailable or degraded
- No calculated columns on DirectQuery tables
- Row limit of 1 million rows per visual query
Direct Lake key considerations (2025-2026 GA):
- Two variants: Direct Lake on OneLake (DL/OL) and Direct Lake on SQL endpoints (DL/SQL)
- DL/OL does NOT fall back to DirectQuery -- queries fail if data cannot be served
- DL/SQL CAN fall back to DirectQuery via SQL analytics endpoint
- Guardrails vary by capacity: F32 allows up to 1,000 files/row groups per table; F64/P1 allows up to 5,000
- Max Memory is a soft limit for paging, not a hard guardrail -- excess paging hurts performance
- Max model size on disk/OneLake IS a hard guardrail -- exceeding causes DQ fallback (DL/SQL) or failure (DL/OL)
- Full DAX support including calculated columns
- Framing (metadata-only refresh) completes in seconds
- Power BI Embedded with Direct Lake mode is GA since March 2025
Choosing storage mode decision tree:
- Data in Fabric OneLake delta tables? Use Direct Lake
- Need real-time data, source is fast? Use DirectQuery
- Data under 1GB, can tolerate refresh lag? Use Import (best performance)
- Large data + need fast queries? Use composite model (Import dimensions + DQ facts + aggregation tables)
| 模式 | 数据位置 | 刷新方式 | 性能表现 | 适用场景 |
|---|---|---|---|---|
| Import | 内存中的VertiPaq引擎 | 定时/按需刷新 | 查询速度最快 | 默认选择,压缩后数据量小于1GB |
| DirectQuery | 源数据库 | 实时 | 取决于数据源性能 | 需要实时数据,或数据量过大无法导入 |
| Dual | 同时存储于内存和源数据库 | 定时+实时 | 兼顾两者优势 | 复合模型中的维度表 |
| Direct Lake | OneLake中的delta表 | 框架刷新(秒级) | 接近Import模式的速度 | Fabric湖仓场景 |
Import模式注意事项:
- PBIX文件大小限制为1GB(Premium/PPU服务中为10GB)
- 数据为刷新时的快照,非实时数据
- 定时刷新次数限制:Pro版每天8次,Premium/PPU版每天48次
DirectQuery限制:
- 查询时不应用Power Query转换
- 每个模型仅支持单一数据源(复合模型除外)
- 性能完全取决于源查询速度
- 许多DAX函数不可用或性能下降
- 无法在DirectQuery表上创建计算列
- 每个可视化查询的行限制为100万条
Direct Lake关键注意事项(2025-2026正式发布):
- 两种变体:基于OneLake的Direct Lake(DL/OL)和基于SQL端点的Direct Lake(DL/SQL)
- DL/OL不会回退到DirectQuery——若无法提供数据,查询将失败
- DL/SQL可通过SQL分析端点回退到DirectQuery
- 限制因容量而异:F32允许每张表最多1000个文件/行组;F64/P1允许最多5000个
- 最大内存为分页软限制,而非硬性约束——过度分页会影响性能
- 磁盘/OneLake上的最大模型大小为硬性约束——超出限制将导致DL/SQL回退到DirectQuery,或DL/OL查询失败
- 支持完整DAX功能,包括计算列
- 框架刷新(仅元数据刷新)可在数秒内完成
- 支持Direct Lake模式的Power BI Embedded自2025年3月起正式发布
存储模式选择决策树:
- 数据存储在Fabric OneLake的delta表中?使用Direct Lake
- 需要实时数据且数据源速度较快?使用DirectQuery
- 数据量小于1GB,可接受刷新延迟?使用Import(性能最佳)
- 大数据量+需要快速查询?使用复合模型(Import维度表 + DQ事实表 + 聚合表)
Relationships
关系配置
| Property | Options | Default |
|---|---|---|
| Cardinality | One-to-many, Many-to-one, One-to-one, Many-to-many | One-to-many |
| Cross-filter direction | Single, Both | Single |
| Active | Yes/No | Yes (only one active per path) |
Relationship rules:
- Only one active relationship between any two tables
- Use USERELATIONSHIP() in DAX to activate inactive relationships
- Avoid bidirectional filtering -- it causes ambiguous filter paths, performance degradation, and unexpected results
- Many-to-many requires a bridge table or composite model many-to-many cardinality
- Referential integrity: set "Assume Referential Integrity" for DirectQuery performance
| 属性 | 选项 | 默认值 |
|---|---|---|
| 基数 | 一对多、多对一、一对一、多对多 | 一对多 |
| 交叉筛选方向 | 单一、双向 | 单一 |
| 激活状态 | 是/否 | 是(每条路径仅一个激活关系) |
关系规则:
- 任意两张表之间仅能有一个激活关系
- 在DAX中使用USERELATIONSHIP()函数激活非激活关系
- 避免双向筛选——会导致筛选路径模糊、性能下降及结果异常
- 多对多关系需要桥接表或复合模型中的多对多基数设置
- 引用完整性:为DirectQuery设置“假设引用完整性”以提升性能
Data Sources Quick Reference
数据源速查
| Category | Sources |
|---|---|
| Microsoft SQL | SQL Server, Azure SQL, Azure Synapse, SQL Server Analysis Services |
| Azure | Cosmos DB, Data Explorer (Kusto), Blob Storage, Data Lake, Fabric Lakehouse/Warehouse |
| Cloud Databases | Snowflake, Databricks, Google BigQuery, Amazon Redshift, Amazon Athena |
| Files | Excel, CSV/TSV, JSON, XML, Parquet, PDF |
| Services | SharePoint, Dynamics 365, Salesforce, Google Analytics, Azure DevOps |
| Protocols | OData, REST API, ODBC, OLEDB |
| Streaming | Azure Stream Analytics, PubNub, REST API push |
| 类别 | 数据源 |
|---|---|
| Microsoft SQL | SQL Server、Azure SQL、Azure Synapse、SQL Server Analysis Services |
| Azure | Cosmos DB、Data Explorer(Kusto)、Blob存储、数据湖、Fabric湖仓 |
| 云数据库 | Snowflake、Databricks、Google BigQuery、Amazon Redshift、Amazon Athena |
| 文件 | Excel、CSV/TSV、JSON、XML、Parquet、PDF |
| 服务 | SharePoint、Dynamics 365、Salesforce、Google Analytics、Azure DevOps |
| 协议 | OData、REST API、ODBC、OLEDB |
| 流式数据 | Azure Stream Analytics、PubNub、REST API推送 |
Incremental Refresh
增量刷新
Configure incremental refresh for large Import tables to avoid full refresh:
- Create and
RangeStartparameters (type DateTime) in Power QueryRangeEnd - Apply filter on the date column using these parameters
- Configure refresh policy: archive period (e.g., 3 years), incremental period (e.g., 30 days)
- Optionally enable "detect data changes" with a last-modified column
- Optionally enable real-time data with DirectQuery for the latest partition
Requirements: Premium, PPU, or Fabric capacity for more than basic incremental refresh. Pro workspaces support incremental refresh but with limitations.
2025-2026 improvements:
- Semantic models with incremental refresh can now be edited directly in Power BI Service (change calculated columns, rename tables, adjust hierarchies) without reopening Desktop
- Enhanced refresh API supports selective partition refresh for finer control
- Improved performance for terabyte-scale datasets with faster partition processing
为大型Import表配置增量刷新以避免全量刷新:
- 在Power Query中创建和
RangeStart参数(类型为DateTime)RangeEnd - 使用这些参数对日期列应用筛选
- 配置刷新策略:归档周期(如3年)、增量周期(如30天)
- 可选:启用“检测数据变化”并指定最后修改列
- 可选:为最新分区启用DirectQuery以支持实时数据
要求: 高级版(Premium)、PPU或Fabric容量可支持更复杂的增量刷新。Pro版工作区支持增量刷新但存在限制。
2025-2026年改进:
- 支持增量刷新的语义模型现在可直接在Power BI Service中编辑(修改计算列、重命名表、调整层次结构),无需重新打开Desktop
- 增强的刷新API支持选择性分区刷新,实现更精细的控制
- 优化了TB级数据集的性能,分区处理速度更快
Gateway Configuration
网关配置
On-premises data gateway bridges on-premises sources to Power BI Service:
| Gateway Type | Use Case |
|---|---|
| Standard (enterprise) | Shared by multiple users, centrally managed |
| Personal | Single user, development/testing only |
| Virtual Network (VNet) | Azure VNet-connected sources, no on-prem hardware |
VNet data gateway (2025-2026):
- Connects to Azure data sources within a VNet without on-premises hardware
- Managed by Fabric/Power BI Service, no gateway machine maintenance
- Supports Azure SQL, Synapse, Azure Data Explorer, and other VNet-bound services
- Enable in Fabric Admin portal under gateway management
Gateway releases (2025-2026):
- Monthly releases throughout 2025-2026 with enhanced caching and query folding
- Improved query performance through optimized connection pooling
- 64-bit only for Power BI Desktop for Report Server starting September 2025
Common gateway failures:
- Credentials expired -- update in gateway settings
- Source unreachable -- check firewall, VPN, DNS
- Memory exhaustion -- monitor gateway machine resources
- Mashup engine crash -- check Power Query complexity
本地数据网关用于将本地数据源连接到Power BI Service:
| 网关类型 | 适用场景 |
|---|---|
| 标准版(企业级) | 多用户共享,集中管理 |
| 个人版 | 单用户使用,仅用于开发/测试 |
| 虚拟网络(VNet) | 连接Azure VNet内的数据源,无需本地硬件 |
VNet数据网关(2025-2026):
- 无需本地硬件即可连接Azure VNet内的数据源
- 由Fabric/Power BI Service管理,无需维护网关机器
- 支持Azure SQL、Synapse、Azure Data Explorer及其他VNet绑定服务
- 在Fabric管理门户的网关管理中启用
网关更新(2025-2026):
- 2025-2026年每月发布更新,增强缓存和查询折叠功能
- 通过优化连接池提升查询性能
- 自2025年9月起,Report Server版Power BI Desktop仅支持64位
常见网关故障:
- 凭据过期——在网关设置中更新
- 数据源不可达——检查防火墙、VPN、DNS
- 内存耗尽——监控网关机器资源
- Mashup引擎崩溃——检查Power Query复杂度
Data Source Authentication
数据源认证
| Method | Use Case | Best For |
|---|---|---|
| OAuth2 | Cloud sources (Azure SQL, Snowflake, Databricks) | Interactive use, SSO |
| Service Principal | Automated refresh, CI/CD pipelines | Unattended operations |
| Workspace Identity | Fabric workspaces (no secret to manage) | Fabric-native models |
| Managed Identity | Dataflows Gen2 to Azure sources | Zero-secret PaaS access |
| Username/Password | Legacy on-prem sources | Gateway-bound sources |
Workspace Identity (2025-2026):
- Tied to a Fabric workspace, similar to Azure Managed Identity
- No expiration, no secret or password to manage
- Configure in workspace settings, assign to semantic model data sources
- Preferred over service principal for Fabric-native scenarios
OAuth2 token limitation: When set via REST API (not UI), OAuth2 credentials lack a refresh token and expire after 1 hour. Use service principal for long-running automation.
Connection pooling best practices:
- Gateway reuses connections where possible -- minimize distinct credential sets
- Set query timeout in data source settings (default 5 min, increase for complex queries)
- Implement retry logic in Power Query for transient source failures using
try/otherwise
| 方法 | 适用场景 | 最佳用途 |
|---|---|---|
| OAuth2 | 云数据源(Azure SQL、Snowflake、Databricks) | 交互式使用、单点登录(SSO) |
| 服务主体 | 自动刷新、CI/CD流水线 | 无人值守操作 |
| 工作区身份 | Fabric工作区(无需管理密钥) | Fabric原生模型 |
| 托管身份 | Dataflow Gen2连接Azure数据源 | 零密钥PaaS访问 |
| 用户名/密码 | 传统本地数据源 | 网关绑定的数据源 |
工作区身份(2025-2026):
- 与Fabric工作区绑定,类似Azure托管身份
- 无过期时间,无需管理密钥或密码
- 在工作区设置中配置,分配给语义模型数据源
- 对于Fabric原生场景,优先于服务主体使用
OAuth2令牌限制: 通过REST API(而非UI)设置时,OAuth2凭据缺少刷新令牌,1小时后过期。长期自动化任务请使用服务主体。
连接池最佳实践:
- 网关会尽可能重用连接——尽量减少不同凭据集的数量
- 在数据源设置中配置查询超时时间(默认5分钟,复杂查询可延长)
- 在Power Query中使用实现重试逻辑,处理数据源临时故障
try/otherwise
Common Gotchas and Anti-Patterns
常见问题与反模式
| Pitfall | Impact | Fix |
|---|---|---|
| Auto date/time enabled | Hidden date tables bloat model (one per date column) | Disable in Options > Data Load |
| Implicit measures (drag numeric to visual) | No control over aggregation, no reuse | Create explicit DAX measures |
| Bidirectional cross-filter | Ambiguity, performance degradation, wrong results | Use single-direction, handle in DAX |
| Too many columns in fact tables | Bloated model, slow refresh, wasted memory | Keep facts narrow: keys + numeric values |
| BLANK vs 0 vs null confusion | DAX treats BLANK differently from 0; visuals hide BLANK rows | Use IF/COALESCE to handle explicitly |
| Circular dependency errors | Usually from calculated columns referencing each other or bidirectional filters | Restructure model, break the cycle |
| 1GB PBIX limit | Cannot save file locally | Remove unused columns, optimize cardinality |
| Power BI Service vs Desktop gap | Some features only available in one or the other | Check feature matrix before designing |
| Calculated columns vs measures | Calculated columns consume memory, stored per row | Prefer measures (computed at query time) |
| String columns in fact tables | High cardinality strings destroy VertiPaq compression | Move to dimension table, use key reference |
| 陷阱 | 影响 | 解决方案 |
|---|---|---|
| 启用自动日期/时间 | 隐藏的日期表会膨胀模型(每个日期列对应一个) | 在选项>数据加载中禁用 |
| 隐式度量值(将数值拖到可视化组件) | 无法控制聚合方式,无法重用 | 创建显式DAX度量值 |
| 双向交叉筛选 | 模糊性、性能下降、结果错误 | 使用单向筛选,在DAX中处理需求 |
| 事实表包含过多列 | 模型膨胀、刷新缓慢、内存浪费 | 精简事实表:仅保留键与数值型字段 |
| BLANK、0与null混淆 | DAX对BLANK的处理与0不同;可视化组件会隐藏BLANK行 | 使用IF/COALESCE显式处理 |
| 循环依赖错误 | 通常由计算列相互引用或双向筛选导致 | 重构模型,打破循环 |
| 1GB PBIX限制 | 无法本地保存文件 | 删除未使用的列,优化基数 |
| Power BI Service与Desktop功能差异 | 部分功能仅在其中一端可用 | 设计前查看功能矩阵 |
| 计算列vs度量值 | 计算列占用内存,按行存储 | 优先使用度量值(查询时计算) |
| 事实表包含字符串列 | 高基数字符串会破坏VertiPaq压缩 | 移至维度表,使用键引用 |
Additional Resources
额外资源
Reference Files
参考文件
- -- Detailed connector configuration for all source types
references/data-sources-detail.md - -- Extended pitfall analysis with examples and resolution patterns
references/gotchas-deep-dive.md
- -- 所有数据源类型的详细连接器配置说明
references/data-sources-detail.md - -- 扩展的陷阱分析,包含示例和解决模式
references/gotchas-deep-dive.md