data-analytics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseData Analytics Diagram Generator
数据分析图生成器
Quick Start: Define data sources → Declare ingestion/ETL icons → Connect to storage/warehouse → Add BI/visualization → Wrap in fence.
```plantuml⚠️ IMPORTANT: Always useor```plantumlcode fence. NEVER use```puml— it will NOT render as a diagram.```text
快速入门: 定义数据源 → 声明数据摄取/ETL图标 → 连接到存储/数据仓库 → 添加BI/可视化组件 → 用 代码块包裹。
```plantuml⚠️ 重要提示: 请始终使用或```plantuml代码块。切勿使用```puml—— 它不会渲染为图表。```text
Critical Rules
核心规则
- Every diagram starts with and ends with
@startuml@enduml - Use for data pipelines (Source → Ingest → Transform → Store → Visualize)
left to right direction - Use stencil syntax for analytics, database, and storage icons
mxgraph.aws4.* - Default colors are applied automatically — you do NOT need to specify or
fillColorstrokeColor - Use or
rectangle "Zone" { ... }for grouping pipeline stagespackage "Layer" { ... } - Directed flows use , async/streaming flows use
-->(dashed)..>
Full stencil reference: See stencils/README.md for 9500+ available icons.
- 所有图表都以 开头,以
@startuml结尾@enduml - 数据管道请使用(从左到右的布局方向):数据源 → 摄取 → 转换 → 存储 → 可视化
left to right direction - 分析、数据库和存储图标请使用 模板语法
mxgraph.aws4.* - 默认颜色会自动应用 —— 你无需指定 或
fillColorstrokeColor - 使用 或
rectangle "Zone" { ... }对管道阶段进行分组package "Layer" { ... } - 定向数据流使用 , 异步/流数据流使用
-->(虚线)..>
完整模板参考: 查看 stencils/README.md 获取9500+可用图标。
Mxgraph Stencil Syntax
Mxgraph模板语法
mxgraph.aws4.<icon> "Label" as <alias>mxgraph.aws4.<icon> "Label" as <alias>Analytics & ETL Stencils
分析与ETL模板
| Category | Stencils | Purpose |
|---|---|---|
| Query Engine | | Serverless SQL on S3 data |
| ETL | | Data integration & cataloging |
| Streaming | | Real-time data streaming |
| MapReduce | | Big data processing (Spark, Hive) |
| Data Warehouse | | Columnar analytics warehouse |
| Search | | Full-text search & log analytics |
| BI | | Dashboards & visualizations |
| Data Lake | | Governed data lake storage |
| Catalog | | Data governance & sharing |
| Streaming Kafka | | Managed Kafka streaming |
| 类别 | 模板 | 用途 |
|---|---|---|
| 查询引擎 | | S3数据上的无服务器SQL |
| ETL | | 数据集成与编目 |
| 流处理 | | 实时数据流 |
| MapReduce | | 大数据处理(Spark, Hive) |
| 数据仓库 | | 列式分析仓库 |
| 搜索 | | 全文搜索与日志分析 |
| BI | | 仪表盘与可视化 |
| 数据湖 | | 受管控的数据湖存储 |
| 编目 | | 数据治理与共享 |
| Kafka流处理 | | 托管Kafka数据流 |
Database Stencils
数据库模板
| Category | Stencils | Purpose |
|---|---|---|
| Relational | | Transactional databases |
| NoSQL | | Key-value & document store |
| Graph | | Graph database |
| In-Memory | | Cache & session store |
| Document | | Document database |
| Ledger | | Immutable transaction log |
| Wide-Column | | Cassandra-compatible |
| 类别 | 模板 | 用途 |
|---|---|---|
| 关系型 | | 事务型数据库 |
| NoSQL | | 键值与文档存储 |
| 图 | | 图数据库 |
| 内存型 | | 缓存与会话存储 |
| 文档型 | | 文档数据库 |
| 账本型 | | 不可变事务日志 |
| 宽列型 | | 兼容Cassandra |
Connection Types
连接类型
| Syntax | Meaning | Use Case |
|---|---|---|
| Solid arrow | Batch data flow / API call |
| Dashed arrow | Streaming / async / CDC |
| Solid line | Bidirectional sync |
| Labeled connection | Describe data format or volume |
| 语法 | 含义 | 使用场景 |
|---|---|---|
| 实线箭头 | 批量数据流 / API调用 |
| 虚线箭头 | 流 / 异步 / CDC |
| 实线 | 双向同步 |
| 带标签的连接 | 描述数据格式或体量 |
Quick Example
快速示例
plantuml
@startuml
left to right direction
mxgraph.aws4.s3 "Data Lake\n(S3)" as s3
mxgraph.aws4.glue "Glue\nETL" as glue
mxgraph.aws4.redshift "Redshift" as rs
mxgraph.aws4.quicksight "QuickSight" as qs
s3 --> glue
glue --> rs
rs --> qs
@endumlplantuml
@startuml
left to right direction
mxgraph.aws4.s3 "Data Lake\n(S3)" as s3
mxgraph.aws4.glue "Glue\nETL" as glue
mxgraph.aws4.redshift "Redshift" as rs
mxgraph.aws4.quicksight "QuickSight" as qs
s3 --> glue
glue --> rs
rs --> qs
@endumlData Analytics Architecture Types
数据分析架构类型
| Type | Purpose | Key Stencils | Example |
|---|---|---|---|
| Data Lake | Centralized raw data store | | data-lake.md |
| Real-time Streaming | Event stream processing | | real-time-streaming.md |
| Data Warehouse | Star-schema analytics | | data-warehouse.md |
| ETL Pipeline | Extract-transform-load | | etl-pipeline.md |
| Log Analytics | Centralized logging | | log-analytics.md |
| ML Feature Store | Feature engineering pipeline | | ml-feature-pipeline.md |
| CDC Pipeline | Database change capture | | cdc-pipeline.md |
| Multi-source BI | Cross-database reporting | | multi-source-bi.md |
| 类型 | 用途 | 核心模板 | 示例 |
|---|---|---|---|
| 数据湖 | 中心化原始数据存储 | | data-lake.md |
| 实时流处理 | 事件流处理 | | real-time-streaming.md |
| 数据仓库 | 星型模型分析 | | data-warehouse.md |
| ETL管道 | 抽取-转换-加载 | | etl-pipeline.md |
| 日志分析 | 中心化日志处理 | | log-analytics.md |
| ML特征存储 | 特征工程管道 | | ml-feature-pipeline.md |
| CDC管道 | 数据库变更捕获 | | cdc-pipeline.md |
| 多源BI | 跨数据库报表 | | multi-source-bi.md |