data-analytics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Data Analytics Diagram Generator

数据分析图生成器

Quick Start: Define data sources → Declare ingestion/ETL icons → Connect to storage/warehouse → Add BI/visualization → Wrap in
```plantuml
fence.
⚠️ IMPORTANT: Always use
```plantuml
or
```puml
code fence. NEVER use
```text
— it will NOT render as a diagram.
快速入门: 定义数据源 → 声明数据摄取/ETL图标 → 连接到存储/数据仓库 → 添加BI/可视化组件 → 用
```plantuml
代码块包裹。
⚠️ 重要提示: 请始终使用
```plantuml
```puml
代码块。切勿使用
```text
—— 它不会渲染为图表。

Critical Rules

核心规则

  • Every diagram starts with
    @startuml
    and ends with
    @enduml
  • Use
    left to right direction
    for data pipelines (Source → Ingest → Transform → Store → Visualize)
  • Use
    mxgraph.aws4.*
    stencil syntax for analytics, database, and storage icons
  • Default colors are applied automatically — you do NOT need to specify
    fillColor
    or
    strokeColor
  • Use
    rectangle "Zone" { ... }
    or
    package "Layer" { ... }
    for grouping pipeline stages
  • Directed flows use
    -->
    , async/streaming flows use
    ..>
    (dashed)
Full stencil reference: See stencils/README.md for 9500+ available icons.
  • 所有图表都以
    @startuml
    开头,以
    @enduml
    结尾
  • 数据管道请使用
    left to right direction
    (从左到右的布局方向):数据源 → 摄取 → 转换 → 存储 → 可视化
  • 分析、数据库和存储图标请使用
    mxgraph.aws4.*
    模板语法
  • 默认颜色会自动应用 —— 你无需指定
    fillColor
    strokeColor
  • 使用
    rectangle "Zone" { ... }
    package "Layer" { ... }
    对管道阶段进行分组
  • 定向数据流使用
    -->
    , 异步/流数据流使用
    ..>
    (虚线)
完整模板参考: 查看 stencils/README.md 获取9500+可用图标。

Mxgraph Stencil Syntax

Mxgraph模板语法

mxgraph.aws4.<icon> "Label" as <alias>
mxgraph.aws4.<icon> "Label" as <alias>

Analytics & ETL Stencils

分析与ETL模板

CategoryStencilsPurpose
Query Engine
athena
,
athena_data_source_connectors
Serverless SQL on S3 data
ETL
glue
,
glue_crawlers
,
glue_data_catalog
,
aws_glue_data_quality
,
aws_glue_for_ray
Data integration & cataloging
Streaming
kinesis
,
kinesis_data_streams
,
kinesis_data_firehose
,
kinesis_data_analytics
,
kinesis_video_streams
Real-time data streaming
MapReduce
emr
,
emr_engine
,
emr_engine_mapr_m3
,
emr_engine_mapr_m5
Big data processing (Spark, Hive)
Data Warehouse
redshift
,
redshift_ra3
,
redshift_streaming_ingestion
,
redshift_ml
Columnar analytics warehouse
Search
opensearch_service_data_node
,
opensearch_ingestion
,
cloudsearch
Full-text search & log analytics
BI
quicksight
Dashboards & visualizations
Data Lake
lake_formation
,
s3
,
glacier
,
glacier_deep_archive
Governed data lake storage
Catalog
datazone_custom_asset_type
,
data_exchange
Data governance & sharing
Streaming Kafka
msk
,
msk_connect
Managed Kafka streaming
类别模板用途
查询引擎
athena
,
athena_data_source_connectors
S3数据上的无服务器SQL
ETL
glue
,
glue_crawlers
,
glue_data_catalog
,
aws_glue_data_quality
,
aws_glue_for_ray
数据集成与编目
流处理
kinesis
,
kinesis_data_streams
,
kinesis_data_firehose
,
kinesis_data_analytics
,
kinesis_video_streams
实时数据流
MapReduce
emr
,
emr_engine
,
emr_engine_mapr_m3
,
emr_engine_mapr_m5
大数据处理(Spark, Hive)
数据仓库
redshift
,
redshift_ra3
,
redshift_streaming_ingestion
,
redshift_ml
列式分析仓库
搜索
opensearch_service_data_node
,
opensearch_ingestion
,
cloudsearch
全文搜索与日志分析
BI
quicksight
仪表盘与可视化
数据湖
lake_formation
,
s3
,
glacier
,
glacier_deep_archive
受管控的数据湖存储
编目
datazone_custom_asset_type
,
data_exchange
数据治理与共享
Kafka流处理
msk
,
msk_connect
托管Kafka数据流

Database Stencils

数据库模板

CategoryStencilsPurpose
Relational
aurora
,
aurora_instance
,
rds
,
rds_instance
,
rds_mysql_instance
,
rds_postgresql_instance
Transactional databases
NoSQL
dynamodb
,
dynamodb_table
,
dynamodb_global_secondary_index
,
dynamodb_stream
Key-value & document store
Graph
neptune
Graph database
In-Memory
elasticache
,
elasticache_for_redis
,
elasticache_for_memcached
Cache & session store
Document
documentdb
,
documentdb_with_mongodb_compatibility
Document database
Ledger
quantum_ledger_database
Immutable transaction log
Wide-Column
keyspaces
Cassandra-compatible
类别模板用途
关系型
aurora
,
aurora_instance
,
rds
,
rds_instance
,
rds_mysql_instance
,
rds_postgresql_instance
事务型数据库
NoSQL
dynamodb
,
dynamodb_table
,
dynamodb_global_secondary_index
,
dynamodb_stream
键值与文档存储
neptune
图数据库
内存型
elasticache
,
elasticache_for_redis
,
elasticache_for_memcached
缓存与会话存储
文档型
documentdb
,
documentdb_with_mongodb_compatibility
文档数据库
账本型
quantum_ledger_database
不可变事务日志
宽列型
keyspaces
兼容Cassandra

Connection Types

连接类型

SyntaxMeaningUse Case
A --> B
Solid arrowBatch data flow / API call
A ..> B
Dashed arrowStreaming / async / CDC
A -- B
Solid lineBidirectional sync
A --> B : "label"
Labeled connectionDescribe data format or volume
语法含义使用场景
A --> B
实线箭头批量数据流 / API调用
A ..> B
虚线箭头流 / 异步 / CDC
A -- B
实线双向同步
A --> B : "label"
带标签的连接描述数据格式或体量

Quick Example

快速示例

plantuml
@startuml
left to right direction
mxgraph.aws4.s3 "Data Lake\n(S3)" as s3
mxgraph.aws4.glue "Glue\nETL" as glue
mxgraph.aws4.redshift "Redshift" as rs
mxgraph.aws4.quicksight "QuickSight" as qs

s3 --> glue
glue --> rs
rs --> qs
@enduml
plantuml
@startuml
left to right direction
mxgraph.aws4.s3 "Data Lake\n(S3)" as s3
mxgraph.aws4.glue "Glue\nETL" as glue
mxgraph.aws4.redshift "Redshift" as rs
mxgraph.aws4.quicksight "QuickSight" as qs

s3 --> glue
glue --> rs
rs --> qs
@enduml

Data Analytics Architecture Types

数据分析架构类型

TypePurposeKey StencilsExample
Data LakeCentralized raw data store
s3
,
lake_formation
,
glue
,
athena
data-lake.md
Real-time StreamingEvent stream processing
kinesis
,
msk
,
lambda_function
,
opensearch_service
real-time-streaming.md
Data WarehouseStar-schema analytics
redshift
,
glue
,
quicksight
data-warehouse.md
ETL PipelineExtract-transform-load
glue
,
glue_crawlers
,
glue_data_catalog
,
s3
etl-pipeline.md
Log AnalyticsCentralized logging
kinesis_data_firehose
,
opensearch_service
,
lambda_function
log-analytics.md
ML Feature StoreFeature engineering pipeline
glue
,
s3
,
athena
,
emr
ml-feature-pipeline.md
CDC PipelineDatabase change capture
dynamodb_streams
,
kinesis
,
lambda_function
,
redshift
cdc-pipeline.md
Multi-source BICross-database reporting
aurora
,
dynamodb
,
redshift
,
quicksight
multi-source-bi.md
类型用途核心模板示例
数据湖中心化原始数据存储
s3
,
lake_formation
,
glue
,
athena
data-lake.md
实时流处理事件流处理
kinesis
,
msk
,
lambda_function
,
opensearch_service
real-time-streaming.md
数据仓库星型模型分析
redshift
,
glue
,
quicksight
data-warehouse.md
ETL管道抽取-转换-加载
glue
,
glue_crawlers
,
glue_data_catalog
,
s3
etl-pipeline.md
日志分析中心化日志处理
kinesis_data_firehose
,
opensearch_service
,
lambda_function
log-analytics.md
ML特征存储特征工程管道
glue
,
s3
,
athena
,
emr
ml-feature-pipeline.md
CDC管道数据库变更捕获
dynamodb_streams
,
kinesis
,
lambda_function
,
redshift
cdc-pipeline.md
多源BI跨数据库报表
aurora
,
dynamodb
,
redshift
,
quicksight
multi-source-bi.md