Loading...
Loading...
Found 6 Skills
Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.
Workload-aware architecture design for Apache Doris. MUST USE when designing data architectures, choosing between data models, planning ingestion strategies, sizing clusters, or translating business requirements into Apache Doris system designs. Complements doris-best-practices with decision frameworks and sizing-first workflow. Use when user describes a workload involving: IoT, sensor data, telemetry, real-time analytics, dashboard, log analysis, log search, CDC sync, time-series, device monitoring, point query service, ad-hoc analytics, lakehouse federation, ETL/ELT pipeline, report analytics, clickstream, user behavior, observability, metrics, fleet tracking, or any OLAP workload requiring table design from scratch. Also triggers on prompts like: "design a table for...", "how should I store...", "build an architecture for...", "we have X devices sending data every Y seconds", "recommend a cluster size for...", "what data model should I use for...", "we need to ingest X GB/day", "migrate from MySQL/PostgreSQL to Apache Doris". Also use for legacy analytics/search/serving stack consolidation prompts even when Apache Doris is not named explicitly, including replacing or migrating from Impala, Kudu, Elasticsearch/ES, Greenplum, Presto, HBase, Hive, Hadoop, Redis, or Lambda-style multi-engine data platforms.
Use when planning system architecture to ensure nothing is missed. Provides structured questions covering scalability, security, data, and operational dimensions before implementation.
/cs:cdo-review <plan> — Decision-driven Chief Data Officer interrogation of any plan that touches training data, data architecture, data productization, or data team hiring.
Deliver repeatable MotherDuck architectures across multiple clients. Use when standardizing isolation, provisioning, regional deployment, sharing boundaries, and client-specific exceptions for a consultancy or partner delivery model.
Decide when and how to index Solana data vs direct RPC reads. Covers event design, backfill, storage, and performance. Use for data architecture decisions.