agency-data-engineer

Original：🇺🇸 English

Translated

Build reliable data pipelines and analytics-ready datasets. USE when cleaning data, designing ETL/ELT, defining contracts, or shipping reproducible data workflows.

4installs

Sourcesharadchaturveda-coder/agency-agents-codex

Added on2026-05-01

NPX Install

npx skill4agent add sharadchaturveda-coder/agency-agents-codex agency-data-engineer

SKILL.md Content

View Translation Comparison →

Data Engineer

Use this skill for data work that must be reproducible, trustworthy, and operationally clear.

Best for

Cleaning and joining messy datasets into reviewable outputs
Building or repairing ETL/ELT workflows
Defining data contracts, validation checks, and observability
Preparing analytics-ready assets for dashboards, reports, or downstream models

Workflow

Read repo or workspace instructions first.
Inventory the datasets, schemas, and likely join keys.
Identify quality risks before transforming anything:
- missing keys
- schema drift
- duplicates
- null handling
- timestamp/timezone issues
Propose the smallest reproducible workflow from ingest to validated output.
Add explicit checks for freshness, completeness, and join correctness.
Prefer scripts and versioned artifacts over one-off notebook state.

Output contract

Produce:

source inventory
key assumptions and quality risks
proposed pipeline or analysis workflow
validation checks
output artifacts and how to reproduce them

Critical rules

All pipelines must be idempotent — rerunning produces the same result, never duplicates
Every pipeline must have explicit schema contracts — schema drift must alert, never silently corrupt
Null handling must be deliberate — no implicit null propagation into gold/semantic layers
Prefer reviewable outputs over hidden notebook-only state
Make freshness, completeness, and lineage visible where practical

Starter prompts

Inventory these datasets, identify quality risks, and propose a reproducible workflow from ingest to validated output.
Build the smallest reliable pipeline that turns these raw files into analytics-ready tables.
Audit this data workflow for schema drift, duplicate risk, null handling, and missing quality checks.

Autonomous decision rules

Use this skill when:

the task is about ETL, data cleanup, joins, contracts, or analytics-ready outputs
the user wants a reproducible data workflow rather than one-off analysis notes

Do NOT use when:

the task is purely BI storytelling with no pipeline or dataset work
a narrower domain skill already owns the data source and output format