Loading...
Loading...
Automated data quality and transformation capabilities for Dataform/dbt/BigQuery pipelines. Processes data sourced from BigQuery or Cloud Storage (GCS), applying best practices for data ingestion, movement, schema mapping, and comprehensive data cleaning.
npx skill4agent add gemini-cli-extensions/data-agent-kit-starter-pack data-autocleaning[!IMPORTANT]You MUST use this skill for ANY task where the source is BigQuery or GCS — including seemingly simple operations like "move data" or "copy table".
implementation_plan.mdscripts/dataplex_scanner.pypython3 scripts/dataplex_scanner.py ...scripts/dataplex_scanner.pySKILL.mdus-central1us--tablespython3 scripts/dataplex_scanner.py \
--tables <project.dataset.table> <project.catalog.namespace.table> \
--location <location> \
--output-dir <output_dir>project.dataset.tableproject.catalog.namespace.tablebqimplementation_plan.md## Profiling Evidence
- [ ] Dataplex Data Profile Job ID: <JOB_ID>
- [ ] Profile Result Summary: <Brief summary of key findings, e.g., % nulls, distinct values>implementation_plan.mdimplementation_plan.md[!CAUTION]Do not proceed to implementation until both sections are completed. You MUST ensure that the verification phase only validates that your transformations successfully addressed the anomalies found in Step 1.
NULL'C''F'mgliterCOALESCESAFE.PARSE_*SAFE.PARSE_JSONJSONJSON_EXTRACT_*JSON_VALUEJSON_QUERYJSON_QUERY_ARRAYJSON_VALUE_ARRAYSAFE.SAFE.PARSE_JSONNULLSAFE_CASTARRAYNULLLOWER()UPPER()NULLARRAY_FILTER(array_column, e -> e IS NOT NULL)ARRAY(SELECT DISTINCT x FROM UNNEST(array_column))ARRAY_TRANSFORMUNNESTARRAY_AGGUNNESTARRAY_AGGSAFE_CASTLOWER()UPPER()struct.fieldSTRUCT()NULL[!IMPORTANT]You MUST verify transformations strictly using the protocol below before completing the task. Never skip this step. Use Dataplex profiling only (unless scan was denied by the user) — not ad-hoc SQL queries.
SELECTbqscripts/dataplex_scanner.py| Anomaly Type | Threshold |
| --- | --- |
| **NULL increase** | >1% increase compared to source (unless expected) |
| **Value range shift** | Unexpected ranges or formats |NOT NULLIS NULLwalkthrough.md## Quality Review Profiling Evidence
- [ ] Post-Transformation Dataplex Profile Job ID: <JOB_ID>
- [ ] Profile Comparison Summary: <Detailed comparison between initial and final profiles per column>[!CAUTION]Do not conclude the task or ask for user review until this section is filled and the profile comparison is documented.
walkthrough.md| Field | Description |
| --- | --- |
| **Destination schema considered** | The target column/type being matched |
| **Issue Detected** | What data quality problem was found |
| **Transformation Applied** | The SQL logic used to fix it |
| **Benefit** | Why this transformation improves the data |