Search Results: data-quality

Found 54 Skills

Data Processingdaemon-blockint-tech/agen...

data-scrubbing

Guides cleaning and standardizing tabular datasets before analysis, modeling, or reporting—profiling, quality rules, missing values, duplicates, outliers, type coercion, encoding fixes, record linkage, deduplication, high-level PII handling (not legal advice), actuarial/insurance field scrubbing, reproducible scrub pipelines, validation checks, and sign-off. Distinct from warehouse ETL or statistical modeling. Use when the user asks for "data scrubbing", "clean this dataset", "scrub the data", "data cleaning", "dedupe records", "handle missing values", "outlier treatment", "standardize columns", "data quality rules", "profile this table", or "prepare data for modeling". Not warehouse pipelines (data-warehouse-engineer), ML modeling (data-scientist, actuary), privacy programs (compliance-engineer), FinOps only (finops-analyst), or assumption governance (assumption-setting).

🇺🇸|EnglishTranslated

Data Processingaradotso/data-skills

data-engineering-medallion-pipeline

End-to-end data engineering pipeline using MinIO, Airbyte, PostgreSQL, DBT, and Airflow with medallion architecture (Bronze/Silver/Gold layers)

🇺🇸|EnglishTranslated

Data Processingmims-harvard/tooluniverse

tooluniverse-expression-data-retrieval

Retrieves gene expression and omics datasets from ArrayExpress and BioStudies with gene disambiguation, experiment quality assessment, and structured reports. Creates comprehensive dataset profiles with metadata, sample information, and download links. Use when users need expression data, omics datasets, or mention ArrayExpress (E-MTAB, E-GEOD) or BioStudies (S-BSST) accessions.

🇺🇸|EnglishTranslated

AI & Machine Learningprakharmnnit/skills-and-p...

backend-principle-eng-python-ml-pro-max

Principal backend engineering intelligence for Python AI/ML systems. Actions: plan, design, build, implement, review, fix, optimize, refactor, debug, secure, scale ML services and pipelines. Focus: data quality, reproducibility, reliability, performance, security, observability, model evaluation, MLOps.

🇺🇸|EnglishTranslated

AI & Machine Learningsundial-org/skills

training-data-curation

Guidelines for creating high-quality datasets for LLM post-training (SFT/DPO/RLHF). Use when preparing data for fine-tuning, evaluating data quality, or designing data collection strategies.

🇺🇸|EnglishTranslated

Data Processingborghei/claude-skills

senior-data-engineer

Expert data engineering covering data pipelines, ETL/ELT, data warehousing, streaming, and data quality.

🇺🇸|EnglishTranslated

Data Processingsickn33/antigravity-aweso...

database

Database development and operations workflow covering SQL, NoSQL, database design, migrations, optimization, and data engineering.

🇺🇸|EnglishTranslated

Data Processingthe-perfect-developer/the...

pandera

This skill should be used when the user asks to "validate a DataFrame with pandera", "write a pandera schema", "use pandera DataFrameModel", "add data validation to a pipeline", or needs guidance on pandera best practices for data quality.

🇺🇸|EnglishTranslated

Data Processinggithub/awesome-copilot

geofeed-tuner

Use this skill whenever the user mentions IP geolocation feeds, RFC 8805, geofeeds, or wants help creating, tuning, validating, or publishing a self-published IP geolocation feed in CSV format. Intended user audience is a network operator, ISP, mobile carrier, cloud provider, hosting company, IXP, or satellite provider asking about IP geolocation accuracy, or geofeed authoring best practices. Helps create, refine, and improve CSV-format IP geolocation feeds with opinionated recommendations beyond RFC 8805 compliance. Do NOT use for private or internal IP address management — applies only to publicly routable IP addresses.

🇺🇸|EnglishTranslated

Data Processingmajesticlabs-dev/majestic...

great-expectations

Data validation using Great Expectations. Expectation suites, checkpoints, and data docs for pipeline monitoring.

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingdatahub-project/datahub-s...

datahub-quality

Use this skill when the user wants to manage data quality in DataHub: create or run assertions, check assertion outcomes, raise or resolve incidents, create notification subscriptions, or diagnose health problems across their estate. Triggers on: "create assertion", "run assertion", "check quality", "data quality", "health check", "raise incident", "resolve incident", "subscribe to", "failing assertions", "active incidents", or any request involving data quality, assertions, incidents, or quality notifications.

🇺🇸|EnglishTranslated

Data Processingjackspace/claudeskillz

exploratory-data-analysis

Analyze datasets to discover patterns, anomalies, and relationships. Use when exploring data files, generating statistical summaries, checking data quality, or creating visualizations. Supports CSV, Excel, JSON, Parquet, and more.

🇺🇸|EnglishTranslated