Search Results: dataset

Found 329 Skills

Data Processinggemini-cli-extensions/dat...

gcp-pipeline-resource-provisioning

Automates declarative resource creation and provisioning for data pipelines, supporting BigQuery, Dataform, Dataproc, BigQuery Data Transfer Service (DTS), and other resources. It manages environment-specific configurations (dev, staging, prod) through a deployment.yaml file. Use when: - Modifying or creating deployment.yaml for deployment settings. - Resolving environment-specific variables (e.g., Project IDs, Regions) for deployment. - Provisioning supported infrastructure like BigQuery datasets/tables, Dataform resources, or DTS resources via deployment.yaml. Do not use when: - Resources already exist. - Managing resources not supported by `gcloud beta orchestration-pipelines resource-types list`. - Managing general cloud infrastructure (VMs, networks, Kubernetes, IAM policies), which are better suited for Terraform. - Infrastructure spans multiple cloud providers (AWS, Azure, etc.). - Already uses Terraform for the target resources.

🇺🇸|EnglishTranslated

Data Processingpostplusai/postplus-skill...

xiaohongshu-tools

Local execution tools for Xiaohongshu/Rednote hosted collection workflows, including actor runs, dataset normalization, account and post ranking, comment clustering, product-pool ranking, and topic-map building.

🇺🇸|EnglishTranslated

20 scripts/Attention

Data Processingneo4j-contrib/neo4j-skill...

neo4j-import-skill

Import structured data into Neo4j — LOAD CSV, CALL IN TRANSACTIONS, neo4j-admin database import full (offline bulk), apoc.load.csv/json, apoc.periodic.iterate, driver batch writes. Covers method selection, header file format, type coercion, null handling, ON ERROR modes, CONCURRENT TRANSACTIONS, pre-import constraint setup, and post-import validation. Use when importing CSV/JSON/Parquet files, migrating relational data to graph, or bulk-loading large datasets. Does NOT handle unstructured document/PDF/vector chunking pipelines — use neo4j-document-import-skill. Does NOT handle live app write patterns (MERGE/CREATE) — use neo4j-cypher-skill. Does NOT handle neo4j-admin backup/restore/config — use neo4j-cli-tools-skill.

🇺🇸|EnglishTranslated

Data Processingalirezarezvani/claude-ski...

data-quality-auditor

Audit datasets for completeness, consistency, accuracy, and validity. Profile data distributions, detect anomalies and outliers, surface structural issues, and produce an actionable remediation plan.

🇺🇸|EnglishTranslated

3 scripts/Checked

AI & Machine Learninggetcompanion-ai/feynman

ml-training-recipe

Find implementable ML training recipes from papers, datasets, docs, and code. Use when the user wants to fine-tune, train, reproduce, or choose a practical ML method, dataset, hyperparameter setup, or benchmark recipe.

🇺🇸|EnglishTranslated

Data Processingsteadfastasart/geoscience...

pooch

Data file fetching and caching for geoscience applications. Download sample datasets with automatic caching, checksum verification, and multiple download sources. Use when Claude needs to: (1) Download datasets from URLs or DOIs, (2) Cache files locally with automatic verification, (3) Verify file integrity with SHA256/MD5 hashes, (4) Extract compressed archives (ZIP, TAR, GZIP), (5) Create data registries for reproducible workflows, (6) Fetch from Zenodo or other repositories.

🇺🇸|EnglishTranslated

1 scripts/Checked

Document Processingvincenzoimp/academic-rese...

source-ingestion

Use when adding, reading, registering, or organizing research sources such as PDFs, arXiv papers, Zotero items, proposals, datasets, reports, archives, web pages, BibTeX, or source metadata.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

brev-etiquette

Brev instance operating guidance for NeMo-RL agents working in /home/ubuntu/RL with limited workspace disk, a larger /ephemeral volume, and optional /home/ubuntu/RL/.env secrets. Use when running auto-research campaigns, experiments, training jobs, model or dataset downloads, shared cache-heavy commands, log-producing runs, checkpoint generation, W&B or Hugging Face authenticated workflows, or any workflow that may create large files on Brev.

🇺🇸|EnglishTranslated

Data Processingmcvickerlab/genvarloader

genvarloader

Use when writing or reading GenVarLoader (gvl) datasets — preparing VCF/PGEN/SVAR variant sources with bcftools/plink2, calling gvl.write, configuring gvl.Dataset for haplotype/reference/annotated/variants output modes, attaching BigWig or Table tracks, setting up spliced haplotypes from a GTF, choosing track insertion-fill strategies for indels, or filtering variants by allele frequency.

🇺🇸|EnglishTranslated

Data Processingdavila7/claude-code-templ...

geo-database

Access NCBI GEO for gene expression/genomics data. Search/download microarray and RNA-seq datasets (GSE, GSM, GPL), retrieve SOFT/Matrix files, for transcriptomics and expression analysis.

🇺🇸|EnglishTranslated

Tools & Utilitiesdavila7/claude-code-templ...

get-available-resources

This skill should be used at the start of any computationally intensive scientific task to detect and report available system resources (CPU cores, GPUs, memory, disk space). It creates a JSON file with resource information and strategic recommendations that inform computational approach decisions such as whether to use parallel processing (joblib, multiprocessing), out-of-core computing (Dask, Zarr), GPU acceleration (PyTorch, JAX), or memory-efficient strategies. Use this skill before running analyses, training models, processing large datasets, or any task where resource constraints matter.

🇺🇸|EnglishTranslated

1 scripts/Attention

Data Processingdavila7/claude-code-templ...

omero-integration

Microscopy data management platform. Access images via Python, retrieve datasets, analyze pixels, manage ROIs/annotations, batch processing, for high-content screening and microscopy workflows.

🇺🇸|EnglishTranslated