Search Results: dataset

Found 329 Skills

Data Processingasgard-ai-platform/skills

stat-eda

Conduct Exploratory Data Analysis (EDA) using descriptive statistics, visualizations, and data quality checks. Use this skill when the user has a dataset and needs to understand its structure, find patterns, detect anomalies, or prepare data for further analysis — even if they say 'what does this data look like', 'find interesting patterns', 'clean this data', or 'summarize this dataset'.

🇺🇸|EnglishTranslated

Data Processingdelphine-l/claude_global

bioinformatics-visualization

Publication-quality bioinformatics figures - phylogenetic trees, genome browsers, iTOL datasets, and data presentation

🇺🇸|EnglishTranslated

AI & Machine Learningvllm-project/vllm-skills

vllm-bench-random-synthetic

Run vLLM performance benchmark using synthetic random data to measure throughput, TTFT (Time to First Token), TPOT (Time per Output Token), and other key performance metrics. Use when the user wants to quickly test vLLM serving performance without downloading external datasets.

🇺🇸|EnglishTranslated

AI & Machine Learningvllm-project/vllm-skills

vllm-prefix-cache-bench

This is a skill for benchmarking the efficiency of automatic prefix caching in vLLM using fixed prompts, real-world datasets, or synthetic prefix/suffix patterns. Use when the user asks to benchmark prefix caching hit rate, caching efficiency, or repeated-prompt performance in vLLM.

🇺🇸|EnglishTranslated

Data Processinggemini-cli-extensions/dat...

gcp-pipeline-resource-provisioning

Automates declarative resource creation and provisioning for data pipelines, supporting BigQuery, Dataform, Dataproc, BigQuery Data Transfer Service (DTS), and other resources. It manages environment-specific configurations (dev, staging, prod) through a deployment.yaml file. Use when: - Modifying or creating deployment.yaml for deployment settings. - Resolving environment-specific variables (e.g., Project IDs, Regions) for deployment. - Provisioning supported infrastructure like BigQuery datasets/tables, Dataform resources, or DTS resources via deployment.yaml. Do not use when: - Resources already exist. - Managing resources not supported by `gcloud beta orchestration-pipelines resource-types list`. - Managing general cloud infrastructure (VMs, networks, Kubernetes, IAM policies), which are better suited for Terraform. - Infrastructure spans multiple cloud providers (AWS, Azure, etc.). - Already uses Terraform for the target resources.

🇺🇸|EnglishTranslated

Data Processingpostplusai/postplus-skill...

xiaohongshu-tools

Local execution tools for Xiaohongshu/Rednote hosted collection workflows, including actor runs, dataset normalization, account and post ranking, comment clustering, product-pool ranking, and topic-map building.

🇺🇸|EnglishTranslated

20 scripts/Attention

Data Processingneo4j-contrib/neo4j-skill...

neo4j-import-skill

Import structured data into Neo4j — LOAD CSV, CALL IN TRANSACTIONS, neo4j-admin database import full (offline bulk), apoc.load.csv/json, apoc.periodic.iterate, driver batch writes. Covers method selection, header file format, type coercion, null handling, ON ERROR modes, CONCURRENT TRANSACTIONS, pre-import constraint setup, and post-import validation. Use when importing CSV/JSON/Parquet files, migrating relational data to graph, or bulk-loading large datasets. Does NOT handle unstructured document/PDF/vector chunking pipelines — use neo4j-document-import-skill. Does NOT handle live app write patterns (MERGE/CREATE) — use neo4j-cypher-skill. Does NOT handle neo4j-admin backup/restore/config — use neo4j-cli-tools-skill.

🇺🇸|EnglishTranslated

Data Processingalirezarezvani/claude-ski...

data-quality-auditor

Audit datasets for completeness, consistency, accuracy, and validity. Profile data distributions, detect anomalies and outliers, surface structural issues, and produce an actionable remediation plan.

🇺🇸|EnglishTranslated

3 scripts/Checked

AI & Machine Learninggetcompanion-ai/feynman

ml-training-recipe

Find implementable ML training recipes from papers, datasets, docs, and code. Use when the user wants to fine-tune, train, reproduce, or choose a practical ML method, dataset, hyperparameter setup, or benchmark recipe.

🇺🇸|EnglishTranslated

Data Processingsteadfastasart/geoscience...

pooch

Data file fetching and caching for geoscience applications. Download sample datasets with automatic caching, checksum verification, and multiple download sources. Use when Claude needs to: (1) Download datasets from URLs or DOIs, (2) Cache files locally with automatic verification, (3) Verify file integrity with SHA256/MD5 hashes, (4) Extract compressed archives (ZIP, TAR, GZIP), (5) Create data registries for reproducible workflows, (6) Fetch from Zenodo or other repositories.

🇺🇸|EnglishTranslated

1 scripts/Checked

Document Processingvincenzoimp/academic-rese...

source-ingestion

Use when adding, reading, registering, or organizing research sources such as PDFs, arXiv papers, Zotero items, proposals, datasets, reports, archives, web pages, BibTeX, or source metadata.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

brev-etiquette

Brev instance operating guidance for NeMo-RL agents working in /home/ubuntu/RL with limited workspace disk, a larger /ephemeral volume, and optional /home/ubuntu/RL/.env secrets. Use when running auto-research campaigns, experiments, training jobs, model or dataset downloads, shared cache-heavy commands, log-producing runs, checkpoint generation, W&B or Hugging Face authenticated workflows, or any workflow that may create large files on Brev.

🇺🇸|EnglishTranslated