Search Results: dataset

Found 207 Skills

data-sleuth

Identify non-obvious signals, hidden patterns, and clever correlations in datasets using investigative data analysis techniques. Use when analyzing social media exports, user data, behavioral datasets, or any structured data where deeper insights are desired. Pairs with personality-profiler for enhanced signal extraction. Triggers on requests like "what patterns do you see", "find hidden signals", "correlate these datasets", "what am I missing in this data", "analyze across datasets", "find non-obvious insights", or when users want to go beyond surface-level analysis. Also use proactively when you notice interesting anomalies or correlations during any data analysis task.

🇺🇸|EnglishTranslated

Data Processingtondevrel/scientific-agen...

duckdb

An analytical in-process SQL database management system. Designed for fast analytical queries (OLAP). Highly interoperable with Python's data ecosystem (Pandas, NumPy, Arrow, Polars). Supports querying files (CSV, Parquet, JSON) directly without an ingestion step. Use for complex SQL queries on Pandas/Polars data, querying large Parquet/CSV files directly, joining data from different sources, analytical pipelines, local datasets too big for Excel, intermediate data storage and feature engineering for ML.

🇺🇸|EnglishTranslated

Data Processingdkyazzentwatwa/chatgpt-sk...

dataset-comparer

Compare two datasets to find differences, added/removed rows, changed values. Use for data validation, ETL verification, or tracking changes.

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingletta-ai/skills

reshard-c4-data

Guidance for data resharding tasks that involve reorganizing files across directory structures with constraints on file sizes and directory contents. This skill applies when redistributing datasets, splitting large files, or reorganizing data into shards while maintaining constraints like maximum files per directory or maximum file sizes. Use when tasks involve resharding, data partitioning, or directory-constrained file reorganization.

🇺🇸|EnglishTranslated

Data Processingmims-harvard/tooluniverse

tooluniverse-statistical-modeling

Perform statistical modeling and regression analysis on biomedical datasets. Supports linear regression, logistic regression (binary/ordinal/multinomial), mixed-effects models, Cox proportional hazards survival analysis, Kaplan-Meier estimation, and comprehensive model diagnostics. Extracts odds ratios, hazard ratios, confidence intervals, p-values, and effect sizes. Designed to solve BixBench statistical reasoning questions involving clinical/experimental data. Use when asked to fit regression models, compute odds ratios, perform survival analysis, run statistical tests, or interpret model coefficients from provided data.

🇺🇸|EnglishTranslated

3 scripts/Checked

AI & Machine Learningorchestra-research/ai-res...

phoenix-observability

Open-source AI observability platform for LLM tracing, evaluation, and monitoring. Use when debugging LLM applications with detailed traces, running evaluations on datasets, or monitoring production AI systems with real-time insights.

🇺🇸|EnglishTranslated

AI & Machine Learningorchestra-research/ai-res...

langsmith-observability

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

🇺🇸|EnglishTranslated

Data Processingphuryn/pm-skills

dummy-dataset

Generate realistic dummy datasets for testing with customizable columns, constraints, and output formats (CSV, JSON, SQL, Python script). Use when creating test data, building mock datasets, or generating sample data for development and demos.

🇺🇸|EnglishTranslated

AI & Machine Learninggithub/awesome-copilot

eval-driven-dev

Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycle. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM, even if they don't say "evals" explicitly. Use for making sure an AI app works correctly, catching regressions after prompt changes, debugging why an agent started behaving differently, or validating output quality before shipping.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

torchdrug

Graph-based drug discovery toolkit. Molecular property prediction (ADMET), protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis, GNNs (GIN, GAT, SchNet), 40+ datasets, for PyTorch-based ML on molecules, proteins, and biomedical graphs.

🇺🇸|EnglishTranslated

Data Processingdavila7/claude-code-templ...

lamindb

This skill should be used when working with LaminDB, an open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR. Use when managing biological datasets (scRNA-seq, spatial, flow cytometry, etc.), tracking computational workflows, curating and validating data with biological ontologies, building data lakehouses, or ensuring data lineage and reproducibility in biological research. Covers data management, annotation, ontologies (genes, cell types, diseases, tissues), schema validation, integrations with workflow managers (Nextflow, Snakemake) and MLOps platforms (W&B, MLflow), and deployment strategies.

🇺🇸|EnglishTranslated

Data Processingletta-ai/skills

multi-source-data-merger

This skill provides guidance for merging data from multiple heterogeneous sources (JSON, CSV, Parquet, XML, etc.) into a unified dataset. Use this skill when tasks involve combining records from different file formats, applying field mappings, resolving conflicts based on priority rules, or generating merged outputs with conflict reports. Applicable to ETL pipelines, data consolidation, and record deduplication scenarios.

🇺🇸|EnglishTranslated