Loading...
Loading...
Found 335 Skills
Master data engineering, ETL/ELT, data warehousing, SQL optimization, and analytics. Use when building data pipelines, designing data systems, or working with large datasets.
Use ONLY when creating NEW registrable components in ML projects that require Factory/Registry patterns. ✅ USE when: - Creating a new Dataset class (needs @register_dataset) - Creating a new Model class (needs @register_model) - Creating a new module directory with __init__.py factory - Initializing a new ML project structure from scratch - Adding new component types (Augmentation, CollateFunction, Metrics) ❌ DO NOT USE when: - Modifying existing functions or methods - Fixing bugs in existing code - Adding helper functions or utilities - Refactoring without adding new registrable components - Simple code changes to a single file - Modifying configuration files - Reading or understanding existing code Key indicator: Does the task require @register_* decorator or Factory pattern? If no, skip this skill.
Use when the user needs Excel file manipulation — reading, writing, formulas, charts, conditional formatting, data validation, pivot tables, or large file handling. Trigger conditions: create Excel reports programmatically, read spreadsheet data, add formulas or charts, apply conditional formatting, perform data validation, generate pivot tables, handle CSV import/export, process large datasets in Excel format.
Points to the BlockchainSpider open-source Python/Scrapy toolkit for collecting on-chain data—transfer subgraphs around an address or tx, EVM and Solana block/transaction ingestion, receipts/logs, and optional label plugins. Use when the user wants to build datasets, offline traces, or research pipelines alongside blockchain-analytics-operations and solana-tracing-specialist—not as a substitute for RPC provider ToS, rate limits, or legal review of sensitive crawls.
Use Parallel's parallel-cli to do live web search, URL extraction (clean markdown), deep research reports, bulk data enrichment (CSV/JSON), FindAll entity discovery, and web monitoring. Use when the user asks to look something up online, needs current sources/citations, provides URLs to read or summarise, requests deep/exhaustive research, wants to enrich a dataset with web-sourced fields, wants a list of entities (companies/people/places), or wants to monitor the web for changes over time.
Python library for working with DICOM (Digital Imaging and Communications in Medicine) files. Use this skill when reading, writing, or modifying medical imaging data in DICOM format, extracting pixel data from medical images (CT, MRI, X-ray, ultrasound), anonymizing DICOM files, working with DICOM metadata and tags, converting DICOM images to other formats, handling compressed DICOM data, or processing medical imaging datasets. Applies to tasks involving medical image analysis, PACS systems, radiology workflows, and healthcare imaging applications.
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Automated hypothesis generation and testing using large language models. Use this skill when generating scientific hypotheses from datasets, combining literature insights with empirical data, testing hypotheses against observational data, or conducting systematic hypothesis exploration for research discovery in domains like deception detection, AI content detection, mental health analysis, or other empirical research tasks.
This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.
Expert in high-performance CSV processing, parsing, and data cleaning using Python, DuckDB, and command-line tools. Use when working with CSV files, cleaning data, transforming datasets, or processing large tabular data files.
This skill provides guidance for merging data from multiple heterogeneous sources (JSON, CSV, Parquet, XML, etc.) into a unified dataset. Use this skill when tasks involve combining records from different file formats, applying field mappings, resolving conflicts based on priority rules, or generating merged outputs with conflict reports. Applicable to ETL pipelines, data consolidation, and record deduplication scenarios.
Use this skill when working with scientific research tools and workflows across bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery. This skill provides access to 600+ scientific tools including machine learning models, datasets, APIs, and analysis packages. Use when searching for scientific tools, executing computational biology workflows, composing multi-step research pipelines, accessing databases like OpenTargets/PubChem/UniProt/PDB/ChEMBL, performing tool discovery for research tasks, or integrating scientific computational resources into LLM workflows.