Total 50,524 skills, Data Processing has 2561 skills
Showing 12 of 2561 skills
Guidance for counting tokens in datasets, particularly from HuggingFace or similar sources. This skill should be used when tasks involve counting tokens in datasets, understanding dataset schemas, filtering by categories/domains, or working with tokenizers. It helps avoid common pitfalls like incomplete field identification and ambiguous terminology interpretation.
Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.
Design ETL/ELT pipelines with proper orchestration, error handling, and monitoring. Use when building data pipelines, designing data workflows, or implementing data transformations.
Run regression analyses in Stata with publication-ready output tables.
Process, analyze, and visualize geospatial data at scale. Handles drone imagery, GPS tracks, GeoJSON optimization, coordinate transformations, and tile generation. Use for mapping apps, drone data processing, location-based services. Activate on "geospatial", "GIS", "PostGIS", "GeoJSON", "map tiles", "coordinate systems". NOT for simple address validation, basic distance calculations, or static map embeds.
Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.
Analyze mental health data, identify psychological patterns, assess mental health status, and provide personalized mental health recommendations. Supports correlation analysis with other health data such as sleep, exercise, and nutrition.
Fetches web pages, parses HTML with CSS selectors, calls REST APIs, and scrapes dynamic content. Use when extracting data from websites, querying JSON APIs, or automating browser interactions.
Provides comprehensive guidance for Elasticsearch including indexing, searching, aggregations, mappings, and cluster management. Use when the user asks about Elasticsearch, needs to implement search functionality, work with Elasticsearch queries, or manage Elasticsearch clusters.
R 4.4+ development specialist covering tidyverse, ggplot2, Shiny, and data science patterns. Use when developing data analysis pipelines, visualizations, or Shiny applications.
Creates and maintains dlt (data load tool) pipelines from APIs, databases, and other sources. Use when the user wants to build or debug pipelines; use verified sources (e.g. Salesforce, GitHub, Stripe) or declarative REST API or custom Python; configure destinations (e.g. DuckDB, BigQuery, Snowflake); implement incremental loading; or edit .dlt config and secrets. Use when the user mentions data ingestion, dlt pipeline, dlt init, rest_api_source, incremental load, or pipeline dashboard.
以全球鎳供給結構為核心,量化各國的主導程度(例如印尼)、主要礦區供給量、以及政策配額/減產情境對全球供需平衡與價格非對稱的影響。