Total 50,503 skills, Data Processing has 2560 skills
Showing 12 of 2560 skills
Compare document similarity using TF-IDF, cosine similarity, and Jaccard index. Use for plagiarism detection, duplicate finding, or content matching.
Auto-generate features with encodings, scaling, polynomial features, and interaction terms for ML pipelines.
Install ADBC (Arrow Database Connectivity) drivers with dbc. Use when the user wants to install database drivers and connect to databases.
Use this skill for AIRR-seq (Adaptive Immune Receptor Repertoire / VDJ-seq) data analysis with immunarch + immundata in R, including ingestion, receptor schema design, immutable transformations, clonality/diversity/public overlap metrics, and Seurat/AnnData integration.
Perl text processing and scripting with regular expressions. Use for .pl files.
TimescaleDB PostgreSQL for time-series. Use for time-series on Postgres.
SQL database queries, joins, aggregations, subqueries, and optimization. Use for .sql files and database operations.
Adds visual descriptions to transcripts by extracting and analyzing video frames with ffmpeg. Creates visual transcript with periodic visual descriptions of the video clip. Use when all files have audio transcripts present (transcript) but don't yet have visual transcripts created (visual_transcript).
Visualization Best Practices - Auto-activating skill for Data Analytics. Triggers on: visualization best practices, visualization best practices Part of the Data Analytics skill category.
Alpha Vantage API documentation reference - provides comprehensive information about stock data, forex, crypto, technical indicators, and fundamental data APIs.
MANDATORY when working with geographic data, spatial queries, geometry operations, or location-based features - enforces PostGIS 3.6.1 best practices including ST_CoverageClean, SFCGAL 3D functions, and bigint topology
Tinybird TypeScript SDK for defining datasources, pipes, and queries with full type inference. Use when working with @tinybirdco/sdk, TypeScript Tinybird projects, or type-safe data ingestion and queries.