Total 50,473 skills, Data Processing has 2559 skills
Showing 12 of 2559 skills
PostgreSQL-based semantic and hybrid search with pgvector and ParadeDB. Use when implementing vector search, semantic search, hybrid search, or full-text search in PostgreSQL. Covers pgvector setup, indexing (HNSW, IVFFlat), hybrid search (FTS + BM25 + RRF), ParadeDB as Elasticsearch alternative, and re-ranking with Cohere/cross-encoders. Supports vector(1536) and halfvec(3072) types for OpenAI embeddings. Triggers: pgvector, vector search, semantic search, hybrid search, embedding search, PostgreSQL RAG, BM25, RRF, HNSW index, similarity search, ParadeDB, pg_search, reranking, Cohere rerank, pg_trgm, trigram, fuzzy search, LIKE, ILIKE, autocomplete, typo tolerance, fuzzystrmatch
Use when asked to visualize sales territories, coverage areas, service regions, or geographic boundaries on interactive maps.
Transform CSV/Excel data into narrative reports with auto-generated insights, visualizations, and PDF export. Auto-detects patterns and creates plain-English summaries.
Standardize and format phone numbers with international support, validation, and multiple output formats.
Senior SaaS CFO / Financial Analyst (15+ years) specialized in financial modeling, projections, and exit strategy for bootstrapped and VC-backed SaaS companies. Activate when user needs: (1) Revenue projections (1-5 years), (2) Exit valuation and multiples, (3) Unit economics analysis (CAC, LTV, payback), (4) Scenario modeling (conservative/base/optimistic), (5) Fundraising narratives with financial backing, (6) M&A due diligence financials, (7) SaaS metrics benchmarking, (8) Cohort analysis and churn modeling. Triggers: "proyecciones", "projections", "exit", "valuation", "ARR", "MRR", "multiples", "revenue forecast", "financial model", "exit strategy", "CAC", "LTV", "unit economics", "churn", "fundraising", "M&A", "acquisition", "5 year plan".
Diagnose ClickHouse Kafka engine health, consumer status, thread pool capacity, and consumption issues. Use for Kafka lag, consumer errors, and thread starvation.
Read/write FASTA, GenBank, FASTQ files. Sequence manipulation (complement, translate). Indexed random access via faidx. For NGS pipelines (SAM/BAM/VCF), use pysam. For BLAST, use gget or blat-integration.
Use when "Polars", "fast dataframe", "lazy evaluation", "Arrow backend", or asking about "pandas alternative", "parallel dataframe", "large CSV processing", "ETL pipeline", "expression API"
Comprehensive data validation using Pydantic v2 with data quality monitoring and schema alignment for PlanetScale PostgreSQL. Use when implementing API validation, database schema alignment, or data quality assurance. Triggers: 'validation', 'Pydantic', 'schema', 'data quality'.
Execute read-only SQL queries against PostgreSQL databases. Use when: (1) querying PostgreSQL data, (2) exploring schemas/tables, (3) running SELECT queries for analysis, (4) checking database contents. Supports multiple database connections with descriptions for auto-selection. Blocks all write operations (INSERT, UPDATE, DELETE, DROP, etc.) for safety.
Optimize strategy parameters using VectorBT. Tests parameter combinations and generates heatmaps.
Split Excel workbooks into separate Excel files by worksheet, with each worksheet generating an individual file. Application scenarios: (1) Split multi-worksheet Excel files into separate files, (2) Extract specific worksheets as independent files, (3) Distribute worksheets from merged workbooks, (4) Create copies of worksheets for separate processing or distribution.