Loading...
Loading...
Found 68 Skills
Chapter 2 데이터 수집 품질 기준 및 검증 방법
Parse and explain HL7 v2.5 IHE PAM (Patient Administration Management) messages. Identifies message type, extracts segments (MSH, EVN, PID, PV1, PV2), validates structure, and provides detailed explanations of ADT messages for patient administration workflows.
Plan a migration onto MotherDuck. Use when moving from Snowflake, Redshift, PostgreSQL, dbt-heavy stacks, or lakehouse tooling and the key decisions are target pattern, cutover slices, validation, rollback, and native-versus-DuckLake posture.
Guidance for counting tokens in datasets, particularly from HuggingFace or similar sources. This skill should be used when tasks involve counting tokens in datasets, understanding dataset schemas, filtering by categories/domains, or working with tokenizers. It helps avoid common pitfalls like incomplete field identification and ambiguous terminology interpretation.
Probability, distributions, hypothesis testing, and statistical inference. Use for A/B testing, experimental design, or statistical validation.
Digital archiving workflows with AI enrichment, entity extraction, and knowledge graph construction. Use when building content archives, implementing AI-powered categorization, extracting entities and relationships, or integrating multiple data sources. Covers patterns from the Jay Rosen Digital Archive project.
The drum sounds. Bear and Bloodhound gather for safe data movement. Use when migrating data that requires both careful movement and codebase understanding.
Standards and best practices for writing LookML tests to ensure data integrity, accuracy, and logic validation.
Validate at every layer data passes through to make bugs impossible. Use when invalid data causes failures deep in execution, requiring validation at multiple system layers.
pytest, data validation, Great Expectations, and quality assurance for data systems
Search and extract contact information for people or companies including names, phone numbers, emails, job titles, and LinkedIn profiles. Aggregates data from multiple sources and provides enriched contact details. Use when users need to find contact information, build prospect lists, or enrich existing contact data.
Use to define schemas, topic tags, and lineage metadata for enriched signals.