Loading...
Loading...
Found 23 Skills
Expert data scientist for advanced analytics, machine learning, and statistical modeling. Handles complex data analysis, predictive modeling, and business intelligence. Use PROACTIVELY for data analysis tasks, ML modeling, statistical analysis, and data-driven insights.
Neo4j Graph Data Science (GDS) plugin — graph projection, algorithm execution, execution modes (stream/stats/mutate/write), memory estimation, and the GDS Python client (graphdatascience v1.21). Use when running gds.pageRank, gds.louvain, gds.wcc, gds.fastRP, gds.knn, gds.betweenness, gds.nodeSimilarity, or any gds.* procedure; projecting named in-memory graphs with gds.graph.project or graph.project; chaining algorithms with mutate mode; computing node embeddings for ML; building recommendation systems with FastRP + KNN. Also triggers on GraphDataScience, GdsSessions, graph catalog operations, ML pipelines, node classification, link prediction. Does NOT cover Aura Graph Analytics serverless sessions — use neo4j-aura-graph-analytics-skill. Does NOT handle Cypher authoring — use neo4j-cypher-skill. Does NOT cover driver setup — use neo4j-driver-python-skill or other driver skill.
Read, modify, execute, and convert Jupyter notebooks programmatically. Use when working with .ipynb files for data science workflows, including editing cells, clearing outputs, or converting to other formats.
This skill should be used when the user asks to "learn from Kaggle", "study Kaggle solutions", "analyze Kaggle competitions", or mentions Kaggle competition URLs. Provides access to extracted knowledge from winning Kaggle solutions across NLP, CV, time series, tabular, and multimodal domains.
Use when the user needs ML pipelines, statistical analysis, data preprocessing, feature engineering, model selection, experiment tracking, or data visualization. Triggers: dataset exploration, model training, feature engineering, hyperparameter tuning, experiment tracking setup, statistical hypothesis testing, visualization creation.
Adaptive multi-agent framework for automated data science tasks with planning, execution, and validation
Decide where files live in an ML experimentation project: reusable code in `src/<pkg>/`, one `# %%` script per experiment in `experiments/`, design notes + index in `journal/`, reports in `reports/`, agent-only probes in `scratch/`, narrative digest in `overview/summary.md`. Owns the layout, the file-creation rules (one file per experiment, ask before editing), and the jupytext `# %%` script convention. Never imposes `data/` — the user owns that. TRIGGER — any of: - Starting a new ML project / scaffolding a workspace. - About to create the first experiment file in a project. - About to create `src/<pkg>/data.py` / `features.py` / `pipeline.py` / `evaluate.py` for the first time. - About to write a `.ipynb` for experimentation — redirect to a `# %%` script under `experiments/`. - User asks where something should live, how to organize the project, or how to set up the workspace. - About to add a new experiment iteration — decide new file vs edit existing (ask the user). SKIP when: the file is clearly part of an already-populated module (e.g., adding a function to existing `features.py`); pure refactor inside a single existing file; pipeline declaration mechanics (`build-ml-pipeline`); evaluation mechanics (`evaluate-ml-pipeline`); skore symbol lookup (`python-api`). HOW TO USE: **first run the Detection table** below — if any signal matches, glue to existing conventions (do not rename or move folders). If no signal matches, scaffold the default layout. **Emit the Pre-flight checklist as visible text and read the Stop conditions before any file is created or edited.** Use templates in `templates/`; copy and adapt, do not rewrite from scratch.
R 4.4+ development specialist covering tidyverse, ggplot2, Shiny, and data science patterns. Use when developing data analysis pipelines, visualizations, or Shiny applications.
Neo4j graph database with Cypher query language. Use for graph-based data.
Comprehensive statistical analysis for research, experiments, and data science. Covers hypothesis testing, effect sizes, confidence intervals, Bayesian methods, regression, and advanced techniques. Emphasizes correct interpretation and avoiding common statistical mistakes. Use when ", " mentioned.
Iterative Python via live Jupyter kernel (hamelnb).