Loading...
Loading...
Build clinical/healthcare deep-learning pipelines with PyHealth — loading EHR/signal/imaging datasets (MIMIC-III/IV, eICU, OMOP, SleepEDF, ChestXray14, EHRShot), defining tasks (mortality, readmission, length-of-stay, drug recommendation, sleep staging, ICD coding, EEG events), instantiating models (Transformer, RETAIN, GAMENet, SafeDrug, MICRON, StageNet, AdaCare, CNN/RNN/MLP), training with the PyHealth Trainer, computing clinical metrics, and using medical code utilities (ICD/ATC/NDC/RxNorm lookup and cross-mapping). Use this skill whenever the user mentions PyHealth, MIMIC, eICU, OMOP, EHR modeling, clinical prediction, drug recommendation, sleep staging, medical code mapping, ICD/ATC codes, or any healthcare ML pipeline that fits the dataset → task → model → trainer → metrics pattern, even if "PyHealth" isn't named explicitly.
npx skill4agent add crazymsn/academic-skills pyhealthDataset → Task → Model → Trainer → Metricsuv# Create a project with the right Python
uv init my-pyhealth-project
cd my-pyhealth-project
uv python pin 3.12
# Add PyHealth (this also pulls in PyTorch and friends)
uv add pyhealth
# Run scripts inside the env
uv run python train.pyuv run --with pyhealth python script.pyuv add pyhealth==1.16references/installation.mdfrom pyhealth.datasets import MIMIC3Dataset, split_by_patient, get_dataloader
from pyhealth.tasks import MortalityPredictionMIMIC3
from pyhealth.models import Transformer
from pyhealth.trainer import Trainer
from pyhealth.metrics.binary import binary_metrics_fn
# 1. Dataset — raw patient registry
base = MIMIC3Dataset(
root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
)
# 2. Task — converts patients into supervised samples
samples = base.set_task(MortalityPredictionMIMIC3())
# 3. Split + DataLoaders (split by patient to avoid leakage)
train_ds, val_ds, test_ds = split_by_patient(samples, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=32, shuffle=False)
test_loader = get_dataloader(test_ds, batch_size=32, shuffle=False)
# 4. Model — must be passed the SampleDataset, not the BaseDataset
model = Transformer(dataset=samples)
# 5. Train + evaluate
trainer = Trainer(model=model)
trainer.train(
train_dataloader=train_loader,
val_dataloader=val_loader,
epochs=50,
monitor="pr_auc",
)
y_true, y_prob, _ = trainer.inference(test_loader)
print(binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc"]))assets/starter_pipeline.pySampleDatasetBaseDatasetMIMIC3Dataset(...)BaseDataset.set_task(task)SampleDatasetbasesplit_by_patientsplit_by_visitMortalityPredictionMIMIC3MortalityPredictionMIMIC4InHospitalMortalityMIMIC4references/tasks.mdmonitor"pr_auc""roc_auc""pr_auc_samples""jaccard_samples""accuracy""f1_macro"ehr_root=root=cache_dir=cache_dir| If the user is asking about… | Read |
|---|---|
| Installing, env setup, MIMIC access, GPU | |
| Which dataset class to use, loading patterns, splitting | |
| What prediction task to choose (mortality, readmission, drug rec, sleep…) | |
| Picking a model architecture, model-specific arguments | |
| Looking up or cross-mapping ICD/ATC/NDC/RxNorm/CCS codes, tokenizers | |
| End-to-end recipes for common scenarios | |
tasks.mdmodels.mdexamples.mdTrainerhttps://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/