- This event has passed.
BIDS-BCHSI Research Xchange Forum
March 1, 2021 @ 12:30 pm - 1:30 pm
Research Talk — Haley Hunter-Zinck, 2019-2021 I4H Fellow.
TITLE: Comparison of synthetic electronic health record data generation techniques for training predictive clinical models
ABSTRACT: Synthetic data is gaining attention for facilitating electronic health records (EHR) data access for building predictive clinical models. Currently, there are several methodologies for generating synthetic data. Some rely on access to real and patient-level EHR data, such as methods based on generative adversarial networks or other machine learning or statistical techniques. Others, such as Synthea, do not depend on record level EHR access and use publicly available and aggregate data resources. Here, we perform quantitative and qualitative comparisons of different synthetic data generation methodologies for the purpose of building clinical predictive models using EHR data. We formulate comparable synthetic datasets with CorGAN and Synthea using the Veteran Health Administration’s COVID-19 Shared Data Resource as a template and a benchmark. Using each synthetic dataset, we train predictive models to predict COVID-19 outcomes such as transfer to the intensive care unit or mortality and validate the synthetically trained models on a real test dataset to measure and compare model utility. We also qualitatively compare synthetic data generators on aspects such as privacy risks, required data inputs, as well as an assessment of manual effort and computational requirements for training the generators.
Register to receive the virtual access link.