journal article Open Access Nov 01, 2022

Synthetic data as an enabler for machine learning applications in medicine

iScience Vol. 25 No. 11 pp. 105331 · Elsevier BV
View at Publisher Save 10.1016/j.isci.2022.105331
Topics

No keywords indexed for this article. Browse by subject →

References
60
[1]
Abadi "Deep learning with differential privacy" (2016)
[2]
Alaa "How faithful is your synthetic data? Sample-level metrics for evaluating and auditing generative models" (2022)
[3]
Ali "Classification with class imbalance problem: a review" (2015)
[4]
Bauchner "Data sharing: an ethical and scientific imperative" JAMA (2016) 10.1001/jama.2016.2420
[5]
Beaulieu-Jones "Privacy-preserving generative deep neural networks support clinical data sharing" Circ. Cardiovasc. Qual. Outcomes (2019) 10.1161/circoutcomes.118.005122
[6]
Bellovin "Privacy and synthetic datasets" Stanford Technol. Law Rev. (2018)
[7]
Bentzen "Remove obstacles to sharing health data with researchers outside of the European Union" Nat. Med. (2021) 10.1038/s41591-021-01460-0
[8]
Bergen "3D PET image generation with tumour masks using TGAN" (2022)
[9]
Boedihardjo "Private measures, random walks, and synthetic data" CoRR (2022)
[10]
Boenisch "When the curious abandon honesty: federated learning is not private" ArXiv (2021)
[11]
Bowen "The philosophy of differential privacy" Not. Am. Math.Soc. (2021)
[12]
A systematic study of the class imbalance problem in convolutional neural networks

Mateusz Buda, Atsuto Maki, Maciej A. Mazurowski

Neural Networks 2018 10.1016/j.neunet.2018.07.011
[13]
Carlini "Membership inference attacks from first principles" (2022)
[14]
Carlini "Extracting training data from large language models" (2021)
[15]
Synthetic data in machine learning for medicine and healthcare

Richard J. Chen, Ming Y. Lu, Tiffany Y. Chen et al.

Nature Biomedical Engineering 2021 10.1038/s41551-021-00751-8
[16]
Choi "Generating multi-label discrete patient records using generative adversarial networks" (2017)
[17]
Dhariwal "Diffusion models beat GANs on image synthesis" (2021)
[18]
Domingo-Ferrer "The limits of differential privacy (and its misuse in data release and machine learning)" Commun.ACM (2021) 10.1145/3433638
[19]
Dwork "Calibrating noise to sensitivity in private data analysis" (2006)
[20]
El Emam "Evaluating identity disclosure risk in fully synthetic health data: model development and validation" J. Med. Internet Res. (2020) 10.2196/23139
[21]
El Emam "Utility metrics for evaluating synthetic health data generation methods: validation study" JMIR Med. Inform. (2022) 10.2196/35734
[22]
El Emam (2020)
[23]
Fredrikson "Model inversion attacks that exploit confidence information and basic countermeasures" (2015)
[24]
Garfinkel "Issues encountered deploying differential privacy" (2018)
[25]
Han "GAN-based synthetic brain MR image generation" (2018)
[26]
Heusel "GANs trained by a two time-scale update rule converge to a local nashequilibrium" (2017)
[27]
Hutson "Robo-writers: the rise and risks of language-generating AI" Nature (2021) 10.1038/d41586-021-00530-0
[28]
James "Synthetic data use: exploring use cases to optimise data utility" Discov.Artif. Intell. (2021) 10.1007/s44163-021-00016-y
[29]
Jo "Lessons from archives: strategies for collecting sociocultural data in machine learning" (2020)
[30]
Jordon "Synthetic Data - what, why and how?" CoRR (2022)
[31]
Kalkman "Responsible data sharing in international health research: a systematic review of principles and norms" BMC Med. Ethics (2019) 10.1186/s12910-019-0359-9
[32]
Karras "Progressive growing of GANs for improved quality, stability, and variation" (2018)
[33]
Lander
[34]
Levine "Synthesis of diagnostic quality cancer pathology images by generative adversarial networks" J. Pathol. (2020) 10.1002/path.5509
[35]
Liu "MACE: a flexible framework for membership privacy estimation in generative models" ArXiv (2020)
[36]
Mandl "HIPAA and the leak of "deidentified" EHR data" N. Engl. J. Med. (2021)
[37]
Melis "Exploiting unintended feature leakage in collaborative learning" (2019)
[38]
Mukherjee "privGAN: protecting GANs from membership inference attacks at low cost to utility" Proc. Priv. Enhanc. Technol. (2021)
[39]
Murakonda "ML privacy meter: aiding regulatory compliance by quantifying the privacy risks of machine learning" CoRR (2020)
[40]
Nalepa "Data augmentation for brain-tumor segmentation: a review" Front. Comput.Neurosci. (2019) 10.3389/fncom.2019.00083
[41]
Naudet "Data sharing and reanalysis of randomized controlled trials in leading biomedical journals with a full data sharing policy: survey of studies published in the BMJ and PLOS Medicine" BMJ (2018) 10.1136/bmj.k400
[42]
Oprisanu "On utility and privacy in synthetic genomic data" (2022)
[43]
Oreiller "Head and neck tumor segmentation in PET/CT: the HECKTOR challenge" Med. Image Anal. (2022) 10.1016/j.media.2021.102336
[44]
Polanin "Efforts to retrieve individual participant data sets for use in a meta-analysis result in moderate data sharing but many data sets remain missing." J. Clin. Epidemiol. (2018) 10.1016/j.jclinepi.2017.12.014
[45]
Rabesandratana "European data law is impeding studies on diabetes and Alzheimer’s, researchers warn" Science (2019) 10.1126/science.366.6468.936
[46]
Rajotte "Reducing bias and increasing utility by federated generative modeling of medical images using a centralized adversary" (2021)
[47]
Read "Data-sharing practices in publications funded by the Canadian Institutes of Health Research: a descriptive analysis" CMAJ Open (2021) 10.9778/cmajo.20200303
[48]
The future of digital health with federated learning

Nicola Rieke, Jonny Hancox, Wenqi Li et al.

npj Digital Medicine 2020 10.1038/s41746-020-00323-1
[49]
Rocher "Estimating the success of re-identifications in incomplete datasets using generative models" Nat. Commun. (2019) 10.1038/s41467-019-10933-3
[50]
Salim "Synthetic patient generation: a deep learning approach using variational autoencoders" ArXiv (2018)

Showing 50 of 60 references

Metrics
102
Citations
60
References
Details
Published
Nov 01, 2022
Vol/Issue
25(11)
Pages
105331
License
View
Funding
Canadian Institute for Advanced Research
Institut de Valorisation des Données
Cite This Article
Jean-Francois Rajotte, Robert Bergen, David L. Buckeridge, et al. (2022). Synthetic data as an enabler for machine learning applications in medicine. iScience, 25(11), 105331. https://doi.org/10.1016/j.isci.2022.105331
Related

You May Also Like