journal article Open Access Jan 18, 2022

The Secondary Use of Electronic Health Records for Data Mining: Data Characteristics and Challenges

Abstract
The primary objective of implementing Electronic Health Records (EHRs) is to improve the management of patients’ health-related information. However, these records have also been extensively used for the secondary purpose of clinical research and to improve healthcare practice. EHRs provide a rich set of information that includes demographics, medical history, medications, laboratory test results, and diagnosis. Data mining and analytics techniques have extensively exploited EHR information to study patient cohorts for various clinical and research applications, such as phenotype extraction, precision medicine, intervention evaluation, disease prediction, detection, and progression. But the presence of diverse data types and associated characteristics poses many challenges to the use of EHR data. In this article, we provide an overview of information found in EHR systems and their characteristics that could be utilized for secondary applications. We first discuss the different types of data stored in EHRs, followed by the data transformations necessary for data analysis and mining. Later, we discuss the data quality issues and characteristics of the EHRs along with the relevant methods used to address them. Moreover, this survey also highlights the usage of various data types for different applications. Hence, this article can serve as a primer for researchers to understand the use of EHRs for data mining and analytics purposes.
Topics

No keywords indexed for this article. Browse by subject →

References
284
[1]
Lawrence L. Weed. 1968. Medical records that guide and teach (concluded). Yearbook of Medical Informatics 212 (1968), 1.
[10]
Zina Ben Miled, Kyle Haas, Christopher M. Black, Rezaul Karim Khandker, Vasu Chandrasekaran, Richard Lipton, and Malaz A. Boustani. 2020. Predicting dementia with routine care EMR data. Artificial Intelligence in Medicine 102, 2020 (2020). DOI:http://dx.doi.org/10.1016/j.artmed.2019.101771
[17]
Marcel von Lucadou, Thomas Ganslandt, Hans-Ulrich Prokosch, and Dennis Toddenroth. 2019. Feasibility analysis of conducting observational studies with the electronic health record. BMC Medical Informatics and Decision Making 19, 1 (2019), 1–14.
[18]
Hanieh Razzaghi, Jane Greenberg, and L. Charles Bailey. 2021. Developing a Systematic Approach to Assessing Data Quality in Secondary Use of Clinical Data based on Intended Use. Technical Report. Wiley Online Library.
[19]
Steven G. Johnson, Stuart Speedie, Gyorgy Simon, Vipin Kumar, and Bonnie L. Westra. 2015. A data quality ontology for the secondary use of EHR data. In Proceedings of the AMIA Annual Symposium Proceedings, Vol. 2015. American Medical Informatics Association, 1937.
[21]
Peter B. Jensen Lars J. Jensen and Soøren Brunak. 2012. Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics 6 13 (2012) 395–405. 10.1038/nrg3208
[31]
Lehmann H., Taylor C., Ehrenstein V., Kharrazi H.Obtaining data from electronic health records. In:Proceedings of the Gliklich RE, Leavy MB, Dreyer NA, editors. Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2 [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Oct. Chapter 4. Available from https://www.ncbi.nlm.nih.gov/books/NBK551878/. ([n. d.]).
[36]
R. L. Fogerty, C. Sankey, K. Kenyon, S. Sussman, S. Sigurdsson, and A. S. Kliger. 2016. Pilot of a low-resource, EHR-based protocol for sepsis monitoring, alert, and intervention. Journal of General Internal Medicine (2016).
[40]
Bhagya Hettige, Yuan-Fang Li, Weiqing Wang, Suong Le, and Wray L. Buntine. 2020. MedGraph: Structural and temporal representation learning of electronic medical records. In Proceedings of the 24th European Conference on Artificial Intelligence (2020).
[45]
Jyotishman Pathak, Sean P. Murphy, Brian N. Willaert, Hilal M. Kremers, Barbara P. Yawn, Walter A. Rocca, and Christopher G. Chute. 2011. Using RxNorm and NDF-RT to classify medication data extracted from electronic health records: Experiences from the rochester epidemiology project. AMIA Annual Symposium Proceedings 2011 (2011), 1089–1098.
[47]
Oliver A., Chodosh J., Ferris R., and Blaum C. S.2019. Over-treatment of older adults with diabetes and dementia. Journal of the American Geriatrics Society 67, S1 (2019), S120. DOI:https://doi.org/10.1111/jgs.15898

Showing 50 of 284 references

Cited By
93
Journal of Medical Internet Researc...
Metrics
93
Citations
284
References
Details
Published
Jan 18, 2022
Vol/Issue
55(2)
Pages
1-40
License
View
Funding
Telstra Health and the Digital Health Cooperative Research Centre
Australian Government’s Department of Industry, Science, Energy and Resources
Cite This Article
Tabinda Sarwar, Sattar Seifollahi, Jeffrey Chan, et al. (2022). The Secondary Use of Electronic Health Records for Data Mining: Data Characteristics and Challenges. ACM Computing Surveys, 55(2), 1-40. https://doi.org/10.1145/3490234
Related

You May Also Like

Data clustering

A. K. Jain, M. N. Murty · 1999

9,568 citations

Anomaly detection

Varun Chandola, Arindam Banerjee · 2009

8,799 citations

Machine learning in automated text categorization

Fabrizio Sebastiani · 2002

5,027 citations

Object tracking

Alper Yilmaz, Omar Javed · 2006

3,632 citations

A Survey on Bias and Fairness in Machine Learning

Ninareh Mehrabi, Fred Morstatter · 2021

3,466 citations