journal article Open Access Jul 05, 2023

Exploring Heterogeneity with Category and Cluster Analyses for Mixed Data

Stats Vol. 6 No. 3 pp. 747-762 · MDPI AG
View at Publisher Save 10.3390/stats6030048
Abstract
Precision medicine aims to overcome the traditional one-model-fits-the-whole-population approach that is unable to detect heterogeneous disease patterns and make accurate personalized predictions. Heterogeneity is particularly relevant for patients with complications of type 2 diabetes, including diabetic kidney disease (DKD). We focus on a DKD longitudinal dataset, aiming to find specific subgroups of patients with characteristics that have a close response to the therapeutic treatment. We develop an approach based on some particular concepts of category theory and cluster analysis to explore individualized modelings and achieving insights onto disease evolution. This paper exploits the visualization tools provided by category theory, and bridges category-based abstract works and real datasets. We build subgroups deriving clusters of patients at different time points, considering a set of variables characterizing the state of patients. We analyze how specific variables affect the disease progress, and which drug combinations are more effective for each cluster of patients. The retrieved information can foster individualized strategies for DKD treatment.
Topics

No keywords indexed for this article. Browse by subject →

References
48
[1]
Mayer "Systems Biology-Derived Biomarkers to Predict Progression of Renal Function Decline in Type 2 Diabetes" Diabetes Care (2017) 10.2337/dc16-2202
[2]
Park "Integrating Multidimensional Data for Clustering Analysis With Applications to Cancer Patient Data" J. Am. Stat. Assoc. (2021) 10.1080/01621459.2020.1730853
[3]
Liu "Subgroup analysis for heterogeneous additive partially linear models and its application to car sales data" Comput. Stat. Data Anal. (2019) 10.1016/j.csda.2019.04.011
[4]
Krakow "Tools for the Precision Medicine Era: How to Develop Highly Personalized Treatment Recommendations from Cohort and Registry Data Using Q-Learning" Am. J. Epidemiol. (2017) 10.1093/aje/kwx027
[5]
Goel "Respondent-driven sampling as Markov chain Monte Carlo" Stat. Med. (2009) 10.1002/sim.3613
[6]
Fuchs "Dissimilarity functions for rank-invariant hierarchical clustering of continuous variables" Comput. Stat. Data Anal. (2021) 10.1016/j.csda.2021.107201
[7]
Amiri "Clustering categorical data via ensembling dissimilarity matrices" J. Comput. Graph. Statist. (2017) 10.1080/10618600.2017.1305278
[8]
Cunningham "ParticleMDI: Particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification" Adv. Data Anal. Classif. (2020) 10.1007/s11634-020-00401-y
[9]
Doove "A comparison of five recursive partitioning methods to find person subgroups involved in meaningful treatment–subgroup interactions" Adv. Data Anal. Classif. (2014) 10.1007/s11634-013-0159-x
[10]
Molinari "Modelling ethnic differences in the distribution of insulin resistance via Bayesian nonparametric processes: An application to the SABRE cohort study" Int. J. Biostat. (2020) 10.1515/ijb-2019-0108
[11]
Boucquemont "Identifying subgroups of renal function trajectories" Nephrol. Dial. Transpl. (2017)
[12]
Karpati, T., Leventer-Roberts, M., Feldman, B., Cohen-Stavi, C.I.R., and Balicer, R. (2018). Patient clusters based on HbA1c trajectories: A step toward individualized medicine in type 2 diabetes. PLoS ONE, 13. 10.1371/journal.pone.0207096
[13]
Perco "Molecular, histological, and clinical phenotyping of diabetic nephropathy: Valuable complementary information?" Kidney Int. (2018) 10.1016/j.kint.2017.10.026
[14]
Mac Lane, S. (1978). Categories for the Working Mathematicians, Cambridge University Press. 10.1007/978-1-4757-4721-8
[15]
Grandis, M. (2020). Higher Category Theory, World Scientific.
[16]
Baez, J., and Lauda, A. (2011). Deep Beauty: Understanding the Quantum World through Mathematical Innovation, Cambridge University Press.
[17]
Spivak, D. (2014). Category Theory for the Sciences, MIT Press.
[18]
Rosen "The Representation of Biological Systems from the Standpoint of the Theory of Categories" Bull. Math. Biophys. (1958) 10.1007/bf02477890
[19]
Varenne "The Mathematical Theory of Categories in Biology and the Concept of Natural Equivalence in Robert Rosen" Revue D’Histoire Des Sci. (2013) 10.3917/rhs.661.0167
[20]
Ehresmann "Conciliating neuroscience and phenomenology via Category Theory" Prog. Biophys. Mol. Biol. (PBMB) (2015) 10.1016/j.pbiomolbio.2015.07.004
[21]
Carlsson "Classifying Clustering Schemes" Found. Comput. Math. (2013) 10.1007/s10208-012-9141-9
[22]
Carlsson, G., and Mémoli, F. (2021). Studies in Classification, Data Analysis, and Knowledge Organization, Springer.
[23]
Bauer "Cotorsion torsion triples and the representation theory of filtered hierarchical clustering" Adv. Math. (2020) 10.1016/j.aim.2020.107171
[24]
Podani "Extending Gower’s General Coefficient of Similarity to Ordinal Characters" Taxon (1999) 10.2307/1224438
[25]
A General Coefficient of Similarity and Some of Its Properties

J. C. Gower

Biometrics 1971 10.2307/2528823
[26]
Hummel, M., Edelmann, D., and Kopp-Schneider, A. (2017). Clustering of samples and variables with mixed-type data. PLoS ONE, 12. 10.1371/journal.pone.0188274
[27]
Distefano, V., Mannone, M., Silvestri, C., and Poli, I. (2021). Book of Short Papers, SIS 2021, Pearson.
[28]
Myers, D. (2020). Double categories of Open Dynamical Systems. Appl. Catego. Theory, 154–167. 10.4204/eptcs.333.11
[29]
"The Gray Monoidal Product of Double Categories" Appl. Categ. Struct. (2020) 10.1007/s10485-019-09587-5
[30]
Pauws "A comparison of methods for clustering longitudinal data with slowly changing trends" Commun. Stat. Simul. Comput. (2021)
[31]
Oellgaard "Intensified multifactorial intervention in type 2 diabetics with microalbuminuria leads to long-term renal benefits" Kidney Int. (2017) 10.1016/j.kint.2016.11.023
[32]
Aschauer, C., Perco, P., Heinzel, A., Sunzenauer, J., and Oberbauer, R. (2017). Positioning of Tacrolimus for the Treatment of Diabetic Nephropathy Based on Computational Network Analysis. PLoS ONE, 12. 10.1371/journal.pone.0169518
[33]
Bauer "A comparative study of divisive and agglomerative hierarchical clustering algorithms" J. Classif. (2018) 10.1007/s00357-018-9259-9
[34]
Everitt, B., Landau, S., and Leese, M. (2011). Cluster Analysis, Oxford University Press. 10.1002/9780470977811
[35]
Miyamoto, S., Abe, R., Endo, Y., and Takeshita, J. (2015, January 13–15). Ward Method of Hierarchical Clustering for Non-Euclidean Similarity Measures. Proceedings of the 2015 Seventh International Conference of Soft Computing and Pattern Recognition (SoCPaR 2015), Fukuoka, Japan. 10.1109/socpar.2015.7492784
[36]
Hirano "Comparison of clustering methods for clinical databases" Inf. Sci. (2004) 10.1016/j.ins.2003.03.011
[37]
Egan, B., Sutherland, S., Tilkemeier, P., Davis, R., Rutledge, V., and Sinopoli, A. (2019). A cluster-based approach for integrating clinical management of Medicare beneficiaries with multiple chronic conditions. PLoS ONE, 14. 10.1371/journal.pone.0217696
[38]
Inohara "Association of Atrial Fibrillation Clinical Phenotypes with Treatment Patterns and Outcomes: A Multicenter Registry Study" JAMA Cardiol. (2018) 10.1001/jamacardio.2017.4665
[39]
Aschenbruck "Cluster Validation for Mixed-Type Data" Arch. Data Sci. Ser. A (2020)
[40]
Halkidi "On Clustering Validation Techniques" J. Intell. Inf. Syst. (2001) 10.1023/a:1012801612483
[41]
Nieweglowski, L. (2023, May 31). Package ‘clv’: Cluster Validation Techniques. Available online: https://rdrr.io/cran/clv/.
[42]
Halkidi, M., and Vazirgiannis, M. (December, January 29). Clustering Validity Assessment: Finding the optimal partitioning of a data set. Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, CA, USA.
[43]
Neuen "Changes in GFR and Albuminuria in Routine Clinical Practice and the Risk of Kidney Disease Progression" Am. J. Kidney Dis. (2021) 10.1053/j.ajkd.2021.02.335
[44]
Zaharia "Risk of diabetes-associated diseases in subgroups of patients with recent-onset diabetes: A 5-year follow-up study" Lancet (2019)
[45]
Vallati "Clinical Similarities: An Innovative Approach for Supporting Medical Decisions" Stud. Health Technol. Inform. (2013)
[46]
McIsaac, M.A., and Cook, R.J. (2013). ISS-2012 Proceedings Volume on Longitudinal Data Analysis Subject to Measurement Errors, Missing Values, and/or Outliers, Springer.
[47]
Sheng "Analytical methods for correlated data arising from multicenter hearing studies" Stat. Med. (2022) 10.1002/sim.9572
[48]
A New Equation to Estimate Glomerular Filtration Rate

Andrew S. Levey, Lesley A. Stevens, Christopher H. Schmid et al.

Annals of Internal Medicine 2009 10.7326/0003-4819-150-9-200905050-00006
Metrics
6
Citations
48
References
Details
Published
Jul 05, 2023
Vol/Issue
6(3)
Pages
747-762
License
View
Funding
European Union’s Horizon 2020 research and innovation program Award: 848011
Cite This Article
Veronica Distefano, Maria Mannone, Irene Poli (2023). Exploring Heterogeneity with Category and Cluster Analyses for Mixed Data. Stats, 6(3), 747-762. https://doi.org/10.3390/stats6030048