journal article Open Access Apr 12, 2022

Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review

Mathematics Vol. 10 No. 8 pp. 1283 · MDPI AG
View at Publisher Save 10.3390/math10081283
Abstract
Technologies have driven big data collection across many fields, such as genomics and business intelligence. This results in a significant increase in variables and data points (observations) collected and stored. Although this presents opportunities to better model the relationship between predictors and the response variables, this also causes serious problems during data analysis, one of which is the multicollinearity problem. The two main approaches used to mitigate multicollinearity are variable selection methods and modified estimator methods. However, variable selection methods may negate efforts to collect more data as new data may eventually be dropped from modeling, while recent studies suggest that optimization approaches via machine learning handle data with multicollinearity better than statistical estimators. Therefore, this study details the chronological developments to mitigate the effects of multicollinearity and up-to-date recommendations to better mitigate multicollinearity.
Topics

No keywords indexed for this article. Browse by subject →

References
82
[1]
Schroeder "Diagnosing and dealing with multicollinearity" West. J. Nurs. Res. (1990) 10.1177/019394599001200204
[2]
Algamal "Biased estimators in Poisson regression model in the presence of multicollinearity: A subject review" Al-Qadisiyah J. Adm. Econ. Sci. (2018)
[3]
Bollinger "Using bollinger bands" Stock. Commod. (1992)
[4]
Iba, H., and Sasaki, T. (1999, January 6–9). Using genetic programming to predict financial data. Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), Washington, DC, USA.
[5]
Lafi "An explanation of the use of principal-components analysis to detect and correct for multicollinearity" Prev. Vet. Med. (1992) 10.1016/0167-5877(92)90041-d
[6]
Alin "Multicollinearity" Wiley Interdiscip. Rev. Comput. Stat. (2010) 10.1002/wics.84
[7]
Mason "Collinearity, power, and interpretation of multiple regression analysis" J. Mark. Res. (1991) 10.1177/002224379102800302
[8]
Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman, W. (1996). Applied Linear Statistical Models, WCB McGraw-Hill.
[9]
Weisberg, S. (2005). Applied Linear Regression, John Wiley & Sons. 10.1002/0471704091
[10]
Belsley, D.A., Kuh, E., and Welsch, R.E. (2005). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, John Wiley & Sons.
[11]
Tamura "Best subset selection for eliminating multicollinearity" J. Oper. Res. Soc. Jpn. (2017)
[12]
Askin "Multicollinearity in regression: Review and examples" J. Forecast. (1982) 10.1002/for.3980010307
[13]
Ralston, A., and Wilf, H.S. (1960). Mathematical Methods for Digital Computers, Wiley.
[14]
Hamaker "On multiple regression analysis" Stat. Neerl. (1962) 10.1111/j.1467-9574.1962.tb01184.x
[15]
Hocking "Selection of the best subset in regression analysis" Technometrics (1967) 10.1080/00401706.1967.10490502
[16]
Gorman "Selection of variables for fitting equations to data" Technometrics (1966) 10.1080/00401706.1966.10490322
[17]
Mallows, C. (1964). Choosing Variables in a Linear Regression: A Graphical Aid, Central Regional Meeting of the Institute of Mathematical Statistics.
[18]
Kashid "A more general criterion for subset selection in multiple linear regression" Commun. Stat.-Theory Methods (2002) 10.1081/sta-120003653
[19]
Montgomery, D.C., Peck, E.A., and Vining, G.G. (2021). Introduction to Linear Regression Analysis, John Wiley & Sons.
[21]
Misra "Improving the classification accuracy using recursive feature elimination with cross-validation" Int. J. Emerg. Technol. (2020)
[22]
Wold "Soft modeling: The basic design and some extensions" Syst. Under Indirect Obs. (1982)
[23]
PLS-regression: a basic tool of chemometrics

Svante Wold, Michael Sjöström, Lennart Eriksson

Chemometrics and Intelligent Laboratory Systems 2001 10.1016/s0169-7439(01)00155-1
[24]
Chong "Performance of some variable selection methods when multicollinearity is present" Chemom. Intell. Lab. Syst. (2005) 10.1016/j.chemolab.2004.12.011
[25]
Maitra "Principle component analysis and partial least squares: Two dimension reduction techniques for regression" Appl. Multivar. Stat. Models (2008)
[26]
Onur "A Comparative Study on Regression Methods in the presence of Multicollinearity" İstatistikçiler Derg. İstatistik Ve Aktüerya (2016)
[27]
Li, C., Wang, H., Wang, J., Tai, Y., and Yang, F. (2018, January 3–5). Multicollinearity problem of CPM communication signals and its suppression method with PLS algorithm. Proceedings of the Thirteenth ACM International Conference on Underwater Networks & Systems, Shenzhen, China. 10.1145/3291940.3291980
[28]
Willis "Systems modelling using genetic programming" Comput. Chem. Eng. (1997) 10.1016/s0098-1354(97)87659-4
[29]
Castillo, F.A., and Villa, C.M. (2005, January 25–29). Symbolic regression in multicollinearity problems. Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation, Washington, DC, USA. 10.1145/1068009.1068377
[30]
Bies "A genetic algorithm-based, hybrid machine learning approach to model selection" J. Pharmacokinet. Pharmacodyn. (2006) 10.1007/s10928-006-9004-6
[31]
Katrutsa "Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria" Expert Syst. Appl. (2017) 10.1016/j.eswa.2017.01.048
[32]
Hall, M.A. (1999). Correlation-Based Feature Selection for Machine Learning, The University of Waikato.
[33]
Peng "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy" IEEE Trans. Pattern Anal. Mach. Intell. (2005) 10.1109/tpami.2005.159
[34]
Senawi "A new maximum relevance-minimum multicollinearity (MRmMC) method for feature selection and ranking" Pattern Recognit. (2017) 10.1016/j.patcog.2017.01.026
[35]
Tamura "Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor" J. Glob. Optim. (2019) 10.1007/s10898-018-0713-3
[36]
Zhao "High-dimensional variable screening under multicollinearity" Stat (2020) 10.1002/sta4.272
[37]
Sure Independence Screening for Ultrahigh Dimensional Feature Space

Jianqing Fan, Jinchi Lv

Journal of the Royal Statistical Society Series B:... 2008 10.1111/j.1467-9868.2008.00674.x
[38]
Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results

Chun‐Lin Chen, Ying-Ming Tsai, Fang‐Rong Chang et al.

Expert Systems 2020 10.1111/exsy.12553
[39]
Larabi-Marie-Sainte, S. (2021). Outlier Detection Based Feature Selection Exploiting Bio-Inspired Optimization Algorithms. Appl. Sci., 11. 10.3390/app11156769
[40]
Singh "Dealing with Multicollinearity problem in analysis of side friction characteristics under urban heterogeneous traffic conditions" Arab. J. Sci. Eng. (2021) 10.1007/s13369-020-05213-y
[41]
Horel "Applications of ridge analysis toregression problems" Chem. Eng. Progress. (1962)
[42]
Duzan "Ridge regression for solving the multicollinearity problem: Review of methods and models" J. Appl. Sci. (2015) 10.3923/jas.2015.392.404
[43]
Assaf "Diagnosing and correcting the effects of multicollinearity: Bayesian implications of ridge regression" Tour. Manag. (2019) 10.1016/j.tourman.2018.09.008
[44]
Roozbeh "Generalized cross-validation for simultaneous optimization of tuning parameters in ridge regression" Iran. J. Sci. Technol. Trans. A Sci. (2020) 10.1007/s40995-020-00851-1
[45]
Singh "An almost unbiased ridge estimator" Sankhyā Indian J. Stat. Ser. B (1986)
[46]
Kejian "A new class of biased estimate in linear regression" Commun. Stat.-Theory Methods (1993) 10.1080/03610929308831027
[47]
Liu "Using Liu-type estimator to combat collinearity" Commun. Stat.-Theory Methods (2003) 10.1081/sta-120019959
[48]
Inan "Liu-type logistic estimator" Commun. Stat.-Simul. Comput. (2013) 10.1080/03610918.2012.667480
[49]
Huang "A two-parameter estimator in the negative binomial regression model" J. Stat. Comput. Simul. (2014) 10.1080/00949655.2012.696648
[50]
"A new modified Jackknifed estimator for the Poisson regression model" J. Appl. Stat. (2016) 10.1080/02664763.2015.1125861

Showing 50 of 82 references

Cited By
463
Smart ensemble modeling for multiclass lateral‑spreading prediction

Muhammad Nouman Amjad Raja, Tarek Abdoun · 2026

Applied Soft Computing
Big Earth Data
Intelligent Geoengineering
Metrics
463
Citations
82
References
Details
Published
Apr 12, 2022
Vol/Issue
10(8)
Pages
1283
License
View
Funding
Ministry of Higher Education Award: FRGS/1/2019/STG06/UTAR/03/1
Ministry of Science and Technology of Taiwan Award: 109-2628-E-027-004–MY3
Ministry of Education of Taiwan Award: Official Document No. 1100156712
Cite This Article
Jireh Yi-Le Chan, Steven Mun Hong Leow, Khean Thye Bea, et al. (2022). Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review. Mathematics, 10(8), 1283. https://doi.org/10.3390/math10081283