Assessing the Validity of k-Fold Cross-Validation for Model Selection: Evidence from Bankruptcy Prediction Using Random Forest and XGBoost

Vlad Teodorescu; Laura Obreja Brașoveanu

doi:10.3390/computation13050127

journal article Open Access May 21, 2025

Assessing the Validity of k-Fold Cross-Validation for Model Selection: Evidence from Bankruptcy Prediction Using Random Forest and XGBoost

Vlad Teodorescu Laura Obreja Brașoveanu

Computation Vol. 13 No. 5 pp. 127 · MDPI AG

View at Publisher Save 10.3390/computation13050127

Abstract

Predicting corporate bankruptcy is a key task in financial risk management, and selecting a machine learning model with superior generalization performance is crucial for prediction accuracy. This study evaluates the effectiveness of k-fold cross-validation as a model selection strategy for random forest and XGBoost classifiers using a publicly available dataset of Taiwanese listed companies. We employ a nested cross-validation framework to assess the relationship between cross-validation (CV) and out-of-sample (OOS) performance on 40 different train/test data partitions. On average, we find k-fold cross-validation to be a valid selection technique when applied within a model class; however, k-fold cross-validation may fail for specific train/test splits. We find that 67% of model selection regret variability is explained by the particular train/test split, highlighting an irreducible uncertainty real world practitioners must contend with. Our study extensively explores hyperparameter tuning for both classifiers and highlights key insights. Additionally, we investigate practical implementation choices in k-fold cross-validation—such as the value of k or prediction strategies. We conclude that k-fold cross-validation is effective for model selection within a model class and on average, but it can be unreliable in specific cases or when comparing models from different classes—this latter issue warranting further investigation.

Topics

No keywords indexed for this article. Browse by subject →

References

23

[1]

FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND THE PREDICTION OF CORPORATE BANKRUPTCY

Edward I. Altman

The Journal of Finance 1968 10.1111/j.1540-6261.1968.tb00843.x

[2]

Financial Ratios and the Probabilistic Prediction of Bankruptcy

James A. Ohlson

Journal of Accounting Research 1980 10.2307/2490395

[3]

Barboza "Machine Learning Models and Bankruptcy Prediction" Expert Syst. Appl. (2017) 10.1016/j.eswa.2017.04.006

[4]

Random Forests

Leo Breiman

Machine Learning 2001 10.1023/a:1010933404324

[5]

Chen, T., and Guestrin, C. (2016, January 13–17). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA. 10.1145/2939672.2939785

[6]

Alanis "Benchmarking Machine Learning Models to Predict Corporate Bankruptcy" J. Credit. Risk (2023)

[7]

Deep learning models for bankruptcy prediction using textual disclosures

Feng Mai, Shaonan Tian, Chihoon Lee et al.

European Journal of Operational Research 2019 10.1016/j.ejor.2018.10.024

[8]

Grinsztajn, L., Oyallon, E., and Varoquaux, G. (2022). Why Do Tree-Based Models Still Outperform Deep Learning on Tabular Data?. arXiv, Available online: https://arxiv.org/abs/2207.08815.

[9]

Cawley "On Over-Fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation" J. Mach. Learn. Res. (2010)

[10]

Wainer "Nested Cross-Validation When Selecting Classifiers Is Overzealous for Most Practical Applications" Expert Syst. Appl. (2021) 10.1016/j.eswa.2021.115222

[11]

Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study

Deron Liang, Chia-Chi Lu, Chih-Fong Tsai et al.

European Journal of Operational Research 2016 10.1016/j.ejor.2016.01.012

[12]

Teodorescu, V., and Toader, C.-I. (2024, January 13–14). Using Machine Learning to Model Bankruptcy Risk in Listed Companies. Proceedings of the 7th International Conference on Economics and Social Sciences, Bucharest, Romania. Issue 1.

[13]

Dasilas "Machine Learning Techniques in Bankruptcy Prediction: A Systematic Literature Review" Expert Syst. Appl. (2024) 10.1016/j.eswa.2024.124761

[14]

Wright "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R" J. Stat. Softw. (2017) 10.18637/jss.v077.i01

[15]

Herrera "On the Importance of the Validation Technique for Classification with Imbalanced Datasets: Addressing Covariate Shift When Data Is Skewed" Inf. Sci. (2014) 10.1016/j.ins.2013.09.038

[16]

Herrera "Study on the Impact of Partition-Induced Dataset Shift on K-Fold Cross-Validation" IEEE Trans. Neural Netw. Learn. Syst. (2012) 10.1109/tnnls.2012.2199516

[17]

Rodriguez "Sensitivity Analysis of K-Fold Cross Validation in Prediction Error Estimation" IEEE Trans. Pattern Anal. Mach. Intell. (2009) 10.1109/tpami.2009.187

[18]

Rios "Cross-Validation Strategies for Balanced and Imbalanced Datasets" Intelligent Systems (2022)

[19]

Forman "Apples-to-Apples in Cross-Validation Studies: Pitfalls in Classifier Performance Measurement" ACM SIGKDD Explor. Newsl. (2010) 10.1145/1882471.1882479

[20]

Santos "Cross-Validation for Imbalanced Datasets: Avoiding Overoptimistic and Overfitting Approaches" IEEE Comput. Intell. Mag. (2018) 10.1109/mci.2018.2866730

[21]

Neunhoeffer "How Cross-Validation Can Go Wrong and What to Do About It" Polit. Anal. (2019) 10.1017/pan.2018.39

[22]

Gnip "An Experimental Survey of Imbalanced Learning Algorithms for Bankruptcy Prediction" Artif. Intell. Rev. (2025) 10.1007/s10462-025-11107-y

[23]

Fitting Linear Mixed-Effects Models Using lme4

Douglas Bates, Martin Mächler, Ben Bolker et al.

Journal of Statistical Software 2015 10.18637/jss.v067.i01

Cited By

36

Breaking the Regional Barriers: Identifying Determinants of Antenatal Care Access in Bangladesh for Improved Maternal Health Policy

Md. Salman, Mou Rani Sarker · 2025

Sustainable Development

Deep learning decodes species-specific codon usage signatures in Brassica from coding sequences

Anjum Shahzad, Muhammad Arfan · 2025

Scientific Reports

Metrics

36

Citations

23

References

Details

Published: May 21, 2025
Vol/Issue: 13(5)
Pages: 127
License: View

Authors

V

Vlad Teodorescu

Finance Department, The Bucharest University of Economic Studies, Piața Romană 6, 010374 București, Romania

L

Laura Obreja Brașoveanu

Center of Financial and Monetary Research (CEFIMO), Finance Department, The Bucharest University of Economic Studies, Piața Romană 6, 010374 București, Romania

Funding

Bucharest University of Economic Studies

Cite This Article

Vlad Teodorescu, Laura Obreja Brașoveanu (2025). Assessing the Validity of k-Fold Cross-Validation for Model Selection: Evidence from Bankruptcy Prediction Using Random Forest and XGBoost. Computation, 13(5), 127. https://doi.org/10.3390/computation13050127

Assessing the Validity of k-Fold Cross-Validation for Model Selection: Evidence from Bankruptcy Prediction Using Random Forest and XGBoost

You May Also Like