journal article Open Access Feb 21, 2022

Enhanced Credit Card Fraud Detection Model Using Machine Learning

Electronics Vol. 11 No. 4 pp. 662 · MDPI AG
View at Publisher Save 10.3390/electronics11040662
Abstract
The COVID-19 pandemic has limited people’s mobility to a certain extent, making it difficult to purchase goods and services offline, which has led the creation of a culture of increased dependence on online services. One of the crucial issues with using credit cards is fraud, which is a serious challenge in the realm of online transactions. Consequently, there is a huge need to develop the best approach possible to using machine learning in order to prevent almost all fraudulent credit card transactions. This paper studies a total of 66 machine learning models based on two stages of evaluation. A real-world credit card fraud detection dataset of European cardholders is used in each model along with stratified K-fold cross-validation. In the first stage, nine machine learning algorithms are tested to detect fraudulent transactions. The best three algorithms are nominated to be used again in the second stage, with 19 resampling techniques used with each one of the best three algorithms. Out of 330 evaluation metric values that took nearly one month to obtain, the All K-Nearest Neighbors (AllKNN) undersampling technique along with CatBoost (AllKNN-CatBoost) is considered to be the best proposed model. Accordingly, the AllKNN-CatBoost model is compared with related works. The results indicate that the proposed model outperforms previous models with an AUC value of 97.94%, a Recall value of 95.91%, and an F1-Score value of 87.40%.
Topics

No keywords indexed for this article. Browse by subject →

References
57
[1]
Dubey, S.C., Mundhe, K.S., and Kadam, A.A. (2020, January 13–15). Credit Card Fraud Detection using Artificial Neural Network and BackPropagation. Proceedings of the 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Rasayani, India. 10.1109/iciccs48265.2020.9120957
[2]
Martin, T. (2022, January 22). Credit Card Fraud: The Biggest Card Frauds in History. Available online: https://www.uswitch.com/credit-cards/guides/credit-card-fraud-the-biggest-card-frauds-in-history/.
[3]
Zhang "HOBA: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture" Inf. Sci. (2019) 10.1016/j.ins.2019.05.023
[4]
Makki "An experimental study with imbalanced classification approaches for credit card fraud detection" IEEE Access (2019) 10.1109/access.2019.2927266
[5]
McCue, C. (2015). Advanced Topics. Data Mining and Predictive Analysis, Butterworth-Heinemann. 10.1016/b978-0-12-800229-2.00015-8
[6]
Berad "A Comparative Study: Credit Card Fraud Detection Using Machine Learning" J. Crit. Rev. (2020)
[7]
Jain "A comparative analysis of various credit card fraud detection techniques" Int. J. Recent Technol. Eng. (2019)
[8]
Tolles "Logistic regression: Relating patient characteristics to outcomes" JAMA (2016) 10.1001/jama.2016.7653
[9]
Shirodkar, N., Mandrekar, P., Mandrekar, R.S., Sakhalkar, R., Kumar, K.C., and Aswale, S. (2020, January 13–15). Credit card fraud detection techniques–A survey. Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Shiroda, India. 10.1109/ic-etite47903.2020.112
[10]
Gaikwad "Credit Card Fraud Detection using Decision Tree Induction Algorithm" Int. J. Innov. Technol. Explor. Eng. IJITEE (2014)
[11]
Zareapoor "Analysis on credit card fraud detection techniques: Based on certain design criteria" Int. J. Comput. Appl. (2012)
[12]
Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman

The Annals of Statistics 2001 10.1214/aos/1013203451
[13]
Microsoft (2021, January 22). LightGBM. Available online: https://github.com/microsoft/LightGBM.
[14]
XGBoost Developers (2022, January 22). Introduction to Boosted Trees. Available online: https://xgboost.readthedocs.io/en/latest/tutorials/model.html.
[15]
Yandex Technologies (2022, January 22). CatBoost. Available online: https://yandex.com/dev/catboost/.
[16]
Delamaire "Credit card fraud and detection techniques: A review" Banks Bank Syst. (2009)
[17]
Khatri, S., Arora, A., and Agrawal, A.P. (2020, January 29–31). Supervised machine learning algorithms for credit card fraud detection: A comparison. Proceedings of the 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India. 10.1109/confluence47617.2020.9057851
[18]
Taha "An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine" IEEE Access (2020) 10.1109/access.2020.2971354
[19]
Vengatesan "Credit card fraud detection using data analytic techniques" Adv. Math. Sci. J. (2020) 10.37418/amsj.9.3.43
[20]
Puh, M., and Brkić, L. (2019, January 20–24). Detecting credit card fraud using selected machine learning algorithms. Proceedings of the 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Zagreb, Croatia. 10.23919/mipro.2019.8757212
[21]
Hema "Machine Learning methods for Discovering Credit Card Fraud" Int. Res. J. Comput. Sci. (2020)
[22]
Kumar, M.S., Soundarya, V., Kavitha, S., Keerthika, E., and Aswini, E. (2019, January 21–22). Credit card fraud detection using random forest algorithm. Proceedings of the 2019 3rd International Conference on Computing and Communications Technologies (ICCCT), Chennai, India. 10.1109/iccct2.2019.8824930
[23]
Patidar "Credit card fraud detection using neural network" Int. J. Soft Comput. Eng. IJSCE (2011)
[24]
Asha "Credit card fraud detection using artificial neural network" Glob. Trans. Proc. (2021) 10.1016/j.gltp.2021.01.006
[25]
Varmedja, D., Karanovic, M., Sladojevic, S., Arsenovic, M., and Anderla, A. (2019, January 20–22). Credit card fraud detection-machine learning methods. Proceedings of the 2019 18th International Symposium Infoteh-Jahorina (Infoteh), Novi Sad, Serbia. 10.1109/infoteh.2019.8717766
[26]
Breunig, M.M., Kriegel, H.P., Ng, R.T., and Sander, J. (2000, January 15–18). LOF: Identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA. 10.1145/342009.335388
[27]
Liu, F.T., Ting, K.M., and Zhou, Z.H. (2008, January 15–19). Isolation forest. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Ballarat, VIC, Australia. 10.1109/icdm.2008.17
[28]
John "Credit card fraud detection using local outlier factor and isolation forest" Int. J. Comput. Sci. Eng (2019)
[29]
Dal Pozzolo, A., Caelen, O., Johnson, R.A., and Bontempi, G. (2015, January 7–10). Calibrating probability with undersampling for unbalanced classification. Proceedings of the 2015 IEEE Symposium Series on Computational Intelligence, Cape Town, South Africa. 10.1109/ssci.2015.33
[30]
Sahin, Y., and Duman, E. (2011, January 15–18). Detecting credit card fraud by ANN and logistic regression. Proceedings of the 2011 International Symposium on Innovations in Intelligent Systems and Applications, Istanbul, Turkey. 10.1109/inista.2011.5946108
[31]
Kokkinaki, A.I. (1997, January 4). On atypical database transactions: Identification of probable frauds using machine learning for user profiling. Proceedings of the 1997 IEEE Knowledge and Data Engineering Exchange Workshop, Nicosia, Cyprus.
[32]
Piryonesi "Data analytics in asset management: Cost-effective prediction of the pavement condition index" J. Infrastruct. Syst. (2020) 10.1061/(asce)is.1943-555x.0000512
[33]
Maes, S., Tuyls, K., Vanschoenwinkel, B., and Manderick, B. (2002, January 16–19). Credit card fraud detection using Bayesian and neural networks. Proceedings of the 1st International Naiso Congress on Neuro Fuzzy Technologies, Brussel, Belgium.
[34]
Syeda, M., Zhang, Y.Q., and Pan, Y. (2002, January 12–17). Parallel granular neural networks for fast credit card fraud detection. Proceedings of the 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE’02. Proceedings (Cat. No. 02CH37291), Atlanta, GA, USA.
[35]
Seeja "Fraudminer: A novel credit card fraud detection model based on frequent itemset mining" Sci. World J. (2014) 10.1155/2014/252797
[36]
Scikit-Learn-Contrib (2022, January 22). Imbalanced-Learn. Available online: https://github.com/scikit-learn-contrib/imbalanced-learn.
[37]
Learning from Imbalanced Data

Haibo He, E.A. Garcia

IEEE Transactions on Knowledge and Data Engineerin... 2009 10.1109/tkde.2008.239
[38]
Dal Pozzolo, A., Caelen, O., and Bontempi, G. (2015, January 7–11). When is undersampling effective in unbalanced classification tasks?. Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Porto, Portugal. 10.1007/978-3-319-23528-8_13
[39]
Mollineda "On the k-NN performance in a challenging scenario of imbalance and overlapping" Pattern Anal. Appl. (2008) 10.1007/s10044-007-0087-5
[40]
Cieslak, D.A., and Chawla, N.V. (2008, January 15–19). Start globally, optimize locally, predict globally: Improving performance on imbalanced data. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Notre Dame, IN, USA. 10.1109/icdm.2008.87
[41]
Scikit-Learn Developers (2022, January 22). 3.1. Cross-validation: Evaluating Estimator Performance. Available online: https://scikit-learn.org/stable/modules/cross_validation.html.
[42]
The meaning and use of the area under a receiver operating characteristic (ROC) curve.

J A Hanley, B J McNeil

Radiology 1982 10.1148/radiology.143.1.7063747
[43]
Stehman "Selecting and interpreting measures of thematic classification accuracy" Remote Sens. Environ. (1997) 10.1016/s0034-4257(97)00083-7
[44]
An introduction to ROC analysis

Tom Fawcett

Pattern Recognition Letters 2006 10.1016/j.patrec.2005.10.010
[45]
Powers, D.M. (2020). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv.
[46]
Chicco "The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation" BioData Min. (2021) 10.1186/s13040-021-00244-z
[47]
Google Developers (2022, January 22). Classification: ROC Curve and AUC. Available online: https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc.
[48]
DeepAI (2022, January 22). Accuracy (Error Rate). Available online: https://deepai.org/machine-learning-glossary-and-terms/accuracy-error-rate.
[49]
Guido, S., and Müller, A.C. (2021). Introduction to Machine Learning with Python A Guide for Data Scientists, O’Reilly.
[50]
C3, AI (2022, January 22). Precision. Available online: https://c3.ai/glossary/machine-learning/precision/.

Showing 50 of 57 references

Cited By
168
International Review of Economics &...
Future Generation Computer Systems
IEEE Access
IEEE Access
Metrics
168
Citations
57
References
Details
Published
Feb 21, 2022
Vol/Issue
11(4)
Pages
662
License
View
Cite This Article
Noor Saleh Alfaiz, Suliman Mohamed Fati (2022). Enhanced Credit Card Fraud Detection Model Using Machine Learning. Electronics, 11(4), 662. https://doi.org/10.3390/electronics11040662
Related

You May Also Like

Machine Learning Interpretability: A Survey on Methods and Metrics

Diogo V. Carvalho, Eduardo M. Pereira · 2019

1,384 citations

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

Mohiuddin Ahmed, Raihan Seraj · 2020

1,342 citations

Sentiment Analysis Based on Deep Learning: A Comparative Study

Nhan Cach Dang, María N. Moreno-García · 2020

550 citations