journal article Jan 01, 2022

Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm

View at Publisher Save 10.3934/mbe.2022641
Abstract
<abstract><p>Microarray technology has developed rapidly in recent years, producing a large number of ultra-high dimensional gene expression data. However, due to the huge sample size and dimension proportion of gene expression data, it is very challenging work to screen important genes from gene expression data. For small samples of high-dimensional biomedical data, this paper proposes a two-stage feature selection framework combining Wrapper, embedding and filtering to avoid the curse of dimensionality. The proposed framework uses weighted gene co-expression network (WGCNA), random forest and minimal redundancy maximal relevance (mRMR) for first stage feature selection. In the second stage, a new gene selection method based on the improved binary Salp Swarm Algorithm is proposed, which combines machine learning methods to adaptively select feature subsets suitable for classification algorithms. Finally, the classification accuracy is evaluated using six methods: lightGBM, RF, SVM, XGBoost, MLP and KNN. To verify the performance of the framework and the effectiveness of the proposed algorithm, the number of genes selected and the classification accuracy was compared with the other five intelligent optimization algorithms. The results show that the proposed framework achieves an accuracy equal to or higher than other advanced intelligent algorithms on 10 datasets, and achieves an accuracy of over 97.6% on all 10 datasets. This shows that the method proposed in this paper can solve the feature selection problem related to high-dimensional data, and the proposed framework has no data set limitation, and it can be applied to other fields involving feature selection.</p></abstract>
Topics

No keywords indexed for this article. Browse by subject →

References
34
[1]
A. Bashiri, M. Ghazisaeedi, R. Safdari, L. Shahmoradi, H. Ehtesham, Improving the prediction of survival in cancer patients by using machine learning techniques: experience of gene expression data: a narrative review, <i>Iran. J. Public Health</i>, <b>46</b> (2017), 165−172.
[2]
A. K. Shukla, P. Singh, M. Vardhan, Gene selection for cancer types classification using novel hybrid metaheuristics approach, <i>Swarm Evol. Comput.</i>, <b>54</b> (2020), 100661. https://doi.org/10.1016/j.swevo.2020.100661 10.1016/j.swevo.2020.100661
[3]
A. Saha, S. Das, Clustering of fuzzy data and simultaneous feature selection: a model selection approach, <i>Fuzzy Set Syst.</i>, <b>340</b> (2018), 1−37. https://doi.org/10.1016/j.fss.2017.11.015 10.1016/j.fss.2017.11.015
[4]
J. A. Cruz, D. S. Wishart, Applications of machine learning in cancer prediction and prognosis, <i>Cancer Inf.</i>, <b>2</b> (2006), 59−77. https://doi.org/10.1177/117693510600200030 10.1177/11769351060020003010.1177/117693510600200030
[5]
A. K. Shukla, P. Singh, M. Vardhan, A hybrid framework for optimal feature subset selection, <i>J. Intell. Fuzzy Syst.</i>, <b>36</b> (2019), 2247−2259. https://doi.org/10.3233/JIFS-169936 10.3233/jifs-169936
[6]
I. Guyon, A. Elisseef, An introduction to variable and fea ture selection, <i>J. Mach. Learn. Res.</i>, <b>3</b> (2003), 1157–1182. https://doi.org/10.5555/944919.944968 10.5555/944919.944968
[7]
L. C. Molina, L. Belanche, A. Nebot, Feature selection algo rithms: a survey and experimental evaluation, in <i>2002 IEEE International Conference on Data Mining</i>, (2002), 306–313. <a href="https://doi.org/10.1109/ICDM.2002.1183917" target="_blank">https://doi.org/10.1109/ICDM.2002.1183917</a>
[8]
Toward integrating feature selection algorithms for classification and clustering

Huan Liu, Lei Yu

IEEE Transactions on Knowledge and Data Engineerin... 10.1109/tkde.2005.66
[9]
H. M. Zawbaa, E. Emary, C. Grosan, V. Snasel, Large-dimensionality small-instance set feature selection: a hybrid bio-inspired heuristic approach, <i>Swarm Evol. Comput.</i>, <b>42</b> (2018), 29–42. https://doi.org/10.1016/j.swevo.2018.02.021 10.1016/j.swevo.2018.02.021
[10]
L. Sun, X. Zhang, Y. Qian, J. Xu, S. Zhang, Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification, <i>Inf. Sci.</i>, <b>502</b> (2019), 18−41. https://doi.org/10.1016/j.ins.2019.05.072 10.1016/j.ins.2019.05.072
[11]
A. Kumar, A. Halder, Ensemble-based active learning using fuzzy-rough approach for cancer sample classification, <i>Eng. Appl. Artif. Intell.</i>, <b>91</b> (2020), 103591. https://doi.org/10.1016/j.engappai.2020.103591 10.1016/j.engappai.2020.103591
[12]
J. Lee, I. Choi, C. Jun, An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data, <i>Expert Syst. Appl.</i>, <b>166</b> (2020), 113971. https://doi.org/10.1016/j.eswa.2020.113971 10.1016/j.eswa.2020.113971
[13]
X. Zheng, C. Zhang, Gene selection for microarray data classification via dual latent representation learning, <i>Neurocomputing</i>, <b>461</b> (2021), 266−280. https://doi.org/10.1016/j.neucom.2021.07.047 10.1016/j.neucom.2021.07.047
[14]
L. Li, W. Ching, Z. Liu, Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods, <i>Comput. Biol. Chem.</i>, <b>100</b> (2022), 107747. https://doi.org/10.1016/j.compbiolchem.2022.107747 10.1016/j.compbiolchem.2022.107747
[15]
H. Wang, L. Tan, B. Niu, Feature selection for classification of microarray gene expression cancers using Bacterial Colony Optimization with multi-dimensional population, <i>Swarm Evol. Comput.</i>, <b>48</b> (2019), 172−181. https://doi.org/10.1016/j.swevo.2019.04.004 10.1016/j.swevo.2019.04.004
[16]
C. Shen, K. Zhang, Two-stage improved Grey Wolf optimization algorithm for feature selection on high-dimensional classification, <i>Complex Intell. Syst.</i>, <b>8</b> (2022), 1−21. https://doi.org/10.1007/s40747-021-00452-4 10.1007/s40747-021-00452-4
[17]
C. Qu, L. Zhang, J. Li, F. Deng, Y. Tang, X. Zeng, et al., Improving feature selection performance for classification of gene expression data using Harris Hawks optimizer with variable neighborhood learning, <i>Briefings Bioinf.</i>, <b>22</b> (2021). https://doi.org/10.1093/bib/bbab097 10.1093/bib/bbab097
[18]
A. Dabba, A. Tari, S. Meftali, R. Mokhtari, Gene selection and classification of microarray data method based on mutual information and moth flame algorithm, <i>Expert Syst. Appl.</i>, <b>166</b> (2020), 114012. https://doi.org/10.1016/j.eswa.2020.114012 10.1016/j.eswa.2020.114012
[19]
L. Sun, X. Kong, J. Xu, Z. Xue, R. Zhai, S. Zhang, A hybrid gene selection method based on reliefF and ant colony optimization algorithm for tumor classification, <i>Sci. Rep.</i>, <b>9</b> (2019), 8978. https://doi.org/10.1038/s41598-019-45223-x 10.1038/s41598-019-45223-x
[20]
Uzma, F. Al-Obeidat, A. Tubaishat, B. Shah, Z. Halim, Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data, <i>Neural Comput. Appl.</i>, <b>34</b> (2020), 8309−8331. https://doi.org/10.1007/s00521-020-05101-4 10.1007/s00521-020-05101-4
[21]
Salp Swarm Algorithm: A bio-inspired optimizer for engineering design problems

Seyedali Mirjalili, Amir H. Gandomi, Seyedeh Zahra Mirjalili et al.

Advances in Engineering Software 10.1016/j.advengsoft.2017.07.002
[22]
J. Kennedy, R. Eberhart, Particle swarm optimization, in <i>Proceedings of ICNN'95 - International Conference on Neural Networks</i>, 1995. <a href="https://doi.org/10.1109/ICNN.1995.488968" target="_blank">https://doi.org/10.1109/ICNN.1995.488968</a>
[23]
S. Mirjalili, S. M. Mirjalili, A. Lewis, Grey wolf optimizer, <i>Adv. Eng. Software</i>, <b>69</b> (2014), 46–61. https://doi.org/10.1016/j.advengsoft.2013.12.007 10.1016/j.advengsoft.2013.12.007
[24]
The Whale Optimization Algorithm

Seyedali Mirjalili, Andrew Lewis

Advances in Engineering Software 10.1016/j.advengsoft.2016.01.008
[25]
SCA: A Sine Cosine Algorithm for solving optimization problems

Seyedali Mirjalili

Knowledge-Based Systems 10.1016/j.knosys.2015.12.022
[26]
WGCNA: an R package for weighted correlation network analysis

Peter Langfelder, Steve Horvath

BMC Bioinformatics 10.1186/1471-2105-9-559
[27]
B. Zhang, S. Horvath, A general framework for weighted gene co-expression network analysis, <i>Stat. Appl. Genet. Mol. Biol.</i>, <b>4</b> (200), 17. https://doi.org/10.2202/1544-6115.1128 10.2202/1544-6115.1128
[28]
H. Peng, F. Long, C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, <i>IEEE Trans. Pattern Anal. Mach. Intell.</i>, <b>27</b> (2005), 1226−1238. https://doi.org/10.1109/TPAMI.2005.159 10.1109/tpami.2005.159
[29]
Available from: <a href="https://csse.szu.edu.cn/staff/zhuzx/Datasets.html" target="_blank">https://csse.szu.edu.cn/staff/zhuzx/Datasets.html</a>.
[30]
A. K. Shukla, P. Singh, M. Vardhan, An adaptive inertia weight teaching-learning-based optimization algorithm and its applications, <i>Appl. Math. Modell.</i>, <b>77</b> (2020), 309−326. https://doi.org/10.1016/j.apm.2019.07.046 10.1016/j.apm.2019.07.046
[31]
M. Rostami, S. Forouzandeh, K. Berahmand, M. Soltani, M. Shahsavari, M. Oussalah, Gene selection for microarray data classification via multi-objective graph theoretic-based method, <i>Artif. Intell. Med.</i>, <b>123</b> (2021), 102228. https://doi.org/10.1016/j.artmed.2021.102228 10.1016/j.artmed.2021.102228
[32]
B. Nouri-Moghaddam, M. Ghazanfari, M. Fathian, A novel bio-inspired hybrid multi-filter wrapper gene selection method with ensemble classifier for microarray data, <i>Neural Comput. Appl.</i>, <b>2021</b> (2021), 1−31. https://doi.org/10.1007/s00521-021-06459-9 10.1007/s00521-021-06459-9
[33]
O. A. Alomari, S. N. Makhadmeh, M. A. Al-Betar, Z. A. A. Alyasseri, I. A. Doush, A. K. Abasi, et al., Gene selection for microarray data classification based on Grey Wolf Optimizer enhanced with TRIZ-inspired operators, <i>Knowledge-Based Syst.</i>, <b>223</b> (2021), 107034. https://doi.org/10.1016/j.knosys.2021.107034 10.1016/j.knosys.2021.107034
[34]
G. Zhang, J. Hou, J. Wang, C. Yan, J. Luo, Feature selection for microarray data classification using hybrid information gain and a modified binary krill herd algorithm, <i>Interdiscip. Sci. Comput. Life Sci.</i>, <b>12</b> (2020), 288−301. https://doi.org/10.1007/s12539-020-00372-w 10.1007/s12539-020-00372-w
Metrics
10
Citations
34
References
Details
Published
Jan 01, 2022
Vol/Issue
19(12)
Pages
13747-13781
Cite This Article
Xiwen Qin, Shuang Zhang, Dongmei Yin, et al. (2022). Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm. Mathematical Biosciences and Engineering, 19(12), 13747-13781. https://doi.org/10.3934/mbe.2022641