Providing a Feature Selection Method for Lung Cancer Prediction Using Neural Network

Reza Sheibani, Mohammad Reza Mazaheri Habibi, Hojjat Azadravesh



Introduction: The remarkable growth of lung cancer and its associated impacts and consequences, along with the substantial costs it imposes on society, has driven the medical community to pursue programs aimed at further examination, prevention, early detection, and diagnosis. In medicine science, timely discovery and diagnosis of diseases can prevent many life-threatening conditions and save people's lives.

Material and Methods: This study aims to predict lung cancer using a novel feature selection method integrated with a classifier. Our approach entails a comprehensive four-stage method. Initially, we calculate feature similarities within a lung cancer dataset using the absolute value of the Pearson correlation coefficient, followed by the clustering of initial features using the community detection algorithm called Louvain. Next, we employ techniques to determine the optimal subset of features using the concept of node centrality. Ultimately, lung cancer diagnosis is executed using the selected features, leveraging a classifier.

Results: Comparative analysis reveals that our proposed method outperforms existing techniques in terms of reduced execution time and improved prediction accuracy. When compared with established methods, our approach demonstrates superior outcomes in terms of the number of selected features and classification accuracy. Our method reduced 12600 features to 118 features and its accuracy was 95.28, 95.49, 95.23 and 95.32 for Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB) and K-Nearest Neighbor (KNN) classifier. The comparison of runtime shows that the proposed method is significantly improved with a runtime of 2.146 seconds compared to other methods.

Conclusion: The proposed feature selection method successfully reduced the initial feature set and significantly decreased computational time. Moreover, the achieved prediction accuracies underscore the reliability of our approach. This significant reduction in feature space while maintaining consistently high prediction accuracies serves as a strong validation of the potency and practical applicability of our methodology in the domain of lung cancer prediction. These compelling results strongly advocate for the potential real-world impact of our approach.


Lung Cancer Prediction; Feature Selection; Artificial Neural Network;


Xu S, Hu H, Ji L, Wang P. An adaptive graph spectral analysis method for feature extraction of an EEG signal. IEEE Sensors Journal. 2019; 19(5): 1884-96.

Salah IB, De la Rosa R, Ouni K, Salah RB. Automatic diagnosis of valvular heart diseases by impedance cardiography signal processing. Biomedical Signal Processing and Control. 2020; 57: 101758.

Hanczar B, Zehraoui F, Issa T, Arles M. Biological interpretation of deep neural network for phenotype prediction based on gene expression. BMC Bioinformatics. 2020; 21(1): 501. PMID: 33148191 DOI: 10.1186/s12859-020-03836-4

Tarkhaneh O, Shen H. An adaptive differential evolution algorithm to optimal multi-level thresholding for MRI brain image segmentation. Expert Systems with Applications. 2019; 138: 112820.

Coleto-Alcudia V, Vega-Rodríguez MA. Artificial bee colony algorithm based on dominance (ABCD) for a hybrid gene selection method. Knowledge-Based Systems. 2020; 205: 106323.

Chatterjee R, Maitra T, Islam SKH, Hassan MM, Alamri A, Fortino G. A novel machine learning based feature selection for motor imagery EEG signal classification in Internet of medical things environment. Future Generation Computer Systems. 2019; 98: 419-34.

Venkataramana L, Jacob SG, Ramadoss R. A parallel multilevel feature selection algorithm for improved cancer classification. Journal of Parallel and Distributed Computing. 2020; 138: 78-98.

Rubin KH, Haastrup PF, Nicolaisen A, Möller S, Wehberg S, Rasmussen S, et al. Developing and validating a lung cancer risk prediction model: A nationwide population-based. Cancers (Basel). 2023; 15(2): 487. PMID: 36672436 DOI: 10.3390/cancers15020487

Robbins HA, Cheung LC, Chaturvedi AK, Baldwin DR, Berg CD, Katki HA. Management of lung cancer screening results based on individual prediction of current and future lung cancer risks. J Thorac Oncol. 2022; 17(2): 252-63. PMID: 34648946 DOI: 10.1016/j.jtho.2021.10.001

Olson RE, Goldsmith L, Winter S, Spaulding E, Dunn N, Mander S, et al. Emotions and lung cancer screening: prioritizing a humanistic approach to care. Health Soc Care Community. 2022; 30(6): e5259-69. PMID: 35894098 DOI: 10.1111/hsc.13945

Zhang XY, Haichao S, Xiaobin Z, Peng L. Active semi-supervised learning based on self-expressive correlation with generative adversarial networks. Neurocomputing. 2019; 345: 103-13.

Wang M, Wang S, Zhang B. APTEEN routing protocol optimization in wireless sensor networks based on combination of genetic algorithms and fruit fly optimization algorithm. Ad Hoc Networks. 2020; 102: 102138.

Dash R, Dash R, Rautray R. An evolutionary framework based microarray gene selection and classification approach using binary shuffled frog leaping algorithm. Journal of King Saud University-Computer and Information Sciences. 2019; 34(3): 880-91.

Taradeh M, Mafarja M, Heidari AA, Faris H, Aljarah I, Mirjalili SA, et al. An evolutionary gravitational search-based feature selection. Information Sciences. 2019; 497: 219-39.

Zhou X, Gao X, Wang J, Yu H, Wang Z, Chi Z. Eye tracking data guided feature selection for image classification. Pattern Recognition. 2017; 63: 56-70.

Heuvelmans MA, van Ooijen PMA, Ather S, Silva CF, Han D, Heussel CP, et al. Lung cancer prediction by deep learning to identify benign lung nodules. Lung Cancer. 2021; 154: 1-4. PMID: 33556604 DOI: 10.1016/j.lungcan.2021.01.027

Chen H, Zhang Y, Gutman I. A kernel-based clustering method for gene selection with gene expression data. J Biomed Inform. 2016; 62: 12-20. PMID: 27215190 DOI: 10.1016/j.jbi.2016.05.007

Chen R, Sjoberg DD, Huang Y, Xie L, Zhou L, He D, et al. Prostate specific antigen and prostate cancer in Chinese men undergoing initial prostate biopsies compared with western cohorts. J Urol. 2017; 197(1): 90-6. PMID: 27593477 DOI: 10.1016/j.juro.2016.08.103

Tuncer SA, Alkan A. A decision support system for detection of the renal cell cancer in the kidney. Measurement. 2018; 123: 298-303.

Dritsas E, Trigka M. Lung cancer risk prediction with machine learning models. Journal of Big Data and Cognitive Computing. 2022; 6(4): 139.

Huang SG, Arpaci I, Al-Emran M, Kılıçarslan S, Al-Sharafi MA. A comparative analysis of classical machine learning and deep learning techniques for predicting lung cancer survivability. Journal of Multimedia Tools and Applications. 2023; 82: 34183-98.

Tarus JK, Niu Z, Yousif A. A hybrid knowledge-based recommender system for e-learning based on ontology and sequential pattern mining. Future Generation Computer Systems. 2017; 72: 37-48.

Zhao Y-D, Cai S-M, Tang M, Shang M. Coarse cluster enhancing collaborative recommendation for social network systems. Physica A: Statistical Mechanics and its Applications. 2017; 483: 209-18.

Triplette M, Wenger DS, Shahrir S, Kross EK, Kava C, Phipps A, et al. Patient identification of lung cancer screening follow-up recommendations and the association with adherence. Ann Am Thorac Soc. 2022; 19(5): 799-806. PMID: 34727513 DOI: 10.1513/AnnalsATS.202107-887OC

Zheng B, Yoon SW, Lam SS. Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Systems with Applications. 2014; 41(4): 1476-82.

Jaganathan P, Kuppuchamy R. A threshold fuzzy entropy based feature selection for medical database classification. Comput Biol Med. 2013; 43(12): 2222-9. PMID: 24290939 DOI: 10.1016/j.compbiomed.2013.10.016

Inbarani HH, Azar AT, Jothi G. Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput Methods Programs Biomed. 2014; 113(1): 175-85. PMID: 24210167 DOI: 10.1016/j.cmpb.2013.10.007

Alshamaln HM, Badr GH, Alohali YA. Genetic bee colony (GBC) algorithm: A new gene selection method for microarray cancer classification. Comput Biol Chem. 2015; 56: 49-60. PMID: 25880524 DOI: 10.1016/j.compbiolchem.2015.03.001

Pérez NP, Guevara López MA, Silva A, Ramos I. Improving the Mann–Whitney statistical test for feature selection: An approach in breast cancer diagnosis on mammography. Artif Intell Med. 2015; 63(1): 19-31. PMID: 25555756 DOI: 10.1016/j.artmed.2014.12.004

Chen C-H. A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection. Applied Soft Computing. 2014; 20: 4-14.

Song Q, Zhao L, Luo XK, Dou XC. Using deep learning for classification of lung nodules on computed tomography images. J Healthc Eng. 2017; 2017: 8314740. PMID: 29065651 DOI: 10.1155/2017/8314740

Lakshmanaprabu SK, Mohanty SN, Shankar K, Arunkumar N, Ramirez G. Optimal deep learning model for classification of lung cancer on CT images. Future Generation Computer Systems. 2019; 92: 374-82.

Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008; 10: 10008–12.



  • There are currently no refbacks.