Improved Hybrid Feature Selection Approach for Sentiment Classification: Integrating Chi-Square and Recursive Feature Elimination
Main Article Content
Abstract
Abstract: Feature selection process select important features that participate in deciding the sentiment of the text and enhance the classification accuracy. Reducing dimensionality, overfitting and underfitting also enhance precision, recall, and F1 score. FS also reduce complexity, storage, and computing time. In this paper, combination of Chi2 and recursive feature elimination is used as a hybrid feature selection method on Amazon review dataset. Three other state of the art feature selection methodsGenetic Algorithm (GA), Mutual Information (MI), and Principal Component Analysis PCA with six classifiers (like Random Forest Classifier (RFC),Logistic Regression (LR),K-Nearest Neighbor (KNN),Linear Support Vector Classifier (Linear SVC), Naïve Bayes (NB),Decision Tree (DT)) are used in this study. Chi2+RFE and MI with 50 percent (10936 features) feature selection methods have given improved accuracy, precision, recall, and f1-score concerning the base condition, where all features (21873 features) are included as well as the above-mentioned classifiers, it demonstrates that Chi2+RFC gives 0.821 maximum accuracy with the LR classifier, 0.821 maximum recall with the LR classifier, and 0.742 maximum f1-score with the DT classifier. Chi2+RFE performs better than other other FS techniques in terms of accuracy, precision, recall, and f1-score.