Feature selection methods for text classification

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, supplementary material.

  • Li Q Zhao S He T Wen J (2024) A simple and efficient filter feature selection method via document-term matrix unitization Pattern Recognition Letters 10.1016/j.patrec.2024.02.025 181 (23-29) Online publication date: May-2024 https://doi.org/10.1016/j.patrec.2024.02.025
  • Al-Saleh M Alkouz A Alarabeyyat A Bouchahma M (2023) Towards Classifying File Segments in Memory Using Machine-Learning 2023 9th International Conference on Information Technology Trends (ITT) 10.1109/ITT59889.2023.10184243 (44-49) Online publication date: 24-May-2023 https://doi.org/10.1109/ITT59889.2023.10184243
  • Ibrahim A Alfonse M Aref M (2023) Effectiveness of Feature Selection in Text Summarization 2023 Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS) 10.1109/ICICIS58388.2023.10391140 (128-133) Online publication date: 21-Nov-2023 https://doi.org/10.1109/ICICIS58388.2023.10391140
  • Show More Cited By

Index Terms

Information systems

Software and its engineering

Software notations and tools

General programming languages

Language features

Data types and structures

Recommendations

High-performing feature selection for text classification.

This paper reports a controlled study on a large number of filter feature selection methods for text classification. Over 100 variants of five major feature selection criteria were examined using four well-known classification algorithms: a Naive ...

Comparison on Feature Selection Methods for Text Classification

The high-dimensional text data always contains a large quantity of noisy terms which bring negative effects on the performance of text classification. Feature selection is the common solution for dimension reduction in text classification. The choices of ...

Feature selection for text classification with Naïve Bayes

As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm ...

Information

Published in.

  • General Chair:

Yahoo!, USA

  • Program Chairs:

Cornell University, USA

Author Picture

University of Vermont, USA

  • SIGMOD: ACM Special Interest Group on Management of Data
  • SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
  • ACM: Association for Computing Machinery

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • feature selection
  • random sampling
  • regularized least squares classification
  • text classification

Acceptance Rates

Upcoming conference, contributors, other metrics, bibliometrics, article metrics.

  • 90 Total Citations View Citations
  • 2,836 Total Downloads
  • Downloads (Last 12 months) 76
  • Downloads (Last 6 weeks) 3
  • Tiwari D Nagpal B Bhati B Mishra A Kumar M (2023) A systematic review of social network sentiment analysis with comparative study of ensemble-based techniques Artificial Intelligence Review 10.1007/s10462-023-10472-w 56 :11 (13407-13461) Online publication date: 12-Apr-2023 https://doi.org/10.1007/s10462-023-10472-w
  • Groppe J Schlichting R Groppe S Möller R (2023) Deep Learning-Based Classification of Customer Communications of a German Utility Company Semantic Intelligence 10.1007/978-981-19-7126-6_16 (205-222) Online publication date: 1-Apr-2023 https://doi.org/10.1007/978-981-19-7126-6_16
  • Zare Chahooki M khalifeh zadeh z (2022) A General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification on Telegram Signal and Data Processing 10.52547/jsdp.19.2.175 19 :2 (175-196) Online publication date: 1-Sep-2022 https://doi.org/10.52547/jsdp.19.2.175
  • Xu Y Yu Z Cao W Chen C (2022) Adaptive Dense Ensemble Model for Text Classification IEEE Transactions on Cybernetics 10.1109/TCYB.2021.3133106 52 :8 (7513-7526) Online publication date: Aug-2022 https://doi.org/10.1109/TCYB.2021.3133106
  • Islam M Lima A Das S Mridha M Prodeep A Watanobe Y (2022) A Comprehensive Survey on the Process, Methods, Evaluation, and Challenges of Feature Selection IEEE Access 10.1109/ACCESS.2022.3205618 10 (99595-99632) Online publication date: 2022 https://doi.org/10.1109/ACCESS.2022.3205618
  • Li J Zhang C Zhang J Qin X Hu L (2022) MiCS-P:Parallel mutual-information computation of big categorical data on spark Journal of Parallel and Distributed Computing 10.1016/j.jpdc.2021.12.002 161 (118-129) Online publication date: Mar-2022 https://doi.org/10.1016/j.jpdc.2021.12.002
  • Zhou H Li X Wang C Ma Y (2022) A feature selection method based on term frequency difference and positive weighting factor Data & Knowledge Engineering 10.1016/j.datak.2022.102060 141 :C Online publication date: 1-Sep-2022 https://dl.acm.org/doi/10.1016/j.datak.2022.102060

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

  • DOI: 10.1007/s11042-018-6083-5
  • Corpus ID: 13684595

Feature selection for text classification: A review

  • Xuelian Deng , Yuqing Li , +1 author Jilian Zhang
  • Published in Multimedia tools and… 8 May 2018
  • Computer Science

254 Citations

A novel feature and class-based globalization technique for text classification, a new big data feature selection approach for text classification, a review of semi-supervised learning for text classification, does a hybrid neural network based feature selection model improve text classification.

  • Highly Influenced

Feature selection methods for text classification: a systematic literature review

Tktc: a framework for top-k text classification of multimedia computing in wireless networks, a feature selection method for multi-label text based on feature importance, text classification using naïve bayes classifier, a novel class-center vector model for text classification using dependencies and a semantic dictionary, a new approach for text documents classification with invasive weed optimization and naive bayes classifier, 126 references, a new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization, an extensive empirical study of feature selection metrics for text classification, feature selection for text classification with naïve bayes, best terms: an efficient feature-selection algorithm for text categorization, feature selection in svm text categorization, ocfs: optimal orthogonal centroid feature selection for text categorization, hybrid feature selection for text classification, feature selection for text categorization on imbalanced data, feature selection for ordinal text classification, some effective techniques for naive bayes text classification, related papers.

Showing 1 through 3 of 0 Related Papers

Feature selection methods for text classification: a systematic literature review

  • Published: 24 February 2021
  • Volume 54 , pages 6149–6200, ( 2021 )

Cite this article

feature selection methods for text classification a systematic literature review

  • Julliano Trindade Pintas   ORCID: orcid.org/0000-0001-5416-8982 1 ,
  • Leandro A. F. Fernandes   ORCID: orcid.org/0000-0001-8491-793X 1 &
  • Ana Cristina Bicharra Garcia   ORCID: orcid.org/0000-0002-3797-5157 2  

3999 Accesses

55 Citations

1 Altmetric

Explore all metrics

Feature Selection (FS) methods alleviate key problems in classification procedures as they are used to improve classification accuracy, reduce data dimensionality, and remove irrelevant data. FS methods have received a great deal of attention from the text classification community. However, only a few literature surveys include them focusing on text classification, and the ones available are either a superficial analysis or present a very small set of work in the subject. For this reason, we conducted a Systematic Literature Review (SLR) that asses 1376 unique papers from journals and conferences published in the past eight years (2013–2020). After abstract screening and full-text eligibility analysis, 175 studies were included in our SLR. Our contribution is twofold. We have considered several aspects of each proposed method and mapped them into a new categorization schema. Additionally, we mapped the main characteristics of the experiments, identifying which datasets, languages, machine learning algorithms, and validation methods have been used to evaluate new and existing techniques. By following the SLR protocol, we allow the replication of our revision process and minimize the chances of bias while classifying the included studies. By mapping issues and experiment settings, our SLR helps researchers to develop and position new studies with respect to the existing literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

feature selection methods for text classification a systematic literature review

Similar content being viewed by others

feature selection methods for text classification a systematic literature review

Filter feature selection methods for text classification: a review

feature selection methods for text classification a systematic literature review

Modified Pointwise Mutual Information-Based Feature Selection for Text Classification

feature selection methods for text classification a systematic literature review

Feature Selection in Text Mining

Abdollahi M, Gao X, Mei Y, Ghosh S, Li J (2019) An ontology-based two-stage approach to medical text classification with feature selection by particle swarm optimisation. In: Proceedings of the IEEE congress on evolutionary computation, pp 119–126

Agnihotri D, Verma K, Tripathi P (2017) Variable global feature selection scheme for automatic classification of text documents. Expert Syst Appl 81:268–281. https://doi.org/10.1016/j.eswa.2017.03.057

Article   Google Scholar  

Agnihotri D, Verma K, Tripathi P (2016) Computing correlative association of terms for automatic classification of text documents. Proceedings of the international symposium on computer vision and the internet, https://doi.org/10.1145/2983402.2983424

Agnihotri D, Verma K, Tripathi P (2017a) Mutual information using sample variance for text feature selection. In: Proceedings of the international conference on communication and information processing, pp 39–44, https://doi.org/10.1145/3162957.3163054

Agnihotri D, Verma K, Tripathi P, Singh B (2018) Soft voting technique to improve the performance of global filter based feature selection in text corpus. Appl Intell 49. https://doi.org/10.1007/s10489-018-1349-1

Agun HV, Yilmazel O (2019) Incorporating topic information in a global feature selection schema for authorship attribution. IEEE Access 7:98522–98529

Al-Salemi B, Ayob M, Noah SAM (2018) Feature ranking for enhancing boosting-based multi-label text categorization. Expert Syst Appl 113:531–543. https://doi.org/10.1016/j.eswa.2018.07.024

Al-Salemi B, Ayob M, Noah SAM, Aziz MJA (2017) Feature selection based on supervised topic modeling for boosting-based multi-label text categorization. In: Proceedings of the international conference on electrical engineering and informatics, pp 1–6, https://doi.org/10.1109/ICEEI.2017.8312411

Alshalabi H, Tiun S, Omar N, Albared M (2013) Experiments on the use of feature selection and machine learning methods in automatic Malay text categorization. Procedia Technol 11:748–754. https://doi.org/10.1016/J.PROTCY.2013.12.254

Arani SHS, Mozaffari S (2013) Genetic-based feature selection for spam detection. In: Proceedings of the Iranian conference on electrical engineering, https://doi.org/10.1109/IranianCEE.2013.6599551

Baccianella S, Esuli A, Sebastiani F (2013) Using micro-documents for feature selection: the case of ordinal text classification. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2013.02.010

Baccianella S, Esuli A, Sebastiani F (2014) Feature selection for ordinal text classification. Neural Comput

Badawi D, Altincay H (2014) A novel framework for termset selection and weighting in binary text classification. Eng Appl Artif Intell 35:38–53. https://doi.org/10.1016/j.engappai.2014.06.012

Baggenstoss PM (2003) The PDF projection theorem and the class-specific method. IEEE Trans Sig Process 51(3):672–685. https://doi.org/10.1109/TSP.2002.808109

Article   MathSciNet   MATH   Google Scholar  

Bagheri A, Saraee M, De Jong F (2013) Sentiment classification in Persian: introducing a mutual information-based method for feature selection. In: Proceedings of the Iranian conference on electrical engineering, https://doi.org/10.1109/IranianCEE.2013.6599671

Bahassine S, Madani A, Al-Sarem M, Kissi M (2018) Feature selection using an improved Chi-square for Arabic text classification. J King Saud Univ—Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2018.05.010

Bahassine S, Madani A, Kissi M (2016) An improved Chi-square feature selection for Arabic text classification using decision tree. In: Proceedings of the international conference on intelligent systems: theories and applications, pp 1–5, https://doi.org/10.1109/SITA.2016.7772289

Bai X, Gao X, Xue B (2018) Particle swarm optimization based two-stage feature selection in text mining. In: Proceedings of the IEEE congress on evolutionary computation, pp 1–8

Belazzoug M, Touahria M, Nouioua F, Brahimi M (2020) An improved sine cosine algorithm to select features for text categorization. J King Saud Univ—Comput Inf Sci 32(4):454–464. https://doi.org/10.1016/j.jksuci.2019.07.003

Benitez IP, Sison AM, Medina RP (2018) An improved genetic algorithm for feature selection in the classification of disaster-related Twitter messages. In: Proceedings of the IEEE symposium on computer applications and industrial electronics, https://doi.org/10.1109/ISCAIE.2018.8405477

BenSaid F, Alimi AM (2021) Online feature selection system for big data classification based on multi-objective automated negotiation. Pattern Recognit 110:107629. https://doi.org/10.1016/j.patcog.2020.107629

Bergstra J, Bengio Y (2013) Random search for hyper-parameter optimization. J Mach Learn Res

Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

Braytee A, Liu W, Catchpoole D, Kennedy P (2017) Multi-label feature selection using correlation information. In: Proceedings of the ACM on conference on information and knowledge management, pp 1649–1656, https://doi.org/10.1145/3132847.3132858

Canuto S, Sousa DX, Gonçalves MA, Rosa TC (2018) A thorough evaluation of distance-based meta-features for automated text classification. IEEE Trans Knowl Data Eng 11(10):346–347. https://doi.org/10.1007/s10489-018-1349-1 0

Cekik R, Uysal AK (2020) A novel filter feature selection method using rough set for short text data. Expert Syst Appl 160:113691. https://doi.org/10.1007/s10489-018-1349-1 1

Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28. https://doi.org/10.1007/s10489-018-1349-1 2

Chen H, Hou Q, Han L, Hu Z, Ye Z, Zeng J, Yuan J (2019) Distributed text feature selection based on bat algorithm optimization. Proc IEEE Int Conf Intell Data Acquis Adv Comput Syst Technol Appl 1:75–80

Google Scholar  

Chen Y, Han B, Hou P (2014) New feature selection methods based on context similarity for text categorization. In: Proceedings of the international conference on fuzzy systems and knowledge discovery, https://doi.org/10.1109/FSKD.2014.6980902

Chen H, Hou Y, Luo Q, Hu Z, Yan L (2018) Text feature selection based on water wave optimization algorithm. In: Proceedings of the international conference on advanced computational intelligence, https://doi.org/10.1109/ICACI.2018.8377518

Chen L, Li J, Zhang L (2017) A method of text categorization based on genetic algorithm and LDA. In: Proceedings of the chinese control conference, https://doi.org/10.23919/ChiCC.2017.8029089

Chen X, Ma J, Lu Y (2013) Feature selection for Chinese online reviews sentiment classification. In: Proceedings of the joint conference of international conference on computational problem-solving and international high speed intelligent communication forum, https://doi.org/10.1109/ICCPS.2013.6893490

Chopard B, Tomassini M (2018) An introduction to metaheuristics for optimization. Springer Int Publ. https://doi.org/10.1007/978-3-319-93073-2

Article   MATH   Google Scholar  

Chormunge S, Jena S (2018) Correlation based feature selection with clustering for high dimensional data. J Electl Syst Inf Technol 5(3):542–549. https://doi.org/10.1016/J.JESIT.2017.06.004

Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

MathSciNet   MATH   Google Scholar  

Deng X, Li Y, Weng J, Zhang J (2019) Feature selection for text classification: a review. Multimed Tools Appl 78(3):3797–3816. https://doi.org/10.1007/s11042-018-6083-5

Ekbal A, Saha S (2015) Joint model for feature selection and parameter optimization coupled with classifier ensemble in chemical mention recognition. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2015.04.015

Feng G, Guo J, Jing BY, Sun T (2015a) Feature subset selection using naive Bayes for text classification. Pattern Recognit Lett. https://doi.org/10.1016/j.patrec.2015.07.028

Feng L, Zuo W, Wang Y (2015b) Improved comprehensive measurement feature selection method for text categorization. In: Proceedings of the international conference on network and information systems for computers, https://doi.org/10.1109/ICNISC.2015.34

Ferilli S, De Carolis B, Esposito F, Redavid D (2015) Sentiment analysis as a text categorization task: a study on feature and algorithm selection for Italian language. In: Proceedings of the IEEE international conference on data science and advanced analytics, https://doi.org/10.1109/DSAA.2015.7344882

Ferreira CHP, De Medeiros DMR, Santana F (2016) FCFilter: feature selection based on clustering and genetic algorithms. In: Proceedings of the IEEE congress on evolutionary computation, https://doi.org/10.1109/CEC.2016.7744048

Fong S, Gao E, Wong R (2016) Optimized swarm search-based feature selection for text mining in sentiment analysis. In: Proceedings of the IEEE international conference on data mining workshop, pp 1153–1162, https://doi.org/10.1109/ICDMW.2015.231

Forman G (2004) A pitfall and solution in multi-class feature selection for text classification. Proceed Int Conf Mach Learn 10(1145/1015330):1015356

Fragoso RCP, Pinheiro RHW, Cavalcanti GDC (2016) Class-dependent feature selection algorithm for text categorization. In: Proceedings of the international joint conference on neural networks, vol 2016-Octob, https://doi.org/10.1109/IJCNN.2016.7727649

Fragoso RCP, Pinheiro RHW, Cavalcanti GDC (2017) A method for automatic determination of the feature vector size for text categorization. In: Proceedings of the Brazilian conference on intelligent systems, https://doi.org/10.1109/BRACIS.2016.055

Fukumoto F, Suzuki Y (2015) Temporal-based feature selection and transfer learning for text categorization. In: Proceedings of the international joint conference on knowledge discovery, knowledge engineering and knowledge management, http://socrates.acadiau.ca/courses/comp/dsilver/

Gao Z, Xu Y, Meng F, Qi F, Lin Z (2014) Improved information gain-based feature selection for text categorization. In: Proceedings of the international conference on wireless communications, vehicular technology, information theory and aerospace and electronic systems

Ghareb AS, Bakar AA, Hamdan AR (2016) Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2015.12.004

Ghareb AS, Abu Bakara A, Al-Radaideh QA, Hamdan AR (2018) Enhanced filter feature selection methods for Arabic text categorization. Int J Inf Retr Res. https://doi.org/10.4018/IJIRR.2018040101

Gökalp O, Tasci E, Ugur A (2020) A novel wrapper feature selection algorithm based on iterated greedy metaheuristic for sentiment classification. Expert Syst Appl 146:113176. https://doi.org/10.1016/j.eswa.2020.113176

Gunduz H, Cataltepe Z (2015) Borsa Istanbul (BIST) daily prediction using financial news and balanced feature selection. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2015.07.058

Guo Y, Chung F, Li G, Zhang L (2019) Multi-label bioinformatics data classification with ensemble embedded feature selection. IEEE Access 7:103863–103875

Guo Y, Chung F, Li G (2017) An ensemble embedded feature selection method for multi-label clinical text classification. In: Proceedings of the IEEE international conference on bioinformatics and biomedicine, https://doi.org/10.1109/BIBM.2016.7822631

Guru DS, Ali M, Suhil M (2018) A novel term weighting scheme and an approach for classification of agricultural arabic text complaints. In: Proceedings of the IEEE international workshop on arabic and derived script analysis and recognition, pp 24–28

Guru DS, Suhil M, Raju LN, Kumar NV (2018) An alternative framework for univariate filter based feature selection for text categorization. Pattern Recog Lett 103(2018):23–31. https://doi.org/10.1016/j.patrec.2017.12.025

Guru D, Swarnalatha K, Kumar VN, Anami B (2020) Effective technique to reduce the dimension of text data. Int J Comput Vis Image Process 10:67–85. https://doi.org/10.4018/IJCVIP.2020010104

Hagenau M, Liebmann M, Neumann D (2013) Automated news reading: stock price prediction based on financial news using context-capturing features. Decis Support Syst 55(3):685–697. https://doi.org/10.1016/j.dss.2013.02.006

Hai NT, Nghia NH, Le TD, Nguyen VT (2015) A hybrid feature selection method for Vietnamese text classification. In: Proceedings of the IEEE international conference on knowledge and systems engineering, https://doi.org/10.1109/KSE.2015.25

Han J, Zuo W, Liu L, Xu Y, Peng T (2016) Building text classifiers using positive, unlabeled and ‘outdated’ examples. Concurr Comput. https://doi.org/10.1002/cpe.3879

Higgins JPT, Green S (2008) Cochrane handbook for systematic reviews of interventions: cochrane book series. Wiley, New York. https://doi.org/10.1002/9780470712184

Book   Google Scholar  

Hussain S, Keung J, Khan AA (2017) Software design patterns classification and selection using text categorization approach. Appl Soft Comput 58:225–244. https://doi.org/10.1016/J.ASOC.2017.04.043

Hussain SF, Babar HZUD, Khalil A, Jillani RM, Hanif M, Khurshid K (2020) A fast non-redundant feature selection technique for text data. IEEE Access 8:181763–181781. https://doi.org/10.1109/ACCESS.2020.3028469

Imani MB, Keyvanpour MR (2013) Azmi R (2013) A novel embedded feature selection method: a comparative study in the application of text categorization. Appl Artif Intell 10(1080/08839514):774211

Islam M, Anjum A, Ahsan T, Wang L (2019) Dimensionality reduction for sentiment classification using machine learning classifiers. In: Proceedings of the IEEE symposium series on computational intelligence, pp 3097–3103

Japkowicz N (2000) The class imbalance problem: significance and strategies. In: Proceedings of the international conference on artificial intelligence

Javed K, Maruf S, Babri HA (2015) A two-stage Markov blanket based feature selection algorithm for text classification. Neurocomputing. https://doi.org/10.1016/j.neucom.2015.01.031

Jiang XY, Jin S (2013) An improved mutual information-based feature selection algorithm for text classification. In: Proceedings of the international conference on intelligent human-machine systems and cybernetics, https://doi.org/10.1109/IHMSC.2013.37

Jiang T, Yu H (2015) A novel feature selection based on Tibetan grammar for Tibetan text classification. In: Proceedings of the IEEE international conference on software engineering and service sciences, https://doi.org/10.1109/ICSESS.2015.7339093

Jie Y, Keping L (2019) The fault diagnosis model for railway system based on an improved feature selection method. In: Proceedings of the IEEE international conference on electronics information and emergency communication, pp 1–4

Karabulut M (2013) Fuzzy unordered rule induction algorithm in text categorization on top of geometric particle swarm optimization term selection. Knowl Based Syst 54:288–297. https://doi.org/10.1016/J.KNOSYS.2013.09.020

Kermani FZ, Eslami E, Sadeghi F (2019) Global filter-wrapper method based on class-dependent correlation for text classification. Eng Appl Artif Intell 85:619–633. https://doi.org/10.1016/j.engappai.2019.07.003

Kim K, Zzang S (2018) Trigonometric comparison measure: a feature selection method for text categorization. Data Knowl Eng 119. https://doi.org/10.1016/j.datak.2018.10.003

Kitchenham B (2004) Procedures for performing systematic reviews. Tech. Rep. TR/SE-0401, Department of Computer Science, Keele University and National ICT

Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Inf Switz 10. https://doi.org/10.3390/info10040150

Kumar HMK, Harish BS (2018) Sarcasm classification: a novel approach by using content based feature selection method. Procedia computer science 143:378–386. https://doi.org/10.1016/j.procs.2018.10.409 , 8th international conference on advances in computing and communications (ICACC-2018)

Kumar V (2014) Feature selection a literature review. Smart Comput Rev. https://doi.org/10.6029/smartcr.2014.03.007

Kumbhar P, Mali M (2013) A survey on feature selection techniques and classification algorithms for efficient text slassification. Int J Sci Res 14(5):2319–7064

Kumbhar P, Mali M, Atique M (2017) A genetic-fuzzy approach for automatic text categorization. In: Proceedings of the international advance computing conference, https://doi.org/10.1109/IACC.2017.114

Kun YJ, Lei Z (2014) Sentiment feature selection algorithm for Chinese micro-blog. In: Proceedings of the international conference on management of e-commerce and e-government, pp 114–118, https://doi.org/10.1109/ICMeCG.2014.32

Kyaw KS, Limsiroratana S (2019) Towards nature-inspired intelligence search for optimization of multi-dimensional feature selection. In: Proceedings of the international computer science and engineering conference, pp 379–384

Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70(November 2016):25–37. https://doi.org/10.1016/j.engappai.2017.12.014

Labani M, Moradi P, Jalili M (2020) A multi-objective genetic algorithm for text feature selection using the relative discriminative criterion. Expert Syst Appl 149:113276. https://doi.org/10.1016/j.eswa.2020.113276

Lampos V, Zou B, Cox IJ (2017) Enhancing feature selection using word embeddings. Proc Int Conf World Wide Web 10(1145/3038912):3052622

Lan Y, Hao Y, Xia K, Qian B, Li C (2020) Stacked residual recurrent neural networks with cross-layer attention for text classification. IEEE Access 8:70401–70410

Larabi Marie-Sainte S, Alalyani N (2018) Firefly algorithm based feature selection for Arabic text classification. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/J.JKSUCI.2018.06.004

Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, De Schaetzen V, Duque R, Bersini H, Nowé A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinf. https://doi.org/10.1109/TCBB.2012.33

Lee J, Kim DW (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett. https://doi.org/10.1016/j.patrec.2012.10.005

Lee J, Kim DW (2015) Mutual information-based multi-label feature selection using interaction information. Expert Syst Appl 42(4):2013–2025. https://doi.org/10.1016/j.eswa.2014.09.063

Lee J, Yu I, Park J, Kim DW (2019) Memetic feature selection for multilabel text categorization using label frequency difference. Inf Sci 485:263–280. https://doi.org/10.1016/j.ins.2019.02.021

Lewis DD (2019) Reuters-21578 text categorization collection data set. https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection

Lewis DD, Yang Y, Rose TG, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397

Li B (2016a) Importance weighted feature selection strategy for text classification. In: Proceedings of the international conference on Asian language processing

Li B (2016b) Selecting features with class based and importance weighted document frequency in text classification. In: Proceedings of the ACM symposium on document engineering, pp 139–142, https://doi.org/10.1145/2960811.2967164

Li J (2013) An approach to meta feature selection. In: Proceedings of the Canadian conference on electrical and computer engineering, https://doi.org/10.1109/CCECE.2013.6567849

Li Z, Lu W, Sun Z, Xing W (2016) A parallel feature selection method study for text classification. Neural Comput Appl 28:1–12. https://doi.org/10.6029/smartcr.2014.03.007 0

Liang J, Zhou X, Guo L, Bai S (2015) Feature selection for sentiment classification using matrix factorization. In: Proceedings of the international conference on world wide web, pp 63–64, https://doi.org/10.1145/2740908.2742741

Lifang Y, Sijun Q, Huan Z (2017) Feature selection algorithm for hierarchical text classification using Kullback-Leibler divergence. In: Proceedings of the ieee international conference on cloud computing and big data analysis, https://doi.org/10.1109/ICCCBDA.2017.7951950

Li Q, He L, Lin X (2013a) Categorical term frequency probability based feature selection for document categorization. In: Proceedings of the international conference on soft computing and pattern recognition, https://doi.org/10.1109/SOCPAR.2013.7054103

Li Q, He L, Lin X (2013b) Dimension reduction based on categorical fuzzy correlation degree for document categorization. In: Proceedings of the IEEE international conference on granular computing, https://doi.org/10.1109/GrC.2013.6740405

Li Q, He L, Lin X (2014) Improved categorical distribution difference feature selection for Chinese document categorization. In: Proceedings of the international conference on ubiquitous information management and communication

Li L, Li C (2015) Research and improvement of a spam filter based on naive Bayes. In: Proceedings of the international conference on intelligent human-machine systems and cybernetics, https://doi.org/10.1109/IHMSC.2015.208

Lin KC, Zhang KY, Huang YH, Hung JC, Yen N (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput. https://doi.org/10.6029/smartcr.2014.03.007 1

Liu Y, Wang Y, Feng L, Zhu X (2016) Term frequency combined hybrid feature selection method for spam filtering. Pattern Anal Appl. https://doi.org/10.6029/smartcr.2014.03.007 2

Li B, Yan Q, Xu Z, Wang G (2015) Weighted document frequency for feature selection in text classification. In: Proceedings of international conference on Asian language processing, https://doi.org/10.1109/IALP.2015.7451549

Li J, Zhao J, Lu K (2016a) Joint feature selection and structure preservation for domain adaptation. In: Proceedings of the IJCAI international joint conference on artificial intelligence

Lu Y, Chen Y (2017) A text feature selection method based on the small world algorithm. Procedia Comput Sci 107:276–284. https://doi.org/10.6029/smartcr.2014.03.007 3

Lu Y, Liang M, Ye Z, Cao L (2015) Improved particle swarm optimization algorithm and its application in text feature selection. Appl Soft Comput J. https://doi.org/10.6029/smartcr.2014.03.007 4

Malji P, Sakhare S (2017) Significance of entropy correlation coefficient over symmetric uncertainty on FAST clustering feature selection algorithm. In: Proceedings of international conference on intelligent systems and control, https://doi.org/10.1109/ISCO.2017.7856035

Manning CD, Schutze H, Raghavan P (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

Manochandar S, Punniyamoorthy M (2018) Scaling feature selection method for enhancing the classification performance of support vector machines in text mining. Comput Ind Eng 124:139–156. https://doi.org/10.1016/j.cie.2018.07.008

Mendez JR, Nez TRCY, Ruano-Ordas D (2019) A new semantic-based feature selection method for spam filtering. Appl Soft Comput 76:89–104. https://doi.org/10.1016/j.asoc.2018.12.008

Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. In: Proceedings of the international conference on learning representations - workshop track proceedings

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Proceedings of the advances in neural information processing systems, pp 3111–3119

Mirończuk MM, Protasiewicz J (2018) A recent overview of the state-of-the-art elements of text classification. Expert Syst Appl 106:36–54. https://doi.org/10.1016/j.eswa.2018.03.058

Mladenović M, Mitrović J, Krstev C, Vitas D (2016) Hybrid sentiment analysis framework for a morphologically rich language. J Intell Inf Syst. https://doi.org/10.1007/s10844-015-0372-5

Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol. https://doi.org/10.1016/j.jclinepi.2009.06.005

Nag K, Pal NR (2016) A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans Cybernet. https://doi.org/10.1109/TCYB.2015.2404806

Naik A, Rangwala H (2016) Embedding feature selection for large-scale hierarchical classification. In: Proceedings of the IEEE international conference on big data, https://doi.org/10.1109/BigData.2016.7840725

Nam LNH, Quoc HB (2016) A combined approach for filter feature selection in document classification. In: Proceedings of the international conference on tools with artificial intelligence, https://doi.org/10.1109/ICTAI.2015.56

Nogueira Rios T, Gama Bispo BV (2018) Statera: a balanced feature selection method for text classification. In: Proceedings of the Brazilian conference on intelligent systems, pp 260–265

Onan A, Korukoglu S (2017) A feature selection model based on genetic rank aggregation for text sentiment classification. J Inf Sci 43(1):25–38. https://doi.org/10.1177/0165551515613226

Ong BY, Goh SW, Xu C (2015) Sparsity adjusted information gain for feature selection in sentiment analysis. In: Proceedings of the IEEE international conference on big data, pp 2122–2128, https://doi.org/10.1109/BigData.2015.7363995

Ortega-Mendoza RM, López-Monroy AP, Franco-Arcega A, Montes-y Gómez M (2018) Emphasizing personal information for author profiling: new approaches for term selection and weighting. Knowl Based Syst 145:169–181. https://doi.org/10.1016/J.KNOSYS.2018.01.014

Ouhbi B, Kamoune M, Frikh B, Zemmouri EM, Behja H (2016) A hybrid feature selection rule measure and its application to systematic review. In: Proceedings of the international conference on information integration and web-based applications and services, pp 106–114, https://doi.org/10.1145/3011141.3011177

Parlar T, Ozel SA, Song F (2016) A new feature selection method for sentiment analysis of Turkish reviews. In: Proceedings of the international symposium on innovations in intelligent systems and applications, pp 1–6, https://doi.org/10.1109/INISTA.2016.7571833

Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput J 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002

Patil LH, Atique M (2013) A novel feature selection based on information gain using WordNet. In: Proceedings of the science and information conference

Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the conference on empirical methods in natural language processing, pp 1532–1543

Pereira RB, Plastino A, Zadrozny B, Merschmann LHC (2018) Categorizing feature selection methods for multi-label classification. Artif Intell Rev 49(1):57–78. https://doi.org/10.1007/s10462-016-9516-4

Pinheiro RHW, Cavalcanti GDC, Ren TI (2015) Data-driven global-ranking local feature selection methods for text categorization. Expert Syst Appl. https://doi.org/10.1016/j.asoc.2018.12.008 0

Pintas JT, Correia L, Bicharra Garcia AC (2017) Crowd-based feature selection for document retrieval in highly demanding decision-making scenarios. Procedia Comput Sci 112:822–832. https://doi.org/10.1016/j.asoc.2018.12.008 1

Pramokchon P, Piamsa-Nga P (2014) A feature score for classifying class-imbalanced data. In: Proceedings of the international computer science and engineering conference, https://doi.org/10.1109/ICSEC.2014.6978232

Qazi A, Goudar RH (2018) An ontology-based term weighting technique for web document categorization. Procedia Comput Sci 133:75–81. https://doi.org/10.1016/j.asoc.2018.12.008 2

Qin S, Song J, Zhang P, Tan Y (2016) Feature selection for text classification based on part of speech filter and synonym merge. In: Proceedings of the international conference on fuzzy systems and knowledge discovery, https://doi.org/10.1109/FSKD.2015.7382024

Rajamohana SP, Umamaheswari K, Keerthana SV (2017) An effective hybrid cuckoo search with harmony search for review spam detection. In: Proceedings of the IEEE international conference on advances in electrical and electronics, information, communication and bio-informatics, https://doi.org/10.1109/AEEICB.2017.7972369

Rasool A, Tao R, Kamyab A (2020) GAWA - a feature selection method for hybrid sentiment classification. IEEE Access 8:191850–191861. https://doi.org/10.1016/j.asoc.2018.12.008 3

Rastogi S (2018) Improving classification accuracy of automated text classifiers. In: Proceedings of the international conference on reliability, infocom technologies and optimization (Trends and Future Directions), pp 1–7

Ravi K, Ravi V (2016) Sentiment classification of Hinglish text. In: Proceedings of the international conference on recent advances in information technology, https://doi.org/10.1109/RAIT.2016.7507974

Rehman A, Javed K, Babri HA, Saeed M (2015) Relative discrimination criterion—a novel feature ranking method for text data. Expert Syst Appl. https://doi.org/10.1016/j.asoc.2018.12.008 4

Rehman A, Javed K, Babri HA (2017) Feature selection based on a normalized difference measure for text classification. Inf Process Manag 53(2):473–489. https://doi.org/10.1016/j.asoc.2018.12.008 5

Rehman A, Javed K, Babri HA, Asim N (2018) Selection of the most relevant terms based on a max-min ratio metric for text classification. Expert Syst Appl 114:78–96. https://doi.org/10.1016/j.asoc.2018.12.008 6

Ren JS, Wang W, Wang J, Liao SS (2013) Exploring the contribution of unlabeled data in financial sentiment analysis. arXiv preprint https://doi.org/10.1016/j.asoc.2018.12.008 7 pp 1149–1155

Rennie J (2019) The 20 newsgroups data set. https://doi.org/10.1016/j.asoc.2018.12.008 8

Roul RK, Gugnani S, Kalpeshbhai SM (2016b) Clustering based feature selection using extreme learning machines for text classification. In: Proceedings of the IEEE international conference electronics, energy, environment, communication, computer, control, https://doi.org/10.1109/INDICON.2015.7443788

Roul RK, Bhalla A, Srivastava A (2016a) Commonality-rarity score computation. Proc Annu Meet Forum Inf Retr Eval 10(1145/3015157):3015165

Rui W, Liu J, Jia Y (2016) Unsupervised feature selection for text classification via word embedding. In: Proceedings of the IEEE international conference on big data analysis, pp 1–5, https://doi.org/10.1109/ICBDA.2016.7509787

Ruta D (2014) Robust method of sparse feature selection for multi-label classification with naive Bayes. In: Proceedings of the federated conference on computer science and information systems, pp 375–380, https://doi.org/10.15439/2014F502

Rzeniewicz J, Szymanski JS (2013) Selecting features with SVM. In: Proceedings of the iberoamerican congress on pattern recognition

Sabbah T, Selamat A, Selamat MH, Ibrahim R, Fujita H (2016) Hybridized term-weighting method for dark web classification. Neurocomputing. https://doi.org/10.1016/j.asoc.2018.12.008 9

Sammut C, Webb GI (2010) Encyclopedia of machine learning. Springer, US

Sarhan AM, Hamissa GM, Elbehiry HE (2016) Proposed document frequency technique for minimizing dataset in web crawler. In: Proceedings of the international conference on computer engineering and systems, https://doi.org/10.1109/ICCES.2015.7393008

Shah FP, Patel V (2016) A review on feature selection and feature extraction for text classification. In: Proceedings of the IEEE international conference on wireless communications, signal processing and networking, https://doi.org/10.1109/WiSPNET.2016.7566545

Shahid R, Javed ST, Zafar K (2017) Feature selection based classification of sentiment analysis using biogeography optimization algorithm. In: Proceedings of the international conference on innovations in electrical engineering and computational technologies, https://doi.org/10.1109/ICIEECT.2017.7916549

Shang C, Li M, Feng S, Jiang Q, Fan J (2013) Feature selection via maximizing global information gain for text classification. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2013.09.019

Shang L, Zhou Z, Liu X (2016) Particle swarm optimization-based feature selection in sentiment classification. Soft Comput. https://doi.org/10.1007/s00500-016-2093-2

Shen K, Chen X, Ke L, Lu Y, Zhang K (2013) A blended feature selection method in text. In: Proceedings of the conference on cyberspace technology, pp 573–576

Sheydaei N, Saraee M, Shahgholian A (2015) A novel feature selection method for text classification using association rules and clustering. J Inf Sci. https://doi.org/10.1177/0165551514550143

Somantri O, Kurnia DA, Sudrajat D, Rahaningsih N, Nurdiawan O, Perdana Wanti L (2019) A hybrid method based on particle swarm optimization for restaurant culinary food reviews. In: Proceedings of the international conference on informatics and computing, pp 1–5

Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2011.181

Song J, Zhang P, Qin S, Gong J (2016) A aethod of the feature selection in hierarchical text classification based on the category discrimination and position information. In: Proceedings of the International conference on industrial informatics - computing technology, intelligent technology, industrial information integration, https://doi.org/10.1109/ICIICII.2015.116

Stambaugh C, Yang H, Breuer F (2013) Analytic feature selection for support vector machines. In: Proceedings of the machine learning and data mining in pattern recognition, pp 219–233

Sundararajan K, Palanisamy A, Versaci M (2020) Multi-rule based ensemble feature selection model for sarcasm type detection in Twitter. Comput Intell Neurosci 2020:2860479. https://doi.org/10.1155/2020/2860479

Sun J, Zhang X, Liao D, Chang V (2017) Efficient method for feature selection in text classification. In: Proceedings of international conference on engineering and technology, vol 2018-Janua, pp 1–6, https://doi.org/10.1109/ICEngTechnol.2017.8308201

Su Z, Xu H, Zhang D, Xu Y (2014) Chinese sentiment classification using a neural network tool - Word2vec. In: Proceedings of the international conference on multisensor fusion and information integration for intelligent systems, https://doi.org/10.1109/MFI.2014.6997687

Tang B, He H, Baggenstoss PM, Kay S (2016a) A Bayesian classification approach using class-specific features for text categorization. IEEE Trans Knowl Data Eng 28(6):1602–1606. https://doi.org/10.1109/TKDE.2016.2522427

Tang B, Kay S, He H (2016b) Toward optimal feature selection in naive Bayes for text categorization. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2016.2563436

Tang B, Kay S, He H, Baggenstoss PM (2016c) EEF: exponentially embedded families with class-specific features for classification. IEEE Sig Process Lett. https://doi.org/10.1109/LSP.2016.2574327

Tang X, Dai Y, Xiang Y (2019) Feature selection based on feature interactions with application to text categorization. Expert Syst Appl 120:207–216. https://doi.org/10.1016/j.eswa.2018.11.018

Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. Data classification: algorithms and applications

Tang B, He H (2016) FSMJ: feature selection with maximum Jensen-Shannon divergence for text categorization. In: Proceedings of the world congress on intelligent control and automation, vol 2016-Septe, pp 3143–3148, https://doi.org/10.1109/WCICA.2016.7578786

Tian W, Li J, Li H (2018) A method of feature selection based on Word2Vec in text categorization. In: Proceedings of the Chinese control conference, pp 9452–9455

Tommasel A (2016) Integrating social network structure into online feature selection. In: Proceedings of the IJCAI international joint conference on artificial intelligence, vol 2016-Janua, pp 4032–4033

Tripathy A, Anand A, Rath SK (2017) Document-level sentiment classification using hybrid machine learning approach. Knowl Inf Syst 53(3):805–831. https://doi.org/10.1016/j.knosys.2013.09.019 0

Trivedi SK, Tripathi A (2017) Sentiment analyis of Indian movie review with various feature selection techniques. In: Proceedings of the IEEE international conference on advances in computer applications, https://doi.org/10.1109/ICACA.2016.7887947

Tutkan M, Ganiz MC, Akyokuş S (2016) Helmholtz principle based supervised and unsupervised feature selection methods for text mining. Inf Process Manag. https://doi.org/10.1016/j.knosys.2013.09.019 1

Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl. https://doi.org/10.1016/j.knosys.2013.09.019 2

Uysal AK, Gunal S (2012) A novel probabilistic feature selection method for text classification. Knowl Based Syst 36:226–235. https://doi.org/10.1016/j.knosys.2013.09.019 3

Vani K, Gupta D (2017) Text plagiarism classification using syntax based linguistic features. Expert Syst Appl 88:448–464. https://doi.org/10.1016/j.knosys.2013.09.019 4

Vychegzhanin SV, Razova EV, Kotelnikov EV (2019) What number of features is optimal: a new method based on approximation function for stance detection task. Proce Int Conf Inf Commun Manag ICICM 2019:43–47. https://doi.org/10.1016/j.knosys.2013.09.019 5

W3Techs (2019) Historical trends in the usage of content languages for websites, September 2019. https://doi.org/10.1016/j.knosys.2013.09.019 6

Wang H, Hong M (2019) Supervised Hebb rule based feature selection for text classification. Inf Process Manag 56(1):167–191. https://doi.org/10.1016/j.knosys.2013.09.019 7

Wang J, Wu L, Kong J, Li Y, Zhang B (2013) Maximum weight and minimum redundancy: a novel framework for feature subset selection. Pattern Recog. https://doi.org/10.1016/j.knosys.2013.09.019 8

Wang Y, Liu Y, Feng L, Zhu X (2014) Novel feature selection method based on harmony search for email classification. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2013.09.019 9

Wang Y, Liu Y, Zhu X (2014) Two-step based hybrid feature selection method for spam filtering. J Intell Fuzzy Syst 27:2785–2796. https://doi.org/10.1007/s00500-016-2093-2 0

Wang D, Zhang H, Liu R, Lv W, Wang D (2014a) T-test feature selection approach based on term frequency for text categorization. Pattern Recog Lett. https://doi.org/10.1007/s00500-016-2093-2 1

Wang D, Zhang H, Liu R, Liu X, Wang J (2016) Unsupervised feature selection through Gram-Schmidt orthogonalization - a word co-occurrence perspective. Neurocomputing. https://doi.org/10.1007/s00500-016-2093-2 2

Wang Q, Liu L, Jiang J, Jiang M, Lu Y, Pei Z (2017) Feature selection method based on multiple centrifuge models. Cluster Comput 20(2):1425–1435. https://doi.org/10.1007/s00500-016-2093-2 3

Wang Y, Wang J, Liao H, Chen H (2017) An efficient semi-supervised representatives feature selection algorithm based on information theory. Pattern Recog. https://doi.org/10.1007/s00500-016-2093-2 4

Webkb (2019) The 4 universities data set. https://doi.org/10.1007/s00500-016-2093-2 5

Wu L, Wang Y, Zhang S, Zhang Y (2017) Fusing gini index and term frequency for text feature selection. In: Proceedings of the IEEE international conference on multimedia big data, https://doi.org/10.1109/BigMM.2017.65

Wu G, Wang L, Zhao N, Lin H (2016) Improved expected cross entropy method for text feature selection. In: Proceedings of the international conference on computer science and mechanical automation, https://doi.org/10.1109/CSMA.2015.17

Wu G, Xu J (2016) Optimized approach of feature selection based on information gain. In: Proceedings of the international conference on computer science and mechanical automation, https://doi.org/10.1109/CSMA.2015.38

Xiaoming D, Tang Y (2013) Improved mutual information method for text feature selection. In: Proceedings of the international conference on computer science and education

Xu Z, King I, Lyu M, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Netw 21:1033–1047. https://doi.org/10.1007/s00500-016-2093-2 6

Xu J, Jiang H (2015) An improved information gain feature selection algorithm for SVM text classifier. In: Proceedings of the international conference on cyber-enabled distributed computing and knowledge discovery, https://doi.org/10.1109/CyberC.2015.53

Xu H, Xu L (2017) Multi-label feature selection algorithm based on label pairwise ranking comparison transformation. In: Proceedings of the international joint conference on neural networks

Yang ZT, Zheng J (2016) Research on Chinese text classification based on Word2vec. In: Proceedings of the IEEE international conference on computer and communications research

Yang J, Liu Z, Qu Z, Wang J (2014) Feature selection method based on crossed centroid for text categorization. In: Proceedings of the IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing, https://doi.org/10.1109/SNPD.2014.6888675

Yang J, Lu Y, Liu Z (2019) An improved strategy of the feature selection algorithm for the text categorization. In: Proceedings of the IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing, pp 3–7

Yang J, Wang J, Liu Z, Qu Z (2015) A term weighting scheme based on the measure of relevance and distinction for text categorization. In: Proceedings of the IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing, https://doi.org/10.1109/SNPD.2015.7176178

Yigit F, Baykan OK (2014) A new feature selection method for text categorization based on information gain and particle swarm optimization. In: Proceedings of IEEE international conference on cloud computing and intelligence systems, https://doi.org/10.1109/CCIS.2014.7175792

Yousefpour A, Ibrahim R, Hamed HNA (2017) Ordinal-based and frequency-based integration of feature selection methods for sentiment analysis. Expert Syst Appl 75:80–93. https://doi.org/10.1007/s00500-016-2093-2 7

Zainuddin N, Selamat A, Ibrahim R (2018) Hybrid sentiment classification on Twitter aspect-based sentiment analysis. Appl Intell 48(5):1218–1232. https://doi.org/10.1007/s00500-016-2093-2 8

Zhang Z, Ke T, Deng N, Tan J (2014) Biased p-norm support vector machine for PU learning. Neurocomputing 136:256–261. https://doi.org/10.1007/s00500-016-2093-2 9

Zhang J, Hu X, Li P, He W, Zhang Y, Li H (2014a) A hybrid feature selection approach by correlation-based filters and SVM-RFE. In: Proceedings of the international conference on pattern recognition, pp 3684–3689, https://doi.org/10.1109/ICPR.2014.633

Zhang H, Ren YG, Yang X (2013) Research on text feature selection algorithm based on information gain and feature relation tree. In: Proceedings of the web information system and application conference, pp 446–449, https://doi.org/10.1109/WISA.2013.90

Zhen Z, Wang H, Xing Y, Han L (2016) Text feature selection approach by means of class difference. In: Proceedings of the international conference on natural computation, fuzzy systems and knowledge discovery, https://doi.org/10.1109/FSKD.2016.7603412

Zhou X, Hu Y, Guo L (2014) Text categorization based on clustering feature selection. Procedia Comput Sci 31:398–405. https://doi.org/10.1177/0165551514550143 0

Zhou H, Han S, Liu Y (2018) A novel feature selection approach based on document frequency of segmented term frequency. IEEE Access 6:53811–53821

Zhou H, Guo J, Wang Y, Zhao M (2016) A feature selection approach based on interclass and intraclass relative contributions of terms. Comput Intell Neurosci. https://doi.org/10.1155/2016/1715780

Zhu L, Wang G, Zou X (2017) Improved information gain feature selection method for Chinese text classification based on word embedding. Proc Int Conf Softw Comput Appl 10(1145/3056662):3056671

Zhuang Y, Wang H, Xiao J, Wu F, Yang Y, Lu W, Zhang Z (2017) Bag-of-discriminative-words (BoDW) representation via topic modeling. IEEE Trans Knowl Data Eng 29(5):977–990. https://doi.org/10.1109/TKDE.2017.2658571

Zong W, Wu F, Chu LK, Sculli D (2015) A discriminative and semantic feature selection method for text categorization. Int J Prod Econ 165:215–222. https://doi.org/10.1016/j.ijpe.2014.12.035

Zuo Z, Li J, Anderson P, Yang L, Naik N (2018) Grooming detection using fuzzy-rough feature selection and text classification. In: Proceedings of the IEEE international conference on fuzzy systems, pp 1–8

Download references

Author information

Authors and affiliations.

Instituto de Computação - Universidade Federal Fluminense (UFF), Niterói, RJ, Brazil

Julliano Trindade Pintas & Leandro A. F. Fernandes

Departamento de Informática Aplicada, Universidade Federal do Estado do Rio de Janeiro (UNIRIO), Rio de Janeiro, RJ, Brazil

Ana Cristina Bicharra Garcia

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Julliano Trindade Pintas .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research for this work was partially sponsored by FAPERJ (grant E-26/202.718/2018) and CNPq-Brazil (grants 311.037/2017-8 and 305.853/2018-0)

Appendix A. List of acronyms

Accuracy Measure

Balanced Accuracy Measure

At Least One FeaTure

Analysis of Variance

Bit-priori Association Classification Algorithm

Binary Black Hole Algorithm

Blended Feature Selection Method

Binary Gravitational Search Algorithm

Balanced Mutual Information

Bag of Discriminative Words

Bag of Words

Binary Particle Swarm Optimization

Correlative Association Score

Class Discriminating Measure

Comprehensively Measure Feature Selection

Convolutional Neural Network

Crowd-based Feature Selection

Cat Swarm Optimization

Deep Belief Network

Document Frequency

Discriminative Features Selection

Distinguishing Feature Selector

Diversified Greedy Backward-Forward Search

Discriminative Personal Purity

Decision Tree

Ensemble Embedded Feature Selection

Fuzzy Rough Feature Selection

Feature Selection

Genetic Algorithm and Wrapper Approaches

Global Filter-based Feature Selection Scheme

Geometric Particle Swarm Optimization

Hierarchical Attention Network

Hebb Rule Based Feature Selection

Inverse Document Frequency

Information Gain

Improved Particle Swarm Optimization

Improved Sine Cosine Algorithm

k -Nearest Neighbors

Latent Dirichlet Allocation

Latent Selection Augmented Naive Bayes

Markov Blanket Filter

Meta Feature Selection

Memetic Feature Selection based on Label Frequency Difference

Mutual Information

Multivariate Mutual Information

Max-Min Ratio

Multi-Objective Automated Negotiation based Online Feature Selection

Multi-Objective Relative Discriminative Criterion

Multivariate Relative Discrimination Criterion

Naive Bayes

Normalized Difference Measure

Optimized Swarm Search-based Feature Selection

Pairwise Comparison Transformation

Part of Speech

Part of Speech Filter

Particle Swarm Optimization

Reuters Corpus Volume I

Relative Discrimination Criterion

Random Forest

Recursive Feature Elimination

Random Projection and Gram-Schmidt Orthogonalization

Sparsity Adjusted Information Gain

Spark BAT Feature Selection

Square of Information Gain and Chi-square

Systematic Literature Review

Synthetic Minority Oversampling Technique

Support Vector Machines

Support Vector Machine-Recursive Feature Elimination

Small World Algorithm

Student’s t -Test

Term Frequency

Term Frequency-Inverse Document Frequency

Wrapper Feature Selection Algorithm based on Iterated Greedy

Wolf Intelligence Based Optimization of Multi-Dimensional Feature Selection Approach

Rights and permissions

Reprints and permissions

About this article

Pintas, J.T., Fernandes, L.A.F. & Garcia, A.C.B. Feature selection methods for text classification: a systematic literature review. Artif Intell Rev 54 , 6149–6200 (2021). https://doi.org/10.1007/s10462-021-09970-6

Download citation

Accepted : 29 January 2021

Published : 24 February 2021

Issue Date : December 2021

DOI : https://doi.org/10.1007/s10462-021-09970-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Feature selection
  • Dimensionality reduction
  • Text classification
  • Systematic literature review
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. (PDF) Feature selection methods for text classification: a systematic

    feature selection methods for text classification a systematic literature review

  2. Classification of feature selection methods

    feature selection methods for text classification a systematic literature review

  3. Feature selection techniques for classification

    feature selection methods for text classification a systematic literature review

  4. Classification of feature selection methods

    feature selection methods for text classification a systematic literature review

  5. Feature selection methods for text classification: a systematic

    feature selection methods for text classification a systematic literature review

  6. systematic literature review steps

    feature selection methods for text classification a systematic literature review

VIDEO

  1. Systematic Literature Review Workshop 1

  2. Introduction Systematic Literature Review-Various frameworks Bibliometric Analysis

  3. Introduction to systematic review

  4. How to Conduct a Systematic Literature Review from Keenious AI tool

  5. Overall feature selection process-Machine Learning-FEATURE SUBSET SELECTION-Unit-2-CSE-R20-JNTUA

  6. Literature review structure and AI tools

COMMENTS

  1. Feature selection methods for text classification: a ...

    Feature Selection (FS) methods alleviate key problems in classification procedures as they are used to improve classification accuracy, reduce data dimensionality, and remove irrelevant data. FS methods have received a great deal of attention from the text classification community.

  2. (PDF) Feature selection methods for text classification: a ...

    Feature Selection (FS) methods alleviate key problems in classification procedures as they are used to improve classification accuracy, reduce data dimensionality, and remove irrelevant...

  3. Feature selection methods for text classification: a ...

    A new feature selection method for text classification is proposed, named Statera, that selects a subset of features that guarantees the representativeness of all classes from a domain in a balanced way, and calculates such degree of represent ativeness based on information retrieval measures.

  4. Feature Selection Methods for Text Classification

    We consider feature selection for text classification both the-oretically and empirically. Our main result is an unsuper-vised feature selection strategy for which we give worst-case theoretical guarantees on the generalization power of the resultant classification function f˜with respect to the classi-

  5. Feature selection methods for text classification ...

    This paper reports a controlled study on a large number of filter feature selection methods for text classification. Over 100 variants of five major feature selection criteria were examined using four well-known classification algorithms: a Naive ...

  6. Feature Selection Methods for Text Classi cation: A ...

    Abstract Feature Selection (FS) methods alleviate key problems in classi cation procedures as they are used to improve classi cation accuracy, reduce data dimen-sionality, and remove...

  7. Feature selection methods for text classification: a ...

    (DOI: 10.1007/S10462-021-09970-6) Feature Selection (FS) methods alleviate key problems in classification procedures as they are used to improve classification accuracy, reduce data dimensionality, and remove irrelevant data.

  8. A systematic review of emerging feature selection ... - Springer

    In this study, a systematic review of the metaheuristic-based feature selection methods for enhancing text classification was performed. The review answered many questions, such as the sub-field of metaheuristics, how it affects the accuracy of text classification, datasets, amongst others.

  9. Feature selection for text classification: A review ...

    A comprehensive review on feature selection techniques for text classification, including Nearest Neighbor (NN) method, Naïve Bayes, Support Vector Machine (SVM), Decision Tree (DT), and Neural Networks, is given. Expand. View on Springer. Save to Library. Create Alert. Cite. Topics. AI-Generated.

  10. Feature selection methods for text classification: a ...

    Feature Selection (FS) methods alleviate key problems in classification procedures as they are used to improve classification accuracy, reduce data dimensionality, and remove irrelevant data. FS methods have received a great deal of attention from the text classification community.