Hanan Q. Jaleel; Jane J. Stephan; Sinan A. Naji
Abstract
Text classification has been a significant domain of study and research because of the increased volume of text datasets and documents available in digital format. Text classification ...
Read More ...
Text classification has been a significant domain of study and research because of the increased volume of text datasets and documents available in digital format. Text classification is one of the major approaches used to arrange digital information via automatically allocating text dataset records or documents into predetermined classes depending on their contents. This paper proposes a technique that implements supervised machine learning algorithms such as KNN, Decision tree, Random Forest, Bernoulli Naive Bayes, and Multinomial Naive Bayes classifiers to classify a dataset into distinct classes. The proposed technique combines the above-mentioned machine learning classifiers with the TF-IDF feature extraction method as a vector space model to achieve more precise classification results. The proposed technique yields high accuracy, precision, recall, and f1-measure metric values for all the implemented classifiers. After comparing the obtained results of different classifiers, it is found that the Random Forest classifier is the best algorithm used to classify the textual dataset records with the highest accuracy value of 0.9995930.