Print ISSN: 1681-6900

Online ISSN: 2412-0758

Keywords : Text Classification

Textual Dataset Classification Using Supervised Machine Learning Techniques

Hanan Q. Jaleel; Jane J. Stephan; Sinan A. Naji

Engineering and Technology Journal, 2022, Volume 40, Issue 4, Pages 527-538
DOI: 10.30684/etj.v40i4.1970

Text classification has been a significant domain of study and research because of the increased volume of text datasets and documents available in digital format. Text classification is one of the major approaches used to arrange digital information via automatically allocating text dataset records or documents into predetermined classes depending on their contents. This paper proposes a technique that implements supervised machine learning algorithms such as KNN, Decision tree, Random Forest, Bernoulli Naive Bayes, and Multinomial Naive Bayes classifiers to classify a dataset into distinct classes. The proposed technique combines the above-mentioned machine learning classifiers with the TF-IDF feature extraction method as a vector space model to achieve more precise classification results. The proposed technique yields high accuracy, precision, recall, and f1-measure metric values for all the implemented classifiers. After comparing the obtained results of different classifiers, it is found that the Random Forest classifier is the best algorithm used to classify the textual dataset records with the highest accuracy value of 0.9995930.

Arabic Texts Classification Based on Keywords Extraction Technique

S.M. kadhem; Almeer; A.Q.Abd

Engineering and Technology Journal, 2017, Volume 35, Issue 2, Pages 96-104

Keyword is useful for a various purposes including labeling, summarizing, indexing, categorization, searching, and clustering. In this paper we will extract keywords from the Arabic text in order to classify it. The proposed system classify any Arabic text through simple statistic and linguistic approaches by extracting the keywords of the text (with their frequency that appear in the text) depending on a Date Base of a particular field (in this work we choose computer science field). This Data Base is represented using one B+ tree for keywords and the other DataBase for non-keywords. The proposed system was implemented using Visual Prolog 5.1, and after testing, it proved to be a valuable for Arabic text classification (From the viewpoint of accuracy and search time).