Print ISSN: 1681-6900

Online ISSN: 2412-0758

Keywords : data mining

Improving Machine Learning Performance by Eliminating the Influence of Unclean Data

Murtadha B. Ressan; Rehab F. Hassan

Engineering and Technology Journal, 2022, Volume 40, Issue 4, Pages 546-539
DOI: 10.30684/etj.v40i4.2010

Regardless of the data source and type (text, digital, photo group, etc.), they are usually unclean data. The term (unclean) means that data contains some bugs and paradoxes that can strongly impact machine learning processes. The nature of the input data of the dataset is the most important reason for the success of the learning algorithm. More than one factor influences machine learning results in a specific task. The characteristics and the nature of the data are the main reasons for the algorithm's success. This paper generally examines data processing entered into an algorithm to learn machines. The paper explains the operations of each stage of prior treatment data for the best achievement of its data set. In this paper, four models for teaching machines (SVM, Multiple Bayes - NB, and Bernoulli - NB) will be used. Best accuracy (Bernoulli - NB) model 89%. The pre-processing algorithm applied to the data set (dirty data) will be developed and compared to previous results before development. The Bernoulli-NB model reaches 91% accuracy and improves the value of the rest of the models used in this process.

Determination Efficient Classification Algorithm for Credit Card Owners: Comparative Study

Raghad A. Azeez

Engineering and Technology Journal, 2021, Volume 39, Issue 1B, Pages 21-29
DOI: 10.30684/etj.v39i1B.1577

Today in the business world, significant loss can happen when the borrowers ignore paying their loans. Convenient credit-risk management represents a necessity for lending institutions. In most times, some persons prefer to late their monthly payments, otherwise, they may face difficulties in the loan payment process to the financial institution. Mainly, most fiscal organizations are considered managed and refined client classification systems, scanning a valid client from invalid ones. This paper produces the data mining idea, specifically the classification technique of data mining and builds a system of data mining process structure. The credit scoring problem will be applied using the Taiwan bank dataset. Besides that, three classification methods are adopted, Naïve Bayesian, Decision Tree (C5.0), and Artificial Neural Network. These classifiers are implemented in the WEKA machine learning application. The results show that the C5.0 algorithm is the best among them, it achieves 0.93 accuracy rates, 0.94 detection rates, 0.96 precision rates, and 0.95 F-Measure which is higher than Naïve Bayesian and Artificial Neural Network; also, the False Positive Rate in C5.0 algorithm achieves 0.1 which is less than Artificial Neural Network and Naïve Bayesian

A Proposal to Detect Computer Worms (Malicious Codes) Using Data Mining Classification Algorithms

Soukaena Hassan Hashim; Inas Ali Abdulmunem

Engineering and Technology Journal, 2013, Volume 31, Issue 2, Pages 142-155

Malicious software (malware) performs a malicious function that compromising a
computer system’s security. Many methods have been developed to improve the security
of the computer system resources, among them the use of firewall, encryption, and
Intrusion Detection System (IDS). IDS can detect newly unrecognized attack attempt and
raising an early alarm to inform the system about this suspicious intrusion attempt. This
paper proposed a hybrid IDS for detection intrusion, especially malware, with
considering network packet and host features. The hybrid IDS designed using Data
Mining (DM) classification methods that for its ability to detect new, previously unseen
intrusions accurately and automatically. It uses both anomaly and misuse detection
techniques using two DM classifiers (Interactive Dichotomizer 3 (ID3) classifier and
Naïve Bayesian (NB) Classifier) to verify the validity of the proposed system in term of
accuracy rate. A proposed HybD dataset used in training and testing the hybrid IDS.
Feature selection is used to consider the intrinsic features in classification decision, this
accomplished by using three different measures: Association rules (AR) method, ReliefF
measure, and Gain Ratio (GR) measure. NB classifier with AR method given the most
accurate classification results (99%) with false positive (FP) rate (0%) and false negative
(FN) rate (1%).

Privacy Preserving for Data Mining Applications

Soukaena Hassan Hashem; Ala; a H. AL-Hamami

Engineering and Technology Journal, 2008, Volume 26, Issue 5, Pages 552-564

The results of data Mining (DM) such as association rules, classes, clusters,
etc, will be readily available for working team. So the mining will penetrate the
privacy of sensitive data and makes the stolen of the knowledge resulted much
more easily. The main objective of the proposed system is preserving the privacy
of data mining, that will done by developing algorithms for modifying, encrypting
and distributing the original data in the database to be mined. So we ensure the
privacy of data (original data in database that will be mined) and the privacy of
knowledge (the association rules extracted from mined database) even after the
mining process has taken place. The problem that arises when confidential
information can be derived from released data by unauthorized users can be solved.