Intrusion Detection System for NSL-KDD Dataset Based on Deep Learning and Recursive Feature Elimination

Intrusion detection system is responsible for monitoring the systems and detect attacks, whether on (host or on a network) and identifying attacks that could come to the system and cause damage to them, that’s mean an IDS prevents unauthorized access to systems by giving an alert to the administrator before causing any serious harm. As a reasonable supplement of the firewall, intrusion detection technology can assist systems to deal with offensive, the Intrusions Detection Systems (IDSs) suffers from high false positive which leads to highly bad accuracy rate. So this work is suggested to implement (IDS) by using a Recursive Feature Elimination to select features and use Deep Neural Network (DNN) and Recurrent Neural Network (RNN) for classification, the suggested model gives good results with high accuracy rate reaching 94%, DNN was used in the binary classification to classify either attack or Normal, while RNN was used in the classifications for the five classes (Normal, Dos, Probe, R2L, U2R). The system was implemented by using (NSL-KDD) dataset, which was very efficient for offline analyses systems for IDS.                                                                                                   

Intrusion detection system is responsible for monitoring the systems and detect attacks, whether on (host or on a network) and identifying attacks that could come to the system and cause damage to them, that's mean an IDS prevents unauthorized access to systems by giving an alert to the administrator before causing any serious harm.As a reasonable supplement of the firewall, intrusion detection technology can assist systems to deal with offensive, the Intrusions Detection Systems (IDSs) suffers from high false positive which leads to highly bad accuracy rate.So this work is suggested to implement (IDS) by using a Recursive Feature Elimination to select features and use Deep Neural Network (DNN) and Recurrent Neural Network (RNN) for classification, the suggested model gives good results with high accuracy rate reaching 94%, DNN was used in the binary classification to classify either attack or Normal, while RNN was used in the classifications for the five classes (Normal, Dos, Probe, R2L, U2R).The system was implemented by using (NSL-KDD) dataset, which was very efficient for offline analyses systems for IDS.

INTRODUCTION
Intrusion detection systems is a security technique which analyses network systems and computer in real time to detect intrusions and manage responsive actions [1].Signature and Anomaly are two major models that utilized in intrusion detection systems.The anomaly is depending on the statistical description of programs or users which is mean detecting any activity deviating from the profile of normal behavior The Signature-based IDSs depends on gathering and saving the signature of known attacks in database.[2,3].The very huge increase in networks and with the increase of devices and users, the computers suffer from attacks and security vulnerabilities which be difficult and expensive to be solved, so the best solution is intrusion detection system to control and monitor the traffic in network.The paper revision for anomaly detection fully based on deep machine learning ways on different training and testing dataset.The suggested system was implemented by NSL-KDD dataset because of the problems in KDD99 dataset [4].The Network Intrusion Detection System (NIDS) is estimate to know attacks which need a comprehensive data set that contains known and unknown behaviors [5].This work proposes a network intrusion detection system which depends on deep neural network and recurrent neural network to classify the normal and the attacks.The paper is arranged as follows: section two illustrates some related works, section three shows Descriptions for the dataset as inclusive and substantive, section four and five illustrates Recurrent and deep neural networks, section six explains the evaluation metrics, section seven explain the steps of suggested system, section eight illustrates the experimental results and discussions and finally conclusions and future work.

RELATED WORK
Many recent systems depend on normal machine learning techniques.One public ways to build intrusion detection systems is to utilize Artificial Neural Network .Such as the back-propagation algorithm [6].Many of the common methods are used to detect intrusion in intrusion detection systems, such as Support vector machines [7,8], K-nearest neighbor (KNN) [9], and Random forest (RF) [10].[11] Suggested IDS on the NSL-KDD dataset by using the support vector machine and decision tree algorithms [12], respectively.Use anomaly detection with the Naive Bayes (NB) [13] and examined on the KDD99 [14].Examined DARPA dataset by utilized Support Vector.Examine of KDD dataset by added the C4 and Self-Organization Map [15].Albeit, to make some perfection, such as the accuracy and decreasing the size of false alarms [16] .Based on the KDD dataset and using Long short term memory algorithm for feature selection and classify an attacks in dataset [17].Machine learning learn the particulars of TCP/IP attributes, but Deep learning is a part of machine learning that be complex because it consists of many layers and transit the TCP/IP in many layers.design this model that combine discretization and HNB classifier this approach focused on problems in intrusion detection and this model based on hidden layer in NB model for many class than can get better accuracy and high detection rate of attacks [18].Newly, intrusion detection systems became on deep learning ways .In [19],suggested for intrusion detection a proposal Self-taught Learning (STL) .According to [20], LSTM assemble many-features using various sources features, such as numeric, nominal, and binary features.In [21], the authors detected how to establish an (IDS) with RNN ..

DATASET DESCRIPTION
Since 1999, the Knowledge Discovery and Data Mining (KDD'99), wildly used data set for the estimated of anomaly detection techniques [22].Some investigators examine KDD99 but results were poor execution on the anomaly detection approach so the best solution was using new dataset which is NSL-KDD [23].This dataset consists of chosen records of the all KDD data set.The Training dataset consists of (125,973) and test dataset consists of (22,544) samples each of sample contains 41 attributes either attacks or normal.Attacks in this dataset are split into: Root to Local (R2L), User to Root(U2R), Denial of service(DOS), Probe Attacks.The following shows some explanations for these attacks types: -1.Dos: It is a class of attack where the attacker restricts processing time of the resources so as to avoid the real user from obtaining those resources.2. R2L: Attackers are not allowing access from a remote system.R2L attack is both categorized under network based and host-based (NIDS).3. U2R: the attacker tries to gain the password of the user and then get into the system as a legitimate user and retrieve the data.4. Probe: The Attacker will exam the network to collect information and would make some penetrations in the next time.
In addition to these types we also have a class describing the Normal class.Table I shows the four types of attack in NSL_ KDD [23].

RECURRENT NEURAL NETWORKS
Recurrent neural networks are a type of supervised Deep learning models, made of artificial neurons with one or more returns loops.The returns loops are recurrent rotation over time or sequence [24].A Recurrent neural network has been successfully used for text data, speech data, classification, regression, natural language processing and generative models [25].A recurrent neural networks not suitable for image data and tabular data.

DEEP NEURAL NETWORK
A deep neural network is a neural network with a certain level of complexity, this algorithm has more than two layers.Deep neural networks utilize sophisticated!Mathematical modeling!to process data in complex!methods.A deep neural network (DNN) has been successfully utilized for a number of classification and regression systems including image classification and natural language processing.It also used for a speech recognition [26] and Bayesian speech and language processing [27].The following metrics are the most usually utilized evaluation metrics: -1) Accuracy: Defined as the average of truly classified samples as normal or attack over the whole number of samples.

EVALUATION METRICS
2) Precision (P): Is the proportion of positive predictions made by the classifier that are true, as in the following equation.
3) Recall(R): It is the percentage of correct positive that is truly detected by the classifier and called DR, TPR.
Is the result between of the precision and the recall.(5)

THE PROPOSED SYSTEM
Based on NSL-KDD dataset, the suggested system for designing an Network Intrusion Detection System.After applying preprocessing on raw dataset.Normalization is applied to make the values of all features are between 0 and 1.Training set was used to train model and testing set was used to evaluate the trained model.To select important features from Training set we applying RFE technique and these important features will be used with testing set at prediction part.After select features for training set, a classification is implemented on training set by using Deep Neural Network and Recurrent Neural Network algorithms.Finally the model is evaluated by prediction with test set and compare results.The proposed system is shown in Figure 1.

•
The nominal attributes in the dataset converting to numerical values.The features 2, 3 and 4 are the protocol type, service and flag.

•
The attack types at the end of the dataset convert into its numeric categories.

II. Normalization: -
The features in the NSL-KDD dataset is continuous or discrete values.The ranges of the values were different and this made them incomparable.The features were normalized by subtracting mean from all feature and dividing by its standard deviation, then normalized the test features using the mean and standard deviation of each feature from train datasets.Min-Max normalization way which is a linear transformation is utilized to scale data between (0,1).The following method is utilized to find the new value [28]:

III. Features Selection: -
One of the essential phases in data pre-processing for intrusion detection system is feature selection.It decreases the number of features, split redundant, irrelevant, or noisy data, and fetch the essential features that effects for intrusion detection system.The wrapper based Recursive Feature Elimination for Random Forest Classifier (RFC-RFE) method, is utilized as feature selection technique which represent wrapper selection way.Recursive Feature Elimination (RFE) provides feature weights that believe multivariate reacting effects between features.RFE was proposed in the state of Random Forest Classifier for getting the best subset from features.To find the best attribute subset, without of doing an exhaustive test over whole feature sets, RFE uses Wrapper technique by using the particular model (Random Forests) and reject the bad Feature (by absolute classifier weight or attribute ranking), and reiteration the operation over increasingly minimal attribute subsets while the best model hypothesis is completed.The weights of this good model are utilized to rank features.Utilized the sickie-learn achievement of RFE with random forest to come up with a feature ranking for dataset.In the NSL-KDD, 25 of important features were selected.As in the form that contains successive operations on the algorithm for feature selection, has achieved a high score reach more than of 98%. Figure II shows the important features resultant from REF.

A. 1.Build by Deep Neural Network for binary class
A fully-connected network structure with three layers (input, hidden and output) is used.Fully connected layers are defined using the Dense class, which can specify the number of neurons or nodes in the layer as the first argument, and specify the activation function using the activation argument.then the rectified linear unit activation function referred to as ReLU is applied on the first two layers and the Sigmoid function to ensure our network output is between 0 and 1 in the output layer.A dropout regularization is a technique used to avoid overfitting when training.

B. Build by Recurrent Neural Network for multi-class
The system uses a fully-connected network structure with three layers (input, hidden and output).input layer can specify the number of neurons or nodes in the layer as the first argument, In RNN the important argument is return sequences, by default is set to False, so return sequences=True for add more layers.Also the system use dropout regularization technique to avoid over fitting through training.Finally, in the output layer the sigmoid function is used to ensure the network output is between 0 and 1. Figure 4 III.

Test outcomes of DNN
Test results of RNNs for Multi-class classification are illustrated in table(4), the performance of RNN is evaluated on test set, for all the classes, the system was obtained good accuracy at the four added layers as compared to the other layers.RNN results in terms of accuracy, gives good results in multi classification., as shown in Table IV.The operating characteristics of the receiver can be used to provide explanations for the required work and work to extract the results by providing the graphics, figure 5

CONCLUSIONS AND FUTURE WORK
Intrusion detection system is a technique that can be used for discovering the known and unknown intrusions before the attacker harm the devices of the networks.In this paper, a proposed Network intrusion detection alert system is implemented to build an effective and flexible Network intrusion detection system by NSL KDD dataset.By using a Recurrent Neural Network and Deep Neural Network algorithms.At first, Deep Neural Network for binary classification is implemented and resulted accuracy rate reached 94% and False Positive Rate equal to 0.08 also, true positive rate was 92%.Then using Recurrent Neural Network for multi-class classification (DOS, Normal, probe, R2L and U2R.) with testing dataset and the accuracy is equal to 94 %.Accuracy was 96%, true positive rate 77% for Normal, accuracy was 96% and true positive rate 94% for DOS, accuracy was 87% and true positive rate 87% for Probe, accuracy was 70% and true positive rate 87% for R2L and accuracy was 94%.true positive rate (0.99) % for U2R.False alarm rate for all classes was between (0.1) and (0.8).Overall model performance was good, especially in anomaly detection.The next of this work can be extended in 3 directions: first, it is possible to apply the system on other intrusion dataset such as Kyoto, WSN-DS and CICIDS2017.Secondly, use another feature selection technique such as LDA and rough set and other.Third, implementing the suggested system online.
Because intrusion detection systems performance based on ConfusionMatrix to evaluate classification in the actual and predicted as shown in table II.• True Positive (TP): -the model correctly predicted normal as normal.• True Negative (TN): -the model correctly predicted attacker as attacker.• False Positive (FP): -the model incorrectly identify a normal activity as a malicious one.• False Negative(FN): -the model incorrectly identify malicious traffic as normal Shows the Confusion Matrix.

) 5 )
Receiver Operating Characteristics (ROC) curve ROC: -is graph based on the True Positive Rate (TPR) to False Positive Rate (FPR) .The machine learning model was better if the AUC is higher.
Figure (1) the proposed system Figure III shows Build Model of DNN for Binary Class.Can piece it all together by adding each layer in DNN: The input layer of data with 25 features since important features that select from RFE technique (the input_dim=25 argument).The first hidden layer has 32 nodes and uses the relu activation function.The second hidden layer has 32 nodes and uses the relu activation function.The third hidden layer has 16 nodes and uses the relu activation function.The fourth hidden layer has 16 nodes and uses the relu activation function.The output layer has Dense = 2 nodes for binary-class classification and uses the Sigmoid activation function.
shows the model of RNN for Multi-Class.The layers can be grouped together by adding each layer in RNN: The input layer of data with 25 variables since important features that select from RFE technique (the input_dim=25 argument) The first hidden layer has 64 nodes and uses the return_sequences=True The second hidden layer has 64 nodes and uses the return_sequences=True The third hidden layer has 64 nodes and uses the return_sequences=True The fourth hidden layer has 64 nodes and uses the return_sequences=False The output layer has Dense= 5 nodes for multi-class classification and uses the sigmoid activation function.Model of RNN for Multi-Class 8. RESULTS AND DISCUSSIONS In early 2015, Keras had the first reusable open-source Python implementations for developing and evaluating deep learning models.The system was implemented by (NSL -KDD ) dataset.This work was implemented by the Python language.The NSL -KDD composed of 125,973 train set and 22,544 test set confined with 5 attacks.This section illustrates the detailed results for the proposed system and for Binary as well as Multiclass classification.The suggested Network Intrusion Detection System is implemented by NSL -KDD dataset and evaluated the work by Recurrent and deep neural network algorithms.The NSL -KDD is divided automatically into train and test sets.For train set, most of the DNN and RNN network topologies showed that the train accuracy is reached to 99%, The system is implemented by two algorithms for classification, deep neural network algorithm for binary class and Recurrent Neural Network algorithm for multi class.The result of classification in test set is showed as follows:-During testing phase, results of DNN for binary-class .The result of DNN obtained best accuracy for the added 4 layers in hidden layer as compared to the other layers.DNN results in terms of accuracy, gives good results in Binary classification for True positive rate (TPR) and false positive rate (FPR) as shown in Table and figure 6 explained the results, Blue color indicates the evaluation process of training and Green indicates the evaluation process of testing process.The ROC curve for NSL-KDD.In most cases, DNN and RNN performed well a figure used as shown.ROC curve for DNN ROC curve for RNN