A combining two ksvm classifiers based on True pixel values and Discrete wavelet transform for mri-based brain tumor detection and Classification

Handling editor: Ivan I. Hashim


H I G H L I G H T S A B S T R A C T
 We have proposed an integrated algorithm to classify brain tumors following two stages.  We have used the Kernel Support Vector Machine (KSVM) classifier.  The linear kernel achieved 97.5% accuracy and 98.57% accuracy in the first and second classifiers.
The studies on brain tumor detection and classification are continuing to improve the specialists' ability in diagnosis. Magnetic Resonance Imaging (MRI) is one of the most common techniques used to evaluate brain tumors diagnosis. However, brain tumors diagnosis is a difficult process due to congenital malformations and possible errors in diagnosing benign from malignant tumors. Therefore, this research aims to propose an integrated algorithm to classify brain tumors following two stages using the Kernel Support Vector Machine (KSVM) classifier. First stage classifies the tumors as normal and abnormal, and the second classifies abnormal tumors as benign and malignant. The first KSVM employs extraction features by considering the pixel values to classify images as a shape. In contrast, the second KSVM uses the Discrete Wavelet Transform (DWT), followed by the Principal Component Analysis (PCA) technique to extract and reduce features and improve the model performance. Also, K-means clustering algorithm is used to segment, isolate and calculate the tumor area. The KSVM classifiers use two kernels (linear and Radial Basis Function (RBF)).
Obtained results showed that the linear kernel achieved 97.5% accuracy and 98.57% accuracy in the first and second classifier, respectively. For all linear classifiers, a 100% sensitivity level is achieved. This work validates the proposed model based on the (K-fold) strategy.

Introduction
Brain tumor is a complicated health problem determined by many cancer research centers around the world. In 2016, World Health Organization (WHO) encourage neurooncological specialists worldwide to develop a verified classification system for brain tumor diagnosis. [1]. Therefore, different neurooncological techniques are adapted to help a valid detection and classification for brain tumor diagnosis. One of the most common methods is a Magnetic Resonance Imaging (MRI) scan [2]. MRI provides essential information about brain anatomy, which helps the specialist with important tumor information [2]. Analysing medical images is schemed through several stages to optimize brain tumour detection and classification as well as reduce error ratios. Pre-processes image is the earlier stage in the whole process, in which different extents of imagery noises could be observed or take place and therefore reduce appropriate contrast [2,3]. MRI images frequently include undesirable parts that may reduce image visualization. Therefore, de-noising process for the medical images is critical to improve visual quality and help to better disease diagnosis. De-noising is achieved by using different filter types such as Median filter, Gaussian filter, and Adaptive filter [3,4]. The performance of the noise removal algorithm can be evaluated using several parameters such as Peak Signal to Noise Ratio (PSNR), Mean Square Error (MSE), and Root Mean Square Error (RMSE) [4]. Literatures context about medical image visualization are varied based on the purpose; aim and objectives. However, Brain tumour classification occupies many studies to decide the infected brain, which substantial for any research interested in class classification. For this purpose, several classifiers can be used, such as Kernel Support Vector Machine (KSVM) [3,5]. The KSVM classifier is an efficient technique that provides an accurate forecast and classification. The KSVM classifier can classify images into normal and abnormal [5,6] or benign and malignant tumours [7,8] It requires specific training of the 323 extracted features which are necessary for any image-based applications [9]. The feature extraction can be achieved by a Discrete Wavelet Transform (DWT) [10] or by the shape analysis method to retrieve and recognize the objects represented in the images [11]. In addition, the abnormal cases can be segmented by clustering techniques [12] such as K-means, which is the simplest unsupervised machine learning algorithm [7]. This method supports the isolating and calculation of the tumour area.
To evaluate a proposed machine learning model, cross-validation is a substantial method that can estimate the performance of the designed model. Several techniques are used for assessing, such as K-fold cross-validation [13,14] which is an efficient and simple method.
This paper aims to improve the precision of MRI brain tumor classification and help the specialist make an accurate decision. Thus, it makes the following contributions: Proposes an integrated algorithm by combining two KSVM classifiers to detect normal and abnormal brains, benign and 1) malignant tumors. Develop a simple method of feature extraction to classify normal and abnormal brains.

2)
Perform K-fold cross-validation to estimate the proposed algorithm 3) The rest of this paper is organized as follows: Section 2: presents a complete explanation of the proposed algorithm. Section 3: explains the K-fold cross-validation method used to evaluate the proposed model. Section 4: demonstrates the results and the evaluation metrics of the proposed model. Section 5: Makes cross-validation of our classifiers based on the Kfold technique. Section 6: discusses the paper's results in comparison with previous studies. And Section 7: concludes the entire work.

Proposed method
The work of this paper uses the data that has been collected by The Cancer Imaging Archive (TCIA) and Cagle [15,16] our proposed method designs an integrated algorithm to classify MRI images and detect brains tumour. It isolates and finds brain tumour boundaries, and calculates the tumour area. In addition, the method classifies tumours as malignant or benign. Figure 1 illustrates the flowchart of the entire proposed method. This section describes the accurate and obvious steps of the methodology that has been used in this work.

Pre-processing
The initial stage in our proposed method is the MRI image pre-processing. This stage converts images into a cleared format by removing and minimizing noise using linear spatial filtering [17]. In this step, we do not focus on the different types of noise filtering techniques. The calculation of the output (g(x, y)) of the linear spatial filtering of an image of size m × n with a filter of size M × N can be expressed as in Eq. (1).

Support vector machine (SVM) classifier
The SVM classifier is one of the popular algorithms that can perform well in classification methods. It is a linear classifier that works in a simple strategy to find an optimal hyper plane with a maximum margin differentiating the classes. However, the support vectors, or feature vectors, might not be linearly separable. In this case, the SVM introduces the kernel function, such as linear function, polynomial, and radial basis function (RPF) [18,19].
The linear kernel function ( ) of the vectors, x and x i are usually described as in Eq. (2).
Where ||xx i || is Euclidean distance between x & x i , and kernel function parameter  > 0. Figure 3 shows a graphical representation of the RBF kernel function. In this work, the classifier is used two times to classify the brain as normal and abnormal initially, and subsequently classify the brain tumor as benign and malignant. As shown in Figure 1, the classifiers in the proposed algorithm are denoted as KSVM-1 and KSVM-2. The evaluation metrics (accuracy, sensitivity, and specificity) can assess the results achieved by the KSVM [5,20]. They are the probability measures for classification precision. Accuracy is a measure for the dataset categorization, sensitivity is a measure for the abnormal cases anticipated, and specificity is a measure for normal cases expected. Table 1 explains the calculations of evaluation metrics.

324
The confusion matrix, also known as the error matrix, defines the terms from the actual results (ground truth) and the expected results of evaluation metrics calculations, as shown in Table 2. The terms of actual results are (True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN)). TP means correctly diagnosed as diseased, TN means correctly diagnosed as not diseased, FP means incorrectly diagnosed as diseased, and FN means incorrectly diagnosed as not diseased).

K-means clustering
The clustering process can confirm the process of targeting the affected area. The brain is segmented using the K-means clustering algorithm, and the tumour area is isolated using the characteristic of the area [12]. We calculate the number of image pixels NP and the size of the tumor region as in Eq. (4).   Where ( ) is black pixels, and ( ) is white pixels. One pixel equal to . And the number of white pixels WP that is our target of the study is calculated as follows: Then we can calculate the tumour area A as follow:

Discrete wavelet transform (dwt)
DWT is an effective numerical analysis method useful in MRI image classification, pathological and abnormal brain detection [21]. Usually, it is used to extract necessary global features to support the classification techniques [22]. The fundamentals of DWT can be explained mathematically as follows. Assume that x(t) is a square-integral function, then the Continuous Wavelet Transform (CWT) of x(t) to a given wavelet (t) is defined as follows [14]: Where the wavelet ( ) is obtained from the mother wavelet (t) by translation and dilation and it is calculated as follows [14]: a and b are the dilation factor and the translation parameter respectively, and both are real positive numbers. One of the most familiar wavelets is the Harr wavelet, which is often the preferred wavelet in several applications. Eq. (8) can be discretized to give the DWT by restraining a, b to a discrete lattice (a = 2b and a  0). Thus, DWT can be expressed as follows [14]: Where denotes for the coefficient of the approximation components, and denotes for the coefficient of the detail components. g (n) refers to a low-pass filter, and h(n) refers to a high-pass filter. j refers to the wavelet scale, and k refers to the translation factor, and DS is the down sampling. The equations in (10) and (11) decompose the signal x (n) into two signals, and this procedure is called one-level decompose [14,21]. Figure 4 demonstrates the process of DWT. The Two-Dimensional (2-D) DWT for images is similar to the DWT case. The 2-D DWT applies to each diminution separately. It leads to a decomposition of ca n , in four sub-band components as shown in Figure 5: the approximation at level ca j+1 , and details subbands (ch j+1 is horizontal sub-band, cv j+1 is vertical sub-band, and cd j+1 is diagonal sub-band), where j is scaling factor [23].

Features extraction
To improve the classification metrics, two methods are implemented to extract features. For KSVM-; we train the classifier based on the pixel values of the brain images in the size of (200 × 200) pixels. In other words, we use the pixel values as a feature that reflects the diversity in the shapes of the images. For KSVM-2, we use the popular methods, the first-order and second-order statistic tools. The first-order tools represent a gray-level distribution of image pixels regardless of their spatial arrangement calculate the features (Mean (M e ), Standard Deviation (SD), Skegness (S K ), Kurtosis (K urtk ), and Entropy (E)) [24]. In contrast, we use the second-order statistic tools, Gary level Co-occurrence Matrix (GLCM) [24], which simultaneously involves two pixels to calculate the features (Energy (E n), Correlation (C orr ) , Contrast (C on ), Homogeneity (H omo )). For an m × n size image, and the function of image pixel value f(x,y), the extracted features are calculated as explained in the section below.

Negative Positive
Expected

Standard deviation
The standard deviation is a measure of the amount of variation of the image pixels.

Skewness
Skewness is a measure of symmetry. The Skewness of a random variable is:

Kurtosis
The Kurtosis is the parameter describing the shape of a random variable probability distribution. The Kurtosis of a random variable is:

Entropy
Entropy is a scale to describe the randomness of the textural image.

Energy
Energy is a measure of the similarity of an image. It is the quantifiable amount of the extent of pixel pair recurrences.

Correlation
The correlation describes the spatial dependencies between the pixels.

Contrast
Contrast is an intensity measure of a pixel and its neighbour over the image.

Principal component analysis (pca)
Extra features reduce the processing performance by increasing the time and memory space. Thus, to reduce the size of extracted features, PCA is used [25]. The PCA is an effective normalizing technique that uses linear transformations to manipulate the data from high to low dimensional space. This method can be specified by eigenvectors of the covariance matrix as follows:

327
Find the mean value of the specified dataset S. 1) Obtain a new matrix A by subtracting the mean value from S.

2)
Any vector S or ̅ can be written as a linear combination of eigenvectors as: ̅ Obtain the lower-dimensional dataset from the largest eigenvalues.

K-fold cross-validation
Cross-validation is a process used to assess a machine learning model skill on a finite data sample. In general, it is expected to carry out data that are not used during the model training. Cross-validation does not increase the final classification accuracy; it does give reliability to the classifier and can be generalized to another independent dataset. K-fold cross-validation is simple to understand and popular method, which results in a more optimistic estimate of the skill than other techniques [13]. A schematic diagram of 5-fold cross-validation illustrates in Figure 6, and it is described as follows [14]: Shuffles the dataset randomly and splits it into K numbers of approximately equal size groups (folds).

1)
For each particular fold, takes the one-fold as the dataset and the remaining fold K -1 as a training dataset.

2)
Fit a model on the training dataset and evaluate it on the test dataset. As such, record the error.

3)
Repeats the process K times as the test dataset and average the error rates to obtain a comprehensive model 4) validation error.

∑ (23)
Where Err is the error rate, and so we can calculate the percentage of accuracy. (24)

Results and Evaluation
This section presents all the results based on the processes proposed in our algorithm. We do not focus on pre-processing image methods, calculations, and comparisons; we achieve a straightforward approach process by using linear spatial filtering. To carry out the design model, we use MATLAB R2012b, a high-performance language for technical computing.

Proposed algorithm results
The KSVM-1 classifier uses 220 MRI images as a dataset. Table 3 explains the number of normal and abnormal images we use for (training and testing). After verifying the proposed model, the model is trained and tested to verify its accuracy. Performance metrics are used to assess the level of performance of the first model (KSVM-1) in classifying a normal brain from an abnormal brain. In the first stage of classification (KSVM-1), we use two kernels. As shown in Figure 7, the best kernel we can consider is a linear function, in terms of accuracy, which increases stably when increasing the training data, unlike the RBF kernel, its accuracy and sensitivity change in descending order with the large increase in the training data. The Linear class satisfies 97.5%, 100%, and 95% in accuracy, sensitivity, and specificity, respectively. The interesting indication in our results besides the accuracy is the sensitivity metric, which is calculated by considering TP and FN terms as explained in Table 1. Thus, a 100% ratio means that all abnormal cases are classified as TP, and FN is equal to zero (no images classified as normal incorrectly). One of the essential objects of this work is to isolate and calculate the tumor area. The use of four clusters based on the K-means technique gives good results in separating the tumor area. We calculate the part of the tumor area through binary coding. The extracted image contains two binary values (1, 0) for the white and black colors, respectively. To calculate the number of white pixels (logic-1) and the brain tumor area, we follow the mouthed described in Section III. Figure  8 shows five samples with their four clusters, tumor segment, and tumor boundary. Table 4 summarizes the calculation of selected samples, and it demonstrates the number of pixels in the tumor segment, tumor area, and tumor boundary. It is essential to extract the texture features from the abnormal images classified by KSVM-1, we use the 2D-DWT technique, which helps to extract and calculate the first-order and second-order features. The extracted features represent the maximum relevant information available to obtain a complete description of the images. Table 5 demonstrates the calculation results of statistical features for five samples. Subsequently, we use the PCA algorithm to reduce the extracted features and increase the processing speed. This technique eliminates unnecessary features and improves the performance of the KSVM-2. We measure several characteristics in the categorized image area by constructing a structure array whose fields indicate different region features. Then, we return the measurements of the pixel value in a gray-scale image. Accordingly, we accurately identify the affected part. In KSVM-2, a set of MRI images of brains affected by various tumors (both malignant and benign), in the form of "train set. Mat", was used which is a ready-made structure of MRI images that have been carefully selected, and pre-tested.

329
Two classification classes (linear kernel, RBF core) and a set of test images shown in Table 6 were used. The best kernel was a linear function as it achieved 98.57%, 100%.and 97.14% for accuracy, sensitivity, and specificity respectively as shown in the frequency chart in Figure 9. An interesting indicator in our results is the measure of sensitivity; a ratio of 100% means that all cases of malignancy are classified correctly.      The K-fold cross-validation assesses our model with an average accuracy, 96.88% and 94.6% for linear kernel and RBF kernel respectively, as shown in Table 8.

Cross-Validations Analysis
Cross-validation is necessary for evaluating the designed model. Since the classifier used is trained by a specific set of data, this results in a high classification accuracy of the training data only. Therefore, we need a method to validate the method used. Cross-validation will not increase the final classification accuracy but it does give reliability to the classifier used and can be generalized to other independent data sets. Datasets are randomly divided into separate k-folds of approximately equal size, and each fold is used to test the induced model. The classifier is evaluated by the average accuracy of k. Thus, in our work, a K-fold technique was used to validate KSVM-1. All prediction errors were taken from all K phases and equations (23,24) were used to calculate the mean cross-validation error rate. For KSVM-1, it uses 5 times of 220 training images. The number of training and test images per fold for KSVM-1 is tabulated in Table 7. The K-fold cross-validation assesses our model with an average accuracy, 96.88% and 94.6% for linear kernel and RBF kernel respectively, as shown in Table 8.

Discussions
We achieve excellent results in all evaluation metrics compared to the results of the previous studies. Some former studies investigate diverse approaches and classifiers to pick the best results. In Table 9, we summarize the best method that has been used to achieve better results. We mark any Not Specified (NS) information, such as the number of images used as a dataset. In addition, we include the unmentioned evaluation metrics of the previous works. The majority of studies do not evaluate their classifiers or algorithms by cross-validation, we used The Cancer Imaging Archive (TCIA) and Kaggle dataset [15,16], and to validate the proposed algorithm. In our study, we use K-fold cross-validation to evaluate our classifier.

Conclusions
This paper proposes a robust model using the most reliable MRI-based techniques to validate abnormal brains. It classifies 220 MRI images by combining two classifiers to improve classification accuracy. The first Kernel Support Vector Machine (KSVM) classifier detects and classifies the abnormal brain exploiting the pixels values as a feature for training and testing. Subsequently, the proposed model uses the abnormal case which is collected as a dataset for the second KSVM classifier. As such, the second classifier detects and classifies abnormal cases to benign and malignant tumors, exploiting the Discrete Wavelet Transform (DWT) technique to extract the global features. Features extraction is followed by the Principal Component Analysis (PCA) algorithm to reduce features and improve the classifier performance. The evaluation metrics in both classifiers satisfy good accuracy equal to 97.5% and 98.57% in the first and second classifier, respectively, and 100% sensitivity in both. The sensitivity indicates that all abnormal cases are detected correctly, which leads the studies in future work to use this method or similar classifiers to improve other types of disease classifications in high accuracy metrics. This study uses a K-fold cross-validation algorithm to assess the proposed classifier. It achieves accuracy equal to 96.88 % for the first classifier. This study also reinforces the proposed model by assessing the brain segmentation, isolation, and calculation of the tumor area using the K-means method.