Classification of Images Using Decision Tree

In this paper, the proposed system is based on texture features classification for multi object images by using decision tree (ID3) algorithm. The proposed system uses image segment tile base to reduce the block effect and uses (low low) Wavelet Haar to reduce image size without loss of any important information. The image texture features like (Entropy, Homogeneity, Energy, Inverse Different Moment (IDM), Contrast and Mean) are extracted from image to build database features. All the texture features extracted from the training images are coded into database features code. ID3 algorithm uses database features code for classification of images into different classes. Splitting rules for growing ID3 algorithm are Entropy, Information Gain used to build database rules, which depend on if_then format. The proposed algorithm is experimented on to test image database with 375 images for 5 classes and uses accuracy measure. In the experimental tests 88% of the images are correctly classified and the design of the proposed system in general is enough to allow other classes and extension of the set of classification classes.


INTRODUCTION
mages are produced by a variety of physical devices including still and video cameras , X-ray, electronic microscope , radar, and are used for a variety of purposes, including entertainment, medical, business , industrial, military, civil ,traffic, security ,and scientific.Image processing allows one to enhance image features of interest while attenuating details irrelevant to a given applications, and then extracting useful information about the scene from the enhanced image.This information is to used extract knowledge to take decision [1].Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the image databases.The images from an image database are first preprocessed to improve their quality.These images then undergo various transformations and feature extraction to generate the important features from the images.With the generated features, mining can be carried out using data mining techniques to discover significant patterns.The resulting patterns are evaluated and interpreted to obtain the final knowledge, which can be applied to applications [2].The decision tree induction is a well-known methodology used widely in various domains, such as artificial intelligence, machine learning, data mining, and pattern recognition .It is a predictive model which usually operates as a classifier.The construction of a decision tree, also called decision tree learning process, uses a training dataset consisting of records with their corresponding features and a label attribute to learn the classifier.Once the tree is built, it can be used to predict the label attribute of unidentified records that are from the same domain [3].

TEXTURE FEATURE EXTRACTION
Texture, the pattern of information or arrangement of the structure found in an image, is an important feature of many image types.In a general sense, texture refers to surface characteristics and appearance of an object given by the size, shape, density, arrangement, proportion of its elementary parts.Due to the significatnace of texture information, texture feature extraction is a key function in various image processing applications, remote sensing and content-based image retrieval.Texture features can be extracted in several methods, using statistical, structural, model-based and transform in-formation, in which the most common way is using the Gray Level Co-occurrence Matrix (GLCM).GLCM contains the second-order statistical information of spatial relationship of pixels of an image [4].In general, GLCM could be computed as follows.First, an original texture image D is re-quantized into an image G with reduced number of gray levels, N g .A typical value of N g is ( 16).Then, GLCM is computed from G by scanning the intensity of each pixel and its neighbor, defined by displacement d and angle ø.Finally, scalar secondary features are extracted from this co-occurrence matrix such as [5] : v Entropy [6]: measures the disorder of an image and it achieves its largest value when all elements in P matrix are equal.When the image is not texturally uniform many GLCM elements have very small values, which implies that entropy is very large.Therefore, entropy is inversely proportional to GLCM energy.

Entropy
v Contrast [7]: is a measure of the degree of spread of the grey levels or the average grey level difference between neighboring pixels.The contrast values will be higher for regions exhibiting large local variations.The GLCM associated with these regions will display more elements distant from the main diagonal, than regions with low contrast.Local statistics contrast and GLCM contrast are strongly correlated.
v Homogeneity [6] :It measures image homogeneity as it assumes larger values for smaller gray tone differences in pair elements.It is more sensitive to the presence of near diagonal elements in the GLCM.The GLCM contrast and homogeneity are strong, but inversely correlated in terms of equivalent distribution in the pixel pairs population.It means homogeneity decreases if contrast increases while energy is kept constant.
v IDM [6]: It measures image homogeneity.This parameter achieves its largest value when most of the occurrences in GLCM are concentrated near the main diagonal.

…(5)
v Mean [7]: The GLCM Mean is expressed in terms of the gray level cooccurrence matrix.Consequently, the pixel value is weighted not by its frequency of occurrence by itself (as in common mean expression), but by the frequency of its occurrence in combination with a certain neighboring pixel value.

DATA MINING
Data mining is a term that describes different techniques used in a domain of machine learning, statistical analysis, modeling techniques and database technologies that can be used in different industries.With a combination of these techniques, it is possible to find different kinds of structures and relations in the data, as well as to derive rules and models that enable prediction and decision making in new situations.
It is possible to perform classification, estimation, prediction, affinity grouping, clustering and description and visualization [8].Classification of data objects based on a predefined knowledge of the objects is a data mining and knowledge management technique used in grouping similar data objects together.It can be defined as supervised learning algorithms as it assigns class labels to data objects based on the relationship between the data items with a pre-defined class label and there are many classification algorithms available in literature but decision trees are the most commonly used because of their ease of implementation and are easier to understand compared with other classification algorithms [9].

DECISION TREE
A decision tree is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.Decision trees are commonly used for gaining information for the purpose of decision-making.A decision tree starts with a root node on which it is for users to take actions.From this node, users split each node recursively according to decision tree learning algorithm.The final result is a decision tree in which each branch represents a possible scenario of a decision and its outcome [10].

ID3 Decision Tree Induction Algorithm
ID3 is a simple decision tree learning algorithm developed by Ross Quinlan (1983).ID3 produces a decision tree which can classify the outcome value based on the values of the given attributes.ID3 algorithm which is a supervised learning, with the ability of generating rules through a decision tree .To construct the decision tree, calculate the entropy of each features of the training images by using ID3 algorithm and measure the information gained for each features and take maximum of them to be the root [11].
v Entropy: The idea of entropy of random variables was proposed by Claude Shannon .There are several ways to introduce the notion of entropy.Quantities of the form [12]. Will be recognized as that of entropy as defined in certain formulations of statistical mechanics where pi is the probability of a system being in cell i of its phase space.We shall call R= -∑pi log pi the entropy of the set of probabilities p 1 ….p n .If x is a chance variable we will write R(x) for its entropy ;thus x is not an argument of a function but a label for a number, to differentiate it from R(y) say, the entropy of the chance variable y [12].
v Information Gain: Given entropy as a measure of the impurity in a collection of training examples, we can now define a measure of the effectiveness of an attribute in classifying the training data.The measure we will use, called information gain, is simply the expected reduction in entropy caused by partitioning the examples according to this attribute.More precisely, the information gain, Gain(S, A) of an attribute A, relative to a collection of examples S, is defined as[10]:

Gain(S,A)= Entropy(S) -∑((
where ∑ is over each value V of all the possible values of the attribute A, SV= subset of S for which attribute A has value V , |SV|=number of element in SV, |S|= number of element in S. The proposed ID3 algorithm is as follows, ID3 ( Learning Sets S, Attributes Sets A, Attributesvalues V) Return Decision Tree.
• Begin • Load learning sets first, create decision tree root node 'rootNode', add learning set S into root node as its subset.• For rootNode, we compute Entropy(rootNode.subset)first • If Entropy (rootNode.subset)==0,then rootNode.subsetconsists of records all with the same value for the categorical attribute, return a leaf node with decision attribute:attribute value; • If Entropy(rootNode.subset)!=0, then compute information gain for each attribute left(have not been used in splitting), find attribute A with Maximum(Gain(S,A)).Create child nodes of this rootNode and add to rootNode in the decision tree.• For each child of the rootNode, apply ID3(S,A,V) recursively until reach node that has entropy=0 or reach leaf node.• End ID3.

PROPOSED ALGORITHM
The main idea of the proposed algorithm depends on the fact that any image has multi unique features.These features are different from image to another depending on their objects color and texture.In this paper, an algorithm is proposed to classify image by image mining and extracting features from the image depending on texture such as (Entropy, Energy, Contrast, Mean, Inverse Difference Moment and Homogeneity).All the texture features extracted from the training images are classified into database features and then used to choose the closest one to the requested image.To specify these features we use gray level singlechannel images to extract their features and contract a feature vectors by using Co-occurrence matrix for each textured image, then feature vectors are classified into groups by using hierarchical classification techniques to use them with decision tree.The structure of the proposed approach consists of two phases (Training phase, testing phase) as shown in Figure ( 1) Each phase has specific functions.

Training Phase
This phase consists of many main steps such as: Step 1: -Image preparing In this step the images are loaded and image size reduced into NxN as shown in Figure (2) that will reduce the computing time and storage space.Step 2: -Image Segmentation (Tile based) In this step we use tile base to get important part of image by dividing the image to NXN parts and neglecting any part that has entropy less than threshold.
Step 3: -Feature Extraction In this step the low level features textures are extracted from each image .RGB color block is converted to graylevel and quantized into16 levels to be used with co-occurrence matrix to extract texture features.PDF created with pdfFactory Pro trial version www.pdffactory.com Step 6: -Create Database Rules As a result many rules are generated and stored in database rules (DBR) due to ifthen format which aid in Test phase to classify images and make decision.

Testing phase
All steps in test phase will be as in training phase except one procedure which is called classifying process step as follows: 1. Read input data from TEDBC 2. Locate rules that satisfy image test features in DBR 3. If found then add 1 to count class (kind k) 4. Repeat steps from 1 until not end of file 5. Find max counter of (class kind1, class kind2,…….,classkind k) 6. Make decision.

EXPERIMENTS AND RESULT
The training images in this paper belong to a subset, which is a part of WANG.The subset used consists of 5 classes, that is (dinosaurs , flowers, horses, food and cars) each class contains 75 images, so that the total numbers of used images are 375.Accuracy measure is the most widely used measurement method to evaluate the classification images.For our experiment on 375 images the classification image accuracy by use of decision tree with texture features algorithm is given as many different results, some of which with good accuracy when they have no complex background and no affected block as well as till segmentation aid is used as to remove any features of image that are unwanted data.To compute the accuracy of the decision the following equation is used and gives the accuracy ratio to classify the images.Table (1) explain is the accuracy of kind of images.

Comparison of CIUDT with Others Systems
This subsection evaluates the classification accuracy of the proposed system and compare it with some of the existing system as shown in Fig .6.The result of the paper is compared with the performance of MUFIN [13], Metode Proposta [14], and CBIC [15], LNBNN [16], DTC [17].It is noted this proposed has higher accuracy due to Table (2).From Figure ( 6) it is noted that the proposed system has higher accuracy than others because we use tile segmentation for the images and that aids in removing any unwanted data and texture feature is used with decision tree.

Contributions
In this paper the main contributions are as follows: CONCLUSIONS 1-The process of classifying single object image is more easily than multi object image.2-To prevent any block effect you can divide the image to many sub images as happens when we use segment tile.3-Decoding method of feature values is a suitable way to reduce data ranges and reduce time of execution of tree building.4-Quantization method of pixels colors value aids in reducing data size without losing any important data.5-Classification of simple background image is easier than that of complex background image and there are no general classification systems.
pdfFactory Pro trial version www.pdffactory.comv Energy [7]: is also called Uniformity or angular second moment.It measures the textural uniformity that is pixel pair repetitions.It detects disorders in textures.Energy reaches a maximum value equal to one.

Figure ( 2 )
Figure (2) The user interface of training image.
For each gray block (N blocks) compute co-occurrence matrices of theta (0 o , 45 o , 90 o and 135 o ) and distance 1.For each GLCM a set of 6 features (Entropy, Energy, Mean, Contrast, IDM, and homogeneity) is computed and saved in a database called (TRDBF) as shown in Figure (3).
pdfFactory Pro trial version www.pdffactory.com1-Use of textures features with decision tree to classify multi object image while the other researches use texture to classify single object image.2-Use of tile segmentation by combining the data obtained from each segment to get increase in number of quantifiable features and neglect any part that has little entropy .This aid in preventing any block effect.3-Use of Low Low sub bands from Haar discrete wavelet transform since Haar can decompose signal into different components in the frequency domain, Low High, High Low, High High, Low Low.Low Low of them represents average component and three other are detail components.