A Proposal of an Efficient Feature Extracting Method for Content-Based Image Retrieval

Searching a required image from the World Wide Web (WWW) is very difficult because the WWW contains a huge number of images. To solve such a problem, an efficient system is needed to retrieve images that are required by the user. The content-based image retrieval (CBIR) system has been used to solve this problem. In this paper, a new combination of three techniques is used for visual features extracting. Color histogram was used to extract color feature from the image. Multi wavelet transform was chosen to represent the information of the texture and the edge histogram was used to represent the shape feature. Object scaling and translation in an image can be got robustly by the combination of these techniques. Furthermore, to speed up retrieval and similarity computation of the proposed system, the data set images are clustered using k-mean clustering algorithm according to the weighted feature vectors. The system evaluation experimentally carried out on800Wang color image dataset, and showed that proposed system performed significantly better and faster than other existing systems by using the proposed features.


INTRODUCTION
ith the rapid evolution of the internet, the size of digital image collection was increased rapidly because of the availability of cheap tools to capture image like image scanners and digital cameras. So efficient tools for image searching, browsing, and retrieval from a large dataset are required. Image retrieval has been become a very active research area since 1970 and many systems have been introduced to retrieve images. Image is retrieved by extracting information from the musing computer assisted image analysis or automated. In text-based image retrieval (TBIR), the images are annotated by keyword or subject heading, which is used as retrieval keys during search and retrieval. TBIR is not appropriate to handle such rich media because it suffers from major problems. It is required a human being in order to describe image and the image cannot be visually described by keyword. To describe the image, the visual features (such as color, texture and shape) are used by CBIR which searches and retrieves images based on these features. The main object of CBIR is to perform effectively image indexing, browsing and retrieving images, also to reduce the human intervention in the indexing process [1].
The remnant of this paper is arranged as follow: section 2 describes current scenario; section 3 discusses the low level feature of image; section 4 and 5 shows the proposed approach and its dataset; section 6contains the result and section 7includesthe conclusion.

Literature Review
Several systems have been developed for query by image content available via the web such as QBIS, NETRA, image Miner and photo book. In 2008, N. Chung, W. Han and C. Ming [2] proposed a novel for image retrieval using Dominate Color Descriptor (DCD) for color feature and modification for the dissimilarity measure. According to the author, his work achieved performance improvement in feature extraction and similar image retrieval. In 2010 H. Nara simhan and P. Ramraj [3] proposed a new clustering algorithm and applied this algorithm to CBIR. They claimed that their algorithm optimizes both intra-cluster and inter-cluster similarity measures and improves the recall. But unfortunately, this system used only the RGB color histogram as the visual feature to describe images. They improved recall but it was at the cost of precision. In 2011, Lidiya, Thrusnav is and Newton [4] proposed a novel for CBIR using pyramidal wavelet transform as a texture feature. The method was used as diagnosis aid in medical field and it was evaluated on Diabetic Retinopathy Database (DRD). According to the authors, the retrieval rate for DRD image was improved.

Low Level Image Features
A low-level image features such as color, shape and texture will be introduced in this section. Color descriptor, texture descriptor and shape descriptor are used to represent these features.

Texture and its Representation
Texture is a general idea that is very difficult to be defined in words but is easy to be recognized, so there is no straightforward description of texture, because the obtainable texture descriptions are based on texture analysis methods. Texture can be assumed to be a repeated pattern of pixels over a spatial domain. One of the properties of texture is the homogeneity which is result from the occurrence of different color or intensity [5]. The different texture properties as apparent by the human eye are, for example, regularity, directionality, smoothness, and coarseness.
The texture contains valuable information about their relationship to their surrounding environment and the structure arrangement of surfaces. They include: distinguishing of image regions using texture properties, which is known as texture classification, also texture segmentation is done by recognition the texture boundaries using texture properties. There are different approaches that used to represent the texture; transform domain, structural and statistical. The most popular methods that represent the texture are: The Fourier transform was the state of the art in signal analysis. The Fourier transform is converting a time-domain signal into the frequency-domain and to measure the frequency components of the signal. The texture feature of an image is represented by the frequency components at specific location of that image. The high frequency components are computed from texture feature and it is the main distinguishing factors between images that are used in CBIR. However, the problem of the Fourier transform is that it does not provide any information about the exact location, so it fails to provide texture information [6].
The best solution to this problem was achieved by the use of wavelet transform as shown in figure 1(a). Multi wavelet transform has been presented as an efficient multi-resolution analysis tool and more powerful method. Multi wavelets are just like the wavelet transformed but they have some variance which is that multi wavelets use several scaling function and several ( ) and wavelet function ( ) [7].These functions give the design of multi wavelet several degree of freedom and made it is possible to have several useful properties that can be summarized as symmetry property as shown in figure 1(b). When multi wavelet dealing with the image boundaries, the symmetry property of multi wavelet will allow symmetry extension and the discontinuity at the boundaries will be prevented that save the information from not loss in these points, orthogonally generates short support, independent sub-images and a large number of vanishing moments results in a system have the ability to represent a large-degree polynomials with a small number of terms [8], which are very important in single and image processing. When comparing the multi wavelet and scalar wavelet, multi wavelet provides the possibility of excellent performance and several degrees of freedom for image processing applications.
LL LH HL HH

Figure (1): Image decomposition of one level (a) wavelet transform (b) Multi wavelet transforms
Multi wavelet transforms have more than two scaling and wavelet functions. The set of these functions are represented by the vector notation. When r=1, (t) is called a scalar wavelet, while in standard r can be arbitrarily large, the multi wavelet to date are primarily for r=2 [9]. The multi wavelet two-scale equations look like those for scale wavelets "The H and G is matrix filter, which are r × r matrix, and the element of these matrix are provided a lot of freedom than a traditional wavelet transform. The additional degree of independency has been used to provide useful properties into the multileveled filters, like orthogonally, high order of approximation, and symmetry. Finally, a better level of performance can be achieved in multi wavelets than scalar wavelets with similar computational complexity. Multi wavelet is valuable tools for signal processing applications such as de noising and image retrieval [10]."

Color and its Representation
Color is one of the most famous used feature compared with shape and texture features, also it is easy to be extracted from image. Color histogram is a low-level feature which represents the color content of the image. Color histogram shows the number of frequency of the appearance Eng. &Tech.Journal, Vol.34,Part (B), No.6,2016 A Proposal

Off-line Part
On-line Part of each color in an image. Color moment extracted from the RGB color space by mean, skew and standard division to make a 9-field feature vector [11].

Shape and its Representation
Shape plays a powerful role in many computer vision application and image processing to distinguish and understand object identity and physical structure. Object of the shape can be recognized by human. Shape feature representations can be broadly classified into two types: 1. Boundary-based: the outer boundary of the shape is only used to representation the shape. The considered region is described by using its external characteristics; i.e., the pixels that located along the object boundary. 2. Region-based: the entire shape region is used to representation the shape, which is done by describing the considered region using its internal characteristics; i.e., the pixels contained in that region. These approaches that used to represent the shape feature such as moment invariants, aspect ratio, circularity, and sets of consecutive boundary segments [1]. In this paper edge histogram is used to represent the shape feature [12].

The Proposed System
The proposed system consists of two main parts: the offline part, and the online part as shown in figure 2.  &Tech.Journal, Vol.34,Part (B), No.6,2016 A Proposal

Offline Part
The data gathering and preprocessing must be performed offline. In this section, the main modules participating in offline part are described.

Feature Extraction Color Feature
In the proposed system, color histogram was used as a color descriptor. The color histogram extraction algorithm follows as: Input: RGB colored image. Output: Multi-dimension color feature vector.
Step1: The RGB image is converted to HSV.
Step2: Quantization the HSV color space from 0-360 for hue and 0-1for saturation and value.
Step3: Compute the histogram for hue, saturation and value for local image and global image.
Step 4: Compute the no of signature for hue, saturation and value for local and global image Step5: Save the color vector consisting of 112 attribute calculated from step 4 in the data set.

Texture Feature
In the proposed system, multi wavelet transform was used to represent the texture feature. To represent global texture feature, the standard division was used on each sub image. µ= ∑ ∑ .… (5) ..… (6) Where N and M represent the size of row and column of the X. The local binary pattern is used to describe the local texture feature because it is invariant and it is quick to compute.

Shape Feature
In the proposed system, edge histogram features that include five categories were used as a shape descriptor. The histogram for edge represents the total number of occurrences of each edge type. Each histogram contains five bins where each bins corresponding to one of five edge types. Edge in the image is classified into five type's 135 degree diametrical, vertical, 45 degree diametrical edges, non-directional edge and horizontal.

Clustering Dataset
Before a query is processed, a priori information can be used to organize the images in the dataset, only a part of dataset needs to be searched when the query is received. This will save the query processing time without sacrificing the retrieval precision. K-mean cluster algorithm is used to achieve this goal; after the data set features are constructed, the K-mean cluster algorithm is applied to group all the images in the dataset.

Online Part
This part receives the query image from the client. The online part performs the following steps of algorithm: Input: Color image. Output: N images similar to the input image.
Step1: The input image is a color image.
Step2: Extract the features vector for the input image by using the same techniques as given in the off-line part.
Step3: Calculate the distance between the input image and the centroid of each cluster and find the smallest distance that determine where the image is belong.
Step4: Calculate the distance between the input image and the images in the cluster that has smallest distance with the input image.
Step 5: Link-Based Ranking is used If the images have the same value of comparing to determine the image that has high priority to be used first.
Step 6: Retrieve the first N images that are most similar to the input image.

Image Dataset
Wang dataset is used in our evaluation. It consists of 800 images. These images are classified into 8 classes, each one contains 100 images. 800 image dataset went through our implemented system to extract the features. Color, shape and texture are used to represent the image and there are591 values extracted from these features. Small part of the proposed system data set is show in table 1. These values are used for classification using k-mean clustering algorithm. This process is made offline for all images in the dataset. The dataset is ready now for estimating and testing the proposed CBIR system.

Experiment and Result
The user is entered a query image to retrieve images from numerous classes of images. The experimental results for different query image entered by the user and corresponding resultant images are showed in the following Figures from 3 to 5. The Query Image The Results The Query Image The Results

Comparison between the Proposed System and Other Systems
In this section, the retrieval accuracy of the proposed system is evaluated and compared with other CBISE system's result. The algorithms used in feature extraction, the details of the algorithm, and the size of the database of other systems. The performance of the proposed system result is compared with the performance of Majid, Mehdi and Tohid's system [13], Shrirom, Priyadorsini, Kaushik's system [14].The average precision of comparison is based on the returned top 10 images. The comparison is recorded in Table (2).