A New Method Using Naive Bayes And RGBD Facial Identification Based on Extracted Features from Image Pixels

Nowadays, life seems to have been resilient, particularly for those with physical disabilities. Recognition of AV letters is one of the critical and famously the difficult structures. This research has been developed based on the potential of the features in some applications than the statistical properties. While, these features have been resolved the lip movement for AV letters recognition, Naive Bayesian and Red green blue and depth RGBD have been adopted for visual letter identification. Naive Bayesian has 73.33% for usual recognition with three letters, each with ten frames, while RGBD classifier is 100%. Within that for this case, two scenarios were made with different forms of noise placed on the face of normal, normal + 10%, normal + 25% and normal + 75% noise. The first one trains and understands all classes, one after another. While the other is training 95 percent of RGBD and 83.3 percent for Naive Bayesian with recognition of one of the inflicted forms. RGBD identification is 100 percent for the second one, while 49.99 for the Naive Bayesian.


INTRODUCTION
Human susceptibility to discrimination and recognition is not matched by any intelligent systems as well as discrimination in noise [1].Speech recognition has been notoriously hard, therefore, with the present development of technologies, it becomes practically efficient.Besides, the standard of life for some people with particular needs and applications has been better [2].One of these technologies incorporates the visual information into a speech recognition system for enhancing the accuracy of that system especially in a noise environment [3].In this context, lip-reading has been an important area of the above incorporate [4], especially for speech recognition without speech, which is called silence or lips language.Accordingly, the motion of the lips must be recognized, while this task is difficult because it has a nonlinear model and the region of interest is noisy [5].Despite these difficulties, the recognition can be satisfied by track the changing of lip shape and then extracted its feature.Many algorithms can  lead to this purpose as well as the facial image processing morphological operation [2 and 6].In this context, the boundaries and colors have been improved the robustness and efficiency of lip features [7].
In recent years, several significant worth studies of lip-reading systems have been presented.These studies were concentrated on three tasks, the first has been focused on the mouth area detection such as [7,4,3,5], while the second has been concentrated on the speech extracted as [1,7], finally, this task has been revived on the conformity between the visual features and particular vocabulary like [1,8,2,9,10].
A time-delay neural network (TDNN) has been presented by [1]; the recognition percentage has been improved from 51%, without acoustic effect, up to the 91% with it.While Petajan et al. [11] has been used a threshold image and hamming distance for utterances discrimination.Whilst, Pentland [12] has been using an optical flow technique for lip points estimation.In this context, [4and 8] have been proposed a segmentation technique for the lip contour extraction to perform a lip reading.While, [3] has been working on the same task, but with more lip points with the aid of fuzzy clustering.[9 and 10] has been estimated a lip contour and then color features are extracted.[5] Has been used a hybrid Discrete Cosine Transform and Dual-tree Complex Wavelet Transform for estimating the lip position and shape.
The work of this paper has been concentrated on the recognition of AV letters from a video frame sequence using a snapshot database.This recognition has been made from a segment of the mouth area.Then, the preprocessing operation has been achieved for preparing the lip area to extract the features.The probability of occurrence of the recognition later has been calculated by using Bayesian-SVM for recognizing the operative letters.Color means intensity has been erected by using RGBD for accurate segmentation, which tends to accurate recognition.

THE COMBINING BAYESIAN AND SVM
Combining Bayesian and SVM have been used in this algorithm.Direct Bayesian SVM has been used for multi-class classification.Where the Bayesian algorithm already converts the multi-class classification problem into a two-class problem for the intra-class classification and the extra-class classification variation.Then, SVM has been trained for two-class features.Where, in the training phase, firstly computed the image difference between images of the same class to construct the intra-class variation set as [13]; Where ∆ I is vector image set.In this context, calculate image difference between images of different class to construct the extra-class variation set as; Where ∆ E is the next vector image set Then, the eigenvalue matrix ʎ Ι , and the eigenvector matrix ʉ Ι , of the intra-personal subspace has been computed from the intra-class variation set.Finally, all the image difference vectors are projected and whitened in the intra-class subspace as; Then, the decision function (∆) has been generated by training the SVM with these two vector sets (Δ I ′ and Δ E ′ ) .In the testing phase, the difference vector ∆ I between probe vector x and each gallery vector x i g , and then, project and whiten the difference vector in the intra-class subspace, and then, the final classification decision is; Where; c -number of classes in the gallery.
The larger is the value of d, the more reliable the result is.

DATABASE
The database has been recorded audio-visual of isolated Audio-video AVlatters.In this work, three letters have been taken (A, B, and E) with five, eight, and ten repeating talkers respectively.Each video letter has ten frames to complete the lips pronounce for each letter.In this context, each class has been considered as all lips pronounce letters for all talkers.

PROPOSED ALGORITHMS
Then; where; (ℎ  � ) ⁄ -h not in class D Then, the SVM has done, the final classification decision as; where;   The second algorithm is the Red, Green, Blue, and diameter (RGBD) feature as in figure (3).This algorithm has been based on the effect of the luminance of the image.Where, this effect has been taken by calculating the color intensity (number of pixels of red, green, and blue of the image) for each frame, and then, calculate the diameter of crop image of the lip region, where this algorithm called RGBD feature.The anatomy of the face as in figure (4-a) was based on the size of the image (X max , Y max ).Where, divided it into the vertical half (X max /2, Y max ) -(X max /2, 0).Then at the horizontal half at (X max /2 , Y max /2) -(X max , Y max /2).In this context, take the vertical lines from the pupil with a distance between them is (£), and from the hidden lip line center of the upper and lower is (€).Then, the lip area is (£ × €).
2. The intensity of colors (RGB) and diameter (D) of the lip area has been calculated as (RGBD) features according to the Eqs.(10-13).
2. Convert the RGBD features to binary for each image frame of AV letter (ten frames for each AV letter).
3.Then, ten vectors for each AV letters have been saved in RAM as in figure (3). 4. Repeat the above steps for each training AVletters and save it in a particular RAM. 5.In the testing phase, with the new AV letter apply the above steps, and then, compare the resultant vectors with that storage.Then, apply Hamming distance (HD) for deciding the corrected one that has a smaller HD as in figure (3).Then two scenarios have been made for improving the robustness of the proposed algorithms.The training classes are; Normal, normal+10%, normal+25%, and normal+75%.Where these increments to the normal case represent the lighting inflicted on the face of classes as shown in figure (5).

DISCUSSION OF THE RESULTS
From the above results in Tables (I-VI), especially tables (3 and 6), it is clear that the RGBD algorithm is progressing (better classification rate) other than the Bayesian SVM algorithm in recognition of the lips movement AV letters.The main reason for reducing the success rate in the Bayesian SVM algorithm is that this algorithm depends on the facial features region not only on the image pixel so there is interference between these features on selected images.Also, there is another important reason, where, this type of application has many patterns in each class.Where some of these patterns have a similarity with the other in another class.These similarities mean having the same statistical properties, then, tend to miss the recognition.

CONCLUSION
Many points can be concluded from this work.For the proposed application, the RGBD algorithm is more accurate in classifying the pronounced word rather than a Bayesian SVM algorithm.While the Bayesian SVM algorithm is faster than RGBD algorithm.On the other hand, the proposed application (identifying the pronounced word based on image pixel) needs to extract many visual features from the sets of the isolated image that represent the pronounced word for more accurate recognition.Therefore, the RGBD algorithm has the advancing than the SVM algorithm by 1.42 times with the same type of visual feature.

1 )
represents the process steps while figure(2) represents the flowchart of this algorithm(8)

Figure 3 :
Figure 3: Flowchart of the Proposed RGBD Algorithm for AVletters Recognition

Figure 5 :
Figure 5: Different Lighting Inflicted on the Face of Classes

Figure ( 6 )
Figure (6), shows snapshot of the GUI program built to test the unknown visual lips image after training the pronounce of the training letters.While figure (7) represent the two proposed classifier output throughout the testing phase.

Figure 6 :Figure 7 :
Figure 6: Graphical User Interface Program for Visual Classifier

TABLE I : Bayesian SVM Algorithm Results
Table (II) below.

TABLE II : The Results of RGBD Algorithm Th
e comparison between these two algorithms are shown in Table (III).