Visual Depression Diagnosis From Face Based on Various Classification Algorithms

Most psychologists believe that facial behavior through depression differs from facial behavior in the absence of depression, so facial behavior can be utilized as a dependable indicator for spotting depression. Visual depression diagnosis system (VDD) establishes dependents on expressions of the face that are expense-effective and movable. At this work, the VDD system is designed according to the Facial Action Coding System (FACS) to extract features of the face. The key concept of the Facial Action Coding System (FACS) to explain the whole face behavior utilizing Action Units (AUs), every AU is linked to the motion of unique or maybe further face muscles. Six AUs have utilized as depression features; those action units are AUs 4, 5, 6, 7, 10, and 12. The datasets that employed to evaluate the performance of the proposed system are gathered for 125 participants (30 males, 95 females); many of them are among 17-60 years of age. At the final step of the current system, four kinds of classification techniques were applied separately; those classifiers algorithms are KNN, SVM, PCA, and LDA. The outcomes of the simulation indicate that the best outcomes are achieved utilizing the KNN and LDA classifiers, where the success rate is 85%. New classification methods in the VDD system are the key contributions of this research, gather real databases that can utilize to compute the performance of every other VDD system based on face emotions, and choose appropriate features of the face.

Most psychologists believe that facial behavior through depression differs from facial behavior in the absence of depression, so facial behavior can be utilized as a dependable indicator for spotting depression. Visual depression diagnosis system (VDD) establishes dependents on expressions of the face that are expense-effective and movable. At this work, the VDD system is designed according to the Facial Action Coding System (FACS) to extract features of the face. The key concept of the Facial Action Coding System (FACS) to explain the whole face behavior utilizing Action Units (AUs), every AU is linked to the motion of unique or maybe further face muscles. Six AUs have utilized as depression features; those action units are AUs 4,5,6,7,10,and 12. The datasets that employed to evaluate the performance of the proposed system are gathered for 125 participants (30 males, 95 females); many of them are among 17-60 years of age. At the final step of the current system, four kinds of classification techniques were applied separately; those classifiers algorithms are KNN, SVM, PCA, and LDA. The outcomes of the simulation indicate that the best outcomes are achieved utilizing the KNN and LDA classifiers, where the success rate is 85%. New classification methods in the VDD system are the key contributions of this research, gather real databases that can utilize to compute the performance of every other VDD system based on face emotions, and choose appropriate features of the face.

INTRODUCTION
Reliable Visual Depression Diagnosis system (VDD) is becoming an extraordinary method to reduce suicides around the world as well as other social issues, particularly after the development of computer vision.
Depressive is a mood disorders that weaken mental health and impact the daily living skills of the casualty, their relationships, and the healthcare system. In the latest survey, the World Health Organization (WHO) expected about 350 million persons globe-wide are influenced of depression [1]. Furthermore, depression is now the fourth-largest reason of illness in the world and is estimated to be the major cause in 2020 (current year) [1]. Additionally, The WHO estimates depression will become the primary reason for suicide in the subsequent 15 years. According to the American Academy of Sociology, about two-thirds of people who commit suicide are depressed at the time of their death, and the risk of suicide is about 20 times higher for people with major depression [32]; as shown in figure 1 which was built using the data from Twenge's work [31], and it demonstrates an investigation of extensive data for the rates of major depressive episodes (MDE) among 100k people in the United States of America for the years 2009-2017. Suicide-related outcomes (suicidal ideas, plans and attempts, and deaths by suicide) increased among young-adults ages 18 to 25 between 2009 and 2017, with smaller and less consistent increases among adults ages 26 and over. A successful cure for depressive disorders was obtained in most cases [2]. Furthermore, a shortage of funding and proficient health suppliers is a significant barrier to the successful detection of depression [1]. Also, present valuation approaches to identifying depression rely on reports and doctors' reviews through the patient through the interview; such tests can vary based on the expertise of the clinician and the procedures used to diagnose them while allowing such inadequate instruments to provide visible symptoms that are good indicators for depression. Therefore, limited knowledge of healthcare suppliers might affect the detection of depression. By a growing incidence of depression, current advancements in computer sight and processing of signals may play a crucial part in helping to overcome these barriers [3]. It must be pointed out that sadness is not depression because everybody has a variety of emotions over days and weeks, typically based on events and circumstances. When disappointed or suffering a loss, the person usually feels sad. Usually these feelings ebb and flow. They respond to input and changes. By contrast, depression tends to feel heavy and constant. People who are depressed are less likely to be acclaim, cheered, or consoled. Those who get well from depression sometimes welcome the probability of experiencing natural sadness again, to have a "bad day," as opposed to a heavyweight on their minds and spirit every day.
Most depression detection systems in the preceding work utilized ineffective classification procedures and the datasets that were gathered in agreed light conditions, and restricted environment. Last but not least, most of the earlier systems were based on a system that cannot operate with an invisible database. Therefore, the key objectives of this research are to construct a system to diagnosis depression with a minimum cost and high accuracy, in addition to assist specialists in detecting depression before becoming suicidal. This system works with a dataset to be collected in an unregulated setting and with no physical interaction with the human body.
The remainder of the paper is as follows: Section 2 explains the mechanism for VDD system. Section 3 describes the details of the collected dataset. Steps that utilized to build the proposed VDD system are presented in section 4. Section 5 summarizes the performance comparison between VDD system classifiers. Section 6 explains in comparison the performance of the proposed system with previous work in details. Section 7 offers some valuable conclusion derived from this paper.

MECHANISM-BASED VISUAL DEPRESSION DIAGNOSIS SYSTEM
A VDD system is essential to detect depressive people without a clinician. Anyone can be diagnosed with depression based on this automated facial recognition system that has studied the signs of depression. Constructing a computer vision scheme for depression recognition built on the expressions of the face can be split for three main steps [4]. The first step is to capture and preprocesses the video; that means preparing the took videos for the next step called the features extraction stage. That means, this stage includes recording video for participant's face through an interview, edit the captured video; recognition of the face, and finding the significant facial landmarks on the images of the face that are required to extract the features. Next, the feature extraction stage is responsible for obtaining the depression features from a series of face pictures. This step utilizes to obtain the face signs which acts for the possible signs of depression. As a final step, the decision-maker is an appropriate classifier utilized to give the subject's status. The block figure of the VDD system utilized video databases as seen in Figure 2; those steps are described in the subsequent sectors.

I. Capturing of the Video and Pre-processing
The digital camera is utilized to register the subject's facial expressions through an interview when the interviewer inquires a list of questions; the face of the subject should be on the front of a camera to record all the changes in facial expression. The editing of the video contains just a period once the participant is trying to think before answering the question. In order to provide this video recorded for the following step, it is necessary to detect the face of the subject and to locate those facial landmarks; it is essential to track the parts required to take out the features. The Viola-Jones face detector is the greatest common face detector to date [5]. This can process images especially rapidly, recognize a face with various skin colors, and identify the face with eyeglasses. This face tracker has been utilized in various facial expression detection approaches. The landmark of the face was tracked by relying on the bounding box for the face extracted from the face originator. The Constrained Local Neural Field (CLNF) method utilized in the current work as obtainable as in [6]. This procedure has the ability to diagnosis and path facial landmarks at weak light situations; although in the existence of an obstruction, plus may detect landmarks from an unknown database; the full form for that sample could be represented utilizing 4 different kinds of factors these are s: is the scaling factor; for rotation of the object, for term of the translation, and for non-rigid shape term. The position of feature indicated as , which is arranged during those 4 parameters, is depending in on a Point Distribution Model (PDM) which defined in the equation bellow: The average value for feature, where : a 3×m principal element matrix.

II. Extraction of the Features
Extraction of the features contains extracting several facial features to choose it as a dependable indicator for identifying depression. Several methods are utilized to take out the feature from the face, the best common and automated techniques which can achieve the features from an invisible database based on FACS, those approaches can be found in [7][4] [8]. FACS describes facial behavior using AUs that represent the movement of the muscles of the face. This FACS recognizes expressions of the face that depends on the detection of the AUs. The AU numbers determine the expression of the face based on FACS. A dependable and strong technique for diagnosing the AUs has utilized 2 kinds of features; geometry and appearance feature; as offered in [9].

III. Decision Maker
Classification of every VDD system tends to categorize the derived characteristics into two groups; namely depressive, and non-depressive. The features extracted are the AUs for the system; that depends on FACS. Four various kinds of classification were proposed to perform the current research; Support Vector Machine (SVM); K-Nearest Neighborhood (KNN), Principal Component Analysis (PCA), and Linear Discriminant Analysis (LDA).
K-Nearest Neighbor (KNN) algorithm is one of the most common methods, which used for classification and pattern recognition problems. It's a basic classifier and having better accuracy when contrasted to further classification methods. To realize the working of the current algorithm, let us consider two class problems. Every class most likely has a similar sample. In the case of an unidentified test model, the KNN classifier makes decisions depends on the closest distance among the unidentified input model and every nearest model per class applying popular distance computation approaches, like Euclidean and Manhattan. Then the unknown sample will be assigned to the class of the sample that given the closest distance, as shown in Figure 3. Eq. (2) can be given for the Euclidean distance calculation technique [10].
Where: ( ) stands for the unknown test sample, ( ) stands for a sample in the database trained, and : refers to the size of the feature. The SVM can define as learning algorithms for regression prediction problems and pattern recognition. The concept beyond SVM is to discover the optimal margin among the training samples and to hold the margin separately. A significant first step in the linear SVM input data training was to define the maximal hyperplane vector [12]. If there is a collection of training pattern : where : is the pattern of the input; :The corresponding output of the target is ∈ {1,-1}. This array is linearly separable if the current vector w and scalar b are present.
if (3) if (4) The equation defined the best hyperplane, and the segregation is: Equation (5) is the better boundary choice, as indicated in Figure 4.

Figure 4: Optimal separating hyperplane between two classes [13]
PCA (principal component analysis) is the most ancient and best-renowned technique of multivariate data analyses. In generic terms, PCA applies a vector space and converts to reduce the dimensionality of huge datasets. By implementing mathematical projection, the original data, that might have involved numerous variables, can frequently be explained in just little variables (e.g., the major components). The main idea of the principal component assessment is to minimize the dimensionality of a set of data in which there a huge number of interlinked variables while remaining as much variation as possible in the data. The above reduction is obtained by converting to a new group of variables, the main components which are not correlated but which are ordered in such a way that the first little retain many of the variation present in all the initial variables. The calculation of the principal components reduces the problem of an eigenvalue-eigenvector for the positive-semispecified symmetric matrix. Eigenvectors and eigenvalues are measures applied to quantify the direction magnitude and the direction of the variation taken by each axis. Eigenvector defines the angle and direction of the axis during the data space, and it quantifies the magnitude of the difference of the data on the axis as shown in Figure 5. is an matrix; is the eigenvalue, and : is the eigenvector, the feature combinations are equivalent to the number of dimensions of the dataset.

Figure 5: Eigenvectors and eigenvalues [14]
PCA researches for the better demonstration of the data in a smaller-squares sense [15]. This is achieved by analyzing the matrix of information covariance in its vectors and selecting the greatest consideration of them to profile a projection matrix. The coaching data in this situation is a matrix of from column vectors, symbolize , including the selected signal statistics. Suppose that there was a total of these arithmetical profiles. is the number of classes (e.g., modulation types) denoted in the training group. Similarly, let be the number of training profiles in each class, so that Describe the mean profile as: The first step is to delete the mean profile in each of the profiles, as it is common to all of them and therefore does not contain any useful classification information: These new training profiles are centered around the new training matrix. ̄ Defines the matrix of covariance as: In addition to resolving it to its eigenvectors . The eigenvectors conforming to the best k eigenvalues form the dropping matrix W. In this implementation, all necessary individual vectors have been considered and included in the projection matrix. W columns describe the feature space in which the statistical profiles are to be estimated. The first project the oriented training matrix into this space, to decide the centroids for each class: The centroid for the class will be ∑ ̄ To employ the classifier, calculate the projection of a test profile in the feature area, and then select the centroid having the least Euclidean space to the point [16]. Linear discriminant analysis (LDA); or discriminant job analysis, is a generalization regarding Fisher's linear discriminant, a method applied in statistics, style recognition, and machine learning to locate a linear combination of features that separate 2 or more classes from objects. The resulting collection may be utilized as a linear classifier for dimensionality decreasing before final classification, as shown in Figure 6 [17]. LDA research a diverse projection matrix W in order to maximize the segregation among classes [15].
To clarify this more, it is needed to explain the class scatter matrix and between the withinclass scatter matrix , described as: Which is the covariance matrix of class , and: Where is the average profile of class and . If those equivalent matrices were computed in the feature area shaped through projecting the profiles utilizing the projection matrix , the results would be and, respectively. The issue for LDA is to find the matrix that will maximize the ratio of the determinants of the two matrices as bellow, From [18], Setting the matrix in the expression above is equal to solving the Eigen problem.
where is the number of features in the column for as with PCA, columns are organized in order to decrease their eigenvalues and an arbitrary number is retained.

COLLECTED DATASETS
There are only limited datasets utilized in the existing computer Vision researches on depression diagnosis that include, in particular, peoples' facial expressions when they are depressed. None of these databases have been made publicly available. So, it is essential to gather the dataset when designing the depression diagnosis system to check the system efficiency and the strength of the procedures utilized to construct the system. Databases are gathered from 125 persons (30 males, 95 females) among the ages of 17-60 years. Before the interview was done, each participant was given a questionnaire (Depression Rating Mechanism) called the Beck Depression Inventory (BDI) [19]. As a self-report inventory, which is one of the most widely used psychometric tests for measuring the degree of the severity of depression. Although the BDI was initially designed to be managed by qualified interviewers, it is most often a self-administrator. It usually takes 5-10 minutes to answer the questions' content, and the score is calculated by summing points to each one of the 21's questions. The result of the score calculated as 0 -9 are considered no depression, 10 -15 simple depression, 16 -23 moderate depression, 24 -36 severe depression, and over 37 extreme depression. The summary of the 125 participants' check-in datasets are shown below: In addition, during the interview there were about 17 relevant questions asked for subjects. The questions are asked and repeated, whereas the camera (Cannon) captured the facial behavior through all sessions.
Databases are gathered under usual light conditions, backgrounds are extremely available, and there were no constraints on the subject's head location as seen in the study, Figure 7. This means, subject turns his/her head without any constraints. The gathered dataset is satisfied with normal facial expressions instead of acting or contriving. This was one of the difficulties faced in this work during recording video. This was one of the difficulties faced in this work during recording video as it is difficult to show the expressions from the face of depression person. It is difficult to be visible to a depressed person, and one of the symptoms of depression is a lack of contact with others like the rest of normal people. The database is collected from several educational institutions, which are; Al-Mamoun University College, Al-Karrada private high school for girls, Al-Sayab high school, and Gori kindergarten and nursery.

THE PROPOSED VISUAL DEPRESSION DIAGNOSIS SYSTEM
In order to build the suggested visual depression diagnosis system, three steps are utilized. The first step in the design of the device is the video Capture and Pre-Processing, which is accountable for filming the video, facial realization, and finding multiple points that interesting in the subject face to arrange the video for the following level. The next step is Taking Out of The Feature. This step is about the diagnosed face; in order to obtain the main features of the AU described in this research on the basis of 2 types of features, geometry and appearance obtained from the face. The final step is the Decision-Making, it's applied to detect the depression participant depends on AUs extracted from the earlier step. Wherever several classification methods are proposed to distinguish between depressed and non-depressed, these classifiers are KNN, SVM, PCA, and LDA. Figure 8 shows the designing stages of the suggested system.

I. Capturing of the Video and Pre-Processing
Video is filmed with a video camera CANON 1300D. The result format of the video is the MOV extension at the resolution of 1920x1080 pixels per frame. In the next step, it's needed to editing the videos then take out features from it. Editing of the video has been done in order to recognize and acquire just the appropriate sections of these videos. The outcomes of editing are sets of video clips (every single video clip has from 0.35 to 1.6-second duration approximately). Each video includes the period of the subjects' thinks before answering the questions. The above editing is needed to remove the undesirable sections out of the original videos and separate the response to depression from the response, not to depression. Psychoanalysis, depression can be detected through the moment earlier the answer to the question, which is a moment of think to reply.
After edit the gathered datasets, 280 video-clips gained, fifty percent of the video clips refers to the depression response, whereas fifty percent refers to the non-depression response. The mean duration to each single video clip is one second, the participant has around 2-10 video clips to both depression and non-depression reactions. The participant's face was identified as a pre-processing stage by applying the Viola-Jones facial detection. This scheme depends on the grayscale of the image to select the component pixels through using highlighted squares, a collection of Haar-like squares in the AdaBoost procedure. And it considered Four Haar-like highlighted squares. Separate Haar highlights are estimated to include whole data estimated to show the face of the region. AdaBoost has 31 layers of cascades with a threshold of 3, as described in VJA. Figure 8 demonstrates the recognition outcomes of the VJA of a subject, as established in that figure, the face went during the 31 layers of AdaBoost, at which point the skin's cheek was applied to the cascaded AdaBoost. Finally, the common limit border is chosen to use the superior AdaBoost layer to detect the face area. The landmark of the face was identified based on the faces bounding box acquired from the detector of the face; facial landmark is necessary for facial following-up, and for extracting geometry features. The Constraint Local Neural Field (CLNF) utilized for landmarks of the face is set the position of sixty-eight landmarks as seen in the Figure 9 below.

II. Feature Extraction
The Suggested system identifies 18 facial AUs (AU 1,2,4,5,6,7,9,10,12,14,15,17,20,23,25,26,28, and 45) some of these AUs don't affect the procedure of detecting depression. Based on the result of the proposed AUs detection system using collected datasets, show that some AUs not affected during a depressive and non-depressive response, roughly remain on the same status in all interviews. Only six AUs have an important effect during the participants' interviews; these are AUs 4, 5, 6, 7, 10, and 12. Table 2summarizes these six AUs that used in this work as potential markers for depressor finding.

TABLE II: The AUs that Effective in VDD system including its name and area
The description of the AU detection system is shown in Figure 10. The parameters of the nonrigid shape and the position of the landmarks defined the geometry feature. Before the extraction appearance feature, it needs to transform the face image into a common reference frame (it is necessary for removing individual variances between faces due to scaling and rotation) and removing the non-facial information from the face image. Alignment and masking perform this task. Face image after removing all non-facial information was coded based on Histograms of Oriented Gradient (HOG) for extracting appearance features.

Figure 9: Flow diagram of the system for detecting Action Units
Open source CLNF algorithm is utilized for facial landmark detection and tracking, followed by feature extraction appearance features and geometry features. Masking and alignment are being used to align the face shape to the standard frame and remove non-facial details from the face picture. This is achieved by applying a transforming symmetry between the detected position of a landmark and the front representation of a face picture.

III. Decision Maker
There were four various kinds of classifiers proposed for the decision-making step: KNN, SVM, PCA, and LDA that utilized to distinguish depressed from non-depressed. The detection rate varies from one classifier to another, and it depends on the training database. The testing and training datasets were selected randomly from the gathered database. The gathered dataset is 280 video clips (140 video clips for depression response and 140 video clips for non-depression response). This dataset is divided into testing and training sets (50 percent training and 50 percent testing), 70 video clips are chosen for depression response (after feature extraction and detection AUs) randomly as mentioned above for train the classifier on depressive response and 70 video clips are randomly chosen for training the classifier on non-depressive response. The reminded video clips (70 for depression and 70 for non-depression response) used for the testing phase.

THE PROPOSED CLASSIFIERS PERFORMANCE COMPARISON
The performance for the suggested classifiers KNN, SVM, PCA, and LDA are compared based on the diagnosis rate of the result obtained when utilizing collected datasets, as displayed in Table 3, which shows the recognition rate of spot depressed when utilizing collected databases. KNN and LDA classifiers were the top and equal diagnosis accuracy results, which are 85%, based on collected datasets. PCA has accuracy equal to 83.58%, and SVM has an 82.14% success rate when using the same datasets. Both PCA and SVM detection accuracy results were lower than KNN and LDA, which had an equal performance rate of successful results. The above outcomes are obtained when the suggested system was doing test utilizing 50 percent of the gathered dataset. In contrast, the other 50 percent of the dataset is used to train classifiers on depressed and non-depressed facial features.

COMPARISON OF SUGGESTED VDD SYSTEM'S PERFORMANCE WITH PREVIOUS WORKS
To evaluate the reliability of the suggested VDD system, the greatest outcomes of the proposed systems should be confronted with earlier works of literature; the best outcome of the suggested work was established on KNN and LDA. It is difficult to compare the results in the same environment; the reason for this difficulty is that the databases used by previous work are not openly available and different from the collected datasets used in the proposed VDD system. As instances of these variances are light conditions, the psychosomatic state of the subjects, and the existence of obstruction. Table 4 shows the comparison between the proposed system and the earlier works based on recognition techniques, the number of subjects, features details and accuracy; this table demonstrates various recognition techniques applied for recognition of the two classes in the preceding works. The efficiency of the suggested system depends on the number of subjects and recognition accuracy and also represents the two principal challenges of the proposed system. As demonstrated in Table 4, the suggested VDD system outperforms to related works in terms of the dataset that collected and accuracy achieved.

CONCLUSION
The main objectives of the suggested system are to identify and incorporate the specific movement of the facial muscle as reliable indicators for the diagnosis of depression in an unregulated environment, and to establish a reliable device that assists clinicians in the diagnosis of depression. The results demonstrate the theory of the ability to use expressions of the face. That six Action Units (AUs 4, 5, 6, 7, 10 and 12) have a powerful effect of depression diagnosis dependable on the dataset gathered, the omitted action units ( AUs 1,2,9,14,15,17,20,23,25,26,28 and 45) were not effective in the detection accuracy. The suggested VDD system is a self-regulating device that really can recognize any video clip as an input; however, it is not part of the trained data for this device.