Keywords : VoxCeleb1 dataset
Engineering and Technology Journal,
2021, Volume 39, Issue 1B, Pages 129-140
Recently, age estimates from speech have received growing interest as they are important for many applications like custom call routing, targeted marketing, or user-profiling. In this work, an automatic system to estimate age in short speech utterances without depending on the text is proposed. From each utterance frame, four groups of features are extracted and then 10 statistical functionals are measured for each extracted dimension of the features, to be followed by dimensionality reduction using Linear Discriminant Analysis (LDA). Finally, bidirectional Gated-Recurrent Neural Networks (G- RNNs) are used to predict speaker age. Experiments are conducted on the VoxCeleb1 dataset to show the performance of the proposed system, which is the first attempt to do so for such a system. In gender-dependent system, the Mean Absolute Error (MAE) of the proposed system is 9.25 years, and 10.33 years, the Root Mean Square Error (RMSE) is 13.17 and 13.26, respectively, for female and male speakers. In gender_ independent system, the MAE of the proposed system is 10.96 years, and the RMSE is 15.47. The results show that the proposed system has a good performance on short-duration utterances, taking into consideration the high noise ratio in the VoxCeleb1 dataset.