Document Type : Research Paper

Authors

Electrical Engineering Dept., University of Technology-Iraq, Alsina’a street, 10066 Baghdad, Iraq.

Abstract

Speech recognition is widely used in robot control and automation. Nevertheless, the use of speech recognition in robots is limited due to its susceptibility to background noise. This paper proposes a speech recognition algorithm to control robots in noisy environments. The proposed algorithm is based on Perceptual Linear Predictive Cepstral Coefficients (PNCC), which is a noise-resistant feature extraction technique, and Modified K-Nearest Neighbors (KNN) with Dynamic Time Warping (DTW) as the classifier. A new KNN-DTW classifier is proposed, integrating weighted KNN and DTW. The proposed algorithm results from experiments comparing PNCC and Mel-frequency cepstral coefficients (MFCC) feature extraction techniques with different classifiers, namely KNN-DTW, two types of KNN (weighted KNN and Medium-KNN), and two types of Support Vector Machine SVM (Linear SVM and Quadratic SVM). The database used to investigate the accuracy was the audio-visual data corpus database UOTletters, which includes 30 speakers, 26 English letters, and 1560 utterances. The database is divided into 50% for training and 50% for testing purposes. In a noise-free environment, the accuracy of the proposed algorithm reached 100%. Moreover, the proposed algorithm demonstrates greater noise immunity across all five noise levels, with an average accuracy difference of 13.67% compared to baseline algorithms.

Graphical Abstract

Highlights

  • The proposed algorithm is based on PNCC feature extraction with a new classifier Weighted-KNN-DTW.
  • Weighted-KNN-DTW classifier is a modification of Weighted KNN and DTW.
  • The accuracy of the proposed algorithm was calculated with different levels of white noise (20dB, 15dB, 10dB, and 5dB).

Keywords

Main Subjects

  1. G. Le Prell, O. H. Clavier, Effects of noise on speech recognition: Challenges for communication by service members, Hear. Res., 349 (2017) 76–89. https://doi.org/10.1016/j.heares.2016.10.004
  2. I. Abass, M. E. Safi, Speech Recognition Based Microcontroller for Wheelchair Movement, Eng. Tech. J., 32 (2014)
  3. Kim, R. M. Stern, Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition,” IEEE/ACM Trans. Audio Speech Lang. Process., 24 (2016) 1315–1329. https://doi.org/10.1109/TASLP.2016.2545928
  4. Kim, R. M. Stern, Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction, Proc. Annu. Conf. Int. Speech Commun. Assoc. Brighton, UK, September (2009) 28–31.
  5. De-La-Calle-Silos, R. M. Stern, Synchrony-Based Feature Extraction for Robust Automatic Speech Recognition, IEEE Signal Process. Lett., 24 (2017) 1158–1162. https://doi.org/10.1109/LSP.2017.2714192
  6. Fux, D. Jouvet, Evaluation of PNCC and extended spectral subtraction methods for robust speech recognition, 23rd Eur. Signal Process. Conf. (2015) 1416–1420. https://doi.org/10.1109/EUSIPCO.2015.7362617
  7. E. Safi, E. I. Abbas, Isolated word recognition based on PNCC with different classifiers in a noisy environment, Appl. Acoust., 195 (2022) 108848. https://doi.org/10.1016/j.apacoust.2022.108848
  8. Khan, T. Goskula, M. Nasiruddin, R. Quazi, Comparison between k-nn and SVM method for speech emotion recognition, Int. J. Comput. Sci. Eng., 3 (2011) 607–611.
  9. Amami, D. B. Ayed, N. Ellouze, An Empirical Comparison of SVM and Some Supervised Learning Algorithms for Vowel Recognition, Int. J. Intell. Inf. Process., 3 (2012). https://doi.org/10.4156/IJIIP.vol3.issue1.6
  10. Chaka, N. Le Thanh, R. Flamary, C. Belleudy, Performance Comparison of the KNN and SVM Classification Algorithms in the Emotion Detection System EMOTICA, Int. J. Sens. Net. Data Commun., 7 (2018) 1–9. https://doi.org/10.4172/2090-4886.1000153
  11. Prabavathy, V. Rathikarani, P. Dhanalakshmi, Classification of Musical Instruments using SVM and KNN, Int. J. Innov. Technol. Explor. Eng., 9 (2020) 1186–1190, https://doi.org/10.35940/ijitee.G5836.059720
  12. A. J. Gnamele, Y. B. Ouattara, T. A. Kobea, G. Baudoin, J. M. Laheurte, KNN and SVM classification for chainsaw sound identification in the forest areas, Int. J. Adv. Comput. Sci. Appl., 10 (2019) 531–536. https://doi.org/10.14569/ijacsa.2019.0101270
  13. L Chen, S Gunduz, M. T. Ozsu, Mixed Type Audio Classification with Support Vector Machine, 2006 IEEE international conference on multimedia and expo. IEEE, (2006) 781–784. https://doi.org/10.1109/ICME.2006.262954
  14. Ali, A. W. Abbas, T. M. Thasleema, B. Uddin, T. Raaz, S. A. R. Abid, Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN, Int. J. Speech Technol., 18 (2015) 271–275. https://doi.org/10.1007/s10772-014-9267-z
  15. E. Safi, E. I. Abbas, Microcontroller - Controlled security door based on speech recognition, Al-Sadiq Int. Conf. Multidisciplinary in IT and Comm. Sci. Appl., (2016) 1-6. https://doi.org/10.1109/AIC-MITCSA.2016.7759909
  16. A. Imtiaz, G. Raja, Isolated word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN, Asia Pacific Conf. on Multimedia and Broadcasting (APMediaCast), Bali, Indonesia, (2016) 106-110. https://doi.org/10.1109/APMediaCast.2016.7878163
  17. Anggraeni, W. S. M. Sanjaya, M. Munawwaroh, M. Y. S. Nurasyidiek, I. P. Santika, Control of robot arm based on speech recognition using Mel-Frequency Cepstrum Coefficients (MFCC) and K-Nearest Neighbors (KNN) method, Int. Conf. Advan. Mechatronics, Intelligent Manufacture, and Industrial Automation, Surabaya, Indonesia, (2017) 217-222. https://doi.org/10.1109/ICAMIMIA.2017.8387590
  18. Adiwijaya, M. N. Aulia, M. S. Mubarok, W. Untari Novia, F. Nhita, A comparative study of MFCC-KNN and LPC-KNN for hijaiyyah letters Pronunciation classification system, 5th International Conference on Information and Communication Technology, Melaka, Malaysia, (2017) 1-5. https://doi.org/10.1109/ICoICT.2017.8074689
  19. Shi, J. Bai, P. Xue, D. Shi, Fusion Feature Extraction Based on Auditory and Energy for Noise-Robust Speech Recognition, IEEE Access, 7 (2019) 81911–81922. https://doi.org/10.1109/ACCESS.2019.2918147
  20. Korkmaz, A. Boyacı, T. Tuncer, Turkish vowel classification based on acoustical and decompositional features optimized by Genetic Algorithm, Appl. Acoust., 154 (2019) 28–35. https://doi.org/10.1016/j.apacoust.2019.04.027
  21. A. Alasadi, T. H. Aldhayni, R. R. Deshmukh, A. H. Alahmadi, A. S. Alshebami, Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System, Eng. Technol. Appl. Sci. Res., 10 (2020) 5547–5553. https://doi.org/10.48084/etasr.3465
  22. Tuncer, E. Aydemir, S. Dogan, Automated ambient recognition method based on dynamic center mirror local binary pattern : DCMLBP, Appl. Acoust., 161 (2020) 107165. https://doi.org/10.1016/j.apacoust.2019.107165
  23. Kim, Signal Processing for Robust Speech Recognition Motivated By Auditory Processing, Diss. Johns Hopkins University, 2010.
  24. Kim, R. M. Stern, Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring, IEEE Int. Conf. on Acoustics, Speech and Signal Process., Dallas, TX, USA, (2010) 4574-4577. https://doi.org/10.1109/ICASSP.2010.5495570
  25. Hermansky, N. Morgan, RASTA Processing of Speech, IEEE Trans. Speech Audio Process., 2 (1994) 578–589. https://doi.org/10.1109/89.326616
  26. Gelbart, N. Morgan, Evaluating long-term spectral subtraction for reverberant ASR, IEEE Work. Autom. Speech Recognit. Understanding, Madonna di Campiglio, Italy, (2001) 103-106. https://doi.org/10.1109/ASRU.2001.1034598
  27. Hermansky, S. Sharma, TempoRAl Patterns (TRAPs) in ASR of noisy speech, IEEE Int. Conf. Acoust. Speech Signal Process., Phoenix, AZ, USA, 1 (1999) 289-292 . https://doi.org/10.1109/ICASSP.1999.758119
  28. Thomas, S. Ganapathy, H. Hermansky, Recognition of Reverberant Speech Using Frequency Domain Linear Prediction, IEEE Signal Process. Lett., 15 (2008) 681–684. https://doi.org/10.1109/LSP.2008.2002708
  29. P. Rath, D. Povey, K. Veselý, J. H. Černocký, Improved feature processing for deep neural networks, Proc. Annu. Conf. Int. Speech Commun. Assoc., (2013) 109–113. https://doi.org/10.21437/interspeech.2013-48
  30. Kim, R. M. Stern, Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring, IEEE Int. Conf. Acoust. Speech Signal Process., Dallas, TX, USA, (2010) 4574-4577. https://doi.org/10.1109/ICASSP.2010.5495570
  31. Ranny, Voice recognition using k nearest neighbor and double distance method, Int. Conf. Ind. Eng. Manag. Sci. Appl., Jeju, Korea (South), (2016) 1-5. https://doi.org/10.1109/ICIMSA.2016.7504045
  32. Cover, P. Hart, Nearest Neighbor Pattern Classification, IEEE Trans. Inf. Theory, 13 (1967) 21-27. https://doi.org/10.1109/TIT.1967.1053964
  33. Bhavsar, A. Ganatra, A Comparative Study of Training Algorithms for Supervised Machine Learning, Int. J. Soft Comput. Eng., 2 (2012) 74–81.
  34. Jan, M. Abrar, S. Bashir, A. M. Mirza, Seasonal to Inter-annual Climate Prediction Using Data Mining KNN Technique, Springer-Verlag Berlin Heidelb., (2008) 40–51.
  35. E. S. Macleod, A. Luk, D. M. Titterington, A Re-Examination of the Distance-Weighted k-Nearest Neighbor Classification Rule, IEEE Trans. Syst. Man. Cybern., 17 (1987) 689–696. https://doi.org/10.1109/TSMC.1987.289362
  36. Fan, Y. Guo, J. Zheng, W. Hong, Application of the Weighted K-Nearest Neighbor Algorithm for Short-Term Load Forecasting, energies, 12 (2019). https://doi.org/10.3390/en12050916
  37. H. Ali, T. R. Saeed, M. H. Al-Muifraje, FPGA Implementation of Visual Speech Recognition System based on NVGRAM-WNN, Int. Conf. Comput. Sci. Software Eng., Duhok, Iraq, (2020) 132-137. https://doi.org/10.1109/CSASE48920.2020.9142095