Speaker Recognition System Using Deep Learning with Convolutional Neural Network

Authors

  • Sandeep Kumar School of Information and Technology C-DAC Noida
  • Samridhi Dev School of Information and Technology C-DAC Noida

DOI:

https://doi.org/10.26438/ijcse/v8i10.6064

Keywords:

Convolutional neural network, speaker recognition, Keras, voice signal spectrogram, tuneR

Abstract

The task of identifying humans by their voice seems to be an easy task for human beings as people interact with a particular person, their mind is upskilled with that voice and the brain becomes proficient enough to easily recognize that particular voice next time. Using this human mind concept, the structure is designed and implemented. In the proposed system Convolutional Neural Network (CNN) has been used. 110 voice samples from 11 different participants/speakers have been collected. These voice signals were converted into the form of an image of the signal spectrogram. 90% of data were used for training and the remaining 10% was used for testing. Implementation was done in RStudio with R programming language. The system achieved 82% accuracy. The proposed system is facile and lucrative.

References

[1] Rajsekhar G., “Real-Time Speaker Recognition using MFCC and VQ”, Ph.D. Thesis, Department of Electronics & Communication Engineering, National Institute of Technology Rourkela, pp. 9-71, 2008.

[2] S. Furui, “An Overview of Speaker Recognition Technology”, ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, Martigny, Switzerland, pp. 1-9, April 1994.

[3] Hemant A. and T. K. Basu, “Advances in Speaker Recognition: A Feature-Based Approach,” Int. Conf. Artificial Intelligence and Pattern Recognition, AIPR’07, Orlando, Florida, USA, July 9-12, pp. 528-537, 2007.

[4] Waghmare, et. al., “Emotion Recognition System from Artificial Marathi Speech using MFCC and LDA Techniques” 2014.

[5] P. L. De Leon, et. al., "Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 8, pp. 2280-2290, Oct. 2012.

[6] Kim, et. al., “Dysarthric speech database for universal access research”. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 1741-1744. 2008.

[7] Shrishirmal, et. al., “Development of Marathi Language Speech Database from Marathwada Region” 2015.

[8] P. J. Castellano, et. al., "Telephone-based speaker recognition using multiple binary classifiers and Gaussian mixture models," IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, 1997, pp. 1075-1078 vol.2. 1997.

[9] G. Doddington, “Speaker Recognition – Identifying People by their Voice”, Proceedings of IEEE, vol.73, 1651-1664, Nov. 1985.

[10] Yeldener, S. & Rieser, J.H., “A background noise reduction technique based on sinusoidal speech coding systems. Acoustics, Speech, and Signal Processing”, International Conference on. 3. 1391 - 1394 vol.3. 10.1109/ICASSP.2000.861840.

[11] Ch. Srinivasa Kumar, P. M. Rao., “Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm”, International Journal of Computer Sciences and Engineering, Vol. 3, No. 8, pp.2942-2954, 2011.

[12] Parmar Dharmistha R, “a survey on speaker recognition with various feature extraction techniques, “International Journal of Computer Sciences and Engineering, Vol. 7, Issue. 8, pp.884-887, 2019.

Downloads

Published

2020-10-31
CITATION
DOI: 10.26438/ijcse/v8i10.6064
Published: 2020-10-31

How to Cite

[1]
S. Kumar and S. Dev, “Speaker Recognition System Using Deep Learning with Convolutional Neural Network”, Int. J. Comp. Sci. Eng., vol. 8, no. 10, pp. 60–64, Oct. 2020.

Issue

Section

Research Article