Speaker Recognition System Using Deep Learning with Convolutional Neural Network
DOI:
https://doi.org/10.26438/ijcse/v8i10.6064Keywords:
Convolutional neural network, speaker recognition, Keras, voice signal spectrogram, tuneRAbstract
The task of identifying humans by their voice seems to be an easy task for human beings as people interact with a particular person, their mind is upskilled with that voice and the brain becomes proficient enough to easily recognize that particular voice next time. Using this human mind concept, the structure is designed and implemented. In the proposed system Convolutional Neural Network (CNN) has been used. 110 voice samples from 11 different participants/speakers have been collected. These voice signals were converted into the form of an image of the signal spectrogram. 90% of data were used for training and the remaining 10% was used for testing. Implementation was done in RStudio with R programming language. The system achieved 82% accuracy. The proposed system is facile and lucrative.
References
[1] Rajsekhar G., “Real-Time Speaker Recognition using MFCC and VQ”, Ph.D. Thesis, Department of Electronics & Communication Engineering, National Institute of Technology Rourkela, pp. 9-71, 2008.
[2] S. Furui, “An Overview of Speaker Recognition Technology”, ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, Martigny, Switzerland, pp. 1-9, April 1994.
[3] Hemant A. and T. K. Basu, “Advances in Speaker Recognition: A Feature-Based Approach,” Int. Conf. Artificial Intelligence and Pattern Recognition, AIPR’07, Orlando, Florida, USA, July 9-12, pp. 528-537, 2007.
[4] Waghmare, et. al., “Emotion Recognition System from Artificial Marathi Speech using MFCC and LDA Techniques” 2014.
[5] P. L. De Leon, et. al., "Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 8, pp. 2280-2290, Oct. 2012.
[6] Kim, et. al., “Dysarthric speech database for universal access research”. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 1741-1744. 2008.
[7] Shrishirmal, et. al., “Development of Marathi Language Speech Database from Marathwada Region” 2015.
[8] P. J. Castellano, et. al., "Telephone-based speaker recognition using multiple binary classifiers and Gaussian mixture models," IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, 1997, pp. 1075-1078 vol.2. 1997.
[9] G. Doddington, “Speaker Recognition – Identifying People by their Voice”, Proceedings of IEEE, vol.73, 1651-1664, Nov. 1985.
[10] Yeldener, S. & Rieser, J.H., “A background noise reduction technique based on sinusoidal speech coding systems. Acoustics, Speech, and Signal Processing”, International Conference on. 3. 1391 - 1394 vol.3. 10.1109/ICASSP.2000.861840.
[11] Ch. Srinivasa Kumar, P. M. Rao., “Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm”, International Journal of Computer Sciences and Engineering, Vol. 3, No. 8, pp.2942-2954, 2011.
[12] Parmar Dharmistha R, “a survey on speaker recognition with various feature extraction techniques, “International Journal of Computer Sciences and Engineering, Vol. 7, Issue. 8, pp.884-887, 2019.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
