Classification of Audio Segments using Voice Activity Detection

Authors

  • Kaur .S Dept. of Computer Science and Engineering, Baba Banda Singh Bahadur Engineering College, Fatehgarh Sahib, Punjab, India
  • Mittal P Dept. of Computer Science and Engineering, Baba Banda Singh Bahadur Engineering College, Fatehgarh Sahib, Punjab, India

DOI:

https://doi.org/10.26438/ijcse/v8i9.101105

Keywords:

Fractal Dimensions

Abstract

Voice activity detection is classifying speech and non-speech frames. Effectively working and noise tolerant voice activity detection technique is responsible for better performance of many new speech technologies in the area of speech processing. In this paper, an unsupervised method for VAD is proposed to identify the segments of speech- presence and speech-absence in an audio. To make the presented algorithm effective and computationally fast, it is implemented by using long-term parameters that are extracted by using Petrosian algorithm used for fractal dimensions. This system plays a significant role in terms of achieving improved speech quality. Two types of datasets recorded in English and Arabic languages are used to analyses the output of the proposed algorithm. An Array of 85 audio signals of TIMIT Database, of different Signal to noise ratios is tested using the algorithm at once. The evaluated performance suggested that the proposed algorithm identifies segments in the audios with different SNR’s.

References

[1] J. Sohn, N. S. Kim, and W. Sung, “A statistical model-based voice activity detection,‘‘ IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1–3, Jan. 1999.

[2] J. Ramirez, J. C. Segura, C. Benitez, L. Garcia, and A. Rubio, “Statistical voice activity detection using a multiple observation likelihood ratio test,‘‘ IEEE Signal Process. Lett., vol. 12, no. 10, pp. 689–692, Oct. 2005.

[3] J.-H. Chang, N. S. Kim, and S. K. Mitra, “Voice activity detection based on multiple statistical models,‘‘ IEEE Trans. Signal Process., vol. 54, no. 6, pp. 1965–1976, Jun. 2006.

[4] J. Wu and X.-L. Zhang, “Maximum margin clustering based statistical VAD with multiple observation compound feature, ‘‘ IEEE Signal Process. Lett., vol. 18, no. 5, pp. 283–286, May 2011.

[5] S. Mudaliar , T.Tahilramani, “Techniques of voice activity detection: A review? in “IJSRD - International Journal for Scientific Research & Development? Vol. 5, Issue 02, 2017

[6] R. Esteller, G. Vachtsevanos, J. Echauz, and B. Litt, “A comparison of waveform fractal dimension algorithms,‘‘ IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 48, no. 2, pp. 177–183, Feb. 2001.

[7] Z.Ali, M.Talha, “ Innovative method for unsupervised voice activity detection and classification of audio segments, in IEEE Int. Conf., Special section on radio frequency identification and security technique , Vol no.6 April 2018.

[8] L. N. Tan, B. J. Borgstrom, and A. Alwan, “Voice activity detection using harmonic frequency components in likelihood ratio test,‘‘ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Mar. 2010, pp. 4466–4469.

[9] M.M. Alsulaiman, G. Muhammd, M. A. Bencherif, A. Mahmood, and Z. Ali, “KSU rich Arabic speech database,‘‘ J. Inf., vol. 16, no. 6, pp. 4231–4253, 2013.

[10] R. J. Moran, R. B. Reilly, P. de Chazal, and P. D. Lacy, “Telephonybased voice pathology assessment using automated speech analysis,‘‘ IEEE Trans. Biomed. Eng., vol. 53, no. 3, pp. 468–477, Mar. 2006.

[11] T. R. Senevirathne, E. L. J. Bohez, and J. A. Van Winden, “Amplitude scale method: New and efficient approach to measure fractal dimension of speech waveforms,‘‘ Electron. Lett., vol. 28, no. 4, pp. 420–422, Feb. 1992.

Downloads

Published

2020-09-30
CITATION
DOI: 10.26438/ijcse/v8i9.101105
Published: 2020-09-30

How to Cite

[1]
S. Kaur and P. Mittal, “Classification of Audio Segments using Voice Activity Detection”, Int. J. Comp. Sci. Eng., vol. 8, no. 9, pp. 101–105, Sep. 2020.

Issue

Section

Research Article