Classification of Audio Segments using Voice Activity Detection

Authors

Kaur .S Dept. of Computer Science and Engineering, Baba Banda Singh Bahadur Engineering College, Fatehgarh Sahib, Punjab, India
Mittal P Dept. of Computer Science and Engineering, Baba Banda Singh Bahadur Engineering College, Fatehgarh Sahib, Punjab, India

DOI:

https://doi.org/10.26438/ijcse/v8i9.101105

Keywords:

Fractal Dimensions

Abstract

Voice activity detection is classifying speech and non-speech frames. Effectively working and noise tolerant voice activity detection technique is responsible for better performance of many new speech technologies in the area of speech processing. In this paper, an unsupervised method for VAD is proposed to identify the segments of speech- presence and speech-absence in an audio. To make the presented algorithm effective and computationally fast, it is implemented by using long-term parameters that are extracted by using Petrosian algorithm used for fractal dimensions. This system plays a significant role in terms of achieving improved speech quality. Two types of datasets recorded in English and Arabic languages are used to analyses the output of the proposed algorithm. An Array of 85 audio signals of TIMIT Database, of different Signal to noise ratios is tested using the algorithm at once. The evaluated performance suggested that the proposed algorithm identifies segments in the audios with different SNR’s.

References

[1] J. Sohn, N. S. Kim, and W. Sung, “A statistical model-based voice activity detection,‘‘ IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1–3, Jan. 1999.

[2] J. Ramirez, J. C. Segura, C. Benitez, L. Garcia, and A. Rubio, “Statistical voice activity detection using a multiple observation likelihood ratio test,‘‘ IEEE Signal Process. Lett., vol. 12, no. 10, pp. 689–692, Oct. 2005.

[3] J.-H. Chang, N. S. Kim, and S. K. Mitra, “Voice activity detection based on multiple statistical models,‘‘ IEEE Trans. Signal Process., vol. 54, no. 6, pp. 1965–1976, Jun. 2006.

[4] J. Wu and X.-L. Zhang, “Maximum margin clustering based statistical VAD with multiple observation compound feature, ‘‘ IEEE Signal Process. Lett., vol. 18, no. 5, pp. 283–286, May 2011.

[5] S. Mudaliar , T.Tahilramani, “Techniques of voice activity detection: A review? in “IJSRD - International Journal for Scientific Research & Development? Vol. 5, Issue 02, 2017

[6] R. Esteller, G. Vachtsevanos, J. Echauz, and B. Litt, “A comparison of waveform fractal dimension algorithms,‘‘ IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 48, no. 2, pp. 177–183, Feb. 2001.

[7] Z.Ali, M.Talha, “ Innovative method for unsupervised voice activity detection and classification of audio segments, in IEEE Int. Conf., Special section on radio frequency identification and security technique , Vol no.6 April 2018.

[8] L. N. Tan, B. J. Borgstrom, and A. Alwan, “Voice activity detection using harmonic frequency components in likelihood ratio test,‘‘ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Mar. 2010, pp. 4466–4469.

[9] M.M. Alsulaiman, G. Muhammd, M. A. Bencherif, A. Mahmood, and Z. Ali, “KSU rich Arabic speech database,‘‘ J. Inf., vol. 16, no. 6, pp. 4231–4253, 2013.

[10] R. J. Moran, R. B. Reilly, P. de Chazal, and P. D. Lacy, “Telephonybased voice pathology assessment using automated speech analysis,‘‘ IEEE Trans. Biomed. Eng., vol. 53, no. 3, pp. 468–477, Mar. 2006.

[11] T. R. Senevirathne, E. L. J. Bohez, and J. A. Van Winden, “Amplitude scale method: New and efficient approach to measure fractal dimension of speech waveforms,‘‘ Electron. Lett., vol. 28, no. 4, pp. 420–422, Feb. 1992.

Downloads

PDF ⁰

Published

2020-09-30

CITATION

DOI: 10.26438/ijcse/v8i9.101105

Published: 2020-09-30

How to Cite

[1]

S. Kaur and P. Mittal, “Classification of Audio Segments using Voice Activity Detection”, Int. J. Comp. Sci. Eng., vol. 8, no. 9, pp. 101–105, Sep. 2020.

Download Citation

Issue

Vol. 8 No. 9 (2020): IJCSE September Edition

Section

Research Article

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.

Classification of Audio Segments using Voice Activity Detection

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

UGC Gazette Regulation

Join Editorial Board

Information

Current Issue

Keywords