VAARTALAP: Embedding Whisper-AI-like Model into a Video-Conferencing System to Aid Real-Time Translation and Transcription
DOI:
https://doi.org/10.26438/ijcse/v11i12.2125Keywords:
Sound transcription, Sound translation, AI, Deep learning, Real-time, Language barrierAbstract
In this world of digitalized communication, effective communication crosses regional boundaries and linguistic obstacles making the world more connected. The demand for seamless multilingual communication has never been more important as corporations, institutions, and individuals engage on a worldwide scale. This article explores a trailblazing initiative that uses real-time translation and transcription services offered by Whisper-AI to transform the world of video conferences. The goal of the research is to create an AI model that easily interfaces a translation and transcription-based model to work in a real-time video conferencing system. Participants may converse in real-time without any language barriers by utilizing cutting-edge voice recognition and translation technologies.
References
[1] A. Radford, J.W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust Speech Recognition via Large- Scale Weak Supervision”, arXivpreprint, arXiv:2212.04356, 2022. doi10.48550/ arXiv.2212.04356
[2] E. Cho, C. Fügen, T. Herrmann, K. Kilgour, M. Mediani, C. Mohr, J. Niehues, K. Rottmann, C. Saam, S. Stüker, and A. Waibel. 2013. “A real- world system for contemporaneous restatement of German lectures”, In the Proceedings of the 2013 INTERSPEECH, Lyon, France, pp.3473-3477, 2013.
[3] N. Arivazhagan, C. Cherry, I. Te, W. Macherey, P. Baljekar and G. Foster, "Re-Translation Strategies for Long Form, Simultaneous, Spoken Language Translation," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, pp.7919-7923, 2020. doi: 10.1109/ICASSP40776.2020.9054585.
[4] Rothman and A. Gully, “Mills for Natural Language Processing” Second Edition, Packt Publishing, UK, ch. 2, 2022, ISBN 9781803247335
[5] T. Chen, W. Wang, W. Wei, X. Shi, X. Li, J. Ye and K. Knight, "DiDi’s Machine Restatement System for WMT 2020", In the Proceedings of the 2020 Workshop on Statistical Machine Translation (WMT), pp.105?112, 2020.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
