Factored Language Modeling
DOI:
https://doi.org/10.26438/ijcse/v6si1.1925Keywords:
Language model, Perplexity, Factored language model, BackoffAbstract
Language modeling is a technique for finding the next most probable word in a sentence. It is first and essential task for successful implementation of some natural language processing applications like machine translation and speech recognition. It ensures for correctness and fluency of the target output in these applications. N-gram is a traditional way to implement language model in which only previous words in the sentence are used to predict the probable next word in the sentence. Factored language modeling is a method to utilize linguistic knowledge of the word along with the word itself for constructing the language model. The paper describes the factored language modeling technique and compares the results obtained against the traditional n-gram technique using perplexity as a measure.
References
R. Rosenfeld, “Two decades of statistical language modeling: where do we go from here?”, In the Proceedings of the 2000 IEEE Intenational conferance, Vol. 88, Issue. 8 pp. 1270–1278, 2000.
S. F. Chen, J. Goodman, “An Empirical Study of Smoothing Techniques for Language Modeling” , In the Proceedings of the 1996 Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, San Francisco, pp 310-318, 1996.
J.A. Bilmes, K. Kirchhoff, “Factored Language Models and Generalized Parallel Backoff ”, In the Proceedings of the 2003 HLT/NAACL, pp 4-6, 2003.
K. Kirchhoff, J. Bilmes, K. Duh, “Factored Language Models Tutorial”, University of Washington, 2016.
A. E. Axelrod, “Factored Language Models for Statistical Machine Translation ”, University of Edinburgh, 2006.
A. Stolcke, “SRILM- an Extensible Language Modeling Toolkit”, In the Proceedings of the 2002 International Conference on Spoken Language Processing, Denver, Colorado, September 2002.
A. Stolcke, J. Wheng, W. Wang, V. Abrash, “SRILM at Sixteen: Update and Outlook”, In the Proceedings of the 2011 IEEE Automatic Speech Recognition and Understanding Workshop, Waikoloa, 2011.
K. Duh, K. Kirchhoff, “Automatic Learning of Language Model Structure”, In the Proceedings of the 2004 International Conference on Computational Linguistics (COLING), 2004.
E. M. deNovais, “Portuguese Text Generation Using Factored Language Models”, J. Brazilian Computation Society, Vol. 19, Issue. 2, pp 135–146, 2013.
M. Laz ̆ar, D. Militaru, “A Romanian Language Modeling Using Linguistic Factors” , In the Proceedings of the 2013 7th Conference in Speech Technology and Human - Computer Dialogue (SpeD), Cluj-Napoca, , pp. 1–6, 2013.
I. Kipyatkova, A. Karpov, “Study of Morphological Factors of Factored Language Models for Russian ASR”, In the Proceedings of the 2014 SPECOM 2014, Novi Sad, pp. 451–458, 2014.
H. Sak, M. Saraçlar, T. Güngör, “Morphology Based and Sub Word Language Modeling for Turkish Speech Recognition”, In the Proceedings of the 2010 ICASSP, Dallas, pp. 5402–5405, 2010.
A. Mousa, M. Shaik, R. Schlüter, H. Ney, “Morpheme Based Factored Language Models for German LVCSR”, In the Proceedings of the 2011 INTERSPEECH, Florence, pp. 1053–1056, 2011.
Z. Alumae, “Sentence Adapted Factored Language Model for Transcribing Stonian Speech”, In the Proceedings of the 2006 ICASSP, Toulouse, pp. 429–432, 2006.
T. Hirsimaki, J. Pylkkonen, M. Kurimo, “Importance of High-Order N-Gram Models in Morph-Based Speech Recognition”, IEEE Trans. Audio, Speech, Lang. Process. , Vol. 17, Issue. 4, pp. 724–732, 2009.
H. Adel, NT. Vu, K. Kirchhoff, D. Telaar, T. Schultz, “Syntactic and Semantic Features for Code-Switching Factored Language Models”, IEEE/ACM Trans. Audio, Speech, Lang. Process, Vol. 23, Issue. 3, pp. 431–440, 2015.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
