Factored Language Modeling

Authors

  • AR Babhulgaonkar Department of Computer Science & Engineering, Walchand College of Engineering, Sangli, India
  • SP Sonavane Department of Information Technology, Walchand College of Engineering, Sangli, India

DOI:

https://doi.org/10.26438/ijcse/v6si1.1925

Keywords:

Language model, Perplexity, Factored language model, Backoff

Abstract

Language modeling is a technique for finding the next most probable word in a sentence. It is first and essential task for successful implementation of some natural language processing applications like machine translation and speech recognition. It ensures for correctness and fluency of the target output in these applications. N-gram is a traditional way to implement language model in which only previous words in the sentence are used to predict the probable next word in the sentence. Factored language modeling is a method to utilize linguistic knowledge of the word along with the word itself for constructing the language model. The paper describes the factored language modeling technique and compares the results obtained against the traditional n-gram technique using perplexity as a measure.

References

R. Rosenfeld, “Two decades of statistical language modeling: where do we go from here?”, In the Proceedings of the 2000 IEEE Intenational conferance, Vol. 88, Issue. 8 pp. 1270–1278, 2000.

S. F. Chen, J. Goodman, “An Empirical Study of Smoothing Techniques for Language Modeling” , In the Proceedings of the 1996 Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, San Francisco, pp 310-318, 1996.

J.A. Bilmes, K. Kirchhoff, “Factored Language Models and Generalized Parallel Backoff ”, In the Proceedings of the 2003 HLT/NAACL, pp 4-6, 2003.

K. Kirchhoff, J. Bilmes, K. Duh, “Factored Language Models Tutorial”, University of Washington, 2016.

A. E. Axelrod, “Factored Language Models for Statistical Machine Translation ”, University of Edinburgh, 2006.

A. Stolcke, “SRILM- an Extensible Language Modeling Toolkit”, In the Proceedings of the 2002 International Conference on Spoken Language Processing, Denver, Colorado, September 2002.

A. Stolcke, J. Wheng, W. Wang, V. Abrash, “SRILM at Sixteen: Update and Outlook”, In the Proceedings of the 2011 IEEE Automatic Speech Recognition and Understanding Workshop, Waikoloa, 2011.

K. Duh, K. Kirchhoff, “Automatic Learning of Language Model Structure”, In the Proceedings of the 2004 International Conference on Computational Linguistics (COLING), 2004.

E. M. deNovais, “Portuguese Text Generation Using Factored Language Models”, J. Brazilian Computation Society, Vol. 19, Issue. 2, pp 135–146, 2013.

M. Laz ̆ar, D. Militaru, “A Romanian Language Modeling Using Linguistic Factors” , In the Proceedings of the 2013 7th Conference in Speech Technology and Human - Computer Dialogue (SpeD), Cluj-Napoca, , pp. 1–6, 2013.

I. Kipyatkova, A. Karpov, “Study of Morphological Factors of Factored Language Models for Russian ASR”, In the Proceedings of the 2014 SPECOM 2014, Novi Sad, pp. 451–458, 2014.

H. Sak, M. Saraçlar, T. Güngör, “Morphology Based and Sub Word Language Modeling for Turkish Speech Recognition”, In the Proceedings of the 2010 ICASSP, Dallas, pp. 5402–5405, 2010.

A. Mousa, M. Shaik, R. Schlüter, H. Ney, “Morpheme Based Factored Language Models for German LVCSR”, In the Proceedings of the 2011 INTERSPEECH, Florence, pp. 1053–1056, 2011.

Z. Alumae, “Sentence Adapted Factored Language Model for Transcribing Stonian Speech”, In the Proceedings of the 2006 ICASSP, Toulouse, pp. 429–432, 2006.

T. Hirsimaki, J. Pylkkonen, M. Kurimo, “Importance of High-Order N-Gram Models in Morph-Based Speech Recognition”, IEEE Trans. Audio, Speech, Lang. Process. , Vol. 17, Issue. 4, pp. 724–732, 2009.

H. Adel, NT. Vu, K. Kirchhoff, D. Telaar, T. Schultz, “Syntactic and Semantic Features for Code-Switching Factored Language Models”, IEEE/ACM Trans. Audio, Speech, Lang. Process, Vol. 23, Issue. 3, pp. 431–440, 2015.

Downloads

Published

2025-11-12
CITATION
DOI: 10.26438/ijcse/v6si1.1925
Published: 2025-11-12

How to Cite

[1]
A. Babhulgaonkar and S. Sonavane, “Factored Language Modeling”, Int. J. Comp. Sci. Eng., vol. 6, no. 1, pp. 19–25, Nov. 2025.