A Comparative Study of Three IR models for Bengali Document Retrieval

Authors

  • Chatterjee S Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
  • Sarkar K Department of Computer Science & Engineering, Jadavpur University, Kolkata, India

Keywords:

Information Retrieval, Bengali language, LSI, BM25, probabilistic, Query

Abstract

In this paper, we studied and examined some selected information retrieval approaches for Bengali information retrieval. These approaches used keyword to describe the content of each document. We choose three models to understand their working mechanisms and shortcomings. These models are TFIDF Vector Space model, Latent Semantic Indexing (LSI) model, and BM25 model. This understanding is important to overcome these shortcomings. These models are examined on our created Bengali dataset and Bengali queries and the results are stated in the result section in this paper. Our study reveals that Okapi BM25 model performs best among the three IR models studied for Bengali document retrieval

References

[1] R. Banerjee, & S. Pal, “ISM @ FIRE - 2011: Monolingual Task”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2011). Available at http://www.isical.ac.in/~fire/2011/workingnotes. html (visited May 2015),2011.

[2] U. Barman, P. Lohar, P. Bhaskar, & S. Bandyopadhyay, “ Ad-hoc Information Retrieval focused on Wikipedia based Query Expansion and Entropy Based Ranking” ,Working Notes of the Forum for Information Retrieval Evaluation, Available at http://www.isical.ac.in/~fire/2012/working-notes.html, 2012.

[3] P. Bhaskar, Das, A. Pakra & S. Bandyopadhyay , “Theme Based English and Bengali Ad-hoc Monolingual Information Retrieval in FIRE 2010”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2010), Available at http://www.isical.ac.in/~fire/2010/working_notes.html (visited May 2015), 2010.

[4] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, & R. Harshman, “Indexing by latent semantic analysis”, Journal of the American society for information science, Vol. 41, No. (6), 391. 1990.

[5] L. Dolamic & J. Savoy, “UniNE at FIRE 2008: Hindi, Bengali, and Marathi IR” , In: Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2008). Available at http://www.isical.ac.in/~fire/2008/working_notes.html (visited May 2015) ,2008.

[6] D. Ganguly, J. Leveling, & G. J. F. Jones, “A Case Study in Decompounding for Bengali Information Retrieval. Information Access Evaluation, Multilinguality, Multimodality, and Visualization, Lecture Notes in Computer Science, Vol. 8138, pp. 108-119,2013.

[7] M. Kantrowitz, B. Mohit, & V. Mittal ,“Stemming and Its Effects on TFIDF Ranking” In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece ,pages 357–359, 2000.

[8] W. Kraaij & R. Pohlmann, “Viewing stemming as recall enhancement” In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM ,pp. 40-48,1996.

[9] P. J. Loponen, , & K. Jarvelin, “UTA Stemming and Lemmatization Experiments in the Bengali ad hoc Track at FIRE 2010”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2010). Available at http://www.isical.ac.in/~fire/2010/working_notes.html (visited May 2015), 2010.

Downloads

Published

2025-11-24

How to Cite

[1]
S. Chatterjee and K. Sarkar, “A Comparative Study of Three IR models for Bengali Document Retrieval”, Int. J. Comp. Sci. Eng., vol. 7, no. 1, pp. 220–225, Nov. 2025.