A Comparative Study of Three IR models for Bengali Document Retrieval
Keywords:
Information Retrieval, Bengali language, LSI, BM25, probabilistic, QueryAbstract
In this paper, we studied and examined some selected information retrieval approaches for Bengali information retrieval. These approaches used keyword to describe the content of each document. We choose three models to understand their working mechanisms and shortcomings. These models are TFIDF Vector Space model, Latent Semantic Indexing (LSI) model, and BM25 model. This understanding is important to overcome these shortcomings. These models are examined on our created Bengali dataset and Bengali queries and the results are stated in the result section in this paper. Our study reveals that Okapi BM25 model performs best among the three IR models studied for Bengali document retrieval
References
[1] R. Banerjee, & S. Pal, “ISM @ FIRE - 2011: Monolingual Task”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2011). Available at http://www.isical.ac.in/~fire/2011/workingnotes. html (visited May 2015),2011.
[2] U. Barman, P. Lohar, P. Bhaskar, & S. Bandyopadhyay, “ Ad-hoc Information Retrieval focused on Wikipedia based Query Expansion and Entropy Based Ranking” ,Working Notes of the Forum for Information Retrieval Evaluation, Available at http://www.isical.ac.in/~fire/2012/working-notes.html, 2012.
[3] P. Bhaskar, Das, A. Pakra & S. Bandyopadhyay , “Theme Based English and Bengali Ad-hoc Monolingual Information Retrieval in FIRE 2010”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2010), Available at http://www.isical.ac.in/~fire/2010/working_notes.html (visited May 2015), 2010.
[4] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, & R. Harshman, “Indexing by latent semantic analysis”, Journal of the American society for information science, Vol. 41, No. (6), 391. 1990.
[5] L. Dolamic & J. Savoy, “UniNE at FIRE 2008: Hindi, Bengali, and Marathi IR” , In: Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2008). Available at http://www.isical.ac.in/~fire/2008/working_notes.html (visited May 2015) ,2008.
[6] D. Ganguly, J. Leveling, & G. J. F. Jones, “A Case Study in Decompounding for Bengali Information Retrieval. Information Access Evaluation, Multilinguality, Multimodality, and Visualization, Lecture Notes in Computer Science, Vol. 8138, pp. 108-119,2013.
[7] M. Kantrowitz, B. Mohit, & V. Mittal ,“Stemming and Its Effects on TFIDF Ranking” In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece ,pages 357–359, 2000.
[8] W. Kraaij & R. Pohlmann, “Viewing stemming as recall enhancement” In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM ,pp. 40-48,1996.
[9] P. J. Loponen, , & K. Jarvelin, “UTA Stemming and Lemmatization Experiments in the Bengali ad hoc Track at FIRE 2010”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2010). Available at http://www.isical.ac.in/~fire/2010/working_notes.html (visited May 2015), 2010.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
