Information Retrieval System Using Vector Space Model for Document Summarization

Authors

  • Chavan VA Department of Computer Engineering, Savitribai Phule Pune University, Maharashtra, India
  • Durugkar SR Department of Computer Engineering, Savitribai Phule Pune University, Maharashtra, India

Keywords:

Vector space model, Document frequency, Term Frequency, Document context

Abstract

Document summarization is the process of reducing size of text document and that retains the most important content of the original document into the reduced document(Summary).In recent year there are huge work has been done in document summarization. There are various techniques available for document summarization but most of the techniques used similarity of sentences to extract sentence, in the document summarization a context of the document are important, so our current method used term indexing model to gives index to document as well as sentences in that document. In this proposed system we used context based document indexing based on vector space model. This document indexing model works with document frequency (DF) and term frequency (TF).DF and TF model gives document indexing weight which is used for document summarization. We compare our system with traditional term based indexing model and will prove that our system gives better result than this system.

References

X. Wan and J. Xiao, “Exploiting Neighborhood Knowledge for Single Document Summarization and Keyphrase Extraction,” ACM Trans. Information Systems, vol. 28, pp. 8:1-8:34, http://doi.acm.org/10.1145/1740592.1740596, June 2010.

K.S. Jones, “Automatic Summarising: Factors and Directions,” Advances in Automatic Text Summarization, pp. 1-12, MIT Press, 1998.

L.L. Bando, F. Scholer, and A. Turpin, “Constructing Query- Biased Summaries: A Comparison of Human and System Generated Snippets,” Proc. Third Symp. Information Interaction in Context, pp. 195-204, http://doi.acm.org/10.1145/1840784. 1840813, 2010.

X. Wan, “Towards a Unified Approach to Simultaneous Single- Document and Multi-Document Summarizations,” Proc. 23rd Int’l Conf. Computational Linguistics, pp. 1137-1145, http://portal. acm.org/citation.cfm?id=1873781.1873909, 2010.

X. Wan, “An Exploration of Document Impact on Graph-Based Multi-Document Summarization,” Proc. Conf. Empirical Methods in Natural Language Processing, pp. 755-762, http://portal.acm.org/ citation.cfm?id=1613715.1613811, 2008.

Q.L. Israel, H. Han, and I.-Y. Song, “Focused Multi-Document Summarization: Human Summarization Activity vs. Automated Systems Techniques,” J. Computing Sciences in Colleges, vol. 25, pp. 10-20, http://portal.acm.org/citation.cfm?id=1747137. 1747140, May 2010.

C. Shen and T. Li, “Multi-Document Summarization via the Minimum Dominating Set,” Proc. 23rd Int’l Conf. Computational Linguistics, pp. 984-992, http://portal.acm.org/citation.cfm?id= 1873781.1873892, 2010.

X. Wan and J. Yang, “Multi-Document Summarization Using Cluster-Based Link Analysis,” Proc. 31st Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 299-306, http://doi.acm.org/10.1145/1390334.1390386, 2008.

D. Wang, T. Li, S. Zhu, and C. Ding, “Multi-Document Summarization via Sentence-Level Semantic Analysis and Symmetric Matrix Factorization,” Proc. 31st Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 307-314, http://doi.acm.org/10.1145/1390334.1390387, 2008.

S. Harabagiu and F. Lacatusu, “Using Topic Themes for Multi- Document Summarization,” ACM Trans. Information Systems, vol. 28, pp. 13:1-13:47, http://doi.acm.org/10.1145/1777432.1777436, July 2010.

H. Daume´ III and D. Marcu, “Bayesian Query-Focused Summarization,” Proc. 21st Int’l Conf. Computational Linguistics and the 44th Ann. meeting of the Assoc. for Computational Linguistics, pp. 305-312, http://dx.doi.org/10.3115/1220175.1220214, 2006.

D.M. Dunlavy, D.P. O’Leary, J.M. Conroy, and J.D. Schlesinger, “QCS: A System for Querying, Clustering and Summarizing Documents,” Information Processing and Management, vol.43, pp.1588-1605, http://portal.acm.org/citation.cfm?id=1284916.

, Nov. 2007.

R. Varadarajan, V. Hristidis, and T. Li, “Beyond Single-Page Web Search Results,” IEEE Trans. Knowledge and Data Eng., vol. 20, no. 3, pp. 411-424, Mar. 2008.

L.-W. Ku, L.-Y. Lee, T.-H. Wu, and H.-H. Chen, “Major Topic Detection and Its Application to Opinion Summarization,” Proc. 28th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 627-628, http://doi.acm.org/10.1145/ 1076034.1076161, 2005.

E. Lloret, A. Balahur, M. Palomar, and A. Montoyo, “Towards Building a Competitive Opinion Summarization System: Challenges and Keys,” Proc. Human Language Technologies: The 2009 Ann. Conference of the North Am. Ch. Assoc. for Computational Linguistics, Companion Vol. : Student Research Workshop and Doctoral Consortium, pp. 72-77, http://portal.acm.org/citation.cfm?id= 1620932.1620945, 2009.

J.G. Conrad, J.L. Leidner, F. Schilder, and R. Kondadadi, “Query- Based Opinion Summarization for Legal Blog Entries,” Proc. 12th Int’l Conf. Artificial Intelligence and Law, pp. 167-176, http://doi.acm.org/10.1145/1568234.1568253, 2009.

Downloads

Published

2014-10-31

How to Cite

[1]
V. A. Chavan and S. R. Durugkar, “Information Retrieval System Using Vector Space Model for Document Summarization”, Int. J. Comp. Sci. Eng., vol. 2, no. 10, pp. 46–50, Oct. 2014.

Issue

Section

Research Article