Automatic News Article Summarization

Authors

  • Rananavare LB Dept. of CSE, Sri Venkateswara University College of Engineering, Tirupathi, India
  • P Venkata Subba Reddy Dept. of CSE, Sri Venkateswara University College of Engineering, Tirupathi, India

DOI:

https://doi.org/10.26438/ijcse/v6i2.230237

Keywords:

Text Summarization, Natural Language Processing, News Paper Articles, Intelligence mining, RDF Triplets, NER

Abstract

A summary condenses a lengthy document by highlighting salient features. It helps reader to understand completely just by reading summary so that the reader can save time and also can decide whether to go through the entire document. Summaries should be shorter than the original article so make sure that to select only pertinent information to include the article. The main goal of newspaper article summary is, the readers to walk away with knowledge on what the newspaper article is all about without the need to read the entire article. This work proposes a news article summarization system which access information from various local on-line newspapers automatically and summarizes information using heterogeneous articles. To make ad-hoc keyword based extraction of news articles, the system uses a tailor-made web crawler which crawls the websites for searching relevant articles. Computational Linguistic techniques mainly Triplet Extraction, Semantic Similarity calculation and OPTICS clustering with DBSCAN is used alongside a sentence selection heuristic to generate coherent and cogent summaries irrespective of the number of articles supplied to the engine. The performance evaluation is done using ROUGE metric.

References

[1] McKnight, W. “Text data mining in business intelligence”, Information Management, 15(1):80, 2005.

[2] Barzilay, R. and McKeown, K. R., “Sentence fusion for multidocument news summarization”, Computational Linguistics, 31(3):297-328, 2005

[3] Nenkova, A., Vanderwende, L., and McKeown, K., “A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization” In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 573-580. ACM, 2006

[4] Lin, C.-Y. and Hovy, E., “The automated acquisition of topic signatures for text summarization”, In Proceedings of the 18th conference on Computational linguistics-Volume 1, pages 495-501. Association for Computational Linguistics, 2000

[5] Bian, J., Yang, Y., and Chua, T.-S, “Multimedia summarization for trending topics in microblogs”, In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 1807-1812. ACM 2013.

[6] Hennig, L. and Labor, D., “Topic-based multidocument summarization with probabilistic latent semantic analysis”, In RANLP, pages 144-149, 2009

[7] Massandy, D. T. and Khodra, M. L., “Guided summarization for Indonesian news articles”, In Advanced Informatics: Concept, Theory and Application (ICAICTA), 2014 International Conference of, pages 140-145. IEEE, 2014.

[8] Mani, I. and Bloedorn, E., “Multi-document summarization by graph search and matching”, arXiv preprint cmp-lg/9712004, 1997.

[9] Amato, F., d'Acierno, A., Colace, F., Moscato, V., Penta, A., and Picariello, A., “Semantic summarization of news from heterogeneous sources”, In International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pages 305-314. Springer, 2016.

[10] Alruily, M., Ayesh, A., and Zedan, H., “Crime profiling for the Arabic language using computational linguistic techniques”, Information Processing & Management, 50(2):315-341, 2014.

[11] Kiss, T. and Strunk, J., “Unsupervised multilingual sentence boundary detection”, Computational Linguistics, 32(4):485-525, 2006.

[12] Marcus, M., Kim, G., Marcinkiewicz, M. A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., and Schasberger, B., “The penn treebank: annotating predicate argument structure”, In Proceedings of the workshop on Human Language Technology, pages 114-119. Association for Computational Linguistics, 1994.

[13] Tjong Kim Sang, E. F. and De Meulder, F., “Introduction to the conll-2003 shared task: Language-independent named entity recognition”, In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 142-147. Association for Computational Linguistics,2003

[14] Nadeau, D. and Sekine, S, “A survey of named entity recognition and classification”, Lingvisticae Investigationes, 30(1):3-26, 2007.

[15] Google (2017). Google Search Engine overview.

Downloads

Published

2025-11-12
CITATION
DOI: 10.26438/ijcse/v6i2.230237
Published: 2025-11-12

How to Cite

[1]
L. B. Rananavare and P. Venkata Subba Reddy, “Automatic News Article Summarization”, Int. J. Comp. Sci. Eng., vol. 6, no. 2, pp. 230–237, Nov. 2025.

Issue

Section

Research Article