An Overview of Various Classification Concepts of Web Page Content

Authors

  • Khan R Department of Computer Science, SRK University, Bhopal, India
  • Gupta RK Department of Computer Science, SRK University, Bhopal, India
  • Namdeo V Department of Computer Science, SRK University, Bhopal, India

DOI:

https://doi.org/10.26438/ijcse/v7i1.377380

Keywords:

Algorithm, Assumption, Classification, Directory, Features, Information, Process, XML, Wepage

Abstract

This paper collects the information about contents available over webpage since the Web is a huge stock of information that requires precise automated classifiers for web pages to manage web directories and increase search engine performance. In the Web page classification problem, each term can be used as a feature of each HTML / XML tag of each web page. This is an efficient way to select the best features to reduce the functional space of the derived Web page classification problem here. Content classification of web pages is essential for many Web information retrieval tasks, such as web directory management and targeted scanning. The uncontrolled nature of web content poses additional problems for the classification of web pages over traditional text classification. However, the interdependent nature of hypertext also provides functions that support the process. As with the work described in the Web page classification, the meaning of these Web-specific functions and algorithms describes leading practices and follows the assumptions underlying the use of adjacent page information.

References

[1] Daya Gupta ; Harsh Tripathi ; Mayukh Maitra, Classifying web hierarchically using multi label tree classifier, 2015 Annual IEEE India Conference (INDICON), 2015

[2] Sumaia Mohammed Al-Ghuribi ; Saleh Alshomrani, A Simple Study of Webpage Text Classification Algorithms for Arabic and English Languages, 2013 International Conference on IT Convergence and Security (ICITCS), 2013

[3] Chinese Web-page Classification Study, Weitong Huang ; LuXiongXu ; Junfeng Duan ; Yuchang Lu, Chinese Web-page Classification Study, 2007 IEEE International Conference on Control and Automation, 2007

[4] Guixian Xu ; Ziheng Yu ; Qi Qi, Efficient Sensitive Information Classification and Topic Tracking Based on Tibetan WebPages,IEEE Access, 2018

[5] Jinbeom Kang ; Joongmin Choi, Block Classification of a Web Page by Using a Combination of Multiple Classifiers, 2008 Fourth International Conference on Networked Computing and Advanced Information Management,2008Sara Chadli,Mohamed Emharraf and Mohammed Saber "The design of an IDS architecture for MANET based on multi-agent" International Colloquium on Information Science and Technology (CiSt),IEEE,2014

[6] Feiyue Ye ; Zhian Yu, Finding the Semantic Relation between Web Pages through Topic Knowledge Repository, 2009 Ninth IEEE International Conference on Computer and Information Technology, 2009

[7] He Youquan ; Xie Jianfang ; Xu Cheng, An improved Naive Bayesian algorithm for Web page text classification, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2011

[8] Moonis Javed ; Aly Akhtar ; Akif Khan Yusufzai, Classification of Web Pages as Evergreen Or Ephemeral Based on Content, 2015 International Conference on Computational Intelligence and Communication Networks (CICN), 2015

[9] Guixian Xu ; Chuncheng Xiang ; Xu Gao ; Xiaobing Zhao ; Guosheng Yang, Automatic Classification of Tibetan Web Pages, International Conference on Computer Science and Electronics Engineering, 2012

[10] Jie Chen, Jian Li, Hao Liao, Qingsheng Yuan, Xiuguo Bao; Study on Meaningful String Extraction Algorithm for Improving Webpage Classification, IEEE, 2011

[11] Prabhjot Kaur ,Web Content Classification: A Survey, IJCTT, 2014

[12] Sankalap Arora,Satvir Singh, The Firefly Optimization Algorithm: Convergence Analysis and Parameter Selection, IJCA, 2013

[13] Bundit Manaskasemsak and Arnon Rungsawang, Web Spam Detection using Link-based Ant Colony Optimization Apichat Taweesiriwate, IEEE, 2012

[14] Ontological Based Webpage ClassificationWui Kheun Ong,Jer Lang Hong,Fariza Fauzi,Ee Xion Tan, IEEE, 2012

Downloads

Published

2019-01-31
CITATION
DOI: 10.26438/ijcse/v7i1.377380
Published: 2019-01-31

How to Cite

[1]
R. Khan, R. Gupta, and V. Namdeo, “An Overview of Various Classification Concepts of Web Page Content”, Int. J. Comp. Sci. Eng., vol. 7, no. 1, pp. 377–380, Jan. 2019.