A New Technique of Web Page Classification and Optimization

Authors

  • Khan R Department of Computer Science, SRK University, Bhopal, India
  • Gupta RK Department of Computer Science, SRK University, Bhopal, India
  • Namdeo V Department of Computer Science, SRK University, Bhopal, India

DOI:

https://doi.org/10.26438/ijcse/v7i1.381385

Keywords:

Accuracy, Artificial Bee, Classification, Clustering, Colony, Firefly, Features, Homogeneous, HTML, Information, Optimization, Precision, Web

Abstract

The rapid development of the internet and web publishing techniques create numerous information sources published as HTML pages on World Wide Web. WWW is now a popular medium by which people all around the world can spread and gather the information of all kinds. The importance of these Web-specific features and algorithms, describe the state-of-the-art practices, and the following hypothesis. This work is for a better description of Web page classification problem. Since Firefly Algorithm (FA) is a recent nature inspired optimization algorithm, which simulates the flash patterns and characteristics of fireflies. Clustering is a popular data analysis technique to identify homogeneous groups of objects based on the values of their attributes. Here is used for clustering on benchmarks which is more suitable than Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO), and other nine methods used. The webpage optimization using Naïve Bayes classifier is an improved optimized web page classification using firefly algorithm with NB classifier. The inclusion of Naïve Bayes is an expert in the field of firefighting. Current classification techniques use word consistency and grouping techniques for classifying web pages. These Techniques use an ad hoc approach to review and reconcile whole keywords on a website for classification. These methods are effective, but not without problems like slow Processing, word meaning differences, poor identification of sentences also disregard the homonymy of the words. Hence this work is better, in the accuracy, precision, etc. parameters with respect to existing concepts.

References

[1] Guixian Xu ; Ziheng Yu ; Qi Qi, Efficient Sensitive Information Classification and Topic Tracking Based on Tibetan WebPages,IEEE Access, 2018

[2] Ankit Dilip Patel ; Vimal N. Pandya, Web page classification based on context to the content extraction of articles 2nd International Conference for Convergence in Technology (I2CT), 2017

[3] Eldhose P Sim, Classification & detection of near duplicate web pages using five stage algorithm, IEEE, 2015

[4] Guixian Xu ; Chuncheng Xiang ; Xu Gao ; Xiaobing Zhao ; Guosheng Yang, Automatic Classification of Tibetan Web Pages, International Conference on Computer Science and Electronics Engineering, 2012

[5] Jonáš Krutil ; Miloš Kudělka ; Václav Snášel, Web page classification based on Schema.org collection,2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN), 2012

[6] He Youquan ; Xie Jianfang ; Xu Cheng, An improved Naive Bayesian algorithm for Web page text classification, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2011

[7] Boyi Xu ; Jing Wang ; Hongming Cai, A Web page classification algorithm and its application in E-government system, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, 2010

[8] Weitong Huang ; Luxiong Xu ; Yanmin Liu, Preprocessing and Feature Preparation in Chinese Web Page Classification, 2009 International Conference on Computer Engineering and Technology, 2009

[9] Jinbeom Kang ; Joongmin Choi, Block Classification of a Web Page by Using a Combination of Multiple Classifiers, 2008 Fourth International Conference on Networked Computing and Advanced Information Management,2008

[10] Yong Zhang ; Bin Fan ; Long-bin Xiao, Web Page Classification Based on a Least Square Support Vector Machine with Latent Semantic Analysis, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008

[11] Moonis Javed ; Aly Akhtar ; Akif Khan Yusufzai, Classification of Web Pages as Evergreen Or Ephemeral Based on Content, 2015 International Conference on Computational Intelligence and Communication Networks (CICN), 2015

[12] Feiyue Ye ; Zhian Yu, Finding the Semantic Relation between Web Pages through Topic Knowledge Repository, 2009 Ninth IEEE International Conference on Computer and Information Technology, 2009

[13] Chinese Web-page Classification Study, Weitong Huang ; LuXiongXu ; Junfeng Duan ; Yuchang Lu, Chinese Web-page Classification Study, 2007 IEEE International Conference on Control and Automation, 2007

[14] Sumaia Mohammed Al-Ghuribi ; Saleh Alshomrani, A Simple Study of Webpage Text Classification Algorithms for Arabic and English Languages, 2013 International Conference on IT Convergence and Security (ICITCS), 2013

[15] Daya Gupta ; Harsh Tripathi ; Mayukh Maitra, Classifying web hierarchically using multi label tree classifier, 2015 Annual IEEE India Conference (INDICON), 2015

[16] Prabhu, Yashoteja, Manik Varma, FastXML: a fast accurate and stable tree-classifier for extreme multilabel learning, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014

[17] E. Lee, J. Kang, J. Choi, and J. Yang., Topic-specific web content adaptation to mobile devices,e 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pages 845-848. IEEE Computer Society, 2006

Downloads

Published

2019-01-31
CITATION
DOI: 10.26438/ijcse/v7i1.381385
Published: 2019-01-31

How to Cite

[1]
R. Khan, R. K. Gupta, and V. Namdeo, “A New Technique of Web Page Classification and Optimization”, Int. J. Comp. Sci. Eng., vol. 7, no. 1, pp. 381–385, Jan. 2019.

Issue

Section

Research Article