An Approach to Design Personalized Focused Crawler

Authors

  • Hardik P Trivedi BE Computers, K. J. Somaiya College of Engineering, Mumbai, India
  • Gaurav N Daxini BE Computers, K. J. Somaiya College of Engineering, Mumbai, India
  • Jignesh A Oswal BE Computers, K. J. Somaiya College of Engineering, Mumbai, India
  • Vinay D Gor BE Computers, K. J. Somaiya College of Engineering, Mumbai, India
  • Swati Mali M Tech Computers, K. J. Somaiya College of Engineering, Mumbai, India

Keywords:

Web Crawler, Focused Crawler, World Wide Web(WWW), Content Analysis, Link Scoring, Change Detection

Abstract

The amount of data and its dynamicity makes it impossible to crawl the World Wide Web (WWW) completely. It�s a challenge in front of crawlers to crawl only the relevant pages from this information explosion. Thus a focused crawler solves this issue of relevancy to a certain level, by focusing on web pages for some given topic or a set of topics. Also a focused crawler with a page change detection policy can help in narrowing down the search to only newer pages, and thus eliminates risk of redundancy and missing updated data. This paper proposes a policy for design of a focused crawler with web page change detection policy.

References

Mahdi Bazarganigilani, Ali Syed and Sandid Burki, “Focused web crawling using decay concept and genetic programming”, published in International Journal of Data Mining & Knowledge Management Process (IJDKP), Vol.1, No.1, Page no(1-12), January 2011.

3Swati Mali and B B Meshram, “Focused Web Crawler with Page Change Detection Policy”, published in International Journal of Computer Applications (IJCA) proceedings on International Conference and workshop on Emerging Trends in Technology (ICWET), No 9 Article 9, Page No 51-56, 2011.

4DivakarYadav, AK Sharma, Sonia Sanchez-Cuadrado, Jorge Morato, “an approach to design incremental parallel webcrawler”, published in Journal of Theoretical and Applied Information Technology, Volume 43 No 1, Page no:(8-29), 15 September 2012.

6Anshika Pal, Deepak Tomar and S.C. Shrivastava, “Effective Focused Crawling Based on Content and Link Structure Analysis”, published in (IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, no. 1, Page No: (1-5), June 2009.

7Ioannis Avraam and Ioanni Anagnostopoulos, “A Comparison over Focused Web Crawling Strategies”, published in Panhellenic Conference on Informatics(IEEE), Print ISBN 978-1-61284-962-1,Page No: (245-249), September 2011.

9Weicheng Ma, Xiuxia Chen and Wenqian Shang, “Advanced deep web crawler based on Dom”, published in IEEE Fifth International Joint Conference on Computational Sciences and Optimization, print ISBN 978-1-4673-1365-0, Page No: (605-609), June 2012

Mejdl S. Safran, Abdullah Althagafi and Dunren Che, ”Improving Relevance Prediction for Focused Web Crawlers”, published in IEEE/ACIS 11th International Conference on Computer and Information Science, print ISBN 978-1-4673-1536-4, page no: (161-166), May 2012.

Jatinder Manhas, “A Study of Factors Affecting Websites Page Loading Speed for Efficient Web Performance”, published in International Journal of Computer Sciences and Engineering (IJCSE), Vol-1, Issue-3, Nov 2013.

Downloads

Published

2014-03-31

How to Cite

[1]
H. P. Trivedi, G. N. Daxini, J. A. Oswal, V. D. Gor, and S. Mali, “An Approach to Design Personalized Focused Crawler”, Int. J. Comp. Sci. Eng., vol. 2, no. 3, pp. 144–147, Mar. 2014.

Issue

Section

Research Article