Harvesting the Resources of Invisible Web
Keywords:
Search Engines, Invisible Web, Surface Web, Internet PortalsAbstract
The World Wide Web is constantly becoming an important part of social, cultural, political, educational, academic, and commercial life. Web contains a wide range of information and applications in areas that are of societal interest. A great number of World Wide Web users use search engines for information retrieval, but still hesitate before making a final decision, often because only rough and limited information about the products is made available. There are millions of high quality resources available on web that the general-purpose search engines can’t see. One of the supportive reasons for this could be use of irrelevant keyword(s) or choice of a wrong search engines for executing a particular request of the searcher. Many times search engine cannot find out what we exactly wanted from it. The major reason why sometimes we do not succeed to acquire efficient results, other than these reasons, is the technical inability of search engines to access and retrieve some of the contents present on the web. That is, some of the information is hidden from the eyes of even efficient search engines. Such information which remains inaccessible from web search engines is termed as “Invisible Web”. Invisible Web contains resources that are not indexed by general-purpose search engines, but this does not indicate that these resources are absolute leftovers and unimportant. The information that is not accessed by a search engine is as much significant as that which is accessed. Invisible web is a phenomenon to be reckoned with. This paper provides a view of Invisible Web and also delves into the reasons why search engines can’t see all of the web contents. Various resources present in invisible web are also discussed. Paper also provides a list of search engines that could mine and harvest Invisible Web.
References
Jacsó, P. (2005), "Google Scholar: the pros and cons", Online Information Review, Vol. 29, No. 2, pp. 208-214.
CompletePlanet. (2004). “Largest deep web sites”. BrightPlanet. Available: http://aip. completeplanet.com/aip-engines/help/largest_engines.jsp
Devine, Jane, and Francine Egger-Sider. 2001. Beyond Google: The Invisible Web. Available: www.lagcc.cuny.edu/library/invisibleweb/definition.htm
Bergman, Michael K. (2001). “The deep Web: Surfacing hidden value.” White paper. BrightPlanet. Available: www.brightplanet.com/images/stories/pdf/deepwebwhite paper. pdf
Sullivan, Danny. (2008). “Google now fills out forms and crawls results.” Search Engine Land. Available: http://searchengineland.com/080411-140000.php
Williams, M.E. (2005), "The state of databases today: 2005", in Gale Directory of Databases, Vol. 2, pp. XV-XXV, Gale Group, Detroit, MI.
Ru, Y. and Horowitz, E. (2005), "Indexing the invisible web: a survey", Online Information Review, Vol. 29, No. 3, pp. 249-265.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
