Web Data Scraper Tools: Survey
Keywords:
Wrapper, Scraper, Document Object Model(DOM)Abstract
World Wide Web contains a huge amount of information that is increasing rapidly. Usually data stored on the web are in unstructured and semi-structured form. In order to obtain the essential data from the web, certain data scraper tools had been invented. In this paper we intend to briefly survey Web Data Scraper Process, the taxonomy for characterizing Web Data Scraper Tools and provide qualitative analysis of them. Hopefully, this work will simulate other studies aimed at a more comprehensive analysis of data scraper approaches and tools for Web data.
References
Searchsoa website : www.searchsoa.techtarget.com
Adelberg, B.Nodose: A Tool for semi-Automatically extracting structured and semi-structured data from text documents.In proceeding of ACM SIGMOD International conference on management of data (Seattle, WA, 1998) pp. 283-294.
Arocena, G.O., Mendelzon, A.O.WebOQL: Restructuring Documents, Databases and Web. In proceedings of the 14th IEEE international conference on data engineering (Orlando, Florida, 1998) pp. 24-33.
Califf, M.E., Mooney, and R.J.: Relational learning of pattern-match rules for information extraction. In proceeding of 16th national conference on artificial intelligence and 11th conference on innovative applications of artificial intelligence (Orlando, Florida, 1999) pp. 328-334.
Crescenzi, V., Mecca, G.: Grammer have exceptions. Information Systems 23, 8 (1998), 539-565.
Baumgartner, R., Gatterbauer, W., Gottlob, G. 2009: Web data extraction system. Encyclopedia of database systems, 3465-3471.
Valter, G. Mecca, Paolo 2001: Road Runner Toward Automatic Generation from Large Web Sites
Noha Negm, Passent, Abdel. B. Salem 2012:A survey of Web Information Extraction Tools
Alberto, Berthier, Altigran, Julianan S.Teixeira : A brief survey of Web Data Extraction Tools
Emilio Ferrara, Giacomo F., Robert Baumgartner: Web Data Extraction, Applications and Techniques: A survey. In ACM Transcations on Computational Logic June 2010.
Baumgartner, R., Flesca, S., and Gottlob, G. Visual Web information extraction with Lixto. In Proceedings of the 26th International Conference on Very Large Database Systems (Rom, Italy, 2001), pp.119-128.
Buneman, P. Semistructured data. In Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Tucson, Arizona, 1997), pp.117-121.
Califf, M. E., and Mooney, R. J. Relational Learning of Pattern-Match Rules for Information Extraction. In Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence (Orlando, Florida, 1999), pp.328-334.
Crescenzi, V., and Mecca, G. Grammars Have Exceptions. Information Systems 23,8 (1998), 539-565.
Crescenzi, V., Mecca, G., and Merialdo, P. RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In Proceedings of the 26th International Conference on very large Database Systems (Rome, Italy, 2001).
Embley, D. W., Campbell, D. M., Jiang, Y. S., Liddle, S. W., Kai Ng, Y., Quass, D., and Smith, R. D. Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages. Data and Knowledge Engineering 31, 3 (1999), 227-251.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
