Giving Future Vision to IR: A Query Clustering Approach

Authors

  • Gaurav Dubey Amity University, Noida, India
  • Romina Nayak Kellton Tech Solutions Ltd., Gurgaon, India
  • Neha Wadhwa Amity University, Noida, India
  • Ajay Rana Amity University, Noida, India

Keywords:

Data Warehouse, Information Retrieval, Query Clustering, Apriori, Subject Area Identification

Abstract

Information Retrieval (IR) has become very tedious given the amount of data handled these days. Search engines are posed with an ever increasing responsibility of giving precise responses to user queries in minimal time. In this paper, we present a query clustering approach which identifies Frequently Asked Questions (FAQs) for answering future queries. The proposed approach is based on identification of distinct subjects from queries enquired& logged in the past. The queries falling under each of the subject category are then reduced to a group which represents the frequently asked queries. In the past, these queries have been asked frequently & thus have an inclination of being repeated in the future. This will give the interface (e.g. search engines) an ability to predict future queries and respond in a time efficient manner. We extend this approach on a Real Estate data warehouse which proves its viability and efficiency in Real Estate domain as well.

References

. Agrawal, S., Chaudhari, S. and Narasayya, V. ‘Automated Selection of Materialized Views and Indexes in SQL databases’, In 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt, pp. 495-505, 2000.

. Aouiche, K. and Darmont, J. ‘Data mining-based materialized view and index selection in data warehouse’, In Journal of Intelligent Information Systems, Pages 65 – 93, 2009

. Baralis, E., Paraboschi, S. and Teniente, E. ‘Materialized View Selection in a Multidimansional Database’, In 23rd International Conference on Very Large Data Bases (VLDB 1997), Athens, Greece, pp. 156-165, 1997

. Brin, S., Motwani, R., Ullman, J.D., Tsur, S. "Dynamic Itemset Counting and Implication Rules for Market Basket Data", SIGMOD Record, Volume 6, Number 2: New York, June 1997, pp. 255 - 264.

. Chaudhuri, S. and Shim, K. ‘Including Groupby in Query Optimization’, In proceedings of the International Conference on Very Large Database Systems, 1994

. Chirkova R., Halevy A. Y., and Suciu D. ‘A Formal Perspective on the View Selection Problem’, In Proceedings of VLDB, pp 59-68, 2001

. Frakes, W. B. and Baeza-Yates, R. Information Retrieval, Data Structure and Algorithms. Prentice Hall, 1992

. Gupta H. and MumickI. S. ‘Selection of Views to Materialize in a Data warehouse’, IEEE Transactions on Knowledge & Data Engineering, 17(1), pp. 24-43, 2005

. Gupta, A., Harinarayan, V. and Quass, D. ‘Generalized Projections: A Powerful Approach to Aggregation’, In proceedings of the International Conference of Very Large Database Systems, 1995

. Harinarayan V., Rajaraman A. and Ullman J. D. ‘Implementing Data Cubes Efficiently’, ACM SIGMOD, Montreal, Canada, pp.205-216, 1996

. Horng J. T., Chang Y. J., Liu B. j., Kao C. Y. ‘Materialized View Selection Using Genetic Algorithms in a Data warehouse System’, In Proceedings of the 1999 congress on Evolutionary Computation, Washington D. C., USA, Vol. 3, 1999

. Inmon W. H. ‘Building the Data Warehouse’, 3rd Edition, Wiley Dreamtech India Pvt. Ltd, 2003

. Jain, A.K. and Dubes, R.C. “Algorithms for Clustering Data”. Englewood Cliffs NJ: Prentice Hall, 1988

. Lawrence, M. ‘Multiobjective Genetic Algorithms for Materialized View Selection in OLAP Data Warehouses’, GECCO’06, July 8-12, SeattleWashington, USA, 2006

. Lehner, W., Ruf, T. and Teschke, M. ‘Improving Query Response Time in Scientific Databases Using Data Aggregation’, In proceedings of 7th International Conference and Workshop on Database and Expert Systems Applications, DEXA 96, Zurich, 1996

. Mohania M., Samtani S., Roddick J. and Kambayashi Y. ‘Advances and Research Directions in Data Warehousing Technology’, Australian Journal of Information Systems, 1998

. O’Neil, P. and Graefe, G. ‘Multi-Table joins through Bitmapped Join Indices’, SIGMOD Record, Vol. 24, No. 3, pp. 8-11, 1995

. Shah, B., Ramachandran, K. and Raghavan, V. ‘A Hybrid Approach for Data Warehouse View Selection’, International Journal of Data Warehousing and Mining, Vol. 2, Issue 2, pp. 1 – 37, 2006

. Teschke, M. and Ulbrich, A. ‘Using Materialized Views to Speed Up Data Warehousing’, Technical Report, IMMD 6, Universität Erlangen-Nümberg, 1997

. Theodoratos, D. and Sellis, T. ‘Data Warehouse Configuration’. In proceeding of VLDB pp. 126-135, Athens, Greece, 1997

. Theodoratos, D. and Xu, W. ‘Constructing Search Spaces for Materialized View Selection’, In 7th ACM Internatioanl Workshop on Data Warehousing and OLAP (DOLAP 2004), Washington, USA, 2004

. Vijay Kumar, T.V., Ghoshal, A.: A Reduced Lattice Greedy Algorithm for Selecting Materialized Views, Communications in Computer and Information Science (CCIS), Volume 31, Springer Verlag, pp. 6-18, 2009

. Vijay Kumar, T.V., Haider, M., Kumar, S.: Proposing Candidate Views for Materialization, Communications in Computer and Information Science (CCIS), Volume 54, Springer Verlag, pp. 89-98, 2010

. Vijay Kumar, T.V., Haider, M.: A Query Answering Greedy Algorithm for Selecting Materialized Views, Lecture Notes in Artificial Intelligence (LNAI), Volume 6422, Springer Verlag, pp. 153-162, 2010

. Vijay Kumar, T.V. and Jain, N.: Selection of Frequent Queries for Constructing Materialized Views in Data Warehouse, The IUP Journal of Systems Management, Vol. 8, No. 2, pp. 46-64, May 2010

. Vijay Kumar, T.V., Goel, A. and Jain, N.: Mining Information for Constructing Materialised Views, International Journal of Information and Communication Technology, Inderscience Publishers, Volume 2, Number 4, pp. 386-405, 2010

. Vijay Kumar, T.V., Haider, M.: Greedy Views Selection using Size and Query Frequency, Communications in Computer and Information Science (CCIS), Volume 125, Springer Verlag, pp. 11-17, 2011

. Vijay Kumar, T.V., Haider, M., Kumar, S.: A View Recommendation Greedy Algorithm for Materialized Views Selection, Communications in Computer and Information Science (CCIS), Volume 141, Springer Verlag, pp. 61-70 , 2011

. Vijay Kumar, T.V. and Devi, K. ‘Frequent Queries Identification for Constructing Materialized Views’, In the proceedings of the International Conference on Electronics Computer Technology(ICECT-2011), April 8-10, 2011, Kanyakumari, Tamil Nadu, Published by IEEE, Volume 6, pp. 177-181, 2011

. Vijay Kumar, T.V., Haider, M.: Selection of Views for Materialization using Size and Query Frequency, Communications in Computer and Information Science (CCIS), Volume 147, Springer Verlag, pp. 150-155, 2011

. Widom, J. ‘Research Problems in Data Warehousing’, 4th International Conference on Information and Knowledge Management, Baltimore, Maryland, pp. 25-30, 1995

. Yang, J., Karlapalem, K. and Li, Q. ‘Algorithms for Materialized View Design in Data Warehousing Environment’, The Very Large databases (VLDB) Journal, pp. 136-145, 1997

Downloads

Published

2014-09-30

How to Cite

[1]
G. Dubey, R. Nayak, N. Wadhwa, and A. Rana, “Giving Future Vision to IR: A Query Clustering Approach”, Int. J. Comp. Sci. Eng., vol. 2, no. 9, pp. 12–17, Sep. 2014.

Issue

Section

Research Article