TACTICS FOR DYNAMIC DATA CLEANSING AND DATA PROFILING USING DIMENSIONS FOR DATA QUALITY ASSESSMENT
DOI:
https://doi.org/10.26438/ijcse/v6i4.271276Keywords:
Data Analysis, Data Profiling, Data Cleansing, Data Standardization, Data Score CardsAbstract
We classify data quality problems that are directed by data cleaning and provide an overview of the principal Solution approaches.Data cleansing is particularly needed when integrating heterogeneous data sources and Should be directed together with schema-related data transformations. We also discuss current tool support for data cleanup. Data profiling is a specific form of data analysis customer data to detect and characterize important features of data sets. Data Analysis offers a delineation of data structure, content, rules and relationships by using statistical methodologies to deliver a lot of standard characteristics about data -data types, field lengths and cardinality of columns, granularity, value sets, format patterns, content patterns, implied rules, and cross-column and cross-file data relationships and cardinality of those relationships. Data deduplication has been advocated as a promising and effective technique to save the digital space by removing the duplicated data from the data centres or clouds. Data deduplication is a process of identifying the redundancy in data and then removing it. The resulting unique data/Consolidate data into single format using data cleansing and Data standardization. Use scorecards to measure data quality progress and shared URL link to the stakeholder.
References
[1] Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. ACM SIGMOD Record 26(1), 1997.
[2] Batini, C.; Lenzerini, M.; Navathe, S.B.: A Comparative Analysis of Methodologies for Database Schema Integration.In Computing Surveys 18(4):323-364, 1986.
[3] Bouzeghoub, M.; Fabret, F.; Galhardas, H.; Pereira, J; Simon, E.; Matulovic, M.: Data Warehouse Refreshment. In [16]:47-67.
[4] Abiteboul, S.; Clue, S.; Milo, T.; Mogilevsky, P.; Simeon, J.: Tools for Data Translation and Integration. In [26]:3-8, 1999.
[5] Lee, M.L.; Lu, H.; Ling, T.W.; Ko, Y.T.: Cleansing Data for Mining and Warehousing. Proc. 10th Intl. Conf.Database and Expert Systems Applications (DEXA), 1999.
[6] Rundensteiner, E. (ed.): Special Issue on Data Transformation. IEEE Tech. Bull. Data Engineering 22(1), 1999.
[7] Cohen, W.: Integration of Heterogeneous Databases without Common Domains Using Queries Based Textual Similarity. Proc. ACM SIGMOD Conf. on Data Management, 1998.
[8] Bernstein, P.A.; Dayal, U.: An Overview of Repository Technology. Proc. 20th VLDB, 1994.
[9] Quass, D.: A Framework for Research in Data Cleaning. Unpublished Manuscript. Brigham Young Univ., 1999.
[10] Hernandez, M.A.; Stolfo, S.J.: Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem. Data Mining and Knowledge discovery 2(1):9-37, 1998.
[11] Erhard Rahm and H. Hai Do. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4):3--13, December 2000.
[12] M.Jayakameswaraiah, Dr.S.Ramakrishna, “A Study on Prediction Performance of some Data Mining Algorithms”, International Journal of Engineering & Technology, ISSN: 2321 7782, Volume-2, Issue-10, pp 141-144 (2014).
[13] K.S.N.Prasad, S.Ramakrishna “Text Analytics to Data Warehousing” (IJCSE) International Journal on Computer Science and Engineering” Vol.02,No.06,2010,PP:2201-2207.
[14] K.S.N.Prasad,S.Ramakrishna”An Autonomous Forest Fire Detection System Based On Spatial Data Mining and Fuzzy Logic”(IJCSNS) International Journal of Computer Science and Network Security,Vol.8 No.12,December 2000.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
