Identification of Duplicate Chunks Using Content Approach

Authors

  • Kaur G Department of Information Technology, Chandigarh Engineering College, Mohali, India
  • Singh Devgan M Department of Information Technology, Chandigarh Engineering College, Mohali, India

DOI:

https://doi.org/10.26438/ijcse/v5i10.110117

Keywords:

Data Deduplication, Duplicate Chunks, Hashing, Execution Time, Polynomial Chunking

Abstract

In this article the implementation of the functions for identification of duplicate chunks based on block, file and content approach have been discussed. The main core of the Deduplication algorithms is chunking and hashing functions. It is also referred as Deduplication granularity. The analysis of these three methods show that the content approach for deduplication is bit slow but the accuracy is good as compared to file and block strategies. It can be seen that the content method of identifying duplicate chunks is about 0.2-0.3% slower but its accuracy is higher by 1-2 % when duplicate finding method of block and file are considered. This work is useful for building duplicate content –aware applications. Especially, when it is used for checking multiple patterns, matching paraphrased content and plagiarism. The proposed methods here can be used for inline as well in the post processing type of Deduplication and it can be extended to include the concept of background and foreground processing.

References

K. Ren, C. Wang and Q. Wang, "Security Challenges for the Public Cloud," IEEE Internet Computing, vol. 16, pp. 69-73, 2012.

Y. Fu, H. Jiang and N. Xiao, "AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment," in 2011 IEEE International Conference on Cluster Computing, 2011, pp. 112-120.

J. Malhotra, J. Bakal and L. G. Malik, "Caching: QoS Enabled Metadata Processing Scheme for Data Deduplication," in Proceedings of the International Congress on Information and Communication Technology: ICICT 2015, Volume 2, Springer Singapore, 2016, pp. 545-553.

J. Xiao, Z. Xu and H. Huang, "Security implications of memory deduplication in a virtualized environment," in 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2013, pp. 1-12.

D. Harnik, B. Pinkas and A. S.-. Peleg, "Side Channels in Cloud Services: Deduplication in Cloud Storage," IEEE Security Privacy, vol. 8, pp. 40-47, 2010.

J. Stanek, A. Sorniotti and E. Androulaki, "A Secure Data Deduplication Scheme for Cloud Storage," in Financial Cryptography and Data Security: 18th International Conference, FC 2014, Christ Church, Barbados, March 3-7, 2014, Revised Selected Papers, Springer Berlin Heidelberg, 2014, pp. 99-118.

Y. C. Moon, H. M. Jung, C. Yoo and Y. W. Ko, "Data Deduplication Using Dynamic Chunking Algorithm," in Computational Collective Intelligence. Technologies and Applications: 4th International Conference, ICCCI 2012, Ho Chi Minh City, Vietnam, November 28-30, 2012, Proceedings, Part II, Springer Berlin Heidelberg, 2012, pp. 59-68.

Y. Fu, H. Jiang and N. Xiao, "Application-Aware Local-Global Source Deduplication for Cloud Backup Services of Personal Storage," IEEE Transactions on Parallel and Distributed Systems, vol. 25, pp. 1155-1165, 2014.

A. Katiyar and J. Weissman, "ViDeDup: An Application-Aware Framework for Video De-duplication," in HotStorage, 2011.

W. Leesakul, P. Townend and J. Xu, "Dynamic data deduplication in cloud storage," in Service Oriented System Engineering (SOSE), 2014 IEEE 8th International Symposium on, 2014, pp. 320-325.

J. Zhang, S. Han, J. Wan, B. Zhu, L. Zhou, Y. Ren and W. Zhang, "IM-Dedup: An Image Management System Based on Deduplication Applied in DWSNs," International Journal of Distributed Sensor Networks, vol. 9, 2013.

S. Mandal, G. Kuenning, D. Ok, V. Shastry, P. Shilane, S. Zhen, V. Tarasov and E. Zadok, "Using Hints to Improve Inline Block-layer Deduplication," in FAST, 2016, pp. 315-322.

A. Ragini and V. Nararaj, "Exploiting The Chunk Redundancy In Cloud Backup Using Alg-De-Duplication Technique," pp. 18-20, 2015.

B. Mao, H. Jiang, S. Wu, Y. Fu and L. Tian, "Read-performance optimization for deduplication-based storage systems in the cloud," ACM Transactions on Storage (TOS), vol. 10, p. 6, 2014.

S. Zhe , S. Jun and Y. Jianming, "A novel approach to data deduplication over the engineering-oriented cloud systems," Integrated Computer-Aided Engineering, vol. 20, pp. 45-57, 2013.

Z. Chen and K. Shen, "OrderMergeDedup: Efficient, Failure-Consistent Deduplication on Flash," in FAST, 2016, pp. 291-299.

T. Jiang, X. Chen and Q. Wu, "Secure and Efficient Cloud Data Deduplication With Randomized Tag," IEEE Transactions on Information Forensics and Security, vol. 12, p. 3, 2017.

J. Hur, D. Koo, Y. Shin and K. Kang, "Secure data deduplication with dynamic ownership management in cloud storage," IEEE Transactions on Knowledge and Data Engineering, vol. 28, pp. 3113-3125, 2016.

S. Mishra and P. Sharma, "Hybrid Cloud Data Security Model Using Splitting Technique," International Journal of Computer Sciences and Engineering , vol. 4, no. 6, 2016.

Y. Zhou, D. Feng and W. Xia, "SecDep: A user-aware efficient fine-grained secure deduplication scheme with multi-level key management," in 2015 31st Symposium on Mass Storage Systems and Technologies (MSST), 2015, pp. 1-14.

Y. Tan, H. Jiang and D. Feng, "CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services," in 2011 IEEE International Parallel Distributed Processing Symposium, 2011, pp. 1266-1277.

Downloads

Published

2025-11-12
CITATION
DOI: 10.26438/ijcse/v5i10.110117
Published: 2025-11-12

How to Cite

[1]
G. Kaur and M. Singh Devgan, “Identification of Duplicate Chunks Using Content Approach”, Int. J. Comp. Sci. Eng., vol. 5, no. 10, pp. 110–117, Nov. 2025.

Issue

Section

Research Article