A Deduplication -Aware similarity finding and removal system for Cloud Provider and Its Users
DOI:
https://doi.org/10.26438/ijcse/v6i9.732736Keywords:
Data deduplication, delta compression, storage system, index structure, performance evaluationAbstract
Data reduction has become increasingly very important in storage systems thanks to the explosive growth of digital information among the globe that has ushered among the large information era. In existing system cloud suppliers offer less method capability and thus displease their users for poor service quality. Therefore, it is vital for a cloud provider to select out applicable servers to provide services; such it reduces worth the most quantity as potential wherever as satisfying its users at the same time. Here the foremost disadvantage duplication therefore to beat of those problems we tend to tend to pick planned model. Throughout this paper, we tend to gift DARE, a low-overhead Deduplication-Aware likeness detection and Elimination theme that effectively exploits existing duplicate-adjacency information for terribly economical likeness detection in information deduplication based backup/archiving storage systems. Our experimental results and backup data sets show that DARE only consumes concerning 1/4 and 1/2 severally of the computation and classification overheads required by the conventional super-feature approaches whereas investigating 2-10% extra redundancy and achieving an improved turnout, by exploiting existing duplicate-adjacency information for likeness detection and finding the “sweet spot” for the super-feature approach.
References
[1] B. Zhu, K. Li, and R. H. Patterson, “Avoiding the disk bottleneck in the data domain deduplication file system,” in Proc. 6th USENIX Conf. File Storage Technol., Feb. 2008, vol. 8, pp. 1–14.
[2] D. T. Meyer and W. J. Bolosky, “A study of practical deduplication,” ACM Trans. Storage, vol. 7, no. 4, p. 14, 2012.
[3] G. Wallace, F. Douglis, H. Qian, P. Shilane, S. Smaldone, M. Chamness, and W. Hsu, “Characteristics of backup workloads in production systems,” in Proc. 10th USENIX Conf. File Storage Technol., Feb. 2012, pp. 33–48.
[4] A. El-Shimi, R. Kalach, A. Kumar, A. Ottean, J. Li, and S. Sengupta, “Primary data deduplication large scale study and system design,” in Proc. Conf. USENIX Annu. Tech. Conf., Jun. 2012, pp. 285– 296.
[5] L. L. You, K. T. Pollack, and D. D. Long, “Deep store: An archival storage system architecture,” in Proc. 21st Int. Conf. Data Eng., Apr. 2005, pp. 804–815.
[6] A. Muthitacharoen, B. Chen, and D. Mazieres, “A low-bandwidth network file system,” in Proc. ACM Symp. Oper. Syst. Principles. Oct. 2001, pp. 1–14.
[7] N. Agrawal, W. Bolosky, J. Douceur, and J. Lorch. A five-year study of file-system metadata. In FAST’07: Proceedings of 5th Conference on File and Storage Technologies, pages 31–45, February 2007. [2] M. G. Baker, J. H. Hartman, M. D. Kupfer, K. W. Shirriff, and J. K. Ousterhout. Measurements of a distributed file system. In Proceedings of the Thirteenth Symposium on Operating Systems Principles, Oct. 1991.
[8] W. Hsu and A. J. Smith. Characteristics of I/O traffic in personal computer and server workloads. IBM Systems Journal, 42:347–372, April 2003.
[9] IDC. Worldwide purpose-built backup appliance 2011-2015 forecast and 2010 vendor shares, 2011. [17] E. Kruus, C. Ungureanu, and C. Dubnicki. Bimodal content defined chunking for backup streams. In FAST’10: Proceedings of the 8th Conference on File and Storage Technologies, February 2010.
[10] P. Kulkarni, F. Douglis, J. LaVoie, and J. M. Tracey. Redundancy elimination within large collections of files. In Proceedings of the USENIX Annual Technical Conference, pages 59–72, 2004.
[11] D. A. Lelewer and D. S. Hirschberg. Data compression. ACM Computing Surveys, 19:261–296, 1987. [20] A. Leung, S. Pasupathy, G. Goodson, and E. L. Miller. Measurement and analysis of large-scale network file system workloads. In Proceedings of the 2008 USENIX Technical Conference, June 2008.
[12] J. Bennett, M. Bauer, and D. Kinchlea. Characteristics of files in NFS environments. In SIGSMALL’91: Proceedings of 1991 Symposium on Small Systems, June 1991.
[13] D. R. Bobbarjung, S. Jagannathan, and C. Dubnicki. Improving duplicate elimination in storage systems. Transactions on Storage, 2:424–448, November 2006.
[14] W. J. Bolosky, S. Corbin, D. Goebel, and J. R. Douceur. Single instance storage in Windows 2000. In Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4, pages 2– 2, Berkeley, CA, USA, 2000. USENIX Association.
[15] M. Chamness. Capacity forecasting in a backup storage environment. In LISA’11: Proceedings of the 25th Large Installation System Administration Conference, Dec. 2011.
[16] A. Chervenak, V. Vellanki, and Z. Kurmas. Protecting file systems: A survey of backup techniques. In Joint NASA and IEEE Mass Storage Conference, 1998.
[17] W. Dong, F. Douglis, K. Li, H. Patterson, S. Reddy, and P. Shilane. Tradeoffs in scalable data routing for deduplication clusters. In FAST’11: Proceedings of 9th Conference on File and Storage Technologies, Feb. 2011.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
