ASID: Application Specific Time Efficient Inline Deduplication on Cloud Storage
Keywords:
cloud, application, deduplication, integrity, performance, inlineAbstract
As the third party cloud storage services provide fewer maintenance facilities, various enterprises and organizations are attracting towards them. This results in the huge amount of data outsourcing over cloud storage servers. Uncontrolled data proliferation is the huge issue. This increasing backup data volume needs better data management technique to deflate the storage space for cloud servers. Data deduplication is one of the most popular data management approaches, which does not allow storing duplicate data over the storage space. This paper presents the application specific inline data deduplication system on cloud server side along with the efficient and optimized file upload and download operations. The system frames and compares utility based and object map based duplicate content searching techniques on the file and chunk algorithmic levels. Map object plays an important role in quick searching for the duplicates as it evades read operations of the existing files. For downloading the file, the system also provides the functionality of data integrity checking at server side for cloud users to verify the originality of file. The performance of the system is evaluated on random files in the form of flat files, structured files, and unstructured files. The experimental results prove the performance of deduplication system in terms of time and memory usage.
References
N. Mandagere, P. Zhou, M.A. Smith, S. Uttamchandani, “Demystifying data de-duplication”, In the Proceedings of the ACM/IFIP/USENIX Middleware’08 Conference Companion, ACM, Belgium, pp. 12-17, 2008.
Y. Jiang, C. Lin,W.Meng, C. Yu, A. M. Cohen, N. R. Smalheiser, “Rule-based deduplication of article records from bibliographic databases”, Database(Oxford)-The Jouornal of Biological Databases and Curation, 2014.
M. Carvalho, A. H. Laender, M.A. Goncalves, A. S. da Silvaet, "A genetic programming approach to record deduplication." IEEE Transactions on Knowledge and Data Engineering, Vol. 24, Issue 3, pp.399-412, 2012
Y. Li, K. Xia, “Fast Video Deduplication via Locality Sensitive Hashing with Similarity Ranking”. In the Proceedings of the 2016 International Conference on Internet Multimedia Computing and Service, ACM, China, pp.94-98, 2016.
O. Murashko, J. Thomson, H. Leather, "Predicting and Optimizing Image Compression." In the Proceedings of the 2016 ACM on Multimedia Conference, Amsterdam, The Netherlands, pp. 665-669, 2016.
D. Kim, S. Song, B.Y. Choi, “SAFE: Structure-aware file and email deduplication for cloud-based storage systems”. In Data Deduplication for Data Optimization for Storage and Network Systems. Springer International Publishing. pp.97-115, 2016.
X. Du, W. Hu, Q. Wang, F. Wang, "ProSy: A similarity based inline deduplication system for primary storage." In the proceedings of 2015 IEEE International Conference on Networking, Architecture, and Storage (NAS) Boston, USA, pp. 195-204, 2015.
A. S. Agrawal, J. Malhotra, “Clustered Outband Deduplication on Primary Data” In the proceedings of 2015 IEEE International Conference on Computing Communication Control and Automation (ICCUBEA 2015), Pune, India, pp. 446-450, 2015.
K. He, J. Chen, R. Du, Q. Wu, G. Xue, X. Zhang, "DeyPoS: Deduplicatable Dynamic Proof of Storage for Multi-User Environments," IEEE Transactions on Computers, Vol. 65, Issue. 12, pp. 3631-3645, 2016.
C. Yang, J. Ren, J. Ma, "Provable ownership of file in de-duplication cloud storage," Security and communications network journal, Vol. 8, Issue. 14, pp. 2457-2468, 2013.
X Yao, Y. Lin, Q. Liu, Y. Zhang, "A secure hierarchical deduplication system in cloud storage," In the proceedings of IEEE/ACM 24th International Symposium on Quality of Service (IWQoS), Beijing, China, pp. 1-10,2016.
Y. Zhou, Y. Deng, Y. Li, J. Xie, "Reducing the read latency of in-line deduplication file system," In the proceedings of IEEE 34th International Performance Computing and Communications Conference (IPCCC), Nanjing, China, pp. 1-2, 2015.
G. Wang, Y. Zhao, X. Xie, L. Liu, "Research on a Clustering Data De-Duplication Mechanism Based on Bloom Filter," In the proceedings of IEEE International Conference on Multimedia Technology(ICMT, 2010), Ningbo, China, pp. 1-5, 2010.
J. Wang, Z. Zhao, Z. Xu, H. Zhang, L. Li, Y. Guo, "I-sieve: An inline high performance deduplication system used in cloud storage." IEEE transactions, Tsinghua Science and Technology, Vol. 20, Issue. 1, pp. 17-27, 2015.
Z. Wen, J. Luo, H. Chen, J. Meng, X. Li, J. Li, "A Verifiable Data Deduplication Scheme in Cloud Computing," In the proceedings of International Conference on Intelligent Networking and Collaborative Systems, Salerno, Italy, pp. 85-90, 2014.
M.S. Sulthana, T. Samatha, V. Sravani, A. Mahendra, “Multiple Auditing Schemes with Integrity and Reliability in Cloud Computing”. International Journal of Computer Sciences and Engineering (IJCSE) Vol. 5, Issue.5, pp. 1-6, 2017.
J. Malhotra, J. Bakal "FiLeD: File Level Deduplication Approach". International Journal of Computer Trends and Technology (IJCTT) Vol. 44, Issue. 2, pp.74-79, 2017.
Network working group, RFC 3174 - US Secure Hash Algorithm-1, September 2001.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
