Detection of Longest Common Sub Sequence in Normal DNA and Dengue Virus Affected Human DNA using Self Organizing Map

Authors

  • G Tamilpavai Dept. of Computer Science and Engineering, Government College of Engineering, Tirunelveli, Tamil Nadu, India
  • C Vishnuppriya Dept. of Computer Science and Engineering, Government College of Engineering, Tirunelveli, Tamil Nadu, India

DOI:

https://doi.org/10.26438/ijcse/v8i1.17

Keywords:

Bioinformatics, K-mers, Longest Common Sub Sequence (LCSS), String pattern matching algorithms

Abstract

Bioinformatics is an active research area which combines biological matter as well as computer science research. Detection of disease causing human Deoxyribo Nucleic Acid (DNA) sequence analysis is one of the major application areas under bioinformatics. Among the severe diseases, the number of Dengue cases and deaths are raised in Tamil Nadu. Identification of sequence motifs involved in Dengue virus is essential for early prediction and saving human life. It includes wide ranges of steps for disease diagnosing. The scope of this proposed work is to provide the longest common subsequence which present in a normal and Dengue virus affected human DNA sequence. The human DNA sequences are collected from National Center for Biotechnology Information (NCBI) database. Human DNA sequence is separated as k-mer using k-mer separation rule. From that, the separated k-mers are clustered using Self Organizing Map (SOM) algorithm. In which mean, median and standard deviation are used as features for clustering k-mers. Then obtained k-mers clusters are given to the Longest Common Subsequence (LCSS) algorithm to find common subsequence with higher length, which presents in every kmers clusters. Time consumption for identification of LCSS is compared for both normal and Dengue virus affected DNA.

References

[1] Vinayak Majki, Sudip Paul and Rachna Jain, “Bioinformatics for Healthcare Applictions”, IEEE Conference, pp.2014-207, 2019.

[2] Terasa K.Attwood, David J.Parry-Smith and Phukan, Introduction to bioinformatics, Noida(U.P), India: Pearson India Education Services Pvt. Ltd, pp.221, 2008.

[3] Izzat Alsmadi and Maryam Nuser, “String Matching Evaluation Methods for DNA Comparison”, International Journal of Advanced Science and Technology”, Vol.47, pp.13-32, 2012.

[4] Sasikala S, Ratha Jeyalakshmi T, “Extensive Review on Computational Predictions of Genomic Regulatory Sequences”, International Journal of Computer Sciences and Engineering, Vol.07, Issue.08, pp.91-94, 2019.

[5] Amit U Sinha and Raj Bhatnagar, “Efficient and Scalable Motif Discovery using Graph-based Search”, IEEE symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp.197-204, 2007.

[6] Khumukcham Robindro, Ashoke Das, “Effectiveness of Ssaha Algorithm for Searching Motif in Large Databases of DNA Sequences” , International Journal of Scientific Research in Computer Science and Engineering, Vol.5, Issue.4, pp.79-87, 2017.

[7] S.Rajesh, S.Prathima and Dr.L.S.S.Reddy, “Unusual Pattern Detection in DNA Database using KMP Algorithm”, International Journal of Computer Applications (0975-8887), Vol.1, Issue.22, pp.1-5, 2010.

[8] Benjamin Schuster-Bockler and Alex Bateman, “Protein interactions in human genetic diseases”, Genome Biology, Vol.9, Issue 1, Article R9, pp.R9.1-R9.12, 2008.

[9] Chein-Hung Huang, Huai Shun Peng and KA-Lok Ng, “Prediction of Cancer Proteins by Integrating Protein Interaction, Domain frequency and Domain Interaction Data using Machine Learning Algorithms”, BioMed Research International, Vol.2015,pp.1-10, 2015.

[10] Lei Yang, Xudong Zhao and Xianglong Tang, “Predicting Disease-Related Proteins Based on Clique Backbone in Protein-Protein Interaction Network”, International Journal of Biological Sciences, Vol.10, Issue.7, pp.677-688, 2014.

[11] Pankaj Bhanbri, O.P. Gupta, “Phylogenetic Tree Construction for Distance based Methods”, International Journal of Scientific Research in Computer Science and Engineering, Vol.5, Issue.3, pp.142-149, 2017.

[12] Sumedha S.Gunawardena, “Optimum-time, Optimum-space, Algorithms for k-mer Analysis of Whole Genome Sequences”, Journal of Bioinformatics and Comparative Genomics, Vol.1, pp.1-12, 2014.

[13] Teuvo Kohonen and Panu Somervuo, “Self-organizing maps of symbol strings”, Elsevier, Neurocomputing 21, pp.19-30, 1998.

[14] Marghny Mohamed, Abeer A. Al-Mehdhar, Mohamed Bamatraf and Moheb R.Girgis, “Enhanced Self-Organizing Map Neural Network for DNA Sequence Classification”, Intelligent Information Management, Vol.5, pp.25-33, 2013.

[15] Dr.S.A.M.Rizvi and Pankaj Agarwal, “A New Bucket-Based Algorithm for Finding LCS from two given Molecular Sequences”, IEEE, Third International Conference on Information Technology: New Generations, 2006.

[16] Xuyu Xiang, Dafang Zhang and Jiaohua Qin, “A New Algorithm for the Longest Common Subsequence Problem”, IEEE, International Conference on Computational Intelligence and Security Workshops, pp.112-115, 2007.

[17] Coasts S. Iliopoulos and M. Sohel Rahman, “Algorithms for Computing Variants of the Longest Common Subsequence Problem”, Elsevier – Theoretical Computer Science, pp.255-267, 2008.

Downloads

Published

2020-01-31
CITATION
DOI: 10.26438/ijcse/v8i1.17
Published: 2020-01-31

How to Cite

[1]
G. Tamilpavai and C. Vishnuppriya, “Detection of Longest Common Sub Sequence in Normal DNA and Dengue Virus Affected Human DNA using Self Organizing Map”, Int. J. Comp. Sci. Eng., vol. 8, no. 1, pp. 1–8, Jan. 2020.

Issue

Section

Research Article