A Review of Clustering Methods forming Non-Convex clusters with, Missing and Noisy Data
Keywords:
Clustering, convex, non-convex, missing values, Big Data, noisy data, data mining, density basedAbstract
Clustering problem is among the foremost quests in Machine Learning Paradigm. The Big Data sets, being versatile, multisourced & multivariate, could have noise, missing values, & may form clusters with arbitrary shape. Because of unpredictable nature of Big Data Sets, the clustering method should be able to handle missing values, noise, & should be able to make arbitrary shaped clusters. The partition based methods for clustering does not form non-convex clusters, The Hierarchical Clustering Methods & Algorithms are able to make arbitrary shaped clusters but they are not suitable for large data set due to time & computational complexity. Density & Grid Paradigm do not solve the issue related to missing values. Combining different Clustering Methods could eradicate the mutual issues they have pertaining to dataset’s geometrical and spatial properties, like missing data, non-convex shapes, noise etc.
References
Cisco, V. N. I. "The Zettabyte Era: Trends and Analysis." Updated :( Jun 23, 2015), http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/VNI_Hyperconnectivity_WP.pdf ; Document ID :1458684187584791 Accessed :Jan 2016
Najlaa, Zahir, Abdullah, Ibrahim, Albert, Sebti, Bouras Fahad, "A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis," IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, 2014.
Leiserson, Rivest, Stein Cormen, Introduction to Algorithms, 3rd ed. ISBN 978-0262033848: Page 43-97, MIT Press & TMH, 2009.
J.B.Macqueen, "Some Methods for classification and Analysis of Multivariate Observations," in 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, Berkeley, 1967, pp. 281-297.
Boomija, "Comparison of Partition Based Clustering Algorithms," Journal of Computer Applications, vol. 1, no. 4, p. 18, Oct-Dec 2008.
A.K Jain and H.C. Martin, "Law, Data clustering: a user’s dilemma," in In Proceedings of the First international conference on Pattern Recognition and Machine Intelligence, 2005.
A.K.Jain, "Data clustering: 50 years beyond K-means," Pattern Recognition Letters, vol. 31, no. 8, pp. 651-666, June 2010.
Vipin Kumar, Pang-Ning Tan, and Michael Steinbach, Introduction to data mining.: Addison-Wesley, 2005. ISBN : 9780321321367
Joulin, Bach Hocking, "Clusterpath An Algorithm for Clustering using Convex Fusion Penalties," in 28th International Conference on Machine Learning , Bellevue, WA, USA, 2011.
Martin, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu Ester, "A density-based algorithm for discovering clusters in large spatial databases with noise," in In Kdd, vol. 96, no. 34, 1996, pp. 226-231.
Amineh, W. Ying Amini, "DENGRIS-Stream: A density-grid based clustering algorithm for evolving data streams over sliding window," in International Conference on Data Mining and Computer Engineering, 2012, pp. 206-210.
Ulrike Von Luxburg, "A tutorial on spectral clustering," Statistics and computing, vol. 17, no. 4, pp. 395-416, 2007.
Pabitra Mitra, Sankar K. Pal, and Aleemuddin Siddiqi, "Non-convex clustering using expectation maximization algorithm with rough set initialization," Pattern Recognition Letters, vol. 24, no. 6, pp. 863-873, 2003.
Saline S Singh & N C Chauhan, "K-means vs K-Medoid: A Comparative Study," in National Conference on Recent Trends in Engineering & Technology, (NCRTET) BVM College, Gujarat, India, 2011.
pafnuty.blog, By Aman Ahuja, Updated: (2013, Aug) https://pafnuty.wordpress.com/2013/08/14/non-convex-sets-with-k-means-and-hierarchical-clustering/ Accessed :Jan 2016
R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Chourasia, Richa, and Preeti Choudhary. "An approach for web log preprocessing and evidence preservation for web mining." (2014): 210-215.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
