Clustering Incomplete Mixed Datasets by using Extended Squeezer Algorithm and Finding Incomplete Set Mixed Dissimilarity (ISMD)
DOI:
https://doi.org/10.26438/ijcse/v6i9.432437Keywords:
Incomplete set mixed dissimilarity, k-prototype, extended squeezer algorithm, Python programmingAbstract
Clustering mixed datasets is one of the challenging task. Traditional algorithms like k-prototype algorithm is used for mixed dataset, but is limited to only complete datasets. In any dataset missing values are common. To handle such missing values or incomplete mixed datasets we use extended squeezer algorithm, which includes the new dissimilarity measure ISMD that is incomplete set mixed dissimilarity for numerical and categorical attribute values. In this method we consider dissimilarities in the missing values and in this extended squeezer algorithm it not only cluster the incomplete dataset, it also need not to input the missing values and need not to initialize any clusters at the beginning. This method is compared with traditional k-prototype algorithm on benchmark datasets. The experimental results shows that the ISMD using extended squeezer algorithm gives better accuracy than the traditional k-prototype algorithm and also it overcomes the limitation of initial clusters. This method is implemented by using Python programming. The results shows that there is significant improvement in the clustering results.
References
[1] M.V.Jagannatha Reddy and Dr. B. Kavitha, “clustering mixed numerical and categorical dataset using similarity weight and filter method”, International journal of Database Theory and Applications, vol-5, no-1 March- (2012), pp-121-134
[2] H. Zhexue, “Extension to the K-means algorithm for clustering large data sets with categorical values”, Data Mining and Knowledge Discovery, (1998), pp. 283-304.
[3] T. Covões and E. Hruschka, “A study of K-Means-based algorithms for constrained clustering”, Intelligent Data Analysis, vol. 17, no. 3, (2013), pp. 485-505.
[4] H. Zhexue, “Clustering large data sets with mixed numeric and categorical values”, Proceedings of the 1th pacific-Asia Conference on Knowledge Discovery & Data Mining. Singapore: World Scientific, (1997), pp. 21-34.
[5] W. Qian, W. Cheng and F. Zhenyuan, “Summary of k-means clustering algorithm”, Electronic Design Engineering, vol. 20, no. 7, (2012), pp. 21-24.
[6] C. Dan and W. Zhenhua, “A K-prototypes Algorithm Based on Improved Initial Center Points”, Computer Knowledge and Technology, (2010) November.
[7] C. Sotirios, “A fuzzy c-means-type algorithm for clustering of deal with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional”, Expert Systems with Applications, vol. 38, no. 7, (2011), pp. 8684-8689.
[8] W. Fengmei and H. Lixia, “A Missing Data Imputation Method Based on Neighbor Rules”, Computer Engineering, vol. 38, no. 21, (2012).
[9] X. Fang and Z. Guizhu, “Clustering algorithm based on Modified Shuffled Frog Leaping Algorithm and K-means”, Computer Engineering and Applications, vol. 49, no. 1, (2013), pp. 176-180.
[10] Takashi Furukawa, Shin-ichi Ohnishi, and Takahiro Yamanoi “On a Fuzzy c-means Algorithm for Mixed Incomplete Data Using Partial Distance and Imputation” Proceedings of the International MultiConference of Engineers and Computer Scientists 2014 Vol I, IMECS 2014, March 12 - 14, 2014, Hong Kong.
[11] Vaishali H. Umathe, Prof. Gauri Chaudhary. “A Review on Incomplete Data And Clustering” (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (2) , 2015, pp 1225-1227
[12] J. Twisk, M. de Boer, W. de Vente and M. Heymans, “Multiple imputation of missing values was not necessary before performing a longitudinal mixed-model analysis”, Journal of Clinical Epidemiology, vol. 66, no. 9, (2013), pp. 1022-1028.
[13] Wu Sen, Chen Hong and Feng Xiaodong “Clustering algorithm for incomplete data sets with mixed numeric and categorical Attributes” IJDTA, vol. 6 No. 5 2013, pp 95-104.
[14] W. Guoyin, “Expansion in the theory of rough set in incomplete information system”, Journal of computer research and development, vol. 33, no. 10, (2002), pp. 1239-1240.
[15] M..V.Jagannatha Reddy, Dr.B.Kavitha “Clustering Incomplete Mixed Numerical and Categorical Datasets using Modified Squeezer Algorithm International Journal of Computer Science and Engineering, E- ISSN:2347-2693, Vol-4, issue-5 pp-36-41 may-16
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
