Decision Models for Record Linkage Using OCCT-One Class Clustering Tree

Authors

  • D Angelin Ponrani Dept. of CSE, Kongunadu College Of Engineering &Technology, Trichy, Tamilnadu, India

Keywords:

Linkage, Clustering, Splitting, Decision Tree

Abstract

Record linkage is traditionally performed among the entities of same type. It can be done based on entities that may or may not share a common identifier. In this paper we propose a new linkage method that performs linkage between matching entities of different data types as well. The proposed technique is based on one-class clustering tree that characterizes the entities which are to be linked. The tree is built in such a way that it is easy to understand and can be transformed into association rules. The inner nodes of the tree consist of features of the first set of entities. The leaves of the tree represent features of the second set that are matching. The data is split using two splitting criteria. Also two pruning methods are used for creating one-class clustering tree. The proposed system results better in performance of precision and recall.

References

M.Dror, A.Shabtai, L.Rokach, Y. Elovici, “OCCT: A One- Class Clustering Tree for Implementing One-to- Many Data Linkage,” IEEE Trans. on Knowledge and Data Engineering, TKDE-2011-09-0577, 2013.

M.Yakout, A.K.Elmagarmid, H.Elmeleegy, M.Quzzani and A.Qi, “Behavior Based Record Linkage,” in Proc. of the VLDB Endowment, vol. 3, no 1-2, pp. 439-448, 2010.

A.J.Storkey, C.K.I.Williams, E.Taylorand R.G.Mann, “An Expectation Maximisation Algorithm for One-to-Many Record Linkage,” University of Edinburgh Informatics Research Report, 2005.

S.Ivie, G.Henry, H.Gatrell and C.Giraud-Carrier, “AMetric Based Machine Learning Approach to Genea- Logical Record Linkage,” in Proc. of the 7th Annual Workshop on Technology for Family History and Genealogical Research, 2007.

P.Christen and K.Goiser, “Towards Automated Data Linkage and Deduplication,” Australian National University, Technical Report, 2005.

P.Langley, Elements of Machine Learning, San Franc-Isco, Morgan Kaufmann, 1996.

S.Guha, R.Rastogi and K.Shim, “Rock: A Robust Clustering Algorithm for Categorical Attributes,” Information Systems, vol. 25, no. 5, pp. 345-366, July 2000.

D.D.Dorfmann and E.Alf, “Maximum-Likelihood EstiMation of Parameters of Signal-Detection Theory and Determination of Confidence Intervals-Rating- Method Data,” Journal of Math Psychology, vol. 6, no. 3, pp. 487-496, 1969

A.Gershman et al., “A Decision Tree Based ecommender System,” in Proc. the 10th Int. Conf. on Innovative Internet Community Services, pp. 170-179, 2010.

J.R.Quinlan, “Induction of Decision Trees,” MachineLearning, vol. 1, no. 1, pp. 81-106, March 1986.

Downloads

Published

2014-12-06

How to Cite

[1]
D. Angelin Ponrani, “Decision Models for Record Linkage Using OCCT-One Class Clustering Tree”, Int. J. Comp. Sci. Eng., vol. 2, no. 11, pp. 27–31, Dec. 2014.

Issue

Section

Research Article