Pragmatic Aspects of Token-based Technique in Detecting Source Code Duplicates
Keywords:
Code Clone Detection, Clone Detection Techniques, Token-based Clone Detection TechniqueAbstract
Clone research community has described several techniques to detect code duplicates present in the code base, mainly categorized into four classes viz. textual or text-based techniques, lexical or token-based techniques, syntactic techniques (including tree-based and metrics-based approaches) and semantic techniques. Literature lists various clone detector tools based on each category capable of detecting clones in batch mode as well as in real-time development environment. But, most of the tools use tokens as their intermediate representation of the source code upon which clone detection algorithms are applied. Thus, this paper will focus on this token-based intermediate representation and its pragmatic aspects towards code duplication detection. By discussing the practical process of converting source code into tokens as an intermediate code representation and how code duplicates are detected, authors will put light on the obscured pros and cons of this token-based approach that will help researchers to select as well as implement, or reject this approach as an intermediate representation for their duplication detection algorithms.
References
Ira D. Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant' Anna, and Lorraine Bier, "Clone Detection Using Abstract Syntax Tree," in Proceedings of 14th International Conference on Software Maintenance(ICSM'98), Bethesda, Mayland, 1998, pp. 368 - 377.
Stefan Bellon, Rainer Koschke, Giuliano Antoniol, Jens Krinke, and Ettore Merlo, "Comparision and Evaluation of Clone Detection Tools," IEEE Transaction on Software Engineering, vol. 33, no. 9, pp. 577 - 591, 2007.
Chanchal K. Roy and James R. Cordy, "A Survey on Software Clone Detection Research," Queen's University, Kingston, Technical Report 2007-541, 2007.
Miryung Kim, Lawrence Bergman, Tessa Lau, and David Notkin, "An Ethnographic Study of Copy and Paste Programming Practices in OOPL," in Proceedings of the 2004 International Symposium on Empirical Software Engineering (ISESE’04), Redondo Beach, CA, USA, USA, 2004.
Minhaz F. Zibran, Ripon K. Saha, Muhammad Asaduzzaman, and Chanchal K. Roy, "Analysing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study," in Proceedings of the 16th IEEE International Conference on Engineering of Complex Computer Systems, Las Vegas, USA, 2011, pp. 295-304.
M. F. Zibran and Chanchal Kumar Roy, "The Road to Software Clone Management: A Survey," Department of Computer Science, University of Saskatchewan, Canada, Technical Report 2012.
Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue, "CCFinder: A Multilinguistic Token-Based Code Clone Detection System For Large Scale Source Code," IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 654-670, July 2002.
Brenda Baker, "On Finding Duplication and Near Duplication in Large Software Systems," in Proceedings of the 2nd Working Conference on Reverse Engineering (WCRE'95), 1995, pp. 86 - 95.
Zhenmin Li, Shan mar, Yuanyuan ZohuLu, and Suvda Myag, "CP-Miner: Finding Copy Paste and Related Bugs in Large Scale Software Code," IEEE Transaction on Software Engineering, vol. 32, no. 3, pp. 176 - 192, March 2006.
Wikipedia.[Online]. https://en.wikipedia.org/wiki/Lexical_analysis
Alfred V. Aho, Monica S. Lam, and Jeffrey D. Ullman Ravi Sethi, Compilers: Principles, Techniques, and Tools, 2nd ed.: Pearson.
Raimer Falke, Pierre Frenzel, and Rainer Koschke, "Empirical Evaluation of Clone Detection using Syntax Suffix Trees," Empirical Software Engineering, vol. 13, no. 6, pp. 601 - 643, July 2008.
Elizabeth Burd and John Bailey, "Evaluating Clone Detection Tools for Use during Preventative Maintenance," in Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation (SCAM '02), Montreal, Canada, 2002, pp. 36-43.
M. Rieger, "Effective Clone Detection without Language Barriers," University of Bern, Switzerland, Dissertation 2005.
Chanchal Kumar Roy, James Cordy, and Rainer Koschke, "Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Quantitative Approach," Science of Computer Programming, vol. 74, no. 7, pp. 470 - 495, March 2009.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
