Characteristic mining of Mathematical Formulas from Document - A Comparative Study on Sequence Matcher and Levenshtein Distance procedure

Authors

  • Appa Rao G Department of CSE, GIT, GITAM,VISAKHAPATNAM,INDIA
  • Srinivas G Department of IT, ANITS, VISAKHAPATNAM, INDIA
  • Venkata Rao K Department of CSSE, AndhraUniversity,VISAKHAPATNAM,INDIA
  • Prasad Reddy PVGD Department of CSSE, AndhraUniversity,VISAKHAPATNAM,INDIA

DOI:

https://doi.org/10.26438/ijcse/v6i4.400404

Keywords:

Levenshtein distance, Sequence matcher

Abstract

The key predicament in the present circumstances is how to categorize the mathematically related keywords from a given text file and store them in one math text file. As the math text file contains only the keywords which are related to mathematics. The math dataset is a collection of huge amount of tested documents and stored in math text file. The dataset is trained with giant amount of text files and the size of dataset increases, training with various text samples. Finally the dataset contains only math-related keywords. The proposed approaches evaluated on the text containing individual formulas and repeated formulas. The two approaches proposed are one is Sequence matcher and another one is Levenshtein Distance, both are used for checking string similarity. The performance of the repossession is premeditated based on dataset of repetitive formulas and formulas appearing once and the time taken for reclamation is also measured.

References

Kai Ma, Siu Cheung Hui and Kuiyu Chang “Feature Extraction and Clustering-based Retrieval for Mathematical Formulas”, pp. 372-377.

Sidath Harshanath Samarasinghe and Siu Cheung Hui “Mathematical Document Retrieval for Problem Solving”, International Conference on Computer Engineering and Technology, pp.583-587,2009.

J. Misutka and L. Galambos, “Mathematical Extension of Full Text Search Engine Indexer”, Proc. 3rd International Conference on Information and Communication Technologies: From Theory to Applications (ICTTA 08), , pp. 1-6,April 2008.

B.R. Miller and A. Youssef, “Technical Aspects of the Digital Library of Mathematical Functions”, in Annals of Mathematics and Artificial Intelligence, Springer Netherlands, pp. 121-136, 2003.

H. Zhang, T.B. and M.S. Lin, “An Evolutionary Kmeans Algorithm for Clustering Time Series Data” ,Proc. International Conference on Machine Learning and Cybernetics, pp. 1282-1287, 2004.

R. Munavalli and M.R. MathFind, “A Math-aware Search Engine”, Proc. Annual International ACM SIGIR Conference on Research and development in information retrieval, pp.735-735, 2006.

M. Kohlhase. “Markup for Mathematical Knowledge,” An Open Markup format for Mathematical Documents”, Ver. 1.2, Lecture Notes in Computer Science, , Springer Berlin, pp. 13-23.

G.AppaRao,K.Venkata Rao,PVGD Prasad Reddy and T.Lava Kumar,“An Efficient Procedure for Characteristic mining of Mathematical Formulas from Document”, International Journal of Engineering Science and Technology (IJEST), Vol. 10 No.03,pp152-157, Mar 2018

Downloads

Published

2025-11-12
CITATION
DOI: 10.26438/ijcse/v6i4.400404
Published: 2025-11-12

How to Cite

[1]
G. Appa Rao, G. Srinivas, K. Venkata Rao, and P. Prasad Reddy, “Characteristic mining of Mathematical Formulas from Document - A Comparative Study on Sequence Matcher and Levenshtein Distance procedure”, Int. J. Comp. Sci. Eng., vol. 6, no. 4, pp. 400–404, Nov. 2025.

Issue

Section

Research Article