An Exploratory Review on Text Detection and Recognition in Images and Videos Using Machine Learning and Deep Learning Techniques
DOI:
https://doi.org/10.26438/ijcse/v13i2.8697Keywords:
Text Detection, Optical Character Recognition (OCR),, Deep Learning, Machine Learning, Text Recognition, Image Processing, Video Analysis, Feature Extraction, Computer Vision and Natural Language Processing (NLPAbstract
The increasing reliance on images and videos as sources of information has led to a growing demand for automated text detection and recognition systems. This literature review explores the current advancements in text extraction methodologies, focusing on machine learning and deep learning techniques. Various approaches, including text detection, localization, recognition, and tracking, are discussed alongside the challenges posed by environmental conditions, text alignment, font variations, and background noise. The study highlights applications in license plate recognition, industrial automation, vehicle tracking, and self-navigating automobiles, where text extraction plays a crucial role. Furthermore, a comparative analysis of existing machine learning-based and deep learning-based models is conducted, evaluating their effectiveness in different scenarios. This examination also discusses evaluation metrics used to validate model performance and identifies the computational challenges associated with processing high-resolution images and videos in real-time. The findings emphasize the need for robust mathematical models and optimization techniques to improve the efficiency and accuracy of text recognition system
References
[1] U. Pal, A. Halder, P. Shivakumara, and M. Blumenstein, "A Comprehensive Review on Text Detection and Recognition in Scene Images," Artificial Intelligence Advances, Vol.4, No.2, pp.1–20, 2024.
[2] Y. Zhang et al., "Text Detection and Recognition Based on a Lensless Imaging System," arXiv preprint arXiv: Vol.22, Issue.10, pp.4244, 2022. https://arxiv.org/abs/2210.04244.
[3] Z. Cheng, J. Lu, Y. Niu, S. Pu, F. Wu, and S. Zhou, "You Only Recognize Once: Towards Fast Video Text Spotting," arXiv preprint arXiv:1903.03299, 2019. https://arxiv.org/abs/1903.03299.
[4] Adem Akdo?an, Murat Kurt,"ExTTNet: A Deep Learning Algorithm for Extracting Table Texts from Invoice Images",Computer Vision and Pattern Recognition, 2025, https://doi.org/10.48550/arXiv.2402.02246
[5] Hansi Seitaj and Vinayak Elangovan, "Information Extraction from Product Labels: A Machine Vision Approach", International Journal of Artificial Intelligence and Applications (IJAIA), March Vol.15, No.2, 2024.
[6] Houze Liu, Iris Li, Yaxin Liang, Dan Sun, YiningYang, Haowei Yang,"Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network," 2024 IEEE 2nd International Conference on Image Processing and Computer Applications (ICIPCA), Shenyang, China, pp.810-816, 2024. doi: 10.1109/ICIPCA61593.2024.10709168.
[7] Alexander Rombach, & Peter Fettke, “Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review”, 10.48550/arXiv.2408.06345, 2024.
[8] Tien Do, Thuyen Tran Doan, Khiem Le, Thua Nguyen, Duy-Dinh Le, and Thanh Duc Ngo, "Key Information Extraction and Recognition from Rich Text Images", Vietnam Journal of Computer Science, Vol.11, No.4, pp.569-594, 2024. https://doi.org/10.1142/S2196888824500131.
[9] Noura A. Semary, Wesam Ahmed, Khalid Amin, Pawe? P?awiak,Mohamed Hammad, “Enhancing machine learning-based sentiment analysis through feature extraction techniques”, PLoS ONE, Vol.19, Issue.2, pp.e0294968, 2024. https://doi.org/10.1371/journal.pone.0294968.
[10] Ivan Malashin, Igor Masich, Vadim Tynchenko, Andrei Gantimurov , Vladimir Nelyub and Aleksei Borodulin, "Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports", Machine Learning and Knowledge Extraction., 6, pp.1361–1377, 2024. https://doi.org/10.3390/make6020064.
[11] Guangyun Lu, Zhiping Ni, Ling Wei, Junwei Cheng, Wei Huang, “Graphic association learning: Multimodal feature extraction and fusion of image and text using artificial intelligence techniques”, Heliyon, Vol.10, Issue.18, pp.e37167, 2024. ISSN 2405-8440, https://doi.org/10.1016/j.heliyon.2024.e37167.
[12] Robert West, Francesca Bonin, James Thomas, Alison J. Wright, Pol Mac Aonghusa , Martin Gleize, Yufang Hou, Alison O`Mara-Eves ,Janna Hastings, Marie Johnston, Susan Michie, “Using machine learning to extract information and predict outcomes from reports of randomised trials of smoking cessation interventions in the Human Behaviour”, Wellcome Open Res, 8:452, 2024. https://doi.org/10.12688/wellcomeopenres.20000.2
[13] M Mahfi Nurandi Karsana, Kemas Muslim L, Widi Astuti. Single-Label and Multi-Label Text Classification using ANN and Comparison with Naïve Bayes and SVM. JURNAL MEDIA INFORMATIKA BUDIDARMA. 7. 857. 10.30865/mib.v7i2.6024, 2023.
[14] Gagandeep Kaur, and Amit Sharma, “A deep learning-based model using hybrid feature extraction approach for consumer sentiment analysis.”, J Big Data 10, 5, 2023. https://doi.org/10.1186/s40537-022-00680-6.
[15] Kitti Szabó Nagy , and Jozef Kapusta, “A Novel Method for Feature Extraction from Unstructured Texts”, Applied Science, 13, 6438, 2023. https://doi.org/10.3390/app13116438.
[16] Xiujuan Wang, Xuerong Li, “Deep Learning in Chinese Text Information Extraction Model for Coastal Biodiversity”, International Journal on Semantic Web and Information Systems (IJSWIS), Vol.19, Issue.1, pp.1-15, 2023. https://doi.org/10.4018/IJSWIS.331756.
[17] Sunil Kumar Dasari, Shilpa Mehta, "Text detection and recognition through deep learning-based fusion neural network", IAES International Journal of Artificial Intelligence (IJ-AI), Vol.12, No.3, pp.1396-1406, 2023. ISSN: 2252-8938, DOI: 10.11591/ijai.v12.i3.pp1396-1406.
[18] An Cong Tran, Lai Thi Ho, and Hai Thanh Nguyen,"Information Extraction from Invoices by using a Graph Convolutional Neural Network: A Case Study of Vietnamese Stores", IEIE Transactions on Smart Processing and Computing, vol.11, no.5, 2022. https://doi.org/10.5573/IEIESPC.2022.11.5.316.
[19] Qing Kuang, “Face Image Feature Extraction based on Deep Learning Algorithm”, Journal of Physics: Conference Series, 1852. 032040, 2021. doi://10.1088/1742-6596/1852/3/032040.
[20] Lokkondra, C.Y., Ramegowda, D., Thimmaiah, G.M., Vijaya, A.P.B., Shivananjappa, M.H., “ETDR: An exploratory view of text detection and recognition in images and video”, Revue d`Intelligence Artificielle, Vol.35, No.5, pp.383-393, 2021. https://doi.org/10.18280/ria.350504.
[21] Nadeesha Perera, Matthias Dehmer and Frank Emmert-Streib(2020), "Named Entity Recognition and Relation Detection for Biomedical Information Extraction", Frontiers in Cell and Developmental Biology, 8:673. doi: 10.3389/fcell.2020.00673.
[22] Philomina Simon, Uma V, "Deep Learning based Feature Extraction for Texture Classification", Procedia Computer Science, Vol.171, pp.1680-1687, 2020. ISSN 1877-0509, https://doi.org/10.1016/j.procs.2020.04.180.
[23] Vaibhav Goel, Vaibhav Kumar, Amandeep Singh Jaggi, Preeti Nagrath, "Text Extraction from Natural Scene Images using OpenCV and CNN", I.J. Information Technology and Computer Science, 9, pp.48-54, 2019. DOI: 10.5815/ijitcs.2019.09.06.
[24] Ruishuang Wang, Zhao Li, Jian Cao, and Tong Chen. “Chinese Text Feature Extraction and Classification Based on Deep Learning”, In Proceedings of the 3rd International Conference on Computer Science and Application Engineering (CSAE `19), Association for Computing Machinery, New York, NY, USA, Article 89, pp.1–5, 2019. https://doi.org/10.1145/3331453.3361636
[25] Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao,"Detecting Text in Natural Image with Connectionist Text Proposal Network", European Conference on Computer Vision, 9912. pp.56-72, 2016. 10.1007/978-3-319-46484-8_4.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
