Matrix Method for Distinction between Text and Non-Text Images
Keywords:
text recognition, distance transform, classifierAbstract
Recognition of text and non-text images is a major challenge in the field of computer vision so as to efficiently extract the text from that image. The algorithm used for the extraction of the text from the images would have a higher efficiency if it is known beforehand that the image is a text image or a non-text image. However, there are many images such as old manuscripts where the extraction of the text becomes very difficult. In that case, the algorithm for the distinction between the text and non-text becomes very easy for text detection and have high accuracy and fast in detecting the text from the image. This method can also be applied to detect and extract the text from the signboards also. In our approach, we had built a system that takes any sort of image as an input. After the input of the image, it is then processed and converted into a binary image. Distance transform method is then applied and the measure of the distance between the various points in the image are then calculated. From the calculated points, duplicate points are merged into one point and are sorted in ascending order. The total area of the binary image is then calculated and also the image corresponding to each of the distance transform points are then calculated. The total area of the binary image is then divided by each of the area value of the corresponding distance transform points are the value extracted is known as the feature values. After getting all the feature values the whole value is then divided into small intervals and is then processed through the classifier. The accuracy of the classifier is then calculated and evaluated for the distinction between text and non-text images. This method is a very simple and accurate method for the distinction between the text and the non-text images and also helps in the extraction of the text from the image. Experiment have been done with simple text and non-text image dataset and the efficiency of the proposed method is then demonstrated.
References
[1] Najwa Maria Chidiac, Pascal Damein and Charles Yacoub, “A robust algorithm for text extraction from images”, 39th International conference on Telecommunication and Signal Processing, 2016.
[2] Radhika Patel and Suman K Mitra, “Extracting text from degraded documents”, 5th National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, 2015.
[3] R. Malik and SeongAh chin, “Extraction of text in images”, Proceedings of International Conference on Information Intelligence and Systems, 1999.
[4] Sezer Karaoglu, Ran Tao, Theo Gevers and Arnold W. M. Smeulders, “Words matter: Scene Text for Image Clssification and Retrieval”, IEEE transactions on multimedia, vol. 19, no. 5, may 2017.
[5] Chengquan Zhang, Cong Yao, Baoguang Shi and Xiang Bai, “Automatic discrimination of text and non-text natural images”, 13th International Conference on Document Analysis and Recognition, 2015.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
