Grammar And Context Based Approach For Identification And Translation Of Proverbs Using Trie-Based Ontology
DOI:
https://doi.org/10.26438/ijcse/v6si10.6063Keywords:
Machine translation, Proverbs, Idioms, English, Regional Languages, grammar-based approach, classification, context-based approach, trie-based ontologyAbstract
Most current machine translation systems for translation from English to regional Indian languages ignore the presence of idioms in the text or return the exact literal meaning of the phrase in another language which loses the essence of the proverb. The main issues that arise include timely detection of the proverbs from a given paragraph and the separate processing required for translations of idioms into other languages. This paper presents a combination of natural language grammar-based approach and context-based approach towards detection of idioms in given English text and further presents a trie-based ontology that can be used to translate proverbs into regional languages. The grammar-based approach involves parsing English sentences and identifying the parts-of-speech tags and determining statistically the probability whether the given sentence is a proverb using certain grammar-based rules applicable for only proverbs. The context-based approach classifies and compares keywords in the proverbs and the keywords present in remaining part of the paragraph. Based on the combination of these two approaches, the proverb can be determined with better accuracy. For quick translation of detected proverbs into regional languages, keyword based priority search can be implemented on previously developed trie-based ontology using parts-of-speech tags
References
[1] D. Pisharoty, P. Sidhaye, H. Utpat, S. Wandkar, R. Sugandhi, “Extending capabilities of english to marathi machine translator”, IJCSI International Journal of Computer Science Issues, Vol.9, Issue No.3, May 2012. ISSN (Online): 1694-0814.
[2] M. Sharma, V. Goyal, “Extracting proverbs in machine translation from hindi to punjabi using regional data approach”, International Journal of Computer Science and Communication, Vol. 2, No. 2, pp. 611-613, July-December 2011.
[3] V. K. Birla, M. N. Ahmed, V. N. Shukla, “Multiword expression extraction – text processing”, In the Proceedings of ASCNT – (2009), CDAC, Noida, India pp. 72-77, 2009.
[4] V. Goyal and Priyanka, “Implementation of rule based algorithm for sandhi-vicheda of compound hindi words”, International Journal of Computer Science Issues, No. 3, pp. 45-49, 2009.
[5] K. Toutanova, C. D. Manning, “Enriching the knowledge sources used in a maximum entropy part-of-speech tagger”, In the Proceedings of the 2000 joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63-70, 2000.
[6] K. Toutanova, D. Klein, C. Manning, Y. Singer, “Feature-rich part-of-speech tagging with a cyclic dependency network”, In the Proceedings of 2003 HLT-NAACL, pp.252-259, 2003.
[7] R. Balyan, S. K. Naskar, A. Toral, N. Chatterjee, “A diagnostic evaluation approach targeting MT systems for indian languages”, In Proceedings of the 2012 Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012), pp. 61-72, COLING 2012, Mumbai, December 2012.
[8] L. R. Nair, P. S. David, “Machine translation systems for indian languages”, International Journal of Computer Applications, Vol. 39, No. 1, February 2012. ISSN: 0975-8887.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
