Rule based POS Tagger for Sanskrit
DOI:
https://doi.org/10.61841/xkrtn624Keywords:
Annotated Corpora, Tokenization, Morphological Analysis.Abstract
POS tagging is a process of attaching each word in a sentence with a suitable tag from the given set of tags. In the paper, a rule-based view of NLP is taken up for tagging the part of speech for Sanskrit words. The foundation for POS tagging is morphological analysis. The twelfth chapter of the Bhagavad Gita is considered as input for the POS tagging process. Annotated corpora will be developed and used for retrieving the grammatical category of the input text. Sanskrit is a language with very concrete grammar proposed by Panini (4000 B.C.) and has layered grammatical structure. Thus, a rule-based approach would fulfill the tagging process rather than a stochastic or probabilistic approach (existing system).Therefore, the project aims to improve the accuracy by utilizing efficient lookup strategies, searching and sorting techniques, and finally rule formations (utilizing the richness of Sanskrit grammar) to quickly narrow down the assignment of grammatical categories to words. The major challenge is the tokenization process of joined words. Since Sanskrit has many inflected noun and verb forms, identifying the correct grammatical category involves contextual meaning and semantics to be taken into view. Also, semantic analysis, derivative analysis, and Sandhi analysis are done.
Downloads
References
[1] A.J.P.M.P. Jayaweera, N.G.J. Dias, “Hidden Markov Model Based Part Of Speech Tagger For Sinhala
Language”, International Journal on Natural Language Computing (IJNLC) Vol. 3, No. 3, June 2014.
[2] R Muni Prashanthi, M. Sirish Kumar, R.J. Rama Sree, "POS Tagger For Sanskrit,” International journal of
Engineering Sciences Research, Vol. 04, (2013), ISSN: 2230-8504; e-ISSN: 2230-8512.
[3] Pallavi Bagul, Archana Mishra, Prachi Mahajan, Medinee Kulkararni, and Gauri Dhopavkar, "Rule-Based POS
Tagger for Marathi Text”, (IJCSIT) International Journal of Computer Science and Information Technologies,
Vol. 5 (2) , 2014, 1322-1326.
[4] Kanak Mohnat, Neha Bhansa, Shashi Pal Singh, and Ajai Kumar, “Hybrid approach for Part of Speech Tagger for
Hindi”, International Journal of Computer Technology and Electronics Engineering (IJCTEE) Volume 4, Issue
1.
[5] http://chandanasamskritam.blogspot.in
[6] http://sanskrit.jnu.ac.in/post/post.jsp
[7] http://nlp.stanford.edu/software/.
[8] J.P. Gupta, Devendra K. Tayal , Arti Gupta, “A TENGRAM method-based POS tagging of multi-category
words in Hindi language “, Expert Systems with Applications, Volume 38 (2011), pages 15084–15093.
[9] Kh Raju Singha, Bipul Syam Purkayastha , Kh Dhiren Singha, ”Part of Speech Tagging in Manipuri with
Hidden Markov Model”, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 6, No. 2.
November 2012, ISSN (online): 1694-0814.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Author
This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.