Rule based POS Tagger for Sanskrit

Authors

  • Sharada Adinarayanan B.Tech Computer Science and Engineering, School of Computing, SASTRA Deemed University, Thanjavur, Tamil Nadu, India. Author
  • J. Naren Assistant Professor, School of Computing, SASTRA Deemed University, Thanjavur, Tamil Nadu, India. Author
  • P. Sriranjanie .Tech Computer Science and Engineering, School of Computing, SASTRA Deemed University, Thanjavur, Tamil Nadu, India Author
  • Dr.G. Vithya Professor, School of Computing, KL University, Vijayawada, AP, India. Author

DOI:

https://doi.org/10.61841/xkrtn624

Keywords:

Annotated Corpora, Tokenization, Morphological Analysis.

Abstract

POS tagging is a process of attaching each word in a sentence with a suitable tag from the given set of tags. In the paper, a rule-based view of NLP is taken up for tagging the part of speech for Sanskrit words. The foundation for POS tagging is morphological analysis. The twelfth chapter of the Bhagavad Gita is considered as input for the POS tagging process. Annotated corpora will be developed and used for retrieving the grammatical category of the input text. Sanskrit is a language with very concrete grammar proposed by Panini (4000 B.C.) and has layered grammatical structure. Thus, a rule-based approach would fulfill the tagging process rather than a stochastic or probabilistic approach (existing system).Therefore, the project aims to improve the accuracy by utilizing efficient lookup strategies, searching and sorting techniques, and finally rule formations (utilizing the richness of Sanskrit grammar) to quickly narrow down the assignment of grammatical categories to words. The major challenge is the tokenization process of joined words. Since Sanskrit has many inflected noun and verb forms, identifying the correct grammatical category involves contextual meaning and semantics to be taken into view. Also, semantic analysis, derivative analysis, and Sandhi analysis are done. 

Downloads

Download data is not yet available.

References

[1] A.J.P.M.P. Jayaweera, N.G.J. Dias, “Hidden Markov Model Based Part Of Speech Tagger For Sinhala

Language”, International Journal on Natural Language Computing (IJNLC) Vol. 3, No. 3, June 2014.

[2] R Muni Prashanthi, M. Sirish Kumar, R.J. Rama Sree, "POS Tagger For Sanskrit,” International journal of

Engineering Sciences Research, Vol. 04, (2013), ISSN: 2230-8504; e-ISSN: 2230-8512.

[3] Pallavi Bagul, Archana Mishra, Prachi Mahajan, Medinee Kulkararni, and Gauri Dhopavkar, "Rule-Based POS

Tagger for Marathi Text”, (IJCSIT) International Journal of Computer Science and Information Technologies,

Vol. 5 (2) , 2014, 1322-1326.

[4] Kanak Mohnat, Neha Bhansa, Shashi Pal Singh, and Ajai Kumar, “Hybrid approach for Part of Speech Tagger for

Hindi”, International Journal of Computer Technology and Electronics Engineering (IJCTEE) Volume 4, Issue

1.

[5] http://chandanasamskritam.blogspot.in

[6] http://sanskrit.jnu.ac.in/post/post.jsp

[7] http://nlp.stanford.edu/software/.

[8] J.P. Gupta, Devendra K. Tayal , Arti Gupta, “A TENGRAM method-based POS tagging of multi-category

words in Hindi language “, Expert Systems with Applications, Volume 38 (2011), pages 15084–15093.

[9] Kh Raju Singha, Bipul Syam Purkayastha , Kh Dhiren Singha, ”Part of Speech Tagging in Manipuri with

Hidden Markov Model”, IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 6, No. 2.

November 2012, ISSN (online): 1694-0814.

Downloads

Published

18.09.2024

How to Cite

Rule based POS Tagger for Sanskrit. (2024). International Journal of Psychosocial Rehabilitation, 23(1), 336-345. https://doi.org/10.61841/xkrtn624