A Corpus-Based Word Sense Disambiguation For Geez Language

Amlakie Aschale Alemu; Kinde Anlay Fante

doi:10.20372/ejssdastu:v8.i1.2021.283

Amlakie Aschale Alemu Debretabor
Kinde Anlay Fante

DOI: https://doi.org/10.20372/ejssdastu:v8.i1.2021.283

Keywords: Key Words: Natural Language processing , Word Sense Disambguation, Semi-supervised, ADtree, Ge’ez Language

Abstract

Abstract

In natural language processing, languages have a number of ambiguous words and solving such kind of problem for the language can help the development of word sense disambiguation using corpusbased Approach. Due to the absence of automatic word sense disambiguation to the language can be a challenge for the development of natural language processing applications of the language. So, this study was to design a word sense disambiguation prototype model for Geez Language words using Corpus-Based technique to extract training sets that minimize the quantity of the desired human intervention. because of the inaccessibility of the Geez wordNet and annotated datasets, six words were Chosen.These words were ሀለፈ(Halafe), ቆመ(ḱome), ባረከ(bareke), አስተርዓየ(astaraya), ገብረ(gebire), ሰዓለ(Se’ale).Separate information sets victimization six ambiguous words were ready for the event of this Geez WSD model. the ultimate classification task was done on the absolutely labeled coaching set victimization Adaboost, SMO, and AD tree classification algorithms on the WEKA package.We compared the Corpus-based machine learning approachs which are unsupervised, supervised and semi-supervised and we found that semi-supervised machine learning approach achieved the best performance. The proposed method achieved an average performance of 92.1%, 91.3%, 91% and 91.1% for Precision, Recall, F1-score and Accuracy using ADTree algorithm respectively. Window size of 4-4 has been the optimal window size to identify the meaning of the selected ambiguous words of Geez language using ADTree algorithm.