POS Tagging for Amharic: A Machine Learning Approach

Detalhes bibliográficos
Autor(a) principal: Kefena, Sintayehu Hirpassa
Data de Publicação: 2020
Outros Autores: Lehal, Gurpreet Singh
Tipo de documento: Artigo
Idioma: eng
Título da fonte: INFOCOMP: Jornal de Ciência da Computação
Texto Completo: https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627
Resumo: In this paper, our focus is the problem of automatic prediction of Parts of Speech of words in Amharic language sentence. We present an experiment that involves the study and implementation of POS tagging model. Four statistical taggers, i.e. Trigrams’n’Tags (TnT) Tagger, Conditional Random Field taggers (CRF), Naive Bays (NB) and Decision Tree (DT) classifier is applying for a morphologically rich language: Amharic. We compare the performances of all taggers with the same size of training and testing Dataset. Various types of language-dependent and independent feature set have formed, and for each algorithm, a combination of them is applied. Based on such inputs CRF based model has achieved outperformed accuracy. The best accuracy obtained from our experiment is 94.08%. Finally, our study shows that linguistic features play a decisive part in overcoming the limitations of the baseline statistical model for Amharic languages.
id UFLA-5_e363e3490a091b0be72b4323897401a4
oai_identifier_str oai:infocomp.dcc.ufla.br:article/627
network_acronym_str UFLA-5
network_name_str INFOCOMP: Jornal de Ciência da Computação
repository_id_str
spelling POS Tagging for Amharic: A Machine Learning ApproachIn this paper, our focus is the problem of automatic prediction of Parts of Speech of words in Amharic language sentence. We present an experiment that involves the study and implementation of POS tagging model. Four statistical taggers, i.e. Trigrams’n’Tags (TnT) Tagger, Conditional Random Field taggers (CRF), Naive Bays (NB) and Decision Tree (DT) classifier is applying for a morphologically rich language: Amharic. We compare the performances of all taggers with the same size of training and testing Dataset. Various types of language-dependent and independent feature set have formed, and for each algorithm, a combination of them is applied. Based on such inputs CRF based model has achieved outperformed accuracy. The best accuracy obtained from our experiment is 94.08%. Finally, our study shows that linguistic features play a decisive part in overcoming the limitations of the baseline statistical model for Amharic languages.Editora da UFLA2020-06-18info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627INFOCOMP Journal of Computer Science; Vol. 19 No. 1 (2020): June 20201982-33631807-4545reponame:INFOCOMP: Jornal de Ciência da Computaçãoinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAenghttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627/534Copyright (c) 2020 Sintayehu Hirpassa Kefena, Gurpreet Singh Lehalinfo:eu-repo/semantics/openAccessKefena, Sintayehu HirpassaLehal, Gurpreet Singh2020-08-18T01:10:10Zoai:infocomp.dcc.ufla.br:article/627Revistahttps://infocomp.dcc.ufla.br/index.php/infocompPUBhttps://infocomp.dcc.ufla.br/index.php/infocomp/oaiinfocomp@dcc.ufla.br||apfreire@dcc.ufla.br1982-33631807-4545opendoar:2024-05-21T19:54:44.374057INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)true
dc.title.none.fl_str_mv POS Tagging for Amharic: A Machine Learning Approach
title POS Tagging for Amharic: A Machine Learning Approach
spellingShingle POS Tagging for Amharic: A Machine Learning Approach
Kefena, Sintayehu Hirpassa
title_short POS Tagging for Amharic: A Machine Learning Approach
title_full POS Tagging for Amharic: A Machine Learning Approach
title_fullStr POS Tagging for Amharic: A Machine Learning Approach
title_full_unstemmed POS Tagging for Amharic: A Machine Learning Approach
title_sort POS Tagging for Amharic: A Machine Learning Approach
author Kefena, Sintayehu Hirpassa
author_facet Kefena, Sintayehu Hirpassa
Lehal, Gurpreet Singh
author_role author
author2 Lehal, Gurpreet Singh
author2_role author
dc.contributor.author.fl_str_mv Kefena, Sintayehu Hirpassa
Lehal, Gurpreet Singh
description In this paper, our focus is the problem of automatic prediction of Parts of Speech of words in Amharic language sentence. We present an experiment that involves the study and implementation of POS tagging model. Four statistical taggers, i.e. Trigrams’n’Tags (TnT) Tagger, Conditional Random Field taggers (CRF), Naive Bays (NB) and Decision Tree (DT) classifier is applying for a morphologically rich language: Amharic. We compare the performances of all taggers with the same size of training and testing Dataset. Various types of language-dependent and independent feature set have formed, and for each algorithm, a combination of them is applied. Based on such inputs CRF based model has achieved outperformed accuracy. The best accuracy obtained from our experiment is 94.08%. Finally, our study shows that linguistic features play a decisive part in overcoming the limitations of the baseline statistical model for Amharic languages.
publishDate 2020
dc.date.none.fl_str_mv 2020-06-18
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627
url https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627/534
dc.rights.driver.fl_str_mv Copyright (c) 2020 Sintayehu Hirpassa Kefena, Gurpreet Singh Lehal
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2020 Sintayehu Hirpassa Kefena, Gurpreet Singh Lehal
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Editora da UFLA
publisher.none.fl_str_mv Editora da UFLA
dc.source.none.fl_str_mv INFOCOMP Journal of Computer Science; Vol. 19 No. 1 (2020): June 2020
1982-3363
1807-4545
reponame:INFOCOMP: Jornal de Ciência da Computação
instname:Universidade Federal de Lavras (UFLA)
instacron:UFLA
instname_str Universidade Federal de Lavras (UFLA)
instacron_str UFLA
institution UFLA
reponame_str INFOCOMP: Jornal de Ciência da Computação
collection INFOCOMP: Jornal de Ciência da Computação
repository.name.fl_str_mv INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv infocomp@dcc.ufla.br||apfreire@dcc.ufla.br
_version_ 1799874742198468608