POS Tagging for Amharic: A Machine Learning Approach
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | INFOCOMP: Jornal de Ciência da Computação |
Texto Completo: | https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627 |
Resumo: | In this paper, our focus is the problem of automatic prediction of Parts of Speech of words in Amharic language sentence. We present an experiment that involves the study and implementation of POS tagging model. Four statistical taggers, i.e. Trigrams’n’Tags (TnT) Tagger, Conditional Random Field taggers (CRF), Naive Bays (NB) and Decision Tree (DT) classifier is applying for a morphologically rich language: Amharic. We compare the performances of all taggers with the same size of training and testing Dataset. Various types of language-dependent and independent feature set have formed, and for each algorithm, a combination of them is applied. Based on such inputs CRF based model has achieved outperformed accuracy. The best accuracy obtained from our experiment is 94.08%. Finally, our study shows that linguistic features play a decisive part in overcoming the limitations of the baseline statistical model for Amharic languages. |
id |
UFLA-5_e363e3490a091b0be72b4323897401a4 |
---|---|
oai_identifier_str |
oai:infocomp.dcc.ufla.br:article/627 |
network_acronym_str |
UFLA-5 |
network_name_str |
INFOCOMP: Jornal de Ciência da Computação |
repository_id_str |
|
spelling |
POS Tagging for Amharic: A Machine Learning ApproachIn this paper, our focus is the problem of automatic prediction of Parts of Speech of words in Amharic language sentence. We present an experiment that involves the study and implementation of POS tagging model. Four statistical taggers, i.e. Trigrams’n’Tags (TnT) Tagger, Conditional Random Field taggers (CRF), Naive Bays (NB) and Decision Tree (DT) classifier is applying for a morphologically rich language: Amharic. We compare the performances of all taggers with the same size of training and testing Dataset. Various types of language-dependent and independent feature set have formed, and for each algorithm, a combination of them is applied. Based on such inputs CRF based model has achieved outperformed accuracy. The best accuracy obtained from our experiment is 94.08%. Finally, our study shows that linguistic features play a decisive part in overcoming the limitations of the baseline statistical model for Amharic languages.Editora da UFLA2020-06-18info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627INFOCOMP Journal of Computer Science; Vol. 19 No. 1 (2020): June 20201982-33631807-4545reponame:INFOCOMP: Jornal de Ciência da Computaçãoinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAenghttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627/534Copyright (c) 2020 Sintayehu Hirpassa Kefena, Gurpreet Singh Lehalinfo:eu-repo/semantics/openAccessKefena, Sintayehu HirpassaLehal, Gurpreet Singh2020-08-18T01:10:10Zoai:infocomp.dcc.ufla.br:article/627Revistahttps://infocomp.dcc.ufla.br/index.php/infocompPUBhttps://infocomp.dcc.ufla.br/index.php/infocomp/oaiinfocomp@dcc.ufla.br||apfreire@dcc.ufla.br1982-33631807-4545opendoar:2024-05-21T19:54:44.374057INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)true |
dc.title.none.fl_str_mv |
POS Tagging for Amharic: A Machine Learning Approach |
title |
POS Tagging for Amharic: A Machine Learning Approach |
spellingShingle |
POS Tagging for Amharic: A Machine Learning Approach Kefena, Sintayehu Hirpassa |
title_short |
POS Tagging for Amharic: A Machine Learning Approach |
title_full |
POS Tagging for Amharic: A Machine Learning Approach |
title_fullStr |
POS Tagging for Amharic: A Machine Learning Approach |
title_full_unstemmed |
POS Tagging for Amharic: A Machine Learning Approach |
title_sort |
POS Tagging for Amharic: A Machine Learning Approach |
author |
Kefena, Sintayehu Hirpassa |
author_facet |
Kefena, Sintayehu Hirpassa Lehal, Gurpreet Singh |
author_role |
author |
author2 |
Lehal, Gurpreet Singh |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Kefena, Sintayehu Hirpassa Lehal, Gurpreet Singh |
description |
In this paper, our focus is the problem of automatic prediction of Parts of Speech of words in Amharic language sentence. We present an experiment that involves the study and implementation of POS tagging model. Four statistical taggers, i.e. Trigrams’n’Tags (TnT) Tagger, Conditional Random Field taggers (CRF), Naive Bays (NB) and Decision Tree (DT) classifier is applying for a morphologically rich language: Amharic. We compare the performances of all taggers with the same size of training and testing Dataset. Various types of language-dependent and independent feature set have formed, and for each algorithm, a combination of them is applied. Based on such inputs CRF based model has achieved outperformed accuracy. The best accuracy obtained from our experiment is 94.08%. Finally, our study shows that linguistic features play a decisive part in overcoming the limitations of the baseline statistical model for Amharic languages. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-06-18 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627 |
url |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627/534 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2020 Sintayehu Hirpassa Kefena, Gurpreet Singh Lehal info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2020 Sintayehu Hirpassa Kefena, Gurpreet Singh Lehal |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Editora da UFLA |
publisher.none.fl_str_mv |
Editora da UFLA |
dc.source.none.fl_str_mv |
INFOCOMP Journal of Computer Science; Vol. 19 No. 1 (2020): June 2020 1982-3363 1807-4545 reponame:INFOCOMP: Jornal de Ciência da Computação instname:Universidade Federal de Lavras (UFLA) instacron:UFLA |
instname_str |
Universidade Federal de Lavras (UFLA) |
instacron_str |
UFLA |
institution |
UFLA |
reponame_str |
INFOCOMP: Jornal de Ciência da Computação |
collection |
INFOCOMP: Jornal de Ciência da Computação |
repository.name.fl_str_mv |
INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA) |
repository.mail.fl_str_mv |
infocomp@dcc.ufla.br||apfreire@dcc.ufla.br |
_version_ |
1799874742198468608 |