POS Tagging for Amharic: A Machine Learning Approach

Kefena, Sintayehu Hirpassa; Lehal, Gurpreet Singh

POS Tagging for Amharic: A Machine Learning Approach

Detalhes bibliográficos
Autor(a) principal:	Kefena, Sintayehu Hirpassa
Data de Publicação:	2020
Outros Autores:	Lehal, Gurpreet Singh
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	INFOCOMP: Jornal de Ciência da Computação
Texto Completo:	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627
Resumo:	In this paper, our focus is the problem of automatic prediction of Parts of Speech of words in Amharic language sentence. We present an experiment that involves the study and implementation of POS tagging model. Four statistical taggers, i.e. Trigrams’n’Tags (TnT) Tagger, Conditional Random Field taggers (CRF), Naive Bays (NB) and Decision Tree (DT) classifier is applying for a morphologically rich language: Amharic. We compare the performances of all taggers with the same size of training and testing Dataset. Various types of language-dependent and independent feature set have formed, and for each algorithm, a combination of them is applied. Based on such inputs CRF based model has achieved outperformed accuracy. The best accuracy obtained from our experiment is 94.08%. Finally, our study shows that linguistic features play a decisive part in overcoming the limitations of the baseline statistical model for Amharic languages.

Metadados do item

id	UFLA-5_e363e3490a091b0be72b4323897401a4
oai_identifier_str	oai:infocomp.dcc.ufla.br:article/627
network_acronym_str	UFLA-5
network_name_str	INFOCOMP: Jornal de Ciência da Computação
repository_id_str
spelling	POS Tagging for Amharic: A Machine Learning ApproachIn this paper, our focus is the problem of automatic prediction of Parts of Speech of words in Amharic language sentence. We present an experiment that involves the study and implementation of POS tagging model. Four statistical taggers, i.e. Trigrams’n’Tags (TnT) Tagger, Conditional Random Field taggers (CRF), Naive Bays (NB) and Decision Tree (DT) classifier is applying for a morphologically rich language: Amharic. We compare the performances of all taggers with the same size of training and testing Dataset. Various types of language-dependent and independent feature set have formed, and for each algorithm, a combination of them is applied. Based on such inputs CRF based model has achieved outperformed accuracy. The best accuracy obtained from our experiment is 94.08%. Finally, our study shows that linguistic features play a decisive part in overcoming the limitations of the baseline statistical model for Amharic languages.Editora da UFLA2020-06-18info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627INFOCOMP Journal of Computer Science; Vol. 19 No. 1 (2020): June 20201982-33631807-4545reponame:INFOCOMP: Jornal de Ciência da Computaçãoinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAenghttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627/534Copyright (c) 2020 Sintayehu Hirpassa Kefena, Gurpreet Singh Lehalinfo:eu-repo/semantics/openAccessKefena, Sintayehu HirpassaLehal, Gurpreet Singh2020-08-18T01:10:10Zoai:infocomp.dcc.ufla.br:article/627Revistahttps://infocomp.dcc.ufla.br/index.php/infocompPUBhttps://infocomp.dcc.ufla.br/index.php/infocomp/oaiinfocomp@dcc.ufla.br\|\|apfreire@dcc.ufla.br1982-33631807-4545opendoar:2024-05-21T19:54:44.374057INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)true
dc.title.none.fl_str_mv	POS Tagging for Amharic: A Machine Learning Approach
title	POS Tagging for Amharic: A Machine Learning Approach
spellingShingle	POS Tagging for Amharic: A Machine Learning Approach Kefena, Sintayehu Hirpassa
title_short	POS Tagging for Amharic: A Machine Learning Approach
title_full	POS Tagging for Amharic: A Machine Learning Approach
title_fullStr	POS Tagging for Amharic: A Machine Learning Approach
title_full_unstemmed	POS Tagging for Amharic: A Machine Learning Approach
title_sort	POS Tagging for Amharic: A Machine Learning Approach
author	Kefena, Sintayehu Hirpassa
author_facet	Kefena, Sintayehu Hirpassa Lehal, Gurpreet Singh
author_role	author
author2	Lehal, Gurpreet Singh
author2_role	author
dc.contributor.author.fl_str_mv	Kefena, Sintayehu Hirpassa Lehal, Gurpreet Singh
description	In this paper, our focus is the problem of automatic prediction of Parts of Speech of words in Amharic language sentence. We present an experiment that involves the study and implementation of POS tagging model. Four statistical taggers, i.e. Trigrams’n’Tags (TnT) Tagger, Conditional Random Field taggers (CRF), Naive Bays (NB) and Decision Tree (DT) classifier is applying for a morphologically rich language: Amharic. We compare the performances of all taggers with the same size of training and testing Dataset. Various types of language-dependent and independent feature set have formed, and for each algorithm, a combination of them is applied. Based on such inputs CRF based model has achieved outperformed accuracy. The best accuracy obtained from our experiment is 94.08%. Finally, our study shows that linguistic features play a decisive part in overcoming the limitations of the baseline statistical model for Amharic languages.
publishDate	2020
dc.date.none.fl_str_mv	2020-06-18
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627
url	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/627/534
dc.rights.driver.fl_str_mv	Copyright (c) 2020 Sintayehu Hirpassa Kefena, Gurpreet Singh Lehal info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Copyright (c) 2020 Sintayehu Hirpassa Kefena, Gurpreet Singh Lehal
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Editora da UFLA
publisher.none.fl_str_mv	Editora da UFLA
dc.source.none.fl_str_mv	INFOCOMP Journal of Computer Science; Vol. 19 No. 1 (2020): June 2020 1982-3363 1807-4545 reponame:INFOCOMP: Jornal de Ciência da Computação instname:Universidade Federal de Lavras (UFLA) instacron:UFLA
instname_str	Universidade Federal de Lavras (UFLA)
instacron_str	UFLA
institution	UFLA
reponame_str	INFOCOMP: Jornal de Ciência da Computação
collection	INFOCOMP: Jornal de Ciência da Computação
repository.name.fl_str_mv	INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv	infocomp@dcc.ufla.br\|\|apfreire@dcc.ufla.br
_version_	1799874742198468608

POS Tagging for Amharic: A Machine Learning Approach

Registros relacionados