Translating the Language of Aviation. The Development and Detailed analysis of the English-Bengali Aviation Corpus for Machine translation
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | INFOCOMP: Jornal de Ciência da Computação |
Texto Completo: | https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1966 |
Resumo: | The recent advent of corpora based transliteration and translation approaches such as SMT and NMT models are completely based on the parallel corpus. It is the corpus that ultimately decides the Translation Accuracy (TA) of the model. With the regular and common domains exhausted and things of the past, Modern fields of research corpora domains lie anywhere between medicines to aero-science. The Work becomes more interesting when Indian languages are taken up especially ones that include technical touch such as Aeronautics and Aviation. With corpora for technical domains in English-Indian languages pairs such as Bengali coming up now, the automatic analysis for such corpora is an interesting aspect that researchers are taking up. Such analysis also helps developers and researchers to further improve the quality of the corpus and set new benchmarks for the development of future corpora. This paper deals with the need, development and detailed analysis of a bilingual corpus in aviation for English and Bengali language pairs. |
id |
UFLA-5_1225d43ef68add3e97e3daf10f0b57dc |
---|---|
oai_identifier_str |
oai:infocomp.dcc.ufla.br:article/1966 |
network_acronym_str |
UFLA-5 |
network_name_str |
INFOCOMP: Jornal de Ciência da Computação |
repository_id_str |
|
spelling |
Translating the Language of Aviation. The Development and Detailed analysis of the English-Bengali Aviation Corpus for Machine translationThe recent advent of corpora based transliteration and translation approaches such as SMT and NMT models are completely based on the parallel corpus. It is the corpus that ultimately decides the Translation Accuracy (TA) of the model. With the regular and common domains exhausted and things of the past, Modern fields of research corpora domains lie anywhere between medicines to aero-science. The Work becomes more interesting when Indian languages are taken up especially ones that include technical touch such as Aeronautics and Aviation. With corpora for technical domains in English-Indian languages pairs such as Bengali coming up now, the automatic analysis for such corpora is an interesting aspect that researchers are taking up. Such analysis also helps developers and researchers to further improve the quality of the corpus and set new benchmarks for the development of future corpora. This paper deals with the need, development and detailed analysis of a bilingual corpus in aviation for English and Bengali language pairs.Editora da UFLA2022-06-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1966INFOCOMP Journal of Computer Science; Vol. 21 No. 1 (2022): June 20221982-33631807-4545reponame:INFOCOMP: Jornal de Ciência da Computaçãoinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAenghttps://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1966/577Copyright (c) 2022 Saptarshi Paulinfo:eu-repo/semantics/openAccessPaul, Saptarshi2022-06-01T13:53:39Zoai:infocomp.dcc.ufla.br:article/1966Revistahttps://infocomp.dcc.ufla.br/index.php/infocompPUBhttps://infocomp.dcc.ufla.br/index.php/infocomp/oaiinfocomp@dcc.ufla.br||apfreire@dcc.ufla.br1982-33631807-4545opendoar:2024-05-21T19:54:47.709872INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA)true |
dc.title.none.fl_str_mv |
Translating the Language of Aviation. The Development and Detailed analysis of the English-Bengali Aviation Corpus for Machine translation |
title |
Translating the Language of Aviation. The Development and Detailed analysis of the English-Bengali Aviation Corpus for Machine translation |
spellingShingle |
Translating the Language of Aviation. The Development and Detailed analysis of the English-Bengali Aviation Corpus for Machine translation Paul, Saptarshi |
title_short |
Translating the Language of Aviation. The Development and Detailed analysis of the English-Bengali Aviation Corpus for Machine translation |
title_full |
Translating the Language of Aviation. The Development and Detailed analysis of the English-Bengali Aviation Corpus for Machine translation |
title_fullStr |
Translating the Language of Aviation. The Development and Detailed analysis of the English-Bengali Aviation Corpus for Machine translation |
title_full_unstemmed |
Translating the Language of Aviation. The Development and Detailed analysis of the English-Bengali Aviation Corpus for Machine translation |
title_sort |
Translating the Language of Aviation. The Development and Detailed analysis of the English-Bengali Aviation Corpus for Machine translation |
author |
Paul, Saptarshi |
author_facet |
Paul, Saptarshi |
author_role |
author |
dc.contributor.author.fl_str_mv |
Paul, Saptarshi |
description |
The recent advent of corpora based transliteration and translation approaches such as SMT and NMT models are completely based on the parallel corpus. It is the corpus that ultimately decides the Translation Accuracy (TA) of the model. With the regular and common domains exhausted and things of the past, Modern fields of research corpora domains lie anywhere between medicines to aero-science. The Work becomes more interesting when Indian languages are taken up especially ones that include technical touch such as Aeronautics and Aviation. With corpora for technical domains in English-Indian languages pairs such as Bengali coming up now, the automatic analysis for such corpora is an interesting aspect that researchers are taking up. Such analysis also helps developers and researchers to further improve the quality of the corpus and set new benchmarks for the development of future corpora. This paper deals with the need, development and detailed analysis of a bilingual corpus in aviation for English and Bengali language pairs. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-06-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1966 |
url |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1966 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1966/577 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2022 Saptarshi Paul info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2022 Saptarshi Paul |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Editora da UFLA |
publisher.none.fl_str_mv |
Editora da UFLA |
dc.source.none.fl_str_mv |
INFOCOMP Journal of Computer Science; Vol. 21 No. 1 (2022): June 2022 1982-3363 1807-4545 reponame:INFOCOMP: Jornal de Ciência da Computação instname:Universidade Federal de Lavras (UFLA) instacron:UFLA |
instname_str |
Universidade Federal de Lavras (UFLA) |
instacron_str |
UFLA |
institution |
UFLA |
reponame_str |
INFOCOMP: Jornal de Ciência da Computação |
collection |
INFOCOMP: Jornal de Ciência da Computação |
repository.name.fl_str_mv |
INFOCOMP: Jornal de Ciência da Computação - Universidade Federal de Lavras (UFLA) |
repository.mail.fl_str_mv |
infocomp@dcc.ufla.br||apfreire@dcc.ufla.br |
_version_ |
1799874742685007872 |