Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Acta scientiarum. Technology (Online) |
Texto Completo: | http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/60486 |
Resumo: | Within the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues. |
id |
UEM-6_a05231c3ad1b93f62db5304aa648f65f |
---|---|
oai_identifier_str |
oai:periodicos.uem.br/ojs:article/60486 |
network_acronym_str |
UEM-6 |
network_name_str |
Acta scientiarum. Technology (Online) |
repository_id_str |
|
spelling |
Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); militaryInformation; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); militaryWithin the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues.Within the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues.Universidade Estadual De Maringá2022-07-28info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/6048610.4025/actascitechnol.v44i1.60486Acta Scientiarum. Technology; Vol 44 (2022): Publicação contínua; e60486Acta Scientiarum. Technology; v. 44 (2022): Publicação contínua; e604861806-25631807-8664reponame:Acta scientiarum. Technology (Online)instname:Universidade Estadual de Maringá (UEM)instacron:UEMenghttp://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/60486/751375154620Copyright (c) 2022 Acta Scientiarum. Technologyhttp://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessChen, Liang-ChingChang, Kuei-HuYang, Shu-Ching 2022-08-22T17:00:17Zoai:periodicos.uem.br/ojs:article/60486Revistahttps://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/indexPUBhttps://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/oai||actatech@uem.br1807-86641806-2563opendoar:2022-08-22T17:00:17Acta scientiarum. Technology (Online) - Universidade Estadual de Maringá (UEM)false |
dc.title.none.fl_str_mv |
Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus |
title |
Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus |
spellingShingle |
Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus Chen, Liang-Ching Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military |
title_short |
Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus |
title_full |
Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus |
title_fullStr |
Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus |
title_full_unstemmed |
Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus |
title_sort |
Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus |
author |
Chen, Liang-Ching |
author_facet |
Chen, Liang-Ching Chang, Kuei-Hu Yang, Shu-Ching |
author_role |
author |
author2 |
Chang, Kuei-Hu Yang, Shu-Ching |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Chen, Liang-Ching Chang, Kuei-Hu Yang, Shu-Ching |
dc.subject.por.fl_str_mv |
Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military |
topic |
Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military |
description |
Within the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-07-28 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/60486 10.4025/actascitechnol.v44i1.60486 |
url |
http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/60486 |
identifier_str_mv |
10.4025/actascitechnol.v44i1.60486 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/60486/751375154620 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2022 Acta Scientiarum. Technology http://creativecommons.org/licenses/by/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2022 Acta Scientiarum. Technology http://creativecommons.org/licenses/by/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade Estadual De Maringá |
publisher.none.fl_str_mv |
Universidade Estadual De Maringá |
dc.source.none.fl_str_mv |
Acta Scientiarum. Technology; Vol 44 (2022): Publicação contínua; e60486 Acta Scientiarum. Technology; v. 44 (2022): Publicação contínua; e60486 1806-2563 1807-8664 reponame:Acta scientiarum. Technology (Online) instname:Universidade Estadual de Maringá (UEM) instacron:UEM |
instname_str |
Universidade Estadual de Maringá (UEM) |
instacron_str |
UEM |
institution |
UEM |
reponame_str |
Acta scientiarum. Technology (Online) |
collection |
Acta scientiarum. Technology (Online) |
repository.name.fl_str_mv |
Acta scientiarum. Technology (Online) - Universidade Estadual de Maringá (UEM) |
repository.mail.fl_str_mv |
||actatech@uem.br |
_version_ |
1799315338053025792 |