Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus

Detalhes bibliográficos
Autor(a) principal: Chen, Liang-Ching
Data de Publicação: 2022
Outros Autores: Chang, Kuei-Hu, Yang, Shu-Ching
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Acta scientiarum. Technology (Online)
Texto Completo: http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/60486
Resumo: Within the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues.
id UEM-6_a05231c3ad1b93f62db5304aa648f65f
oai_identifier_str oai:periodicos.uem.br/ojs:article/60486
network_acronym_str UEM-6
network_name_str Acta scientiarum. Technology (Online)
repository_id_str
spelling Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); militaryInformation; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); militaryWithin the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues.Within the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues.Universidade Estadual De Maringá2022-07-28info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/6048610.4025/actascitechnol.v44i1.60486Acta Scientiarum. Technology; Vol 44 (2022): Publicação contínua; e60486Acta Scientiarum. Technology; v. 44 (2022): Publicação contínua; e604861806-25631807-8664reponame:Acta scientiarum. Technology (Online)instname:Universidade Estadual de Maringá (UEM)instacron:UEMenghttp://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/60486/751375154620Copyright (c) 2022 Acta Scientiarum. Technologyhttp://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessChen, Liang-ChingChang, Kuei-HuYang, Shu-Ching 2022-08-22T17:00:17Zoai:periodicos.uem.br/ojs:article/60486Revistahttps://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/indexPUBhttps://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/oai||actatech@uem.br1807-86641806-2563opendoar:2022-08-22T17:00:17Acta scientiarum. Technology (Online) - Universidade Estadual de Maringá (UEM)false
dc.title.none.fl_str_mv Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus
Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus
title Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus
spellingShingle Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus
Chen, Liang-Ching
Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military
Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military
title_short Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus
title_full Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus
title_fullStr Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus
title_full_unstemmed Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus
title_sort Integrating corpus-based and NLP approach to extract terminology and domain-oriented information: an example of US military corpus
author Chen, Liang-Ching
author_facet Chen, Liang-Ching
Chang, Kuei-Hu
Yang, Shu-Ching
author_role author
author2 Chang, Kuei-Hu
Yang, Shu-Ching
author2_role author
author
dc.contributor.author.fl_str_mv Chen, Liang-Ching
Chang, Kuei-Hu
Yang, Shu-Ching
dc.subject.por.fl_str_mv Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military
Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military
topic Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military
Information; communication and technology (ICT); corpus-based approach; natural language data (NLD); natural language processing (NLP); military
description Within the modern information, communication and technology (ICT), seeking high efficient and accurate corpus-based approaches to process natural language data (NLD) is critical. Traditional corpus-based approaches for processing corpus (i.e. the collected NLD) mainly focused on quantifying and ranking words for assisting human in extracting keywords. However, traditional corpus-based approaches cannot identify the meanings behind the words to properly extract terminologies nor their information. To address this issue, the main objective of this paper is to propose an integrated linguistic analysis approach that combines two corpus-based approaches and a rule-based natural language processing (NLP) approach to extract and identify terminologies and create the text database for extracting deeper domain-oriented information by using the terminologies as channels to retrieve core information from the target corpus. Military domain is an uncommon research field and often classified as confidential data, which caused little researches to focus on. Nevertheless, military information is vital to national security and should not be ignored. Hence, to verify the proposed approach in extracting terminologies and information of the terminologies, the researchers adopt the US Army field manual (FM) 8-10-6 as the target corpus and empirical case. Compared with AntConc 3.5.8 and Tongpoon-Patanasorn’s hybrid approach, the results indicate that from the perspectives of terminology identification, texts database creation, domain knowledge extraction, only the proposed approach can handle all these issues.
publishDate 2022
dc.date.none.fl_str_mv 2022-07-28
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/60486
10.4025/actascitechnol.v44i1.60486
url http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/60486
identifier_str_mv 10.4025/actascitechnol.v44i1.60486
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/60486/751375154620
dc.rights.driver.fl_str_mv Copyright (c) 2022 Acta Scientiarum. Technology
http://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2022 Acta Scientiarum. Technology
http://creativecommons.org/licenses/by/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Estadual De Maringá
publisher.none.fl_str_mv Universidade Estadual De Maringá
dc.source.none.fl_str_mv Acta Scientiarum. Technology; Vol 44 (2022): Publicação contínua; e60486
Acta Scientiarum. Technology; v. 44 (2022): Publicação contínua; e60486
1806-2563
1807-8664
reponame:Acta scientiarum. Technology (Online)
instname:Universidade Estadual de Maringá (UEM)
instacron:UEM
instname_str Universidade Estadual de Maringá (UEM)
instacron_str UEM
institution UEM
reponame_str Acta scientiarum. Technology (Online)
collection Acta scientiarum. Technology (Online)
repository.name.fl_str_mv Acta scientiarum. Technology (Online) - Universidade Estadual de Maringá (UEM)
repository.mail.fl_str_mv ||actatech@uem.br
_version_ 1799315338053025792