Computational-linguistic processing of lexical bundles: A corpusbased study in the area of Pharmaceutical Regulation

Detalhes bibliográficos
Autor(a) principal: Mazza, Luciene Novais
Data de Publicação: 2015
Tipo de documento: Artigo
Idioma: por
Título da fonte: Calidoscópio (Online)
Texto Completo: https://revistas.unisinos.br/index.php/calidoscopio/article/view/cld.2015.133.13
Resumo: The present paper aims to demonstrate a computational tool developed to extract three-word lexical bundles and show – by working through this – the automatic recognition of recurring lexical items among regulatory documents. In this quantitative analysis a specific document prepared by pharmaceutical industries (in which the matter is directed related to the public health protection agencies) is generally examined. Nonetheless, the quantitative data collection methods can also be used to search any other linguistics features within a variety of genres and specific type of documents and it allows the linguistics researcher to easily identify which terms fall under a domain of specific texts. The study focus their main concern on investigating lexical pattern frequency of language use, particularly across the current context of business, and it seeks to spread Douglas Biber works based on recurrent word combinations that makes use of tools and techniques developed in corpus-based linguistics. As the theoretical framework for this study we primarily draw upon Corpus Linguistics, a theory that is able to connect its concepts over the computational assumptions and design tools for end users and extract the lexical bundles as well. The collected corpus gathers documents in English from fifteen different manufacturing sites of a multinational Pharmaceutical company, totaling about 110,000 words, whose limits include different writers among different geographic parts of the world. The investigation shows that it is possible to search text-internal features by the extraction of lexical bundle between data across the same specific-domain document.Keywords: lexical bundles, domain-specific corpus, linguistic-computational tool.
id Unisinos-3_7c00b2f7e28d5f1acb39b5c76ac580b0
oai_identifier_str oai:ojs2.revistas.unisinos.br:article/9837
network_acronym_str Unisinos-3
network_name_str Calidoscópio (Online)
repository_id_str
spelling Computational-linguistic processing of lexical bundles: A corpusbased study in the area of Pharmaceutical RegulationProcessamento linguístico-computacional de pacotes lexicais: um estudo de corpus na área de Regulamentação FarmacêuticaThe present paper aims to demonstrate a computational tool developed to extract three-word lexical bundles and show – by working through this – the automatic recognition of recurring lexical items among regulatory documents. In this quantitative analysis a specific document prepared by pharmaceutical industries (in which the matter is directed related to the public health protection agencies) is generally examined. Nonetheless, the quantitative data collection methods can also be used to search any other linguistics features within a variety of genres and specific type of documents and it allows the linguistics researcher to easily identify which terms fall under a domain of specific texts. The study focus their main concern on investigating lexical pattern frequency of language use, particularly across the current context of business, and it seeks to spread Douglas Biber works based on recurrent word combinations that makes use of tools and techniques developed in corpus-based linguistics. As the theoretical framework for this study we primarily draw upon Corpus Linguistics, a theory that is able to connect its concepts over the computational assumptions and design tools for end users and extract the lexical bundles as well. The collected corpus gathers documents in English from fifteen different manufacturing sites of a multinational Pharmaceutical company, totaling about 110,000 words, whose limits include different writers among different geographic parts of the world. The investigation shows that it is possible to search text-internal features by the extraction of lexical bundle between data across the same specific-domain document.Keywords: lexical bundles, domain-specific corpus, linguistic-computational tool.Este trabalho tem por objetivo demonstrar um aplicativo computacional desenvolvido para a extração de pacotes lexicais de três palavras e apresentar por meio deste as unidades lexicais recorrentes entre documentos de especialidade. O método quantitativo aplicado, em princípio, explora um tipo de texto produzido pelas indústrias do setor farmacêutico, o qual está diretamente relacionado a assuntos regulatórios no âmbito das agências internacionais de vigilância sanitária. No entanto, os procedimentos de análise podem ser adotados para investigar outros aspectos linguísticos dentre a variedade de gêneros e tipos textuais, como também possibilita a identificação de termos. O estudo tem como principal enfoque a frequência de ocorrência dos padrões lexicais em corpus autêntico da língua em uso por meio de ferramentas linguístico-computacionais, em particular nas pesquisas voltadas ao estudo da linguagem em contextos empresariais, e busca multiplicar os trabalhos de Douglas Biber com base na combinação de palavras recorrentes em corpora específicos. O referencial teórico- -metodológico baseia-se na Linguística de Corpus, que é capaz de dialogar, especificamente, com a Linguística Computacional e oferecer meios para o desenvolvimento do aplicativo e ao processamento dos pacotes lexicais. O corpus coletado reúne quinze exemplares do documento escrito na língua inglesa, totalizando cerca de 110 mil palavras, cuja delimitação contempla diferentes localidades do mundo, envolvendo vários autores. Os resultados desvelam a possibilidade de investigação nas divisões internas dos textos mediante o cruzamento entre documentos de uma mesma especialidade.Palavras-chave: pacotes lexicais, corpus de especialidade, ferramenta linguístico-computacional.Unisinos2015-12-21info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://revistas.unisinos.br/index.php/calidoscopio/article/view/cld.2015.133.13Calidoscópio; Vol. 13 No. 3 (2015): September/December; 424-439Calidoscópio; v. 13 n. 3 (2015): setembro/dezembro; 424-4392177-6202reponame:Calidoscópio (Online)instname:Universidade do Vale do Rio dos Sinos (UNISINOS)instacron:Unisinosporhttps://revistas.unisinos.br/index.php/calidoscopio/article/view/cld.2015.133.13/5071Mazza, Luciene Novaisinfo:eu-repo/semantics/openAccess2016-02-11T20:14:19Zoai:ojs2.revistas.unisinos.br:article/9837Revistahttps://revistas.unisinos.br/index.php/calidoscopioPUBhttps://revistas.unisinos.br/index.php/calidoscopio/oaicmira@unisinos.br || cmira@unisinos.br2177-62022177-6202opendoar:2016-02-11T20:14:19Calidoscópio (Online) - Universidade do Vale do Rio dos Sinos (UNISINOS)false
dc.title.none.fl_str_mv Computational-linguistic processing of lexical bundles: A corpusbased study in the area of Pharmaceutical Regulation
Processamento linguístico-computacional de pacotes lexicais: um estudo de corpus na área de Regulamentação Farmacêutica
title Computational-linguistic processing of lexical bundles: A corpusbased study in the area of Pharmaceutical Regulation
spellingShingle Computational-linguistic processing of lexical bundles: A corpusbased study in the area of Pharmaceutical Regulation
Mazza, Luciene Novais
title_short Computational-linguistic processing of lexical bundles: A corpusbased study in the area of Pharmaceutical Regulation
title_full Computational-linguistic processing of lexical bundles: A corpusbased study in the area of Pharmaceutical Regulation
title_fullStr Computational-linguistic processing of lexical bundles: A corpusbased study in the area of Pharmaceutical Regulation
title_full_unstemmed Computational-linguistic processing of lexical bundles: A corpusbased study in the area of Pharmaceutical Regulation
title_sort Computational-linguistic processing of lexical bundles: A corpusbased study in the area of Pharmaceutical Regulation
author Mazza, Luciene Novais
author_facet Mazza, Luciene Novais
author_role author
dc.contributor.author.fl_str_mv Mazza, Luciene Novais
description The present paper aims to demonstrate a computational tool developed to extract three-word lexical bundles and show – by working through this – the automatic recognition of recurring lexical items among regulatory documents. In this quantitative analysis a specific document prepared by pharmaceutical industries (in which the matter is directed related to the public health protection agencies) is generally examined. Nonetheless, the quantitative data collection methods can also be used to search any other linguistics features within a variety of genres and specific type of documents and it allows the linguistics researcher to easily identify which terms fall under a domain of specific texts. The study focus their main concern on investigating lexical pattern frequency of language use, particularly across the current context of business, and it seeks to spread Douglas Biber works based on recurrent word combinations that makes use of tools and techniques developed in corpus-based linguistics. As the theoretical framework for this study we primarily draw upon Corpus Linguistics, a theory that is able to connect its concepts over the computational assumptions and design tools for end users and extract the lexical bundles as well. The collected corpus gathers documents in English from fifteen different manufacturing sites of a multinational Pharmaceutical company, totaling about 110,000 words, whose limits include different writers among different geographic parts of the world. The investigation shows that it is possible to search text-internal features by the extraction of lexical bundle between data across the same specific-domain document.Keywords: lexical bundles, domain-specific corpus, linguistic-computational tool.
publishDate 2015
dc.date.none.fl_str_mv 2015-12-21
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://revistas.unisinos.br/index.php/calidoscopio/article/view/cld.2015.133.13
url https://revistas.unisinos.br/index.php/calidoscopio/article/view/cld.2015.133.13
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv https://revistas.unisinos.br/index.php/calidoscopio/article/view/cld.2015.133.13/5071
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Unisinos
publisher.none.fl_str_mv Unisinos
dc.source.none.fl_str_mv Calidoscópio; Vol. 13 No. 3 (2015): September/December; 424-439
Calidoscópio; v. 13 n. 3 (2015): setembro/dezembro; 424-439
2177-6202
reponame:Calidoscópio (Online)
instname:Universidade do Vale do Rio dos Sinos (UNISINOS)
instacron:Unisinos
instname_str Universidade do Vale do Rio dos Sinos (UNISINOS)
instacron_str Unisinos
institution Unisinos
reponame_str Calidoscópio (Online)
collection Calidoscópio (Online)
repository.name.fl_str_mv Calidoscópio (Online) - Universidade do Vale do Rio dos Sinos (UNISINOS)
repository.mail.fl_str_mv cmira@unisinos.br || cmira@unisinos.br
_version_ 1792203886641020928