AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN

Fonseca, Cláudia Aparecida; Souza Netto, Rafael Santiago de; Guelpeli, Marcus Vinícius Carvalho; Bodolay, Adriana Nascimento

AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN

Detalhes bibliográficos
Autor(a) principal:	Fonseca, Cláudia Aparecida
Data de Publicação:	2018
Outros Autores:	Souza Netto, Rafael Santiago de, Guelpeli, Marcus Vinícius Carvalho, Bodolay, Adriana Nascimento
Tipo de documento:	Artigo
Idioma:	por
Título da fonte:	Texto livre
DOI:	10.17851/1983-3652.11.3.40-64
Texto Completo:	https://periodicos.ufmg.br/index.php/textolivre/article/view/16811
Resumo:	RESUMO: A diversidade dos recursos de linguagem, que possibilita a construção de aplicações em Processamento de Linguagem Natural, provoca a necessidade da criação de ferramentas que sejam igualmente flexíveis. Além disso, essas ferramentas devem ser tão amigáveis quanto úteis, a fim de reduzir o esforço para usuários iniciantes e, ao mesmo tempo, promover um eficiente desempenho para usuários avançados. O presente artigo apresenta o AnoTex, que é um anotador textual capaz de executar a filtragem de dados estruturados do gênero artigo científico, coletados dos arquivos disponíveis na base de dados da Biblioteca Eletrônica SciELO – Scientific Electronic Library On-line. Como produto do processo de extração, obteve-se uma base de dados com as informações filtradas e estruturadas no formato XML, que delimitam e identificam as marcações do gênero em análise, disponível para uso em várias ferramentas e aplicações. São apresentadas outras ferramentas de anotação de textos, atualmente existentes, e argumenta-se que o AnoTex é o primeiro a combinar um bom nível de facilidade de uso com recursos estruturados, constitutivos do gênero, de alta qualidade linguística. Os resultados demonstram como a categorização dos elementos constitutivos do gênero, por meio de sua representação em bancos de árvore, pode condensar as informações disponíveis de forma hierarquizada e dinâmica, construídas durante a compilação. Essas características podem indicar novas estratégias de uso para as marcações coletadas, de modo a atender às necessidades no melhoramento do acesso e da recuperação da informação proporcionados pelo uso das ferramentas de processamento de texto. PALAVRAS-CHAVE: Processamento de Linguagem Natural; gênero textual; anotador textual; anotação de corpus.   ABSTRACT: The diversity of language resources, which enables the construction of applications in Natural Language Processing, causes the need to create tools that are equally flexible. In addition, these tools should be as user-friendly as useful, in order to reduce the effort for new users and at the same time promote efficient performance for expert users. This article presents the AnoTex, which is a textual annotator capable of performing the filtering of structured data of the textual genre scientific article, collected from the available archives in the database of SciELO – Scientific Electronic Library Online. As a product of the extraction process, we have obtained a database structured in the XML format that delimit and identify the markings of the genre under analysis, available for use in various tools and applications. Other textual annotation tools are currently available, and it is argued that AnoTex is the first to combine a good level of ease-of-use with structured, basic text-based features of high linguistic quality. The results demonstrate how the categorization of the constituent elements of the genre, through its representation in tree banks, can concentrate the information available in a hierarchical and dynamic way. These features may indicate new usage strategies for the collected tags to meet the needs for improvement in the access and retrieval of information through the use of word processing tools. KEYWORDS: Natural Language Processing; textual genre; textual annotator; annotation of corpus.

Metadados do item

id	UFMG-9_ec006b44a899084e815c6c70b5eebb4d
oai_identifier_str	oai:periodicos.ufmg.br:article/16811
network_acronym_str	UFMG-9
network_name_str	Texto livre
spelling	AnoTex: structured data filtering routine of the scientific article genre as contribution to PLNAnoTex: rotina de filtragem de dados estruturados do gênero artigo científico como contribuição para o PLN Processamento de Linguagem Naturalgênero textualanotador textualanotação de corpusRESUMO: A diversidade dos recursos de linguagem, que possibilita a construção de aplicações em Processamento de Linguagem Natural, provoca a necessidade da criação de ferramentas que sejam igualmente flexíveis. Além disso, essas ferramentas devem ser tão amigáveis quanto úteis, a fim de reduzir o esforço para usuários iniciantes e, ao mesmo tempo, promover um eficiente desempenho para usuários avançados. O presente artigo apresenta o AnoTex, que é um anotador textual capaz de executar a filtragem de dados estruturados do gênero artigo científico, coletados dos arquivos disponíveis na base de dados da Biblioteca Eletrônica SciELO – Scientific Electronic Library On-line. Como produto do processo de extração, obteve-se uma base de dados com as informações filtradas e estruturadas no formato XML, que delimitam e identificam as marcações do gênero em análise, disponível para uso em várias ferramentas e aplicações. São apresentadas outras ferramentas de anotação de textos, atualmente existentes, e argumenta-se que o AnoTex é o primeiro a combinar um bom nível de facilidade de uso com recursos estruturados, constitutivos do gênero, de alta qualidade linguística. Os resultados demonstram como a categorização dos elementos constitutivos do gênero, por meio de sua representação em bancos de árvore, pode condensar as informações disponíveis de forma hierarquizada e dinâmica, construídas durante a compilação. Essas características podem indicar novas estratégias de uso para as marcações coletadas, de modo a atender às necessidades no melhoramento do acesso e da recuperação da informação proporcionados pelo uso das ferramentas de processamento de texto. PALAVRAS-CHAVE: Processamento de Linguagem Natural; gênero textual; anotador textual; anotação de corpus.   ABSTRACT: The diversity of language resources, which enables the construction of applications in Natural Language Processing, causes the need to create tools that are equally flexible. In addition, these tools should be as user-friendly as useful, in order to reduce the effort for new users and at the same time promote efficient performance for expert users. This article presents the AnoTex, which is a textual annotator capable of performing the filtering of structured data of the textual genre scientific article, collected from the available archives in the database of SciELO – Scientific Electronic Library Online. As a product of the extraction process, we have obtained a database structured in the XML format that delimit and identify the markings of the genre under analysis, available for use in various tools and applications. Other textual annotation tools are currently available, and it is argued that AnoTex is the first to combine a good level of ease-of-use with structured, basic text-based features of high linguistic quality. The results demonstrate how the categorization of the constituent elements of the genre, through its representation in tree banks, can concentrate the information available in a hierarchical and dynamic way. These features may indicate new usage strategies for the collected tags to meet the needs for improvement in the access and retrieval of information through the use of word processing tools. KEYWORDS: Natural Language Processing; textual genre; textual annotator; annotation of corpus.Universidade Federal de Minas Gerais2018-12-26info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://periodicos.ufmg.br/index.php/textolivre/article/view/1681110.17851/1983-3652.11.3.40-64Texto Livre; Vol. 11 No. 3 (2018): Texto Livre: Linguagem e Tecnologia; 40-64Texto Livre; Vol. 11 Núm. 3 (2018): Texto Livre: Linguagem e Tecnologia; 40-64Texto Livre; Vol. 11 No 3 (2018): Texto Livre: Linguagem e Tecnologia; 40-64Texto Livre; v. 11 n. 3 (2018): Texto Livre: Linguagem e Tecnologia; 40-641983-3652reponame:Texto livreinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGporhttps://periodicos.ufmg.br/index.php/textolivre/article/view/16811/13572Copyright (c) 2018 Texto Livre: Linguagem e Tecnologiainfo:eu-repo/semantics/openAccessFonseca, Cláudia AparecidaSouza Netto, Rafael Santiago deGuelpeli, Marcus Vinícius CarvalhoBodolay, Adriana Nascimento2021-03-22T13:32:34Zoai:periodicos.ufmg.br:article/16811Revistahttp://www.periodicos.letras.ufmg.br/index.php/textolivrePUBhttps://periodicos.ufmg.br/index.php/textolivre/oairevistatextolivre@letras.ufmg.br1983-36521983-3652opendoar:2021-03-22T13:32:34Texto livre - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv	AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN AnoTex: rotina de filtragem de dados estruturados do gênero artigo científico como contribuição para o PLN
title	AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN
spellingShingle	AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN Fonseca, Cláudia Aparecida Processamento de Linguagem Natural gênero textual anotador textual anotação de corpus Fonseca, Cláudia Aparecida Processamento de Linguagem Natural gênero textual anotador textual anotação de corpus
title_short	AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN
title_full	AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN
title_fullStr	AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN
title_full_unstemmed	AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN
title_sort	AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN
author	Fonseca, Cláudia Aparecida
author_facet	Fonseca, Cláudia Aparecida Fonseca, Cláudia Aparecida Souza Netto, Rafael Santiago de Guelpeli, Marcus Vinícius Carvalho Bodolay, Adriana Nascimento Souza Netto, Rafael Santiago de Guelpeli, Marcus Vinícius Carvalho Bodolay, Adriana Nascimento
author_role	author
author2	Souza Netto, Rafael Santiago de Guelpeli, Marcus Vinícius Carvalho Bodolay, Adriana Nascimento
author2_role	author author author
dc.contributor.author.fl_str_mv	Fonseca, Cláudia Aparecida Souza Netto, Rafael Santiago de Guelpeli, Marcus Vinícius Carvalho Bodolay, Adriana Nascimento
dc.subject.por.fl_str_mv	Processamento de Linguagem Natural gênero textual anotador textual anotação de corpus
topic	Processamento de Linguagem Natural gênero textual anotador textual anotação de corpus
description	RESUMO: A diversidade dos recursos de linguagem, que possibilita a construção de aplicações em Processamento de Linguagem Natural, provoca a necessidade da criação de ferramentas que sejam igualmente flexíveis. Além disso, essas ferramentas devem ser tão amigáveis quanto úteis, a fim de reduzir o esforço para usuários iniciantes e, ao mesmo tempo, promover um eficiente desempenho para usuários avançados. O presente artigo apresenta o AnoTex, que é um anotador textual capaz de executar a filtragem de dados estruturados do gênero artigo científico, coletados dos arquivos disponíveis na base de dados da Biblioteca Eletrônica SciELO – Scientific Electronic Library On-line. Como produto do processo de extração, obteve-se uma base de dados com as informações filtradas e estruturadas no formato XML, que delimitam e identificam as marcações do gênero em análise, disponível para uso em várias ferramentas e aplicações. São apresentadas outras ferramentas de anotação de textos, atualmente existentes, e argumenta-se que o AnoTex é o primeiro a combinar um bom nível de facilidade de uso com recursos estruturados, constitutivos do gênero, de alta qualidade linguística. Os resultados demonstram como a categorização dos elementos constitutivos do gênero, por meio de sua representação em bancos de árvore, pode condensar as informações disponíveis de forma hierarquizada e dinâmica, construídas durante a compilação. Essas características podem indicar novas estratégias de uso para as marcações coletadas, de modo a atender às necessidades no melhoramento do acesso e da recuperação da informação proporcionados pelo uso das ferramentas de processamento de texto. PALAVRAS-CHAVE: Processamento de Linguagem Natural; gênero textual; anotador textual; anotação de corpus.   ABSTRACT: The diversity of language resources, which enables the construction of applications in Natural Language Processing, causes the need to create tools that are equally flexible. In addition, these tools should be as user-friendly as useful, in order to reduce the effort for new users and at the same time promote efficient performance for expert users. This article presents the AnoTex, which is a textual annotator capable of performing the filtering of structured data of the textual genre scientific article, collected from the available archives in the database of SciELO – Scientific Electronic Library Online. As a product of the extraction process, we have obtained a database structured in the XML format that delimit and identify the markings of the genre under analysis, available for use in various tools and applications. Other textual annotation tools are currently available, and it is argued that AnoTex is the first to combine a good level of ease-of-use with structured, basic text-based features of high linguistic quality. The results demonstrate how the categorization of the constituent elements of the genre, through its representation in tree banks, can concentrate the information available in a hierarchical and dynamic way. These features may indicate new usage strategies for the collected tags to meet the needs for improvement in the access and retrieval of information through the use of word processing tools. KEYWORDS: Natural Language Processing; textual genre; textual annotator; annotation of corpus.
publishDate	2018
dc.date.none.fl_str_mv	2018-12-26
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://periodicos.ufmg.br/index.php/textolivre/article/view/16811 10.17851/1983-3652.11.3.40-64
url	https://periodicos.ufmg.br/index.php/textolivre/article/view/16811
identifier_str_mv	10.17851/1983-3652.11.3.40-64
dc.language.iso.fl_str_mv	por
language	por
dc.relation.none.fl_str_mv	https://periodicos.ufmg.br/index.php/textolivre/article/view/16811/13572
dc.rights.driver.fl_str_mv	Copyright (c) 2018 Texto Livre: Linguagem e Tecnologia info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Copyright (c) 2018 Texto Livre: Linguagem e Tecnologia
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv	Texto Livre; Vol. 11 No. 3 (2018): Texto Livre: Linguagem e Tecnologia; 40-64 Texto Livre; Vol. 11 Núm. 3 (2018): Texto Livre: Linguagem e Tecnologia; 40-64 Texto Livre; Vol. 11 No 3 (2018): Texto Livre: Linguagem e Tecnologia; 40-64 Texto Livre; v. 11 n. 3 (2018): Texto Livre: Linguagem e Tecnologia; 40-64 1983-3652 reponame:Texto livre instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG
instname_str	Universidade Federal de Minas Gerais (UFMG)
instacron_str	UFMG
institution	UFMG
reponame_str	Texto livre
collection	Texto livre
repository.name.fl_str_mv	Texto livre - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv	revistatextolivre@letras.ufmg.br
_version_	1822183026528878592
dc.identifier.doi.none.fl_str_mv	10.17851/1983-3652.11.3.40-64

AnoTex: structured data filtering routine of the scientific article genre as contribution to PLN

Registros relacionados