Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais

Souza, Jacqueline Aparecida de

Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais

Detalhes bibliográficos
Autor(a) principal:	Souza, Jacqueline Aparecida de
Data de Publicação:	2010
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFSCAR
Texto Completo:	https://repositorio.ufscar.br/handle/ufscar/5698
Resumo:	Based on methodological postulates of the Linguistic of corpus and on the genre concepts, proposed by Swales (1990) and Biber (1995), this research intends to describe linguistic traces which are characteristic of historic texts and correlate them to their respective genres, as well as propose a typology of traces so that it is possible to automatically identify the genre. In order to execute the research, the corpus of the Portuguese of the centuries XVI, XVII and XVII of the project Historical Dictionary of the Portuguese in Brazil (program Institutes of the Millennium/CNPq UNESP/Araraquara), which is constituted by 2,459 texts and 7,5 million words has been used. In order to realize a historical description, the study has started from synchronic characteristics obtained from the table of contemporary traces elaborated by Aires (2005). As for the manipulation of the corpus, it has been used the Philologic, the Unitex as well as another tool for the extraction and quantification of traces that has been developed. For the purposes of classification, algorithms available at Weka (Waikato Environment for knowledge Analysis) such as: Naive Bayes, Bayes Net, SMO, Multilayer Perceptron e RBFNetwork, J48, NBTree have been used. The description has been made based on the 62 traces, which include statistics based on a text as a whole and on words, including classes of verbs, pronouns, adverbs as well as discourse markers, expressions and lexical units. It has been concluded that the genres share specific linguistic characteristics. However, they also present their own standards with the use of specific expressions and the frequency of lexical units. Despite the limitations and complications in using a historical corpus, the performance of the classifiers based on the raised traces was satisfactory and the rate of correct classification was 84% and 92%.

Metadados do item

id	SCAR_2dd2f2fcc8b482e6d3e441a37b5ad6d7
oai_identifier_str	oai:repositorio.ufscar.br:ufscar/5698
network_acronym_str	SCAR
network_name_str	Repositório Institucional da UFSCAR
repository_id_str	4322
spelling	Souza, Jacqueline Aparecida deAlmeida, Gladis Maria de Barcelloshttp://lattes.cnpq.br/4046789388750478http://lattes.cnpq.br/89390492227961303a003217-dab7-4fa0-993c-9e1dc625cd672016-06-02T20:25:07Z2011-01-172016-06-02T20:25:07Z2010-02-26SOUZA, Jacqueline Aparecida de. Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais. 2010. 167 f. Dissertação (Mestrado em Ciências Humanas) - Universidade Federal de São Carlos, São Carlos, 2010.https://repositorio.ufscar.br/handle/ufscar/5698Based on methodological postulates of the Linguistic of corpus and on the genre concepts, proposed by Swales (1990) and Biber (1995), this research intends to describe linguistic traces which are characteristic of historic texts and correlate them to their respective genres, as well as propose a typology of traces so that it is possible to automatically identify the genre. In order to execute the research, the corpus of the Portuguese of the centuries XVI, XVII and XVII of the project Historical Dictionary of the Portuguese in Brazil (program Institutes of the Millennium/CNPq UNESP/Araraquara), which is constituted by 2,459 texts and 7,5 million words has been used. In order to realize a historical description, the study has started from synchronic characteristics obtained from the table of contemporary traces elaborated by Aires (2005). As for the manipulation of the corpus, it has been used the Philologic, the Unitex as well as another tool for the extraction and quantification of traces that has been developed. For the purposes of classification, algorithms available at Weka (Waikato Environment for knowledge Analysis) such as: Naive Bayes, Bayes Net, SMO, Multilayer Perceptron e RBFNetwork, J48, NBTree have been used. The description has been made based on the 62 traces, which include statistics based on a text as a whole and on words, including classes of verbs, pronouns, adverbs as well as discourse markers, expressions and lexical units. It has been concluded that the genres share specific linguistic characteristics. However, they also present their own standards with the use of specific expressions and the frequency of lexical units. Despite the limitations and complications in using a historical corpus, the performance of the classifiers based on the raised traces was satisfactory and the rate of correct classification was 84% and 92%.Com base nos postulados metodológicos da Linguística de Corpus e nos conceitos de gênero, propostos por Swales (1990) e Biber (1995), esta pesquisa pretende descrever traços linguísticos característicos de textos históricos, correlacionando-os a seus respectivos gêneros, e propor uma tipologia de traços de forma que seja possível identificar o gênero de cada texto automaticamente. Para execução da pesquisa foi utilizado o corpus do português dos séculos XVI, XVII e XVIII do projeto Dicionário Histórico do Português do Brasil (programa Institutos do Milênio/CNPq UNESP/Araraquara), constituído por 2.459 textos e 7.5 milhões de palavras. Para realizar uma descrição histórica, partiu-se de características sincrônicas obtidas a partir da tabela de traços contemporâneos elaborada por Aires (2005). No que tange à manipulação do corpus, utilizou-se o Philologic, o Unitex e desenvolveu-se uma ferramenta para extração e quantificação dos traços. Para fins de classificação, foram utilizados os algoritmos disponibilizados no Weka (Waikato Environment for Knowledge Analysis), tais como: Naive Bayes, Bayes Net, SMO, Multilayer Perceptron e RBFNetwork, J48, NBTree. A descrição foi realizada com base em 62 traços, os quais abarcam estatísticas baseadas no texto como um todo e em palavras, incluindo as classes de verbos, pronomes, advérbios, como também marcadores discursivos, expressões e unidades lexicais. Concluiu-se que os gêneros compartilham características linguísticas específicas, porém, também apresentam seus padrões próprios, como o uso de determinadas expressões e a frequência de unidades lexicais. Apesar das limitações e complicações em utilizar um corpus histórico, o desempenho dos classificadores com base nos traços levantados foi satisfatório, com a taxa de acerto 84% e 92% de classificação correta.Universidade Federal de Minas Geraisapplication/pdfporUniversidade Federal de São CarlosPrograma de Pós-Graduação em Linguística - PPGLUFSCarBRLinguísticaLinguística de corpusAprendizado de computadorCorpus históricoTraços lingüísticosGêneros textuaisClassificação automáticaCorpus linguisticsFeaturesTextual genreAutomatic classificationLINGUISTICA, LETRAS E ARTES::LINGUISTICATipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuaisinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis-1-18eac6ac4-a936-48dd-b9d7-997dc0548cbcinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINAL3377.pdfapplication/pdf3546850https://repositorio.ufscar.br/bitstream/ufscar/5698/1/3377.pdfd15885076635f742d9e61ee253c4d220MD51THUMBNAIL3377.pdf.jpg3377.pdf.jpgIM Thumbnailimage/jpeg10641https://repositorio.ufscar.br/bitstream/ufscar/5698/2/3377.pdf.jpgccf23b4b219d4019d39a7d2d30cf1de1MD52ufscar/56982023-09-18 18:31:08.279oai:repositorio.ufscar.br:ufscar/5698Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:08Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv	Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais
title	Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais
spellingShingle	Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais Souza, Jacqueline Aparecida de Linguística Linguística de corpus Aprendizado de computador Corpus histórico Traços lingüísticos Gêneros textuais Classificação automática Corpus linguistics Features Textual genre Automatic classification LINGUISTICA, LETRAS E ARTES::LINGUISTICA
title_short	Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais
title_full	Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais
title_fullStr	Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais
title_full_unstemmed	Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais
title_sort	Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais
author	Souza, Jacqueline Aparecida de
author_facet	Souza, Jacqueline Aparecida de
author_role	author
dc.contributor.authorlattes.por.fl_str_mv	http://lattes.cnpq.br/8939049222796130
dc.contributor.author.fl_str_mv	Souza, Jacqueline Aparecida de
dc.contributor.advisor1.fl_str_mv	Almeida, Gladis Maria de Barcellos
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/4046789388750478
dc.contributor.authorID.fl_str_mv	3a003217-dab7-4fa0-993c-9e1dc625cd67
contributor_str_mv	Almeida, Gladis Maria de Barcellos
dc.subject.por.fl_str_mv	Linguística Linguística de corpus Aprendizado de computador Corpus histórico Traços lingüísticos Gêneros textuais Classificação automática
topic	Linguística Linguística de corpus Aprendizado de computador Corpus histórico Traços lingüísticos Gêneros textuais Classificação automática Corpus linguistics Features Textual genre Automatic classification LINGUISTICA, LETRAS E ARTES::LINGUISTICA
dc.subject.eng.fl_str_mv	Corpus linguistics Features Textual genre Automatic classification
dc.subject.cnpq.fl_str_mv	LINGUISTICA, LETRAS E ARTES::LINGUISTICA
description	Based on methodological postulates of the Linguistic of corpus and on the genre concepts, proposed by Swales (1990) and Biber (1995), this research intends to describe linguistic traces which are characteristic of historic texts and correlate them to their respective genres, as well as propose a typology of traces so that it is possible to automatically identify the genre. In order to execute the research, the corpus of the Portuguese of the centuries XVI, XVII and XVII of the project Historical Dictionary of the Portuguese in Brazil (program Institutes of the Millennium/CNPq UNESP/Araraquara), which is constituted by 2,459 texts and 7,5 million words has been used. In order to realize a historical description, the study has started from synchronic characteristics obtained from the table of contemporary traces elaborated by Aires (2005). As for the manipulation of the corpus, it has been used the Philologic, the Unitex as well as another tool for the extraction and quantification of traces that has been developed. For the purposes of classification, algorithms available at Weka (Waikato Environment for knowledge Analysis) such as: Naive Bayes, Bayes Net, SMO, Multilayer Perceptron e RBFNetwork, J48, NBTree have been used. The description has been made based on the 62 traces, which include statistics based on a text as a whole and on words, including classes of verbs, pronouns, adverbs as well as discourse markers, expressions and lexical units. It has been concluded that the genres share specific linguistic characteristics. However, they also present their own standards with the use of specific expressions and the frequency of lexical units. Despite the limitations and complications in using a historical corpus, the performance of the classifiers based on the raised traces was satisfactory and the rate of correct classification was 84% and 92%.
publishDate	2010
dc.date.issued.fl_str_mv	2010-02-26
dc.date.available.fl_str_mv	2011-01-17 2016-06-02T20:25:07Z
dc.date.accessioned.fl_str_mv	2016-06-02T20:25:07Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	SOUZA, Jacqueline Aparecida de. Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais. 2010. 167 f. Dissertação (Mestrado em Ciências Humanas) - Universidade Federal de São Carlos, São Carlos, 2010.
dc.identifier.uri.fl_str_mv	https://repositorio.ufscar.br/handle/ufscar/5698
identifier_str_mv	SOUZA, Jacqueline Aparecida de. Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais. 2010. 167 f. Dissertação (Mestrado em Ciências Humanas) - Universidade Federal de São Carlos, São Carlos, 2010.
url	https://repositorio.ufscar.br/handle/ufscar/5698
dc.language.iso.fl_str_mv	por
language	por
dc.relation.confidence.fl_str_mv	-1 -1
dc.relation.authority.fl_str_mv	8eac6ac4-a936-48dd-b9d7-997dc0548cbc
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de São Carlos
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Linguística - PPGL
dc.publisher.initials.fl_str_mv	UFSCar
dc.publisher.country.fl_str_mv	BR
publisher.none.fl_str_mv	Universidade Federal de São Carlos
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR
instname_str	Universidade Federal de São Carlos (UFSCAR)
instacron_str	UFSCAR
institution	UFSCAR
reponame_str	Repositório Institucional da UFSCAR
collection	Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv	https://repositorio.ufscar.br/bitstream/ufscar/5698/1/3377.pdf https://repositorio.ufscar.br/bitstream/ufscar/5698/2/3377.pdf.jpg
bitstream.checksum.fl_str_mv	d15885076635f742d9e61ee253c4d220 ccf23b4b219d4019d39a7d2d30cf1de1
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_	1813715545756794880

Tipologia de traços linguísticos de textos do português do Brasil dos séculos XVI, XVII, XVIII e XIX: uma proposta para a classificação automática de gêneros textuais

Registros relacionados