Uma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corpora

Santos, Carlos Alberto dos

Uma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corpora

Detalhes bibliográficos
Autor(a) principal:	Santos, Carlos Alberto dos
Data de Publicação:	2018
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Biblioteca Digital de Teses e Dissertações da PUC_RS
Texto Completo:	http://tede2.pucrs.br/tede2/handle/tede/8233
Resumo:	It is known that linguistic processing of corpora demands high computational effort because of the complexity of its algorithms, but despite this, the results reached are better than that generated by the statistical processing, where the computational demand is lower. This dissertation describes a comparative analysis between the process linguistic and statistical of term extraction. Experiments were carried out through four corpora in English idiom, built from scientific papers, on which terms extractions were carried out using the approaches. The resulting terms lists were refined with use of relevance metrics and stop list, and then compared with the reference lists of the corpora across the recall technical. These lists, in its turn, were built from the context these corpora, whith help of Internet searches. The results shown that the statistical extraction combined with the stop list and relevance metrics can produce superior results to linguistic process extraction using the same metrics. It’s concluded that statistical approach composed by these metrics can be ideal option to relevance terms extraction, by requiring few computational resources and by to show superior results that found in the linguistic processing.

Metadados do item

id	P_RS_2b38de42a0ca15b7235729b0a9de9d55
oai_identifier_str	oai:tede2.pucrs.br:tede/8233
network_acronym_str	P_RS
network_name_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
repository_id_str
spelling	Vieira, Renatahttp://lattes.cnpq.br/6218967777630412Santos, Carlos Alberto dos2018-08-01T14:31:21Z2018-04-27http://tede2.pucrs.br/tede2/handle/tede/8233It is known that linguistic processing of corpora demands high computational effort because of the complexity of its algorithms, but despite this, the results reached are better than that generated by the statistical processing, where the computational demand is lower. This dissertation describes a comparative analysis between the process linguistic and statistical of term extraction. Experiments were carried out through four corpora in English idiom, built from scientific papers, on which terms extractions were carried out using the approaches. The resulting terms lists were refined with use of relevance metrics and stop list, and then compared with the reference lists of the corpora across the recall technical. These lists, in its turn, were built from the context these corpora, whith help of Internet searches. The results shown that the statistical extraction combined with the stop list and relevance metrics can produce superior results to linguistic process extraction using the same metrics. It’s concluded that statistical approach composed by these metrics can be ideal option to relevance terms extraction, by requiring few computational resources and by to show superior results that found in the linguistic processing.Sabe-se que o processamento linguístico de corpora demanda grande esforço computacional devido à complexidade dos seus algoritmos, mas que, apesar disso, os resultados alcançados são melhores que aqueles gerados pelo processamento estatístico, onde a demanda computacional é menor. Esta dissertação descreve uma análise comparativa entre os processos linguístico e estatístico de extração de termos. Foram realizados experimentos através de quatro corpora em língua inglesa, construídos a partir de artigos científicos, sobre os quais foram executadas extrações de termos utilizando essas abordagens. As listas de termos resultantes foram refinadas com o uso de métricas de relevância e stop list, e em seguida comparadas com as listas de referência dos corpora através da técnica do recall. Essas listas, por sua vez, foram construídas a partir do contexto desses corpora e com ajuda de pesquisas na Internet. Os resultados mostraram que a extração estatística combinada com as técnicas da stop list e as métricas de relevância pode produzir resultados superiores ao processo de extração linguístico refinado pelas mesmas métricas. Concluiu se que a abordagem estatística composta por essas técnicas pode ser a opção ideal para extração de termos relevantes, por exigir poucos recursos computacionais e por apresentar resultados superiores àqueles encontrados no processamento linguístico.Submitted by PPG Ciência da Computação (ppgcc@pucrs.br) on 2018-07-26T19:48:07Z No. of bitstreams: 1 CARLOS ALBERTO DOS SANTOS_DIS.pdf: 1271475 bytes, checksum: 856ae87ad633d3c772b413816caa43d1 (MD5)Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-08-01T13:39:36Z (GMT) No. of bitstreams: 1 CARLOS ALBERTO DOS SANTOS_DIS.pdf: 1271475 bytes, checksum: 856ae87ad633d3c772b413816caa43d1 (MD5)Made available in DSpace on 2018-08-01T14:31:21Z (GMT). No. of bitstreams: 1 CARLOS ALBERTO DOS SANTOS_DIS.pdf: 1271475 bytes, checksum: 856ae87ad633d3c772b413816caa43d1 (MD5) Previous issue date: 2018-04-27application/pdfhttp://tede2.pucrs.br:80/tede2/retrieve/172937/CARLOS%20ALBERTO%20DOS%20SANTOS_DIS.pdf.jpgporPontifícia Universidade Católica do Rio Grande do SulPrograma de Pós-Graduação em Ciência da ComputaçãoPUCRSBrasilEscola PolitécnicaEextração de TermosMineração de TextoLista de ReferênciaMétricas EstatísticasExtração LinguísticaExtração EstatísticaStop ListTerm ExtractionText MiningReference ListStop ListStatistical MetricsLinguistic ExtractionStatistical ExtractionCIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAOUma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corporainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisTrabalho não apresenta restrição para publicação1974996533081274470500500-862078257083325301info:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da PUC_RSinstname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)instacron:PUC_RSTHUMBNAILCARLOS ALBERTO DOS SANTOS_DIS.pdf.jpgCARLOS ALBERTO DOS SANTOS_DIS.pdf.jpgimage/jpeg5899http://tede2.pucrs.br/tede2/bitstream/tede/8233/4/CARLOS+ALBERTO+DOS+SANTOS_DIS.pdf.jpg103da34f8ec836cec3f1b9ef796f9b6aMD54TEXTCARLOS ALBERTO DOS SANTOS_DIS.pdf.txtCARLOS ALBERTO DOS SANTOS_DIS.pdf.txttext/plain185898http://tede2.pucrs.br/tede2/bitstream/tede/8233/3/CARLOS+ALBERTO+DOS+SANTOS_DIS.pdf.txt90a2757919191ba894d9112d019e106dMD53ORIGINALCARLOS ALBERTO DOS SANTOS_DIS.pdfCARLOS ALBERTO DOS SANTOS_DIS.pdfapplication/pdf1271475http://tede2.pucrs.br/tede2/bitstream/tede/8233/2/CARLOS+ALBERTO+DOS+SANTOS_DIS.pdf856ae87ad633d3c772b413816caa43d1MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-8610http://tede2.pucrs.br/tede2/bitstream/tede/8233/1/license.txt5a9d6006225b368ef605ba16b4f6d1beMD51tede/82332018-08-01 20:00:47.644oai:tede2.pucrs.br:tede/8233QXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2HDp8OjbyBFbGV0csO0bmljYTogQ29tIGJhc2Ugbm8gZGlzcG9zdG8gbmEgTGVpIEZlZGVyYWwgbsK6OS42MTAsIGRlIDE5IGRlIGZldmVyZWlybyBkZSAxOTk4LCBvIGF1dG9yIEFVVE9SSVpBIGEgcHVibGljYcOnw6NvIGVsZXRyw7RuaWNhIGRhIHByZXNlbnRlIG9icmEgbm8gYWNlcnZvIGRhIEJpYmxpb3RlY2EgRGlnaXRhbCBkYSBQb250aWbDrWNpYSBVbml2ZXJzaWRhZGUgQ2F0w7NsaWNhIGRvIFJpbyBHcmFuZGUgZG8gU3VsLCBzZWRpYWRhIGEgQXYuIElwaXJhbmdhIDY2ODEsIFBvcnRvIEFsZWdyZSwgUmlvIEdyYW5kZSBkbyBTdWwsIGNvbSByZWdpc3RybyBkZSBDTlBKIDg4NjMwNDEzMDAwMi04MSBiZW0gY29tbyBlbSBvdXRyYXMgYmlibGlvdGVjYXMgZGlnaXRhaXMsIG5hY2lvbmFpcyBlIGludGVybmFjaW9uYWlzLCBjb25zw7NyY2lvcyBlIHJlZGVzIMOgcyBxdWFpcyBhIGJpYmxpb3RlY2EgZGEgUFVDUlMgcG9zc2EgYSB2aXIgcGFydGljaXBhciwgc2VtIMO0bnVzIGFsdXNpdm8gYW9zIGRpcmVpdG9zIGF1dG9yYWlzLCBhIHTDrXR1bG8gZGUgZGl2dWxnYcOnw6NvIGRhIHByb2R1w6fDo28gY2llbnTDrWZpY2EuCg==Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.pucrs.br/tede2/PRIhttps://tede2.pucrs.br/oai/requestbiblioteca.central@pucrs.br\|\|opendoar:2018-08-01T23:00:47Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)false
dc.title.por.fl_str_mv	Uma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corpora
title	Uma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corpora
spellingShingle	Uma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corpora Santos, Carlos Alberto dos Eextração de Termos Mineração de Texto Lista de Referência Métricas Estatísticas Extração Linguística Extração Estatística Stop List Term Extraction Text Mining Reference List Stop List Statistical Metrics Linguistic Extraction Statistical Extraction CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
title_short	Uma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corpora
title_full	Uma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corpora
title_fullStr	Uma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corpora
title_full_unstemmed	Uma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corpora
title_sort	Uma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corpora
author	Santos, Carlos Alberto dos
author_facet	Santos, Carlos Alberto dos
author_role	author
dc.contributor.advisor1.fl_str_mv	Vieira, Renata
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/6218967777630412
dc.contributor.author.fl_str_mv	Santos, Carlos Alberto dos
contributor_str_mv	Vieira, Renata
dc.subject.por.fl_str_mv	Eextração de Termos Mineração de Texto Lista de Referência Métricas Estatísticas Extração Linguística Extração Estatística
topic	Eextração de Termos Mineração de Texto Lista de Referência Métricas Estatísticas Extração Linguística Extração Estatística Stop List Term Extraction Text Mining Reference List Stop List Statistical Metrics Linguistic Extraction Statistical Extraction CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
dc.subject.eng.fl_str_mv	Stop List Term Extraction Text Mining Reference List Stop List Statistical Metrics Linguistic Extraction Statistical Extraction
dc.subject.cnpq.fl_str_mv	CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
description	It is known that linguistic processing of corpora demands high computational effort because of the complexity of its algorithms, but despite this, the results reached are better than that generated by the statistical processing, where the computational demand is lower. This dissertation describes a comparative analysis between the process linguistic and statistical of term extraction. Experiments were carried out through four corpora in English idiom, built from scientific papers, on which terms extractions were carried out using the approaches. The resulting terms lists were refined with use of relevance metrics and stop list, and then compared with the reference lists of the corpora across the recall technical. These lists, in its turn, were built from the context these corpora, whith help of Internet searches. The results shown that the statistical extraction combined with the stop list and relevance metrics can produce superior results to linguistic process extraction using the same metrics. It’s concluded that statistical approach composed by these metrics can be ideal option to relevance terms extraction, by requiring few computational resources and by to show superior results that found in the linguistic processing.
publishDate	2018
dc.date.accessioned.fl_str_mv	2018-08-01T14:31:21Z
dc.date.issued.fl_str_mv	2018-04-27
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://tede2.pucrs.br/tede2/handle/tede/8233
url	http://tede2.pucrs.br/tede2/handle/tede/8233
dc.language.iso.fl_str_mv	por
language	por
dc.relation.program.fl_str_mv	1974996533081274470
dc.relation.confidence.fl_str_mv	500 500
dc.relation.cnpq.fl_str_mv	-862078257083325301
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Pontifícia Universidade Católica do Rio Grande do Sul
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv	PUCRS
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	Escola Politécnica
publisher.none.fl_str_mv	Pontifícia Universidade Católica do Rio Grande do Sul
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da PUC_RS instname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS) instacron:PUC_RS
instname_str	Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron_str	PUC_RS
institution	PUC_RS
reponame_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
collection	Biblioteca Digital de Teses e Dissertações da PUC_RS
bitstream.url.fl_str_mv	http://tede2.pucrs.br/tede2/bitstream/tede/8233/4/CARLOS+ALBERTO+DOS+SANTOS_DIS.pdf.jpg http://tede2.pucrs.br/tede2/bitstream/tede/8233/3/CARLOS+ALBERTO+DOS+SANTOS_DIS.pdf.txt http://tede2.pucrs.br/tede2/bitstream/tede/8233/2/CARLOS+ALBERTO+DOS+SANTOS_DIS.pdf http://tede2.pucrs.br/tede2/bitstream/tede/8233/1/license.txt
bitstream.checksum.fl_str_mv	103da34f8ec836cec3f1b9ef796f9b6a 90a2757919191ba894d9112d019e106d 856ae87ad633d3c772b413816caa43d1 5a9d6006225b368ef605ba16b4f6d1be
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
repository.mail.fl_str_mv	biblioteca.central@pucrs.br\|\|
_version_	1799765334964568064

Uma análise comparativa entre as abordagens linguística e estatística para extração automática de termos relevantes de corpora

Registros relacionados