Vector representation of texts applied to prediction models

Detalhes bibliográficos
Autor(a) principal: Stern, Deborah Bassi
Data de Publicação: 2020
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UFSCAR
Texto Completo: https://repositorio.ufscar.br/handle/ufscar/12362
Resumo: Natural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are state-of-art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets.
id SCAR_31e9e6ebc15d6795e4df203dd2417ce2
oai_identifier_str oai:repositorio.ufscar.br:ufscar/12362
network_acronym_str SCAR
network_name_str Repositório Institucional da UFSCAR
repository_id_str 4322
spelling Stern, Deborah BassiIzbicki, Rafaelhttp://lattes.cnpq.br/9991192137633896http://lattes.cnpq.br/05809715896150883fafee68-46cd-4b16-addf-945cc55088d82020-03-27T16:25:42Z2020-03-27T16:25:42Z2020-03-09STERN, Deborah Bassi. Vector representation of texts applied to prediction models. 2020. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/ufscar/12362.https://repositorio.ufscar.br/handle/ufscar/12362Natural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are state-of-art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets.Processamento de linguagem natural sofreu uma grande mudança com o tempo. Abordagens estatísticas passaram a ganhar atenção apenas recentemente. O modelo word2vec é uma destas. Ele é uma rede neural rasa desenhada para ajustar representações vetoriais de palavras segundo seus valores semânticos e sintáticos. As representações de palavras obtidas por este método são o estado da arte. Este método tem muitas aplicações, como permitir o ajuste de modelos preditivos baseadas em textos. Na literatura é comum um texto ser representado pela média das representações vetorias das palavras que o compõem. O vetor resultante é então incluído como variável explicativa no modelo. Nesta dissertação propomos a obtenção de mais informação sobre o texto através de outras estatísticas descritivas além da média, como outros momentos e quantis. A melhora dos modelos preditivos é estudada com dados reais.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)CAPES: Código de Financiamento 001engUniversidade Federal de São CarlosCâmpus São CarlosPrograma Interinstitucional de Pós-Graduação em Estatística - PIPGEsUFSCarAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessProcessamento de linguagem naturalRedes neuraisRepresentação vetorial de palavrasModelos de prediçãoNatural language processingNeural networksWordVectorsPrediction modelsCIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICAVector representation of texts applied to prediction modelsRepresentações vetoriais de textos aplicados a modelos preditivosinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis6006003e57f161-19fe-4345-9e87-bc60eb7be98freponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALdissertacao.pdfdissertacao.pdfapplication/pdf1058304https://repositorio.ufscar.br/bitstream/ufscar/12362/1/dissertacao.pdf9cf919af9fe04ae0ab3391925268f534MD51pipges-ufscar_cartacomprovante_.pdfpipges-ufscar_cartacomprovante_.pdfapplication/pdf495513https://repositorio.ufscar.br/bitstream/ufscar/12362/2/pipges-ufscar_cartacomprovante_.pdff1a5e158a646cc4da736d267e95271c4MD52CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufscar.br/bitstream/ufscar/12362/3/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD53TEXTdissertacao.pdf.txtdissertacao.pdf.txtExtracted texttext/plain35349https://repositorio.ufscar.br/bitstream/ufscar/12362/4/dissertacao.pdf.txt3ad80f646b83608578ebe3ad8c440946MD54pipges-ufscar_cartacomprovante_.pdf.txtpipges-ufscar_cartacomprovante_.pdf.txtExtracted texttext/plain1235https://repositorio.ufscar.br/bitstream/ufscar/12362/6/pipges-ufscar_cartacomprovante_.pdf.txt55d83805226deb1ac3e0ecf9322b273aMD56THUMBNAILdissertacao.pdf.jpgdissertacao.pdf.jpgIM Thumbnailimage/jpeg15151https://repositorio.ufscar.br/bitstream/ufscar/12362/5/dissertacao.pdf.jpg2b1cc65704a4cf6601cd8442801da45fMD55pipges-ufscar_cartacomprovante_.pdf.jpgpipges-ufscar_cartacomprovante_.pdf.jpgIM Thumbnailimage/jpeg9136https://repositorio.ufscar.br/bitstream/ufscar/12362/7/pipges-ufscar_cartacomprovante_.pdf.jpgd53fe2474cb075c4525d913a606a5399MD57ufscar/123622023-09-18 18:31:52.76oai:repositorio.ufscar.br:ufscar/12362Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:52Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.eng.fl_str_mv Vector representation of texts applied to prediction models
dc.title.alternative.por.fl_str_mv Representações vetoriais de textos aplicados a modelos preditivos
title Vector representation of texts applied to prediction models
spellingShingle Vector representation of texts applied to prediction models
Stern, Deborah Bassi
Processamento de linguagem natural
Redes neurais
Representação vetorial de palavras
Modelos de predição
Natural language processing
Neural networks
WordVectors
Prediction models
CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA
title_short Vector representation of texts applied to prediction models
title_full Vector representation of texts applied to prediction models
title_fullStr Vector representation of texts applied to prediction models
title_full_unstemmed Vector representation of texts applied to prediction models
title_sort Vector representation of texts applied to prediction models
author Stern, Deborah Bassi
author_facet Stern, Deborah Bassi
author_role author
dc.contributor.authorlattes.por.fl_str_mv http://lattes.cnpq.br/0580971589615088
dc.contributor.author.fl_str_mv Stern, Deborah Bassi
dc.contributor.advisor1.fl_str_mv Izbicki, Rafael
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/9991192137633896
dc.contributor.authorID.fl_str_mv 3fafee68-46cd-4b16-addf-945cc55088d8
contributor_str_mv Izbicki, Rafael
dc.subject.por.fl_str_mv Processamento de linguagem natural
Redes neurais
Representação vetorial de palavras
Modelos de predição
topic Processamento de linguagem natural
Redes neurais
Representação vetorial de palavras
Modelos de predição
Natural language processing
Neural networks
WordVectors
Prediction models
CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA
dc.subject.eng.fl_str_mv Natural language processing
Neural networks
WordVectors
Prediction models
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA
description Natural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are state-of-art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets.
publishDate 2020
dc.date.accessioned.fl_str_mv 2020-03-27T16:25:42Z
dc.date.available.fl_str_mv 2020-03-27T16:25:42Z
dc.date.issued.fl_str_mv 2020-03-09
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv STERN, Deborah Bassi. Vector representation of texts applied to prediction models. 2020. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/ufscar/12362.
dc.identifier.uri.fl_str_mv https://repositorio.ufscar.br/handle/ufscar/12362
identifier_str_mv STERN, Deborah Bassi. Vector representation of texts applied to prediction models. 2020. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/ufscar/12362.
url https://repositorio.ufscar.br/handle/ufscar/12362
dc.language.iso.fl_str_mv eng
language eng
dc.relation.confidence.fl_str_mv 600
600
dc.relation.authority.fl_str_mv 3e57f161-19fe-4345-9e87-bc60eb7be98f
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.publisher.program.fl_str_mv Programa Interinstitucional de Pós-Graduação em Estatística - PIPGEs
dc.publisher.initials.fl_str_mv UFSCar
publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFSCAR
instname:Universidade Federal de São Carlos (UFSCAR)
instacron:UFSCAR
instname_str Universidade Federal de São Carlos (UFSCAR)
instacron_str UFSCAR
institution UFSCAR
reponame_str Repositório Institucional da UFSCAR
collection Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv https://repositorio.ufscar.br/bitstream/ufscar/12362/1/dissertacao.pdf
https://repositorio.ufscar.br/bitstream/ufscar/12362/2/pipges-ufscar_cartacomprovante_.pdf
https://repositorio.ufscar.br/bitstream/ufscar/12362/3/license_rdf
https://repositorio.ufscar.br/bitstream/ufscar/12362/4/dissertacao.pdf.txt
https://repositorio.ufscar.br/bitstream/ufscar/12362/6/pipges-ufscar_cartacomprovante_.pdf.txt
https://repositorio.ufscar.br/bitstream/ufscar/12362/5/dissertacao.pdf.jpg
https://repositorio.ufscar.br/bitstream/ufscar/12362/7/pipges-ufscar_cartacomprovante_.pdf.jpg
bitstream.checksum.fl_str_mv 9cf919af9fe04ae0ab3391925268f534
f1a5e158a646cc4da736d267e95271c4
e39d27027a6cc9cb039ad269a5db8e34
3ad80f646b83608578ebe3ad8c440946
55d83805226deb1ac3e0ecf9322b273a
2b1cc65704a4cf6601cd8442801da45f
d53fe2474cb075c4525d913a606a5399
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_ 1813715614305353728