Vector representation of texts applied to prediction models

Stern, Deborah Bassi

Vector representation of texts applied to prediction models

Detalhes bibliográficos
Autor(a) principal:	Stern, Deborah Bassi
Data de Publicação:	2020
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da USP
Texto Completo:	https://www.teses.usp.br/teses/disponiveis/104/104131/tde-10062020-102333/
Resumo:	Natural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are stateof- art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets.

Metadados do item

id	USP_c550f9efa5cfb5fa4884a784ab4ea64a
oai_identifier_str	oai:teses.usp.br:tde-10062020-102333
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	Vector representation of texts applied to prediction modelsRepresentações vetoriais de textos aplicados a modelos preditivosModelos de prediçãoNatural language processingNeural networksPrediction modelsProcessamento de linguagem naturalRedes neuraisRepresentação vetorial de palavrasWordVectorsNatural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are stateof- art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets.Processamento de linguagem natural sofreu uma grande mudança com o tempo. Abordagens estatísticas passaram a ganhar atenção apenas recentemente. O modelo word2vec é uma destas. Ele é uma rede neural rasa desenhada para ajustar representações vetoriais de palavras segundo seus valores semânticos e sintáticos. As representações de palavras obtidas por este método são o estado da arte. Este método tem muitas aplicações, como permitir o ajuste de modelos preditivos baseadas em textos. Na literatura é comum um texto ser representado pela média das representações vetorias das palavras que o compõem. O vetor resultante é então incluído como variável explicativa no modelo. Nesta dissertação propomos a obtenção de mais informação sobre o texto através de outras estatísticas descritivas além da média, como outros momentos e quantis. A melhora dos modelos preditivos é estudada com dados reais.Biblioteca Digitais de Teses e Dissertações da USPIzbicki, RafaelStern, Deborah Bassi2020-03-09info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/104/104131/tde-10062020-102333/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2020-06-10T16:31:03Zoai:teses.usp.br:tde-10062020-102333Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212020-06-10T16:31:03Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Vector representation of texts applied to prediction models Representações vetoriais de textos aplicados a modelos preditivos
title	Vector representation of texts applied to prediction models
spellingShingle	Vector representation of texts applied to prediction models Stern, Deborah Bassi Modelos de predição Natural language processing Neural networks Prediction models Processamento de linguagem natural Redes neurais Representação vetorial de palavras WordVectors
title_short	Vector representation of texts applied to prediction models
title_full	Vector representation of texts applied to prediction models
title_fullStr	Vector representation of texts applied to prediction models
title_full_unstemmed	Vector representation of texts applied to prediction models
title_sort	Vector representation of texts applied to prediction models
author	Stern, Deborah Bassi
author_facet	Stern, Deborah Bassi
author_role	author
dc.contributor.none.fl_str_mv	Izbicki, Rafael
dc.contributor.author.fl_str_mv	Stern, Deborah Bassi
dc.subject.por.fl_str_mv	Modelos de predição Natural language processing Neural networks Prediction models Processamento de linguagem natural Redes neurais Representação vetorial de palavras WordVectors
topic	Modelos de predição Natural language processing Neural networks Prediction models Processamento de linguagem natural Redes neurais Representação vetorial de palavras WordVectors
description	Natural Language Processing has gone through substantial changes over time. It was only recently that statistical approaches started receiving attention. The Word2Vec model is one of these. It is a shallow neural network designed to fit vectorial representations of words according to their syntactic and semantic values. The word embeddings acquired by this method are stateof- art. This method has many uses, one of which is the fitting of prediction models based on texts. It is common in the literature for a text to be represented as the mean of its word embeddings. The resulting vector is then used in the predictive model as an explanatory variables. In this dissertation, we propose getting more information of text by adding other summary statistics besides the mean, such as other moments and quantiles. The improvement of the prediction models is studied in real datasets.
publishDate	2020
dc.date.none.fl_str_mv	2020-03-09
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.teses.usp.br/teses/disponiveis/104/104131/tde-10062020-102333/
url	https://www.teses.usp.br/teses/disponiveis/104/104131/tde-10062020-102333/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1826319343478112256

Vector representation of texts applied to prediction models

Registros relacionados