Using structured and unstructured data for product price prediction

Detalhes bibliográficos
Autor(a) principal: CARVALHO, Giovanni Paolo Santos de
Data de Publicação: 2020
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UFPE
Texto Completo: https://repositorio.ufpe.br/handle/123456789/39488
Resumo: Product price estimation is a relatively new trend in e-commerce that helps customers in their decision making process of buying or selling a product, giving a starting point of what could be a fair price. In this work, we are particularly interested in performing price prediction from online product offers. These offers usually present some text describing the product in natural language (unstructured data) and the specification of the product composed of its properties (structured data). In this dissertation, we aim to predict the price of product offers based on both structured and unstructured information. For that, we propose an attention-based network that deals with structured data individually, and also the interaction between this data and unstructured data, combining them to perform the prediction. For the structured information, we apply a regular fully-connected network; and to model the interaction between them (product’s properties and its description), we employ a co-attention network. Those networks are combined and used by a neural network regressor to learn a vector representation of the product offer. This vector can then be used as a feature set by any regressor to perform product price prediction. This architecture is designed to operate with general structured and unstructured types of product offers, and in this particular study, it is evaluated on a car price prediction task, for which we collected a dataset by scraping 11 sources of car classifieds. Our experimental evaluation shows that: (1) regressors using the learned embedding obtained the best results, improving their performance in almost all scenarios in comparison to raw features; and (2) simple linear regressor models such as Linear Regression using the learned embedding achieved comparable results to more competitive algorithms such as LightGBM.
id UFPE_f3e099c9bc8f1be32fff340361731927
oai_identifier_str oai:repositorio.ufpe.br:123456789/39488
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str 2221
spelling CARVALHO, Giovanni Paolo Santos dehttp://lattes.cnpq.br/0657811885828755http://lattes.cnpq.br/7113249247656195http://lattes.cnpq.br/3084134533707587BARBOSA, Luciano de AndradeREN, Tsang Ing2021-03-26T15:39:52Z2021-03-26T15:39:52Z2020-01-17CARVALHO, Giovanni Paolo Santos de. Using structured and unstructured data for product price prediction. 2020. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2020.https://repositorio.ufpe.br/handle/123456789/39488Product price estimation is a relatively new trend in e-commerce that helps customers in their decision making process of buying or selling a product, giving a starting point of what could be a fair price. In this work, we are particularly interested in performing price prediction from online product offers. These offers usually present some text describing the product in natural language (unstructured data) and the specification of the product composed of its properties (structured data). In this dissertation, we aim to predict the price of product offers based on both structured and unstructured information. For that, we propose an attention-based network that deals with structured data individually, and also the interaction between this data and unstructured data, combining them to perform the prediction. For the structured information, we apply a regular fully-connected network; and to model the interaction between them (product’s properties and its description), we employ a co-attention network. Those networks are combined and used by a neural network regressor to learn a vector representation of the product offer. This vector can then be used as a feature set by any regressor to perform product price prediction. This architecture is designed to operate with general structured and unstructured types of product offers, and in this particular study, it is evaluated on a car price prediction task, for which we collected a dataset by scraping 11 sources of car classifieds. Our experimental evaluation shows that: (1) regressors using the learned embedding obtained the best results, improving their performance in almost all scenarios in comparison to raw features; and (2) simple linear regressor models such as Linear Regression using the learned embedding achieved comparable results to more competitive algorithms such as LightGBM.Predição automática do preço de produtos é uma tendência relativamente recente que ajuda indivíduos no seu processo de decisão a respeito de realizar uma compra ou uma venda de um produto, fornecendo um ponto de referência de qual seria um preço justo. Neste trabalho, estamos particularmente interessados em realizar a predição a partir de anúncios de produtos disponíveis na web. Esses anúncios frequentemente são acompanhados de algum texto de screvendo o produto em linguagem natural (dados não-estruturados) e de especificações do produto contendo as suas propriedades (dados estruturados). Nesta dissertação, visamos predi zer o preço de um produto anunciado a partir de ambas modalidades de dados disponíveis. Para este fim, propomos uma rede baseada em atenção que lida com dados estruturados e também a interação entre esses e dados não-estruturados, combinando-os para realizar a predição do preço. Para os dados estruturados, utilizamos uma rede Multilayer Perceptron simples; e para modelar a interação entre ambos (descrição do produto e suas especificações), nós utilizamos uma rede com um mecanismo de co-attention. Essas redes combinadas são utilizadas em um regressor baseado em Redes Neurais para aprender representações vetoriais (embeddings) do produto anunciado. Este embedding pode ser utilizado como conjunto de características por qualquer regressor para realizar a estimação do preço. Esta arquitetura é projetada para operar com dados genéricos estruturados e não-estruturados de anúncios de produtos e, neste estudo em particular, ela é avaliada na tarefa de predição do preço de anúncios de automóveis na web, para a qual realizamos a coleta a partir de 11 sites de anúncios classificados. Nossos resul tados experimentais mostram que: (1) regressores utilizando os embeddings aprendidos pela rede proposta obtiveram os melhores resultados, melhorando sua performance em quase todos os cenários em comparação com o conjunto original de dados; e (2) modelos de regressão mais simples como Linear Regression utilizando as características aprendidas alcançam resultados comparáveis a outros algoritmos mais competitivos como LightGBM.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessInteligência computacionalOtimizaçãoUsing structured and unstructured data for product price predictioninfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPETEXTDISSERTAÇÃO Giovanni Paolo Santos de Carvalho.pdf.txtDISSERTAÇÃO Giovanni Paolo Santos de Carvalho.pdf.txtExtracted texttext/plain121171https://repositorio.ufpe.br/bitstream/123456789/39488/4/DISSERTA%c3%87%c3%83O%20Giovanni%20Paolo%20Santos%20de%20Carvalho.pdf.txt154e520c9ba20b92d0eff946141b7983MD54THUMBNAILDISSERTAÇÃO Giovanni Paolo Santos de Carvalho.pdf.jpgDISSERTAÇÃO Giovanni Paolo Santos de Carvalho.pdf.jpgGenerated Thumbnailimage/jpeg1216https://repositorio.ufpe.br/bitstream/123456789/39488/5/DISSERTA%c3%87%c3%83O%20Giovanni%20Paolo%20Santos%20de%20Carvalho.pdf.jpgb0ceb36fcc9e44b95cbb84b21725217bMD55ORIGINALDISSERTAÇÃO Giovanni Paolo Santos de Carvalho.pdfDISSERTAÇÃO Giovanni Paolo Santos de Carvalho.pdfapplication/pdf5546093https://repositorio.ufpe.br/bitstream/123456789/39488/1/DISSERTA%c3%87%c3%83O%20Giovanni%20Paolo%20Santos%20de%20Carvalho.pdf1ff40afc18228a403022d1b471424039MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/39488/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82310https://repositorio.ufpe.br/bitstream/123456789/39488/3/license.txtbd573a5ca8288eb7272482765f819534MD53123456789/394882021-03-27 02:15:52.669oai:repositorio.ufpe.br:123456789/39488TGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKClRvZG8gZGVwb3NpdGFudGUgZGUgbWF0ZXJpYWwgbm8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgKFJJKSBkZXZlIGNvbmNlZGVyLCDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIChVRlBFKSwgdW1hIExpY2Vuw6dhIGRlIERpc3RyaWJ1acOnw6NvIE7Do28gRXhjbHVzaXZhIHBhcmEgbWFudGVyIGUgdG9ybmFyIGFjZXNzw612ZWlzIG9zIHNldXMgZG9jdW1lbnRvcywgZW0gZm9ybWF0byBkaWdpdGFsLCBuZXN0ZSByZXBvc2l0w7NyaW8uCgpDb20gYSBjb25jZXNzw6NvIGRlc3RhIGxpY2Vuw6dhIG7Do28gZXhjbHVzaXZhLCBvIGRlcG9zaXRhbnRlIG1hbnTDqW0gdG9kb3Mgb3MgZGlyZWl0b3MgZGUgYXV0b3IuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwoKTGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKCkFvIGNvbmNvcmRhciBjb20gZXN0YSBsaWNlbsOnYSBlIGFjZWl0w6EtbGEsIHZvY8OqIChhdXRvciBvdSBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMpOgoKYSkgRGVjbGFyYSBxdWUgY29uaGVjZSBhIHBvbMOtdGljYSBkZSBjb3B5cmlnaHQgZGEgZWRpdG9yYSBkbyBzZXUgZG9jdW1lbnRvOwpiKSBEZWNsYXJhIHF1ZSBjb25oZWNlIGUgYWNlaXRhIGFzIERpcmV0cml6ZXMgcGFyYSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGUEU7CmMpIENvbmNlZGUgw6AgVUZQRSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZGUgYXJxdWl2YXIsIHJlcHJvZHV6aXIsIGNvbnZlcnRlciAoY29tbyBkZWZpbmlkbyBhIHNlZ3VpciksIGNvbXVuaWNhciBlL291IGRpc3RyaWJ1aXIsIG5vIFJJLCBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbSBmb3JtYXRvIGRpZ2l0YWwgb3UgcG9yIG91dHJvIG1laW87CmQpIERlY2xhcmEgcXVlIGF1dG9yaXphIGEgVUZQRSBhIGFycXVpdmFyIG1haXMgZGUgdW1hIGPDs3BpYSBkZXN0ZSBkb2N1bWVudG8gZSBjb252ZXJ0w6otbG8sIHNlbSBhbHRlcmFyIG8gc2V1IGNvbnRlw7pkbywgcGFyYSBxdWFscXVlciBmb3JtYXRvIGRlIGZpY2hlaXJvLCBtZWlvIG91IHN1cG9ydGUsIHBhcmEgZWZlaXRvcyBkZSBzZWd1cmFuw6dhLCBwcmVzZXJ2YcOnw6NvIChiYWNrdXApIGUgYWNlc3NvOwplKSBEZWNsYXJhIHF1ZSBvIGRvY3VtZW50byBzdWJtZXRpZG8gw6kgbyBzZXUgdHJhYmFsaG8gb3JpZ2luYWwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBhIHRlcmNlaXJvcyBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBhIGVudHJlZ2EgZG8gZG9jdW1lbnRvIG7Do28gaW5mcmluZ2Ugb3MgZGlyZWl0b3MgZGUgb3V0cmEgcGVzc29hIG91IGVudGlkYWRlOwpmKSBEZWNsYXJhIHF1ZSwgbm8gY2FzbyBkbyBkb2N1bWVudG8gc3VibWV0aWRvIGNvbnRlciBtYXRlcmlhbCBkbyBxdWFsIG7Do28gZGV0w6ltIG9zIGRpcmVpdG9zIGRlCmF1dG9yLCBvYnRldmUgYSBhdXRvcml6YcOnw6NvIGlycmVzdHJpdGEgZG8gcmVzcGVjdGl2byBkZXRlbnRvciBkZXNzZXMgZGlyZWl0b3MgcGFyYSBjZWRlciDDoApVRlBFIG9zIGRpcmVpdG9zIHJlcXVlcmlkb3MgcG9yIGVzdGEgTGljZW7Dp2EgZSBhdXRvcml6YXIgYSB1bml2ZXJzaWRhZGUgYSB1dGlsaXrDoS1sb3MgbGVnYWxtZW50ZS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBlc3NlIG1hdGVyaWFsIGN1am9zIGRpcmVpdG9zIHPDo28gZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3UgY29udGXDumRvIGRvIGRvY3VtZW50byBlbnRyZWd1ZTsKZykgU2UgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgYmFzZWFkbyBlbSB0cmFiYWxobyBmaW5hbmNpYWRvIG91IGFwb2lhZG8gcG9yIG91dHJhIGluc3RpdHVpw6fDo28gcXVlIG7Do28gYSBVRlBFLCBkZWNsYXJhIHF1ZSBjdW1wcml1IHF1YWlzcXVlciBvYnJpZ2HDp8O1ZXMgZXhpZ2lkYXMgcGVsbyByZXNwZWN0aXZvIGNvbnRyYXRvIG91IGFjb3Jkby4KCkEgVUZQRSBpZGVudGlmaWNhcsOhIGNsYXJhbWVudGUgbyhzKSBub21lKHMpIGRvKHMpIGF1dG9yIChlcykgZG9zIGRpcmVpdG9zIGRvIGRvY3VtZW50byBlbnRyZWd1ZSBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIHBhcmEgYWzDqW0gZG8gcHJldmlzdG8gbmEgYWzDrW5lYSBjKS4KRepositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212021-03-27T05:15:52Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.pt_BR.fl_str_mv Using structured and unstructured data for product price prediction
title Using structured and unstructured data for product price prediction
spellingShingle Using structured and unstructured data for product price prediction
CARVALHO, Giovanni Paolo Santos de
Inteligência computacional
Otimização
title_short Using structured and unstructured data for product price prediction
title_full Using structured and unstructured data for product price prediction
title_fullStr Using structured and unstructured data for product price prediction
title_full_unstemmed Using structured and unstructured data for product price prediction
title_sort Using structured and unstructured data for product price prediction
author CARVALHO, Giovanni Paolo Santos de
author_facet CARVALHO, Giovanni Paolo Santos de
author_role author
dc.contributor.authorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/0657811885828755
dc.contributor.advisorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/7113249247656195
dc.contributor.advisor-coLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/3084134533707587
dc.contributor.author.fl_str_mv CARVALHO, Giovanni Paolo Santos de
dc.contributor.advisor1.fl_str_mv BARBOSA, Luciano de Andrade
dc.contributor.advisor-co1.fl_str_mv REN, Tsang Ing
contributor_str_mv BARBOSA, Luciano de Andrade
REN, Tsang Ing
dc.subject.por.fl_str_mv Inteligência computacional
Otimização
topic Inteligência computacional
Otimização
description Product price estimation is a relatively new trend in e-commerce that helps customers in their decision making process of buying or selling a product, giving a starting point of what could be a fair price. In this work, we are particularly interested in performing price prediction from online product offers. These offers usually present some text describing the product in natural language (unstructured data) and the specification of the product composed of its properties (structured data). In this dissertation, we aim to predict the price of product offers based on both structured and unstructured information. For that, we propose an attention-based network that deals with structured data individually, and also the interaction between this data and unstructured data, combining them to perform the prediction. For the structured information, we apply a regular fully-connected network; and to model the interaction between them (product’s properties and its description), we employ a co-attention network. Those networks are combined and used by a neural network regressor to learn a vector representation of the product offer. This vector can then be used as a feature set by any regressor to perform product price prediction. This architecture is designed to operate with general structured and unstructured types of product offers, and in this particular study, it is evaluated on a car price prediction task, for which we collected a dataset by scraping 11 sources of car classifieds. Our experimental evaluation shows that: (1) regressors using the learned embedding obtained the best results, improving their performance in almost all scenarios in comparison to raw features; and (2) simple linear regressor models such as Linear Regression using the learned embedding achieved comparable results to more competitive algorithms such as LightGBM.
publishDate 2020
dc.date.issued.fl_str_mv 2020-01-17
dc.date.accessioned.fl_str_mv 2021-03-26T15:39:52Z
dc.date.available.fl_str_mv 2021-03-26T15:39:52Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv CARVALHO, Giovanni Paolo Santos de. Using structured and unstructured data for product price prediction. 2020. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2020.
dc.identifier.uri.fl_str_mv https://repositorio.ufpe.br/handle/123456789/39488
identifier_str_mv CARVALHO, Giovanni Paolo Santos de. Using structured and unstructured data for product price prediction. 2020. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2020.
url https://repositorio.ufpe.br/handle/123456789/39488
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.publisher.program.fl_str_mv Programa de Pos Graduacao em Ciencia da Computacao
dc.publisher.initials.fl_str_mv UFPE
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
bitstream.url.fl_str_mv https://repositorio.ufpe.br/bitstream/123456789/39488/4/DISSERTA%c3%87%c3%83O%20Giovanni%20Paolo%20Santos%20de%20Carvalho.pdf.txt
https://repositorio.ufpe.br/bitstream/123456789/39488/5/DISSERTA%c3%87%c3%83O%20Giovanni%20Paolo%20Santos%20de%20Carvalho.pdf.jpg
https://repositorio.ufpe.br/bitstream/123456789/39488/1/DISSERTA%c3%87%c3%83O%20Giovanni%20Paolo%20Santos%20de%20Carvalho.pdf
https://repositorio.ufpe.br/bitstream/123456789/39488/2/license_rdf
https://repositorio.ufpe.br/bitstream/123456789/39488/3/license.txt
bitstream.checksum.fl_str_mv 154e520c9ba20b92d0eff946141b7983
b0ceb36fcc9e44b95cbb84b21725217b
1ff40afc18228a403022d1b471424039
e39d27027a6cc9cb039ad269a5db8e34
bd573a5ca8288eb7272482765f819534
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1802310797814136832