Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders

Detalhes bibliográficos
Autor(a) principal: Pereira, Victor Martinez Vidal
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UFBA
Texto Completo: https://repositorio.ufba.br/handle/ri/36099
Resumo: Recommender Systems provide suggestions for items that are most likely of interest to users. Providing personalized recommendations is a challenge that can be addressed by filtering algorithms among which Collaborative Filtering (CF) has demonstrated much progress in the last few years. By using Matrix Factorization (MF) techniques, CF methods reduce prediction error by using optimization algorithms. However, they usually face problems such as data sparsity and prediction error. Studies point to the use of data available in Semantic Web as a path to improve recommender systems and address the challenges related to CF techniques. Motivated by these premises, the present work, conducted by me at RecSys Research Group at UFBA, developed a data pipeline along with an algorithm that processes the Ratings Matrix combining semantic similarities of Linked Open Data (LOD) and estimates missing ratings. The experiments took subsets of 1000 samples from three di↵erent datasets (Movielens, LastFM and LibraryThing), calculated two semantic similarity metrics, Linked Data Similarity Distance (LDSD) and Resource Similarity (RESIM), and applied three MF-based algorithms (SVD, SVD++ and NMF). Results suggest the proposed pipeline is able to reduce Root Mean Square Error (RMSE) of all subsets with statistical confidence supported by parametric test one-way ANOVA followed by Tukey’s multiple comparison test.
id UFBA-2_841ba77b7eaf82e7f6d64a21285d5c1f
oai_identifier_str oai:repositorio.ufba.br:ri/36099
network_acronym_str UFBA-2
network_name_str Repositório Institucional da UFBA
repository_id_str 1932
spelling 2022-10-04T13:52:05Z2022-10-04T13:52:05Z2022-06-21PEREIRA, Victor Martinez Vidal. Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders. 2022. 64 f. Dissertação (Mestrado em Ciências da Computação) Instituto de Computação, Universidade Federal da Bahia, Salvador, Ba, 2022.https://repositorio.ufba.br/handle/ri/36099Recommender Systems provide suggestions for items that are most likely of interest to users. Providing personalized recommendations is a challenge that can be addressed by filtering algorithms among which Collaborative Filtering (CF) has demonstrated much progress in the last few years. By using Matrix Factorization (MF) techniques, CF methods reduce prediction error by using optimization algorithms. However, they usually face problems such as data sparsity and prediction error. Studies point to the use of data available in Semantic Web as a path to improve recommender systems and address the challenges related to CF techniques. Motivated by these premises, the present work, conducted by me at RecSys Research Group at UFBA, developed a data pipeline along with an algorithm that processes the Ratings Matrix combining semantic similarities of Linked Open Data (LOD) and estimates missing ratings. The experiments took subsets of 1000 samples from three di↵erent datasets (Movielens, LastFM and LibraryThing), calculated two semantic similarity metrics, Linked Data Similarity Distance (LDSD) and Resource Similarity (RESIM), and applied three MF-based algorithms (SVD, SVD++ and NMF). Results suggest the proposed pipeline is able to reduce Root Mean Square Error (RMSE) of all subsets with statistical confidence supported by parametric test one-way ANOVA followed by Tukey’s multiple comparison test.Recommender Systems provide suggestions for items that are most likely of interest to users. Providing personalized recommendations is a challenge that can be addressed by filtering algorithms among which Collaborative Filtering (CF) has demonstrated much progress in the last few years. By using Matrix Factorization (MF) techniques, CF methods reduce prediction error by using optimization algorithms. However, they usually face problems such as data sparsity and prediction error. Studies point to the use of data available in Semantic Web as a path to improve recommender systems and address the challenges related to CF techniques. Motivated by these premises, the present work, conducted by me at RecSys Research Group at UFBA, developed a data pipeline along with an algorithm that processes the Ratings Matrix combining semantic similarities of Linked Open Data (LOD) and estimates missing ratings. The experiments took subsets of 1000 samples from three di↵erent datasets (Movielens, LastFM and LibraryThing), calculated two semantic similarity metrics, Linked Data Similarity Distance (LDSD) and Resource Similarity (RESIM), and applied three MF-based algorithms (SVD, SVD++ and NMF). Results suggest the proposed pipeline is able to reduce Root Mean Square Error (RMSE) of all subsets with statistical confidence supported by parametric test one-way ANOVA followed by Tukey’s multiple comparison test.Submitted by Victor Martinez Vidal Pereira (victor.martinez@ufba.br) on 2022-09-27T16:03:16Z No. of bitstreams: 1 PGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdf: 6671829 bytes, checksum: 632045b133a74efbdcd08d96e235fd23 (MD5)Approved for entry into archive by Solange Rocha (soluny@gmail.com) on 2022-10-04T13:52:04Z (GMT) No. of bitstreams: 1 PGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdf: 6671829 bytes, checksum: 632045b133a74efbdcd08d96e235fd23 (MD5)Made available in DSpace on 2022-10-04T13:52:05Z (GMT). No. of bitstreams: 1 PGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdf: 6671829 bytes, checksum: 632045b133a74efbdcd08d96e235fd23 (MD5) Previous issue date: 2022-06-21engUniversidade Federal da BahiaPrograma de Pós-Graduação em Ciência da Computação (PGCOMP) UFBABrasilInstituto de Computação - ICRecommender systemsMatrix factorizationLinked open dataPrediction errorCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::SISTEMAS DE INFORMACAOSistemas de recomendaçãoFatorização de matrizesDados abertosErro preditoExploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommendersExplorando Linked Data na DBpedia para reduzir Erro Predito em Recomendadores baseados em Fatorização de MatrizExploração de dados vinculados na Dbpedia para reduzir erro de previsão em recomendadores de factorização matricialinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisDurão, Frederico Araujo0000-0002-7766-6666http://lattes.cnpq.br/6271096128174325Durão, Frederico Araujohttps://orcid.org/0000-0002-7766-6666http://lattes.cnpq.br/6271096128174325Pereira, Adriano César Machadohttps://orcid.org/0000-0003-2389-0512http://lattes.cnpq.br/6813736989856243Coimbra, Danilo Barbosa0000-0003-2218-1351http://lattes.cnpq.br/9590398895954821https://orcid.org/0000-0002-2438-8439http://lattes.cnpq.br/2228036140992682Pereira, Victor Martinez Vidalreponame:Repositório Institucional da UFBAinstname:Universidade Federal da Bahia (UFBA)instacron:UFBAinfo:eu-repo/semantics/openAccessORIGINALPGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdfPGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdfDissertação de Mestrado de Victor Martinez Vidal Pereiraapplication/pdf6671829https://repositorio.ufba.br/bitstream/ri/36099/1/PGCOMP-2022-Dissertac%cc%a7a%cc%83o_Mestrado-Victor_Martinez_Vidal_Pereira.pdf632045b133a74efbdcd08d96e235fd23MD51LICENSElicense.txtlicense.txttext/plain1715https://repositorio.ufba.br/bitstream/ri/36099/2/license.txt67bf4f75790b0d8d38d8f112a48ad90bMD52TEXTPGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdf.txtPGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdf.txtExtracted texttext/plain136291https://repositorio.ufba.br/bitstream/ri/36099/3/PGCOMP-2022-Dissertac%cc%a7a%cc%83o_Mestrado-Victor_Martinez_Vidal_Pereira.pdf.txt1a6462211fb5a9723b255d3ee3c6a50eMD53ri/360992022-10-05 14:07:16.346oai:repositorio.ufba.br:ri/36099TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCBvIGF1dG9yIG91IHRpdHVsYXIgZG9zIGRpcmVpdG9zIGRlIGF1dG9yIGNvbmNlZGUgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIG5vIGZvcm1hdG8gaW1wcmVzc28gZS9vdSBlbGV0csO0bmljbyBlIGVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyAKZm9ybWF0b3Mgw6F1ZGlvIGUvb3UgdsOtZGVvLgoKTyBhdXRvciBvdSB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvciBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIGEgc3VhIHB1YmxpY2HDp8OjbyBwYXJhIHF1YWxxdWVyIG1laW8gZS9vdSBmb3JtYXRvIHBhcmEgZmlucyBkZSBwcmVzZXJ2YcOnw6NvLCBwb2RlbmRvIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrdXAgZSBwcmVzZXJ2YcOnw6NvLgoKTyBhdXRvciBvdSB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvciBkZWNsYXJhIHF1ZSBhIHN1YSBwdWJsaWNhw6fDo28gw6kgb3JpZ2luYWwgZSBxdWUgbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIG9zIGRpcmVpdG9zIGFwcmVzZW50YWRvcyBuZXN0YSBsaWNlbsOnYSBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIGlkZW50aWZpY2FkbyBlIHJlY29uaGVjaWRvIG5vIHRleHRvIG91IG5vIGNvbnRlw7pkbyBkYSBwdWJsaWNhw6fDo28gb3JhIGRlcG9zaXRhZGEuCgpDQVNPIEEgUFVCTElDQcOHw4NPIE9SQSBERVBPU0lUQURBICBSRVNVTFRFIERFIFVNIFBBVFJPQ8ONTklPIE9VIEFQT0lPIERFIFVNQSAgQUfDik5DSUEgREUgRk9NRU5UTyBPVSBPVVRSTyAKT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08sIENPTU8gVEFNQsOJTSBBUyBERU1BSVMgT0JSSUdBw4fDlUVTIApFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKTyBSZXBvc2l0w7NyaW8gc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyLCBjbGFyYW1lbnRlLCBvIHNldSBub21lIChzKSBvdSBvKHMpIG5vbWUocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28gZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBhbMOpbSBkYXF1ZWxhcyBjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgo=Repositório InstitucionalPUBhttp://192.188.11.11:8080/oai/requestopendoar:19322022-10-05T17:07:16Repositório Institucional da UFBA - Universidade Federal da Bahia (UFBA)false
dc.title.pt_BR.fl_str_mv Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders
dc.title.alternative.pt_BR.fl_str_mv Explorando Linked Data na DBpedia para reduzir Erro Predito em Recomendadores baseados em Fatorização de Matriz
Exploração de dados vinculados na Dbpedia para reduzir erro de previsão em recomendadores de factorização matricial
title Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders
spellingShingle Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders
Pereira, Victor Martinez Vidal
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::SISTEMAS DE INFORMACAO
Sistemas de recomendação
Fatorização de matrizes
Dados abertos
Erro predito
Recommender systems
Matrix factorization
Linked open data
Prediction error
title_short Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders
title_full Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders
title_fullStr Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders
title_full_unstemmed Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders
title_sort Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders
author Pereira, Victor Martinez Vidal
author_facet Pereira, Victor Martinez Vidal
author_role author
dc.contributor.advisor1.fl_str_mv Durão, Frederico Araujo
dc.contributor.advisor1ID.fl_str_mv 0000-0002-7766-6666
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/6271096128174325
dc.contributor.referee1.fl_str_mv Durão, Frederico Araujo
dc.contributor.referee1ID.fl_str_mv https://orcid.org/0000-0002-7766-6666
dc.contributor.referee1Lattes.fl_str_mv http://lattes.cnpq.br/6271096128174325
dc.contributor.referee2.fl_str_mv Pereira, Adriano César Machado
dc.contributor.referee2ID.fl_str_mv https://orcid.org/0000-0003-2389-0512
dc.contributor.referee2Lattes.fl_str_mv http://lattes.cnpq.br/6813736989856243
dc.contributor.referee3.fl_str_mv Coimbra, Danilo Barbosa
dc.contributor.referee3ID.fl_str_mv 0000-0003-2218-1351
dc.contributor.referee3Lattes.fl_str_mv http://lattes.cnpq.br/9590398895954821
dc.contributor.authorID.fl_str_mv https://orcid.org/0000-0002-2438-8439
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/2228036140992682
dc.contributor.author.fl_str_mv Pereira, Victor Martinez Vidal
contributor_str_mv Durão, Frederico Araujo
Durão, Frederico Araujo
Pereira, Adriano César Machado
Coimbra, Danilo Barbosa
dc.subject.cnpq.fl_str_mv CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::SISTEMAS DE INFORMACAO
topic CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::SISTEMAS DE INFORMACAO
Sistemas de recomendação
Fatorização de matrizes
Dados abertos
Erro predito
Recommender systems
Matrix factorization
Linked open data
Prediction error
dc.subject.por.fl_str_mv Sistemas de recomendação
Fatorização de matrizes
Dados abertos
Erro predito
dc.subject.other.pt_BR.fl_str_mv Recommender systems
Matrix factorization
Linked open data
Prediction error
description Recommender Systems provide suggestions for items that are most likely of interest to users. Providing personalized recommendations is a challenge that can be addressed by filtering algorithms among which Collaborative Filtering (CF) has demonstrated much progress in the last few years. By using Matrix Factorization (MF) techniques, CF methods reduce prediction error by using optimization algorithms. However, they usually face problems such as data sparsity and prediction error. Studies point to the use of data available in Semantic Web as a path to improve recommender systems and address the challenges related to CF techniques. Motivated by these premises, the present work, conducted by me at RecSys Research Group at UFBA, developed a data pipeline along with an algorithm that processes the Ratings Matrix combining semantic similarities of Linked Open Data (LOD) and estimates missing ratings. The experiments took subsets of 1000 samples from three di↵erent datasets (Movielens, LastFM and LibraryThing), calculated two semantic similarity metrics, Linked Data Similarity Distance (LDSD) and Resource Similarity (RESIM), and applied three MF-based algorithms (SVD, SVD++ and NMF). Results suggest the proposed pipeline is able to reduce Root Mean Square Error (RMSE) of all subsets with statistical confidence supported by parametric test one-way ANOVA followed by Tukey’s multiple comparison test.
publishDate 2022
dc.date.accessioned.fl_str_mv 2022-10-04T13:52:05Z
dc.date.available.fl_str_mv 2022-10-04T13:52:05Z
dc.date.issued.fl_str_mv 2022-06-21
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv PEREIRA, Victor Martinez Vidal. Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders. 2022. 64 f. Dissertação (Mestrado em Ciências da Computação) Instituto de Computação, Universidade Federal da Bahia, Salvador, Ba, 2022.
dc.identifier.uri.fl_str_mv https://repositorio.ufba.br/handle/ri/36099
identifier_str_mv PEREIRA, Victor Martinez Vidal. Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders. 2022. 64 f. Dissertação (Mestrado em Ciências da Computação) Instituto de Computação, Universidade Federal da Bahia, Salvador, Ba, 2022.
url https://repositorio.ufba.br/handle/ri/36099
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal da Bahia
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Ciência da Computação (PGCOMP) 
dc.publisher.initials.fl_str_mv UFBA
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv Instituto de Computação - IC
publisher.none.fl_str_mv Universidade Federal da Bahia
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFBA
instname:Universidade Federal da Bahia (UFBA)
instacron:UFBA
instname_str Universidade Federal da Bahia (UFBA)
instacron_str UFBA
institution UFBA
reponame_str Repositório Institucional da UFBA
collection Repositório Institucional da UFBA
bitstream.url.fl_str_mv https://repositorio.ufba.br/bitstream/ri/36099/1/PGCOMP-2022-Dissertac%cc%a7a%cc%83o_Mestrado-Victor_Martinez_Vidal_Pereira.pdf
https://repositorio.ufba.br/bitstream/ri/36099/2/license.txt
https://repositorio.ufba.br/bitstream/ri/36099/3/PGCOMP-2022-Dissertac%cc%a7a%cc%83o_Mestrado-Victor_Martinez_Vidal_Pereira.pdf.txt
bitstream.checksum.fl_str_mv 632045b133a74efbdcd08d96e235fd23
67bf4f75790b0d8d38d8f112a48ad90b
1a6462211fb5a9723b255d3ee3c6a50e
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFBA - Universidade Federal da Bahia (UFBA)
repository.mail.fl_str_mv
_version_ 1808459652516544512