Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFBA |
Texto Completo: | https://repositorio.ufba.br/handle/ri/36099 |
Resumo: | Recommender Systems provide suggestions for items that are most likely of interest to users. Providing personalized recommendations is a challenge that can be addressed by filtering algorithms among which Collaborative Filtering (CF) has demonstrated much progress in the last few years. By using Matrix Factorization (MF) techniques, CF methods reduce prediction error by using optimization algorithms. However, they usually face problems such as data sparsity and prediction error. Studies point to the use of data available in Semantic Web as a path to improve recommender systems and address the challenges related to CF techniques. Motivated by these premises, the present work, conducted by me at RecSys Research Group at UFBA, developed a data pipeline along with an algorithm that processes the Ratings Matrix combining semantic similarities of Linked Open Data (LOD) and estimates missing ratings. The experiments took subsets of 1000 samples from three di↵erent datasets (Movielens, LastFM and LibraryThing), calculated two semantic similarity metrics, Linked Data Similarity Distance (LDSD) and Resource Similarity (RESIM), and applied three MF-based algorithms (SVD, SVD++ and NMF). Results suggest the proposed pipeline is able to reduce Root Mean Square Error (RMSE) of all subsets with statistical confidence supported by parametric test one-way ANOVA followed by Tukey’s multiple comparison test. |
id |
UFBA-2_841ba77b7eaf82e7f6d64a21285d5c1f |
---|---|
oai_identifier_str |
oai:repositorio.ufba.br:ri/36099 |
network_acronym_str |
UFBA-2 |
network_name_str |
Repositório Institucional da UFBA |
repository_id_str |
1932 |
spelling |
2022-10-04T13:52:05Z2022-10-04T13:52:05Z2022-06-21PEREIRA, Victor Martinez Vidal. Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders. 2022. 64 f. Dissertação (Mestrado em Ciências da Computação) Instituto de Computação, Universidade Federal da Bahia, Salvador, Ba, 2022.https://repositorio.ufba.br/handle/ri/36099Recommender Systems provide suggestions for items that are most likely of interest to users. Providing personalized recommendations is a challenge that can be addressed by filtering algorithms among which Collaborative Filtering (CF) has demonstrated much progress in the last few years. By using Matrix Factorization (MF) techniques, CF methods reduce prediction error by using optimization algorithms. However, they usually face problems such as data sparsity and prediction error. Studies point to the use of data available in Semantic Web as a path to improve recommender systems and address the challenges related to CF techniques. Motivated by these premises, the present work, conducted by me at RecSys Research Group at UFBA, developed a data pipeline along with an algorithm that processes the Ratings Matrix combining semantic similarities of Linked Open Data (LOD) and estimates missing ratings. The experiments took subsets of 1000 samples from three di↵erent datasets (Movielens, LastFM and LibraryThing), calculated two semantic similarity metrics, Linked Data Similarity Distance (LDSD) and Resource Similarity (RESIM), and applied three MF-based algorithms (SVD, SVD++ and NMF). Results suggest the proposed pipeline is able to reduce Root Mean Square Error (RMSE) of all subsets with statistical confidence supported by parametric test one-way ANOVA followed by Tukey’s multiple comparison test.Recommender Systems provide suggestions for items that are most likely of interest to users. Providing personalized recommendations is a challenge that can be addressed by filtering algorithms among which Collaborative Filtering (CF) has demonstrated much progress in the last few years. By using Matrix Factorization (MF) techniques, CF methods reduce prediction error by using optimization algorithms. However, they usually face problems such as data sparsity and prediction error. Studies point to the use of data available in Semantic Web as a path to improve recommender systems and address the challenges related to CF techniques. Motivated by these premises, the present work, conducted by me at RecSys Research Group at UFBA, developed a data pipeline along with an algorithm that processes the Ratings Matrix combining semantic similarities of Linked Open Data (LOD) and estimates missing ratings. The experiments took subsets of 1000 samples from three di↵erent datasets (Movielens, LastFM and LibraryThing), calculated two semantic similarity metrics, Linked Data Similarity Distance (LDSD) and Resource Similarity (RESIM), and applied three MF-based algorithms (SVD, SVD++ and NMF). Results suggest the proposed pipeline is able to reduce Root Mean Square Error (RMSE) of all subsets with statistical confidence supported by parametric test one-way ANOVA followed by Tukey’s multiple comparison test.Submitted by Victor Martinez Vidal Pereira (victor.martinez@ufba.br) on 2022-09-27T16:03:16Z No. of bitstreams: 1 PGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdf: 6671829 bytes, checksum: 632045b133a74efbdcd08d96e235fd23 (MD5)Approved for entry into archive by Solange Rocha (soluny@gmail.com) on 2022-10-04T13:52:04Z (GMT) No. of bitstreams: 1 PGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdf: 6671829 bytes, checksum: 632045b133a74efbdcd08d96e235fd23 (MD5)Made available in DSpace on 2022-10-04T13:52:05Z (GMT). No. of bitstreams: 1 PGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdf: 6671829 bytes, checksum: 632045b133a74efbdcd08d96e235fd23 (MD5) Previous issue date: 2022-06-21engUniversidade Federal da BahiaPrograma de Pós-Graduação em Ciência da Computação (PGCOMP) UFBABrasilInstituto de Computação - ICRecommender systemsMatrix factorizationLinked open dataPrediction errorCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::SISTEMAS DE INFORMACAOSistemas de recomendaçãoFatorização de matrizesDados abertosErro preditoExploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommendersExplorando Linked Data na DBpedia para reduzir Erro Predito em Recomendadores baseados em Fatorização de MatrizExploração de dados vinculados na Dbpedia para reduzir erro de previsão em recomendadores de factorização matricialinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisDurão, Frederico Araujo0000-0002-7766-6666http://lattes.cnpq.br/6271096128174325Durão, Frederico Araujohttps://orcid.org/0000-0002-7766-6666http://lattes.cnpq.br/6271096128174325Pereira, Adriano César Machadohttps://orcid.org/0000-0003-2389-0512http://lattes.cnpq.br/6813736989856243Coimbra, Danilo Barbosa0000-0003-2218-1351http://lattes.cnpq.br/9590398895954821https://orcid.org/0000-0002-2438-8439http://lattes.cnpq.br/2228036140992682Pereira, Victor Martinez Vidalreponame:Repositório Institucional da UFBAinstname:Universidade Federal da Bahia (UFBA)instacron:UFBAinfo:eu-repo/semantics/openAccessORIGINALPGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdfPGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdfDissertação de Mestrado de Victor Martinez Vidal Pereiraapplication/pdf6671829https://repositorio.ufba.br/bitstream/ri/36099/1/PGCOMP-2022-Dissertac%cc%a7a%cc%83o_Mestrado-Victor_Martinez_Vidal_Pereira.pdf632045b133a74efbdcd08d96e235fd23MD51LICENSElicense.txtlicense.txttext/plain1715https://repositorio.ufba.br/bitstream/ri/36099/2/license.txt67bf4f75790b0d8d38d8f112a48ad90bMD52TEXTPGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdf.txtPGCOMP-2022-Dissertação_Mestrado-Victor_Martinez_Vidal_Pereira.pdf.txtExtracted texttext/plain136291https://repositorio.ufba.br/bitstream/ri/36099/3/PGCOMP-2022-Dissertac%cc%a7a%cc%83o_Mestrado-Victor_Martinez_Vidal_Pereira.pdf.txt1a6462211fb5a9723b255d3ee3c6a50eMD53ri/360992022-10-05 14:07:16.346oai:repositorio.ufba.br:ri/36099TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCBvIGF1dG9yIG91IHRpdHVsYXIgZG9zIGRpcmVpdG9zIGRlIGF1dG9yIGNvbmNlZGUgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIG5vIGZvcm1hdG8gaW1wcmVzc28gZS9vdSBlbGV0csO0bmljbyBlIGVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyAKZm9ybWF0b3Mgw6F1ZGlvIGUvb3UgdsOtZGVvLgoKTyBhdXRvciBvdSB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvciBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIGEgc3VhIHB1YmxpY2HDp8OjbyBwYXJhIHF1YWxxdWVyIG1laW8gZS9vdSBmb3JtYXRvIHBhcmEgZmlucyBkZSBwcmVzZXJ2YcOnw6NvLCBwb2RlbmRvIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrdXAgZSBwcmVzZXJ2YcOnw6NvLgoKTyBhdXRvciBvdSB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvciBkZWNsYXJhIHF1ZSBhIHN1YSBwdWJsaWNhw6fDo28gw6kgb3JpZ2luYWwgZSBxdWUgbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIG9zIGRpcmVpdG9zIGFwcmVzZW50YWRvcyBuZXN0YSBsaWNlbsOnYSBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIGlkZW50aWZpY2FkbyBlIHJlY29uaGVjaWRvIG5vIHRleHRvIG91IG5vIGNvbnRlw7pkbyBkYSBwdWJsaWNhw6fDo28gb3JhIGRlcG9zaXRhZGEuCgpDQVNPIEEgUFVCTElDQcOHw4NPIE9SQSBERVBPU0lUQURBICBSRVNVTFRFIERFIFVNIFBBVFJPQ8ONTklPIE9VIEFQT0lPIERFIFVNQSAgQUfDik5DSUEgREUgRk9NRU5UTyBPVSBPVVRSTyAKT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08sIENPTU8gVEFNQsOJTSBBUyBERU1BSVMgT0JSSUdBw4fDlUVTIApFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKTyBSZXBvc2l0w7NyaW8gc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyLCBjbGFyYW1lbnRlLCBvIHNldSBub21lIChzKSBvdSBvKHMpIG5vbWUocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28gZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBhbMOpbSBkYXF1ZWxhcyBjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgo=Repositório InstitucionalPUBhttp://192.188.11.11:8080/oai/requestopendoar:19322022-10-05T17:07:16Repositório Institucional da UFBA - Universidade Federal da Bahia (UFBA)false |
dc.title.pt_BR.fl_str_mv |
Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders |
dc.title.alternative.pt_BR.fl_str_mv |
Explorando Linked Data na DBpedia para reduzir Erro Predito em Recomendadores baseados em Fatorização de Matriz Exploração de dados vinculados na Dbpedia para reduzir erro de previsão em recomendadores de factorização matricial |
title |
Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders |
spellingShingle |
Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders Pereira, Victor Martinez Vidal CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::SISTEMAS DE INFORMACAO Sistemas de recomendação Fatorização de matrizes Dados abertos Erro predito Recommender systems Matrix factorization Linked open data Prediction error |
title_short |
Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders |
title_full |
Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders |
title_fullStr |
Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders |
title_full_unstemmed |
Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders |
title_sort |
Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders |
author |
Pereira, Victor Martinez Vidal |
author_facet |
Pereira, Victor Martinez Vidal |
author_role |
author |
dc.contributor.advisor1.fl_str_mv |
Durão, Frederico Araujo |
dc.contributor.advisor1ID.fl_str_mv |
0000-0002-7766-6666 |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/6271096128174325 |
dc.contributor.referee1.fl_str_mv |
Durão, Frederico Araujo |
dc.contributor.referee1ID.fl_str_mv |
https://orcid.org/0000-0002-7766-6666 |
dc.contributor.referee1Lattes.fl_str_mv |
http://lattes.cnpq.br/6271096128174325 |
dc.contributor.referee2.fl_str_mv |
Pereira, Adriano César Machado |
dc.contributor.referee2ID.fl_str_mv |
https://orcid.org/0000-0003-2389-0512 |
dc.contributor.referee2Lattes.fl_str_mv |
http://lattes.cnpq.br/6813736989856243 |
dc.contributor.referee3.fl_str_mv |
Coimbra, Danilo Barbosa |
dc.contributor.referee3ID.fl_str_mv |
0000-0003-2218-1351 |
dc.contributor.referee3Lattes.fl_str_mv |
http://lattes.cnpq.br/9590398895954821 |
dc.contributor.authorID.fl_str_mv |
https://orcid.org/0000-0002-2438-8439 |
dc.contributor.authorLattes.fl_str_mv |
http://lattes.cnpq.br/2228036140992682 |
dc.contributor.author.fl_str_mv |
Pereira, Victor Martinez Vidal |
contributor_str_mv |
Durão, Frederico Araujo Durão, Frederico Araujo Pereira, Adriano César Machado Coimbra, Danilo Barbosa |
dc.subject.cnpq.fl_str_mv |
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::SISTEMAS DE INFORMACAO |
topic |
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::SISTEMAS DE INFORMACAO Sistemas de recomendação Fatorização de matrizes Dados abertos Erro predito Recommender systems Matrix factorization Linked open data Prediction error |
dc.subject.por.fl_str_mv |
Sistemas de recomendação Fatorização de matrizes Dados abertos Erro predito |
dc.subject.other.pt_BR.fl_str_mv |
Recommender systems Matrix factorization Linked open data Prediction error |
description |
Recommender Systems provide suggestions for items that are most likely of interest to users. Providing personalized recommendations is a challenge that can be addressed by filtering algorithms among which Collaborative Filtering (CF) has demonstrated much progress in the last few years. By using Matrix Factorization (MF) techniques, CF methods reduce prediction error by using optimization algorithms. However, they usually face problems such as data sparsity and prediction error. Studies point to the use of data available in Semantic Web as a path to improve recommender systems and address the challenges related to CF techniques. Motivated by these premises, the present work, conducted by me at RecSys Research Group at UFBA, developed a data pipeline along with an algorithm that processes the Ratings Matrix combining semantic similarities of Linked Open Data (LOD) and estimates missing ratings. The experiments took subsets of 1000 samples from three di↵erent datasets (Movielens, LastFM and LibraryThing), calculated two semantic similarity metrics, Linked Data Similarity Distance (LDSD) and Resource Similarity (RESIM), and applied three MF-based algorithms (SVD, SVD++ and NMF). Results suggest the proposed pipeline is able to reduce Root Mean Square Error (RMSE) of all subsets with statistical confidence supported by parametric test one-way ANOVA followed by Tukey’s multiple comparison test. |
publishDate |
2022 |
dc.date.accessioned.fl_str_mv |
2022-10-04T13:52:05Z |
dc.date.available.fl_str_mv |
2022-10-04T13:52:05Z |
dc.date.issued.fl_str_mv |
2022-06-21 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
PEREIRA, Victor Martinez Vidal. Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders. 2022. 64 f. Dissertação (Mestrado em Ciências da Computação) Instituto de Computação, Universidade Federal da Bahia, Salvador, Ba, 2022. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufba.br/handle/ri/36099 |
identifier_str_mv |
PEREIRA, Victor Martinez Vidal. Exploiting linked data in Dbpedia to reduce prediction error in matrix factorization recommenders. 2022. 64 f. Dissertação (Mestrado em Ciências da Computação) Instituto de Computação, Universidade Federal da Bahia, Salvador, Ba, 2022. |
url |
https://repositorio.ufba.br/handle/ri/36099 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal da Bahia |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Ciência da Computação (PGCOMP) |
dc.publisher.initials.fl_str_mv |
UFBA |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
Instituto de Computação - IC |
publisher.none.fl_str_mv |
Universidade Federal da Bahia |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFBA instname:Universidade Federal da Bahia (UFBA) instacron:UFBA |
instname_str |
Universidade Federal da Bahia (UFBA) |
instacron_str |
UFBA |
institution |
UFBA |
reponame_str |
Repositório Institucional da UFBA |
collection |
Repositório Institucional da UFBA |
bitstream.url.fl_str_mv |
https://repositorio.ufba.br/bitstream/ri/36099/1/PGCOMP-2022-Dissertac%cc%a7a%cc%83o_Mestrado-Victor_Martinez_Vidal_Pereira.pdf https://repositorio.ufba.br/bitstream/ri/36099/2/license.txt https://repositorio.ufba.br/bitstream/ri/36099/3/PGCOMP-2022-Dissertac%cc%a7a%cc%83o_Mestrado-Victor_Martinez_Vidal_Pereira.pdf.txt |
bitstream.checksum.fl_str_mv |
632045b133a74efbdcd08d96e235fd23 67bf4f75790b0d8d38d8f112a48ad90b 1a6462211fb5a9723b255d3ee3c6a50e |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFBA - Universidade Federal da Bahia (UFBA) |
repository.mail.fl_str_mv |
|
_version_ |
1808459652516544512 |