Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly

Detalhes bibliográficos
Autor(a) principal: García-Méndez, Silvia
Data de Publicação: 2022
Outros Autores: Leal, Fátima, Malheiro, Benedita, Burguillo-Rial, Juan Carlos, Veloso, Bruno, Chis, Adriana E., González–Vélez, Horacio
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.22/20675
Resumo: Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.
id RCAP_9240a1058eb3dfce1c9e5a990c77661e
oai_identifier_str oai:recipp.ipp.pt:10400.22/20675
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the uglyClassificationData reliabilityStream processingSynthetic dataData fabricationWiki contributorsData crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.This work has been supported by: (i) Xunta de Galicia, Spain grant ED481B-2021-118, Spain; (ii) National Funds through the FCT – Fundação para a Ciência e a Tecnologia, Portugal (Portuguese Foundation for Science and Technology) as part of project UIDB/50014/2020; (iii) CHIST-ERA, Ireland and the Irish Research Council, Ireland as part of the ‘‘Smart Pharmaceutical Manufacturing (SPuMoNI)’’ research project [Apr/2019–Dec/2022]; and (iv) University of Vigo, Spain/CISUG for open access charge.ElsevierRepositório Científico do Instituto Politécnico do PortoGarcía-Méndez, SilviaLeal, FátimaMalheiro, BeneditaBurguillo-Rial, Juan CarlosVeloso, BrunoChis, Adriana E.González–Vélez, Horacio2022-07-15T08:50:32Z20222022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.22/20675eng1569-190X10.1016/j.simpat.2022.102616info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-03-13T13:16:12Zoai:recipp.ipp.pt:10400.22/20675Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:40:43.530507Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
spellingShingle Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
García-Méndez, Silvia
Classification
Data reliability
Stream processing
Synthetic data
Data fabrication
Wiki contributors
title_short Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_full Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_fullStr Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_full_unstemmed Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_sort Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
author García-Méndez, Silvia
author_facet García-Méndez, Silvia
Leal, Fátima
Malheiro, Benedita
Burguillo-Rial, Juan Carlos
Veloso, Bruno
Chis, Adriana E.
González–Vélez, Horacio
author_role author
author2 Leal, Fátima
Malheiro, Benedita
Burguillo-Rial, Juan Carlos
Veloso, Bruno
Chis, Adriana E.
González–Vélez, Horacio
author2_role author
author
author
author
author
author
dc.contributor.none.fl_str_mv Repositório Científico do Instituto Politécnico do Porto
dc.contributor.author.fl_str_mv García-Méndez, Silvia
Leal, Fátima
Malheiro, Benedita
Burguillo-Rial, Juan Carlos
Veloso, Bruno
Chis, Adriana E.
González–Vélez, Horacio
dc.subject.por.fl_str_mv Classification
Data reliability
Stream processing
Synthetic data
Data fabrication
Wiki contributors
topic Classification
Data reliability
Stream processing
Synthetic data
Data fabrication
Wiki contributors
description Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.
publishDate 2022
dc.date.none.fl_str_mv 2022-07-15T08:50:32Z
2022
2022-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.22/20675
url http://hdl.handle.net/10400.22/20675
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1569-190X
10.1016/j.simpat.2022.102616
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799131495581876224