Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.22/20675 |
Resumo: | Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %. |
id |
RCAP_9240a1058eb3dfce1c9e5a990c77661e |
---|---|
oai_identifier_str |
oai:recipp.ipp.pt:10400.22/20675 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the uglyClassificationData reliabilityStream processingSynthetic dataData fabricationWiki contributorsData crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.This work has been supported by: (i) Xunta de Galicia, Spain grant ED481B-2021-118, Spain; (ii) National Funds through the FCT – Fundação para a Ciência e a Tecnologia, Portugal (Portuguese Foundation for Science and Technology) as part of project UIDB/50014/2020; (iii) CHIST-ERA, Ireland and the Irish Research Council, Ireland as part of the ‘‘Smart Pharmaceutical Manufacturing (SPuMoNI)’’ research project [Apr/2019–Dec/2022]; and (iv) University of Vigo, Spain/CISUG for open access charge.ElsevierRepositório Científico do Instituto Politécnico do PortoGarcía-Méndez, SilviaLeal, FátimaMalheiro, BeneditaBurguillo-Rial, Juan CarlosVeloso, BrunoChis, Adriana E.González–Vélez, Horacio2022-07-15T08:50:32Z20222022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.22/20675eng1569-190X10.1016/j.simpat.2022.102616info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-03-13T13:16:12Zoai:recipp.ipp.pt:10400.22/20675Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:40:43.530507Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly |
title |
Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly |
spellingShingle |
Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly García-Méndez, Silvia Classification Data reliability Stream processing Synthetic data Data fabrication Wiki contributors |
title_short |
Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly |
title_full |
Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly |
title_fullStr |
Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly |
title_full_unstemmed |
Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly |
title_sort |
Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly |
author |
García-Méndez, Silvia |
author_facet |
García-Méndez, Silvia Leal, Fátima Malheiro, Benedita Burguillo-Rial, Juan Carlos Veloso, Bruno Chis, Adriana E. González–Vélez, Horacio |
author_role |
author |
author2 |
Leal, Fátima Malheiro, Benedita Burguillo-Rial, Juan Carlos Veloso, Bruno Chis, Adriana E. González–Vélez, Horacio |
author2_role |
author author author author author author |
dc.contributor.none.fl_str_mv |
Repositório Científico do Instituto Politécnico do Porto |
dc.contributor.author.fl_str_mv |
García-Méndez, Silvia Leal, Fátima Malheiro, Benedita Burguillo-Rial, Juan Carlos Veloso, Bruno Chis, Adriana E. González–Vélez, Horacio |
dc.subject.por.fl_str_mv |
Classification Data reliability Stream processing Synthetic data Data fabrication Wiki contributors |
topic |
Classification Data reliability Stream processing Synthetic data Data fabrication Wiki contributors |
description |
Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-07-15T08:50:32Z 2022 2022-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.22/20675 |
url |
http://hdl.handle.net/10400.22/20675 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
1569-190X 10.1016/j.simpat.2022.102616 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier |
publisher.none.fl_str_mv |
Elsevier |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799131495581876224 |