Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly

García-Méndez, Silvia; Leal, Fátima; Malheiro, Benedita; Burguillo-Rial, Juan Carlos; Veloso, Bruno; Chis, Adriana E.; González–Vélez, Horacio

Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly

Detalhes bibliográficos
Autor(a) principal:	García-Méndez, Silvia
Data de Publicação:	2022
Outros Autores:	Leal, Fátima, Malheiro, Benedita, Burguillo-Rial, Juan Carlos, Veloso, Bruno, Chis, Adriana E., González–Vélez, Horacio
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10400.22/20675
Resumo:	Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.

Metadados do item

id	RCAP_9240a1058eb3dfce1c9e5a990c77661e
oai_identifier_str	oai:recipp.ipp.pt:10400.22/20675
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the uglyClassificationData reliabilityStream processingSynthetic dataData fabricationWiki contributorsData crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.This work has been supported by: (i) Xunta de Galicia, Spain grant ED481B-2021-118, Spain; (ii) National Funds through the FCT – Fundação para a Ciência e a Tecnologia, Portugal (Portuguese Foundation for Science and Technology) as part of project UIDB/50014/2020; (iii) CHIST-ERA, Ireland and the Irish Research Council, Ireland as part of the ‘‘Smart Pharmaceutical Manufacturing (SPuMoNI)’’ research project [Apr/2019–Dec/2022]; and (iv) University of Vigo, Spain/CISUG for open access charge.ElsevierRepositório Científico do Instituto Politécnico do PortoGarcía-Méndez, SilviaLeal, FátimaMalheiro, BeneditaBurguillo-Rial, Juan CarlosVeloso, BrunoChis, Adriana E.González–Vélez, Horacio2022-07-15T08:50:32Z20222022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.22/20675eng1569-190X10.1016/j.simpat.2022.102616info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-03-13T13:16:12Zoai:recipp.ipp.pt:10400.22/20675Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:40:43.530507Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title	Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
spellingShingle	Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly García-Méndez, Silvia Classification Data reliability Stream processing Synthetic data Data fabrication Wiki contributors
title_short	Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_full	Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_fullStr	Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_full_unstemmed	Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_sort	Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
author	García-Méndez, Silvia
author_facet	García-Méndez, Silvia Leal, Fátima Malheiro, Benedita Burguillo-Rial, Juan Carlos Veloso, Bruno Chis, Adriana E. González–Vélez, Horacio
author_role	author
author2	Leal, Fátima Malheiro, Benedita Burguillo-Rial, Juan Carlos Veloso, Bruno Chis, Adriana E. González–Vélez, Horacio
author2_role	author author author author author author
dc.contributor.none.fl_str_mv	Repositório Científico do Instituto Politécnico do Porto
dc.contributor.author.fl_str_mv	García-Méndez, Silvia Leal, Fátima Malheiro, Benedita Burguillo-Rial, Juan Carlos Veloso, Bruno Chis, Adriana E. González–Vélez, Horacio
dc.subject.por.fl_str_mv	Classification Data reliability Stream processing Synthetic data Data fabrication Wiki contributors
topic	Classification Data reliability Stream processing Synthetic data Data fabrication Wiki contributors
description	Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.
publishDate	2022
dc.date.none.fl_str_mv	2022-07-15T08:50:32Z 2022 2022-01-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10400.22/20675
url	http://hdl.handle.net/10400.22/20675
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	1569-190X 10.1016/j.simpat.2022.102616
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Elsevier
publisher.none.fl_str_mv	Elsevier
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799131495581876224

Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly

Registros relacionados