Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly

Detalhes bibliográficos
Autor(a) principal: García-Méndez, Silvia
Data de Publicação: 2022
Outros Autores: Leal, Fátima, Malheiro, Benedita, Burguillo-Rial, Juan Carlos, Veloso, Bruno, Chis, Adriana E., González-Vélez, Horacio
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/11328/4289
https://doi.org/10.1016/j.simpat.2022.102616
Resumo: Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.
id RCAP_ee3ad3ac6f7b40b793583d13929bf112
oai_identifier_str oai:repositorio.upt.pt:11328/4289
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the uglyClassificationData reliabilityStream processingSynthetic dataData fabricationWiki contributorsData crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.Elsevier2022-06-27T10:56:39Z2022-06-272022-06-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfimage/pngGarcía-Méndez, S., Leal, F., Malheiro, B., Burguillo-Rial, J. C., Veloso, B., Chis, A. E., & González-Vélez, H. (2022). Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly. Simulation Modelling Practice and Theory, 120, 102616, 1-13. https://doi.org/10.1016/j.simpat.2022.102616. Repositório Institucional UPT. http://hdl.handle.net/11328/4289http://hdl.handle.net/11328/4289García-Méndez, S., Leal, F., Malheiro, B., Burguillo-Rial, J. C., Veloso, B., Chis, A. E., & González-Vélez, H. (2022). Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly. Simulation Modelling Practice and Theory, 120, 102616, 1-13. https://doi.org/10.1016/j.simpat.2022.102616. Repositório Institucional UPT. http://hdl.handle.net/11328/4289http://hdl.handle.net/11328/4289https://doi.org/10.1016/j.simpat.2022.102616eng1569-190X (Print)http://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessGarcía-Méndez, SilviaLeal, FátimaMalheiro, BeneditaBurguillo-Rial, Juan CarlosVeloso, BrunoChis, Adriana E.González-Vélez, Horacioreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-16T02:12:23Zoai:repositorio.upt.pt:11328/4289Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:41:20.410553Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
spellingShingle Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
García-Méndez, Silvia
Classification
Data reliability
Stream processing
Synthetic data
Data fabrication
Wiki contributors
title_short Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_full Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_fullStr Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_full_unstemmed Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
title_sort Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly
author García-Méndez, Silvia
author_facet García-Méndez, Silvia
Leal, Fátima
Malheiro, Benedita
Burguillo-Rial, Juan Carlos
Veloso, Bruno
Chis, Adriana E.
González-Vélez, Horacio
author_role author
author2 Leal, Fátima
Malheiro, Benedita
Burguillo-Rial, Juan Carlos
Veloso, Bruno
Chis, Adriana E.
González-Vélez, Horacio
author2_role author
author
author
author
author
author
dc.contributor.author.fl_str_mv García-Méndez, Silvia
Leal, Fátima
Malheiro, Benedita
Burguillo-Rial, Juan Carlos
Veloso, Bruno
Chis, Adriana E.
González-Vélez, Horacio
dc.subject.por.fl_str_mv Classification
Data reliability
Stream processing
Synthetic data
Data fabrication
Wiki contributors
topic Classification
Data reliability
Stream processing
Synthetic data
Data fabrication
Wiki contributors
description Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.
publishDate 2022
dc.date.none.fl_str_mv 2022-06-27T10:56:39Z
2022-06-27
2022-06-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv García-Méndez, S., Leal, F., Malheiro, B., Burguillo-Rial, J. C., Veloso, B., Chis, A. E., & González-Vélez, H. (2022). Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly. Simulation Modelling Practice and Theory, 120, 102616, 1-13. https://doi.org/10.1016/j.simpat.2022.102616. Repositório Institucional UPT. http://hdl.handle.net/11328/4289
http://hdl.handle.net/11328/4289
García-Méndez, S., Leal, F., Malheiro, B., Burguillo-Rial, J. C., Veloso, B., Chis, A. E., & González-Vélez, H. (2022). Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly. Simulation Modelling Practice and Theory, 120, 102616, 1-13. https://doi.org/10.1016/j.simpat.2022.102616. Repositório Institucional UPT. http://hdl.handle.net/11328/4289
http://hdl.handle.net/11328/4289
https://doi.org/10.1016/j.simpat.2022.102616
identifier_str_mv García-Méndez, S., Leal, F., Malheiro, B., Burguillo-Rial, J. C., Veloso, B., Chis, A. E., & González-Vélez, H. (2022). Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly. Simulation Modelling Practice and Theory, 120, 102616, 1-13. https://doi.org/10.1016/j.simpat.2022.102616. Repositório Institucional UPT. http://hdl.handle.net/11328/4289
url http://hdl.handle.net/11328/4289
https://doi.org/10.1016/j.simpat.2022.102616
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1569-190X (Print)
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
image/png
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134976993656832