Interpretable Classification of Wiki-Review Streams

Detalhes bibliográficos
Autor(a) principal: García-Méndez, Silvia
Data de Publicação: 2023
Outros Autores: Leal, Fátima, Malheiro, Benedita, Burguillo-Rial, Juan Carlos
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.22/25145
Resumo: Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and F-measure).
id RCAP_5ded4645958eaff0fd5a2d7bea9e3b85
oai_identifier_str oai:recipp.ipp.pt:10400.22/25145
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Interpretable Classification of Wiki-Review StreamsData reliability and fairnessData-stream processing and classificationSynthetic dataTransparencyVandalismWikisWiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and F-measure).IEEERepositório Científico do Instituto Politécnico do PortoGarcía-Méndez, SilviaLeal, FátimaMalheiro, BeneditaBurguillo-Rial, Juan Carlos2024-03-11T11:58:32Z2023-12-132023-12-13T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.22/25145engS. García-Méndez, F. Leal, B. Malheiro and J. C. Burguillo-Rial, "Interpretable Classification of Wiki-Review Streams," in IEEE Access, vol. 11, pp. 141137-141151, 2023, doi: 10.1109/ACCESS.2023.334247210.1109/ACCESS.2023.33424722169-3536info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-13T01:48:41Zoai:recipp.ipp.pt:10400.22/25145Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T04:00:30.836053Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Interpretable Classification of Wiki-Review Streams
title Interpretable Classification of Wiki-Review Streams
spellingShingle Interpretable Classification of Wiki-Review Streams
García-Méndez, Silvia
Data reliability and fairness
Data-stream processing and classification
Synthetic data
Transparency
Vandalism
Wikis
title_short Interpretable Classification of Wiki-Review Streams
title_full Interpretable Classification of Wiki-Review Streams
title_fullStr Interpretable Classification of Wiki-Review Streams
title_full_unstemmed Interpretable Classification of Wiki-Review Streams
title_sort Interpretable Classification of Wiki-Review Streams
author García-Méndez, Silvia
author_facet García-Méndez, Silvia
Leal, Fátima
Malheiro, Benedita
Burguillo-Rial, Juan Carlos
author_role author
author2 Leal, Fátima
Malheiro, Benedita
Burguillo-Rial, Juan Carlos
author2_role author
author
author
dc.contributor.none.fl_str_mv Repositório Científico do Instituto Politécnico do Porto
dc.contributor.author.fl_str_mv García-Méndez, Silvia
Leal, Fátima
Malheiro, Benedita
Burguillo-Rial, Juan Carlos
dc.subject.por.fl_str_mv Data reliability and fairness
Data-stream processing and classification
Synthetic data
Transparency
Vandalism
Wikis
topic Data reliability and fairness
Data-stream processing and classification
Synthetic data
Transparency
Vandalism
Wikis
description Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and F-measure).
publishDate 2023
dc.date.none.fl_str_mv 2023-12-13
2023-12-13T00:00:00Z
2024-03-11T11:58:32Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.22/25145
url http://hdl.handle.net/10400.22/25145
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv S. García-Méndez, F. Leal, B. Malheiro and J. C. Burguillo-Rial, "Interpretable Classification of Wiki-Review Streams," in IEEE Access, vol. 11, pp. 141137-141151, 2023, doi: 10.1109/ACCESS.2023.3342472
10.1109/ACCESS.2023.3342472
2169-3536
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv IEEE
publisher.none.fl_str_mv IEEE
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138180908187648