Interpretable Classification of Wiki-Review Streams
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.22/25145 |
Resumo: | Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and F-measure). |
id |
RCAP_5ded4645958eaff0fd5a2d7bea9e3b85 |
---|---|
oai_identifier_str |
oai:recipp.ipp.pt:10400.22/25145 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Interpretable Classification of Wiki-Review StreamsData reliability and fairnessData-stream processing and classificationSynthetic dataTransparencyVandalismWikisWiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and F-measure).IEEERepositório Científico do Instituto Politécnico do PortoGarcía-Méndez, SilviaLeal, FátimaMalheiro, BeneditaBurguillo-Rial, Juan Carlos2024-03-11T11:58:32Z2023-12-132023-12-13T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.22/25145engS. García-Méndez, F. Leal, B. Malheiro and J. C. Burguillo-Rial, "Interpretable Classification of Wiki-Review Streams," in IEEE Access, vol. 11, pp. 141137-141151, 2023, doi: 10.1109/ACCESS.2023.334247210.1109/ACCESS.2023.33424722169-3536info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-13T01:48:41Zoai:recipp.ipp.pt:10400.22/25145Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T04:00:30.836053Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Interpretable Classification of Wiki-Review Streams |
title |
Interpretable Classification of Wiki-Review Streams |
spellingShingle |
Interpretable Classification of Wiki-Review Streams García-Méndez, Silvia Data reliability and fairness Data-stream processing and classification Synthetic data Transparency Vandalism Wikis |
title_short |
Interpretable Classification of Wiki-Review Streams |
title_full |
Interpretable Classification of Wiki-Review Streams |
title_fullStr |
Interpretable Classification of Wiki-Review Streams |
title_full_unstemmed |
Interpretable Classification of Wiki-Review Streams |
title_sort |
Interpretable Classification of Wiki-Review Streams |
author |
García-Méndez, Silvia |
author_facet |
García-Méndez, Silvia Leal, Fátima Malheiro, Benedita Burguillo-Rial, Juan Carlos |
author_role |
author |
author2 |
Leal, Fátima Malheiro, Benedita Burguillo-Rial, Juan Carlos |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
Repositório Científico do Instituto Politécnico do Porto |
dc.contributor.author.fl_str_mv |
García-Méndez, Silvia Leal, Fátima Malheiro, Benedita Burguillo-Rial, Juan Carlos |
dc.subject.por.fl_str_mv |
Data reliability and fairness Data-stream processing and classification Synthetic data Transparency Vandalism Wikis |
topic |
Data reliability and fairness Data-stream processing and classification Synthetic data Transparency Vandalism Wikis |
description |
Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90% values for all evaluation metrics (accuracy, precision, recall, and F-measure). |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-12-13 2023-12-13T00:00:00Z 2024-03-11T11:58:32Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.22/25145 |
url |
http://hdl.handle.net/10400.22/25145 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
S. García-Méndez, F. Leal, B. Malheiro and J. C. Burguillo-Rial, "Interpretable Classification of Wiki-Review Streams," in IEEE Access, vol. 11, pp. 141137-141151, 2023, doi: 10.1109/ACCESS.2023.3342472 10.1109/ACCESS.2023.3342472 2169-3536 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
IEEE |
publisher.none.fl_str_mv |
IEEE |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138180908187648 |