Improving Active Learning Performance through the Use of Data Augmentation

Detalhes bibliográficos
Autor(a) principal: Fonseca, João
Data de Publicação: 2023
Outros Autores: Bação, Fernando
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/149803
Resumo: Fonseca, J., & Bação, F. (2023). Improving Active Learning Performance through the Use of Data Augmentation. International Journal of Intelligent Systems, 2023, 1-17. https://doi.org/10.1155/2023/7941878 --- Funding: This research was supported by three research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciencia e a Tecnologia”): SFRH/BD/151473/2021 - MIT Portugal PhD Grant; DSAIPA/DS/0116/2019, and PCIF/SSI/0102/2017.
id RCAP_6c68151cf006cb0742b14ad0ac7a35fc
oai_identifier_str oai:run.unl.pt:10362/149803
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Improving Active Learning Performance through the Use of Data AugmentationSoftwareTheoretical Computer ScienceHuman-Computer InteractionArtificial IntelligenceSDG 11 - Sustainable Cities and CommunitiesFonseca, J., & Bação, F. (2023). Improving Active Learning Performance through the Use of Data Augmentation. International Journal of Intelligent Systems, 2023, 1-17. https://doi.org/10.1155/2023/7941878 --- Funding: This research was supported by three research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciencia e a Tecnologia”): SFRH/BD/151473/2021 - MIT Portugal PhD Grant; DSAIPA/DS/0116/2019, and PCIF/SSI/0102/2017.Active learning (AL) is a well-known technique to optimize data usage in training, through the interactive selection of unlabeled observations, out of a large pool of unlabeled data, to be labeled by a supervisor. Its focus is to find the unlabeled observations that, once labeled, will maximize the informativeness of the training dataset, therefore reducing data-related costs. The literature describes several methods to improve the effectiveness of this process. Nonetheless, there is a paucity of research developed around the application of artificial data sources in AL, especially outside image classification or NLP. This paper proposes a new AL framework, which relies on the effective use of artificial data. It may be used with any classifier, generation mechanism, and data type and can be integrated with multiple other state-of-the-art AL contributions. This combination is expected to increase the ML classifier’s performance and reduce both the supervisor’s involvement and the amount of required labeled data at the expense of a marginal increase in computational time. The proposed method introduces a hyperparameter optimization component to improve the generation of artificial instances during the AL process as well as an uncertainty-based data generation mechanism. We compare the proposed method to the standard framework and an oversampling-based active learning method for more informed data generation in an AL context. The models’ performance was tested using four different classifiers, two AL-specific performance metrics, and three classification performance metrics over 15 different datasets. We demonstrated that the proposed framework, using data augmentation, significantly improved the performance of AL, both in terms of classification performance and data selection efficiency (all the codes and preprocessed data developed for this study are available at https://github.com/joaopfonseca/publications/).NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolRUNFonseca, JoãoBação, Fernando2023-02-27T22:23:57Z2023-02-202023-02-20T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article17application/pdfhttp://hdl.handle.net/10362/149803eng0884-8173PURE: 54410436https://doi.org/10.1155/2023/7941878info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:31:39Zoai:run.unl.pt:10362/149803Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:53:52.212127Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Improving Active Learning Performance through the Use of Data Augmentation
title Improving Active Learning Performance through the Use of Data Augmentation
spellingShingle Improving Active Learning Performance through the Use of Data Augmentation
Fonseca, João
Software
Theoretical Computer Science
Human-Computer Interaction
Artificial Intelligence
SDG 11 - Sustainable Cities and Communities
title_short Improving Active Learning Performance through the Use of Data Augmentation
title_full Improving Active Learning Performance through the Use of Data Augmentation
title_fullStr Improving Active Learning Performance through the Use of Data Augmentation
title_full_unstemmed Improving Active Learning Performance through the Use of Data Augmentation
title_sort Improving Active Learning Performance through the Use of Data Augmentation
author Fonseca, João
author_facet Fonseca, João
Bação, Fernando
author_role author
author2 Bação, Fernando
author2_role author
dc.contributor.none.fl_str_mv NOVA Information Management School (NOVA IMS)
Information Management Research Center (MagIC) - NOVA Information Management School
RUN
dc.contributor.author.fl_str_mv Fonseca, João
Bação, Fernando
dc.subject.por.fl_str_mv Software
Theoretical Computer Science
Human-Computer Interaction
Artificial Intelligence
SDG 11 - Sustainable Cities and Communities
topic Software
Theoretical Computer Science
Human-Computer Interaction
Artificial Intelligence
SDG 11 - Sustainable Cities and Communities
description Fonseca, J., & Bação, F. (2023). Improving Active Learning Performance through the Use of Data Augmentation. International Journal of Intelligent Systems, 2023, 1-17. https://doi.org/10.1155/2023/7941878 --- Funding: This research was supported by three research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciencia e a Tecnologia”): SFRH/BD/151473/2021 - MIT Portugal PhD Grant; DSAIPA/DS/0116/2019, and PCIF/SSI/0102/2017.
publishDate 2023
dc.date.none.fl_str_mv 2023-02-27T22:23:57Z
2023-02-20
2023-02-20T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/149803
url http://hdl.handle.net/10362/149803
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0884-8173
PURE: 54410436
https://doi.org/10.1155/2023/7941878
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 17
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138128895672320