Improving Active Learning Performance through the Use of Data Augmentation
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/149803 |
Resumo: | Fonseca, J., & Bação, F. (2023). Improving Active Learning Performance through the Use of Data Augmentation. International Journal of Intelligent Systems, 2023, 1-17. https://doi.org/10.1155/2023/7941878 --- Funding: This research was supported by three research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciencia e a Tecnologia”): SFRH/BD/151473/2021 - MIT Portugal PhD Grant; DSAIPA/DS/0116/2019, and PCIF/SSI/0102/2017. |
id |
RCAP_6c68151cf006cb0742b14ad0ac7a35fc |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/149803 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Improving Active Learning Performance through the Use of Data AugmentationSoftwareTheoretical Computer ScienceHuman-Computer InteractionArtificial IntelligenceSDG 11 - Sustainable Cities and CommunitiesFonseca, J., & Bação, F. (2023). Improving Active Learning Performance through the Use of Data Augmentation. International Journal of Intelligent Systems, 2023, 1-17. https://doi.org/10.1155/2023/7941878 --- Funding: This research was supported by three research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciencia e a Tecnologia”): SFRH/BD/151473/2021 - MIT Portugal PhD Grant; DSAIPA/DS/0116/2019, and PCIF/SSI/0102/2017.Active learning (AL) is a well-known technique to optimize data usage in training, through the interactive selection of unlabeled observations, out of a large pool of unlabeled data, to be labeled by a supervisor. Its focus is to find the unlabeled observations that, once labeled, will maximize the informativeness of the training dataset, therefore reducing data-related costs. The literature describes several methods to improve the effectiveness of this process. Nonetheless, there is a paucity of research developed around the application of artificial data sources in AL, especially outside image classification or NLP. This paper proposes a new AL framework, which relies on the effective use of artificial data. It may be used with any classifier, generation mechanism, and data type and can be integrated with multiple other state-of-the-art AL contributions. This combination is expected to increase the ML classifier’s performance and reduce both the supervisor’s involvement and the amount of required labeled data at the expense of a marginal increase in computational time. The proposed method introduces a hyperparameter optimization component to improve the generation of artificial instances during the AL process as well as an uncertainty-based data generation mechanism. We compare the proposed method to the standard framework and an oversampling-based active learning method for more informed data generation in an AL context. The models’ performance was tested using four different classifiers, two AL-specific performance metrics, and three classification performance metrics over 15 different datasets. We demonstrated that the proposed framework, using data augmentation, significantly improved the performance of AL, both in terms of classification performance and data selection efficiency (all the codes and preprocessed data developed for this study are available at https://github.com/joaopfonseca/publications/).NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolRUNFonseca, JoãoBação, Fernando2023-02-27T22:23:57Z2023-02-202023-02-20T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article17application/pdfhttp://hdl.handle.net/10362/149803eng0884-8173PURE: 54410436https://doi.org/10.1155/2023/7941878info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:31:39Zoai:run.unl.pt:10362/149803Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:53:52.212127Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Improving Active Learning Performance through the Use of Data Augmentation |
title |
Improving Active Learning Performance through the Use of Data Augmentation |
spellingShingle |
Improving Active Learning Performance through the Use of Data Augmentation Fonseca, João Software Theoretical Computer Science Human-Computer Interaction Artificial Intelligence SDG 11 - Sustainable Cities and Communities |
title_short |
Improving Active Learning Performance through the Use of Data Augmentation |
title_full |
Improving Active Learning Performance through the Use of Data Augmentation |
title_fullStr |
Improving Active Learning Performance through the Use of Data Augmentation |
title_full_unstemmed |
Improving Active Learning Performance through the Use of Data Augmentation |
title_sort |
Improving Active Learning Performance through the Use of Data Augmentation |
author |
Fonseca, João |
author_facet |
Fonseca, João Bação, Fernando |
author_role |
author |
author2 |
Bação, Fernando |
author2_role |
author |
dc.contributor.none.fl_str_mv |
NOVA Information Management School (NOVA IMS) Information Management Research Center (MagIC) - NOVA Information Management School RUN |
dc.contributor.author.fl_str_mv |
Fonseca, João Bação, Fernando |
dc.subject.por.fl_str_mv |
Software Theoretical Computer Science Human-Computer Interaction Artificial Intelligence SDG 11 - Sustainable Cities and Communities |
topic |
Software Theoretical Computer Science Human-Computer Interaction Artificial Intelligence SDG 11 - Sustainable Cities and Communities |
description |
Fonseca, J., & Bação, F. (2023). Improving Active Learning Performance through the Use of Data Augmentation. International Journal of Intelligent Systems, 2023, 1-17. https://doi.org/10.1155/2023/7941878 --- Funding: This research was supported by three research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciencia e a Tecnologia”): SFRH/BD/151473/2021 - MIT Portugal PhD Grant; DSAIPA/DS/0116/2019, and PCIF/SSI/0102/2017. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-02-27T22:23:57Z 2023-02-20 2023-02-20T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/149803 |
url |
http://hdl.handle.net/10362/149803 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
0884-8173 PURE: 54410436 https://doi.org/10.1155/2023/7941878 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
17 application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138128895672320 |