Supervised Data Augmentation

Fernandes, Paulo Alexandre Castillo

Supervised Data Augmentation

Detalhes bibliográficos
Autor(a) principal:	Fernandes, Paulo Alexandre Castillo
Data de Publicação:	2020
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10316/92165
Resumo:	Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia

Metadados do item

id	RCAP_fc43cadcce158d600cbbfe83ad6a4692
oai_identifier_str	oai:estudogeral.uc.pt:10316/92165
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Supervised Data AugmentationSupervised Data AugmentationAprendizagem MáquinaAumento de DadosRedes Generativas AdversariaisComputação EvolucionáriaExploração do Espaço LatenteMachine LearningData AugmentationGenerative Adversarial NetworkEvolutionary ComputationLatent Space ExplorationDissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e TecnologiaA Aprendizagem Máquina (AM) tem visto tremendos avanços nos últimos anos e está atualmente a invadir muitas áreas da tecnologia. Porém, a AM está dependete da informaçãoguardada em conjuntos de dados para aprender a realizar determinadas tarefas. Muitasvezes, os conjuntos de dados sofrem de desiquilíbrios e falta de informação, o que faz comque seja mais difícil treinar os modelos de AM. O caminho para resolver isto é através doAumento de Dados (AD). Apesar de já existirem muitas formas de realizar AD, ainda hápoucos estudos que procurem a utilização de Redes Generativas Adversariais (RGA) paraproduzir amostras para esta tarefa. Uma RGA é um modelo que é capaz de aprender adistribuição de um conjunto de dados através do treino e é capaz de gerar amostras consoante esta distribuição. A quantidade de amostras diferentes que uma RGA é capaz deproduzir é inumerável, e muitas destas amostra são ainda distintas das que são encontradasno conjunto de treino devido à combinação de diferentes características. O principal problema é o facto de normalmente não existir controlo sobre a geração de novas instâncias. Aliteratura indica que poderá ser necessário incluir uma forma extra de gerir esta geração.Isto levanta as questões "Como poderá ser feita esta gestão?" e "Como se poderá fazercom que uma RGA gere um determinado tipo de dados?".Nesta dissertação é explorada a utilização de RGAs para realizar AD ao mesmo tempo quetenta responder às questões anteriores. Desta forma, é proposta uma estrutura para realizarAD em conjuntos de images, a ELSEGAN (Rede Generativa Adversarial de ExploraçãoEvolucionária do Espaço Latente). Mais especificamente, irão ser usadas RGAs para gerarimagens para vários onjuntos de dados de modo a melhorar classificadores em tarefas deClassificação de Image (CI). Um módulo supervisor irá ser adicionado à RGA para gerir aprodução e adição de imagens. A supervisão é feita através deEste trabalho vai tentar procurar o uso de RGAs para realizar AD, ao mesmo tempoque tenta responder a estas questões. Mais especificamente, as RGAs serão usadas paragerar imagens para vários conjuntos de treino de modo a melhorar o desempenho de classificadores em tarefas de Classificação de Imagem (CI). Além do mais, à RGA irá seradicionado um módulo supervisor que vai gerir a geração e adição de imagens do gerador.A supervisão será inicialmente feita através de Computação Evolucionária (CE) que seráutilizada para explorar o espaço latente da RGA de modo a pesquisar conjuntos de imagensque optimizem um certo objetivo. Vários algoritmos serão explorados, tal como diferentesmétricas e crtérios de supervisão que acabarão por culminar em resultados distintos. Emúltima análise, a principal contribuição desta dissertação será uma estrutura de 3 módulospara realizar AD em conjuntos de imagens, a ELSEGAN (Rede Generativa Adversarial deExploração Evolucionária do Espaço Latente). Esta estrutura é composta por uma RGA,responsável por aprender a distribuição do conjunto original e gerar images conformemente,um supervisor, responsável por gerir a geração e filtragem de novas instâncias através daevolução de conjuntos de imagens recorrendo a CE, e um classificador, que irá avaliar aperformance de toda a estrutura.Machine Learning (ML) has seen tremendous advances in recent years and is currentlyinvading many areas of technology. ML, though, is dependent on stored information indatasets to learn how to perform a certain task. Many times, datasets suffer from imbalances and missing information which makes it more difficult to train ML models. The wayto solve this is by performing Data Augmentation (DA). Although there are many waysto perform DA, there are still only a few pieces of research that look into the usage ofGenerative Adversarial Networks (GANs) to produce samples for this task. A GAN is amodel that is able to learn the distribution of a dataset through training and can generatesamples according to this distribution. The amount of different samples that a GAN canproduce is innumerable, and many of these samples are even distinct from what is foundin the training set because of the combination of different features. The main issue is thatusually there is no control over how the model generates new samples. The literature indicates that it might be necessary to include an extra form of management in the generationphase of the GAN. This begs the questions "How can this management be done?" and"How can a GAN be made to generate a certain type of data?".In this dissertation is explored the usage of GANs to perform DA while also trying toanswer the previous questions. Thus, a framework is proposed for performing DA indatasets of images, the Evolutionary Latent Space Exploration Generative AdversarialNetwork (ELSEGAN). More specifically, GANs will be used to generate images for severaldatasets in order to improve classifiers in tasks of Image Classification (IC). A supervisormodule will be added to the GAN to manage the generation and addition of images. Thesupervision is performed by the Evolutionary Computation (EC) that is used to explorethe latent space of the GAN and search for sets of images that optimise a certain objective.Different EC algorithms were explored as well as different metrics and criteria for supervision.Finally, a classifier is used to attest the performance of the models that were created bythe framework and DA approaches.The first experiments were centred around the process of supervision and exploration ofthe latent space using EC. The exploration featured three EC algorithms, namely RandomSampling (RS), Genetic Algorithm (GA) and Multi-dimensional Archive of PhenotypicElites (MAP-Elites), guided by an image similarity criteria that was tested using twodistinct images similarity metrics, Root-Mean-Squared Error (RMSE) and NormalizedCross-Correlation (NCC). Overall, the experiments performed on a set of image datasetsshow that it is possible to guide the exploration of latent space with EC to find sets ofimages that show optimisation of a certain criterion.In a second set of experiments, the ELSEGAN was put to the test by performing DAwithin the context of a real-world problem using the Human Sperm Head Morphology(HuSHeM) dataset, a bio-medicine multi-class problem with a small number of samplesthat provides a challenge to the different supervised classification approaches. Additionally,another method of supervision was explored that was guided by the loss of a previouslytrained classifier. Furthermore, the possibility of a new process of training that featureddynamic DA was also tested. The results of classifier performance tests, revealed that theclassifiers that were trained with DA showed an overall improvement over those with noDA, increasing the performance by more than 5% in some cases.In the end, the experimental results attained in the experiments throughout the dissertationshow the validity and potential of the ELSEGAN approach.Universidade de Coimbra - 6 meses, 388,81€ por mês2020-11-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesishttp://hdl.handle.net/10316/92165http://hdl.handle.net/10316/92165TID:202553884engFernandes, Paulo Alexandre Castilloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2022-05-25T04:56:00Zoai:estudogeral.uc.pt:10316/92165Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:11:19.980231Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Supervised Data Augmentation Supervised Data Augmentation
title	Supervised Data Augmentation
spellingShingle	Supervised Data Augmentation Fernandes, Paulo Alexandre Castillo Aprendizagem Máquina Aumento de Dados Redes Generativas Adversariais Computação Evolucionária Exploração do Espaço Latente Machine Learning Data Augmentation Generative Adversarial Network Evolutionary Computation Latent Space Exploration
title_short	Supervised Data Augmentation
title_full	Supervised Data Augmentation
title_fullStr	Supervised Data Augmentation
title_full_unstemmed	Supervised Data Augmentation
title_sort	Supervised Data Augmentation
author	Fernandes, Paulo Alexandre Castillo
author_facet	Fernandes, Paulo Alexandre Castillo
author_role	author
dc.contributor.author.fl_str_mv	Fernandes, Paulo Alexandre Castillo
dc.subject.por.fl_str_mv	Aprendizagem Máquina Aumento de Dados Redes Generativas Adversariais Computação Evolucionária Exploração do Espaço Latente Machine Learning Data Augmentation Generative Adversarial Network Evolutionary Computation Latent Space Exploration
topic	Aprendizagem Máquina Aumento de Dados Redes Generativas Adversariais Computação Evolucionária Exploração do Espaço Latente Machine Learning Data Augmentation Generative Adversarial Network Evolutionary Computation Latent Space Exploration
description	Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia
publishDate	2020
dc.date.none.fl_str_mv	2020-11-04
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10316/92165 http://hdl.handle.net/10316/92165 TID:202553884
url	http://hdl.handle.net/10316/92165
identifier_str_mv	TID:202553884
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799134010130038784

Supervised Data Augmentation

Registros relacionados