Understanding the search space of methods for automatically designing graph neural networks

Detalhes bibliográficos
Autor(a) principal: Matheus Henrique do Nascimento Nunes
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UFMG
Texto Completo: http://hdl.handle.net/1843/47526
https://orcid.org/0000-0001-5975-7903
Resumo: Graph-structured data has become increasingly available and, due to its ubiquity, an object of study in many areas of research. Due to the absence of the notion of sequence in graphs, Machine Learning (ML) methods have historically struggled to work on this data. Specialized methods for performing ML over graph data have gained a lot of attention from the research community, especially Graph Neural Networks (GNNs), which have been extensively used over real-world data, achieving state-of-the-art results in tasks such as circuit design, movie recommendation, and anomaly detection. Many GNN models have been recently proposed, and choosing the best model for each problem has become a cumbersome and error-prone task. Aiming at mitigating this problem, recent works have proposed strategies for applying Neural Architecture Search (NAS) - a set of methods designed to automatically configure neural networks, very successful on Convolutional Neural Networks, that deal with image data - to GNN models. Automatically configured GNNs have achieved high performance results, surpassing human-crafted ones. However, the NAS for GNNs literature is still in its early stages, and methods that have been successfully applied for NAS in CNNs have yet to be tested on GNNs as well. In this work we have conducted a comprehensive comparative analysis of a proposed Evolutionary Algorithm against a literature Reinforcement Learning and a simple Random Search baseline, considering 7 real-world datasets and two search spaces. We have shown that Random Search is just as effective in finding good performing architectures as other more complex methods. Another interesting finding is that all three search methods converge very early in the search (in about 10% of the budget). We hypothesized that this might have been happening due to the presence of Neutrality (regions in which all solutions have very similar performance values) in the search space. Shifting the focus from the first part of this work, in the second part we have conducted an extensive visual and analytical evaluation of one of the literature's search spaces, using dimensionality reduction and Fitness Landscape Analysis techniques. We have demonstrated that the search space for NAS in GNNs presents high searchability (i.e. it is not difficult for algorithms to explore and find a suitable solution) and neutrality (i.e. there are many regions in the search space in which the performance of the neighboring solutions are relatively equal). We hypothesize that in the future, less expensive methods can be used to perform the optimization in this scenario without loss of generality.
id UFMG_2d28dc1165e5b59c0398b04dcc00ddc4
oai_identifier_str oai:repositorio.ufmg.br:1843/47526
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling Gisele Lobo Pappahttp://lattes.cnpq.br/5936682335701497Fabrício Murai FerreiraNuno Lourençohttp://lattes.cnpq.br/9801186721884441Matheus Henrique do Nascimento Nunes2022-11-29T12:32:33Z2022-11-29T12:32:33Z2021-12-07http://hdl.handle.net/1843/47526https://orcid.org/0000-0001-5975-7903Graph-structured data has become increasingly available and, due to its ubiquity, an object of study in many areas of research. Due to the absence of the notion of sequence in graphs, Machine Learning (ML) methods have historically struggled to work on this data. Specialized methods for performing ML over graph data have gained a lot of attention from the research community, especially Graph Neural Networks (GNNs), which have been extensively used over real-world data, achieving state-of-the-art results in tasks such as circuit design, movie recommendation, and anomaly detection. Many GNN models have been recently proposed, and choosing the best model for each problem has become a cumbersome and error-prone task. Aiming at mitigating this problem, recent works have proposed strategies for applying Neural Architecture Search (NAS) - a set of methods designed to automatically configure neural networks, very successful on Convolutional Neural Networks, that deal with image data - to GNN models. Automatically configured GNNs have achieved high performance results, surpassing human-crafted ones. However, the NAS for GNNs literature is still in its early stages, and methods that have been successfully applied for NAS in CNNs have yet to be tested on GNNs as well. In this work we have conducted a comprehensive comparative analysis of a proposed Evolutionary Algorithm against a literature Reinforcement Learning and a simple Random Search baseline, considering 7 real-world datasets and two search spaces. We have shown that Random Search is just as effective in finding good performing architectures as other more complex methods. Another interesting finding is that all three search methods converge very early in the search (in about 10% of the budget). We hypothesized that this might have been happening due to the presence of Neutrality (regions in which all solutions have very similar performance values) in the search space. Shifting the focus from the first part of this work, in the second part we have conducted an extensive visual and analytical evaluation of one of the literature's search spaces, using dimensionality reduction and Fitness Landscape Analysis techniques. We have demonstrated that the search space for NAS in GNNs presents high searchability (i.e. it is not difficult for algorithms to explore and find a suitable solution) and neutrality (i.e. there are many regions in the search space in which the performance of the neighboring solutions are relatively equal). We hypothesize that in the future, less expensive methods can be used to perform the optimization in this scenario without loss of generality.Dados estruturados em formato de grafos têm se tornado cada vez mais disponíveis, e devido à sua ubiquidade, têm se tornado objeto de estudo em várias áreas de pesquisa. Dada a ausência da noção de sequência entre elementos em um grafo, algoritmos de Aprendizado de Máquina (ML, em inglês) têm historicamente enfrentado dificuldades em trabalhar com este tipo de dados. Métodos especializados para grafos têm ganhado atenção da comunidade de pesquisa recentemente, especialmente as Redes Neurais para Grafos (GNNs, em inglês), que têm sido extensivamente utilizadas em dados reais, alcançando resultados estado-da-arte em tarefas como projeto de circuitos, recomendação de filmes e detecção de anomalias. Uma gama de modelos de GNN foi proposta recentemente, e escolher o melhor modelo para cada tarefa tem se tornado uma tarefa complicada e propensa a erros. Objetivando mitigar este problema, trabalhos recentes têm investigado estratégias para aplicar Busca de Arquitetura Neurais (NAS, em inglês) - um conjunto de métodos projetados para automaticamente configurar redes neurais, que têm obtido muito sucesso em Redes Neurais Convolucionais (CNNs, em inglês), que lidam com imagens - para modelos de GNN. GNNs automaticamente configuradas têm alcançado bons resultados em performance, superando redes configuradas por humanos. Porém, a literatura de NAS para GNNs ainda está em seus estágios iniciais, e métodos que foram aplicados com sucesso para NAS em CNNs, ainda não foram testados para GNNs. O foco deste trabalho é conduzir uma análise comparativa compreensiva de um Algoritmo Evolucionario proposto, contra um algoritmo de Aprendizado por Reforço da literatura, e uma Busca Aleatória como baseline, considerando 7 datasets reais, e dois espaços de busca. É demonstrado que a Busca Aleatória é tão efetiva quanto outros métodos mais complexos, em encontrar boas arquiteturas de GNN. Outro achado interessante é de que todos os três métodos convergem bem cedo na busca (utilizando aproximadamente 10% da cota). A hipótese é de que isto acontece devido à presença de Neutralidade no espaço (regiões do espaço em que todas as soluções tem valores de performance parecidas). Em uma segunda etapa do trabalho, o foco é em conduzir uma avaliação visual e analítica extensa de um dos espaços de busca da literatura, usando técnicas de redução de dimensionalidade e Fitness Landscape Analysis (FLA). É demonstrado que o espaço de busca para NAS em GNNs apresenta grande "Buscabilidade" (i.e. não é difícil para algoritmos explorar o espaço e encontrar boas soluções) e "Neutralidade" (i.e. existem várias regiões do espaço em que a performance de soluções vizinhas é relativamente igual). A hipótese é de que, no futuro, métodos menos computacionalmente custosos possam ser empregados para esta tarefa sem perda de generalidade.CNPq - Conselho Nacional de Desenvolvimento Científico e TecnológicoengUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em Ciência da ComputaçãoUFMGBrasilICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃOComputação - TesesRedes neurais ( Computação) - TesesAprendizado de máquina - TesesGraph Neural NetworksAutomated Machine LearningNeural Architecture SearchUnderstanding the search space of methods for automatically designing graph neural networksUma análise do espaço de busca de métodos para o design automático de graph neural networksinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGORIGINALdissertacao_fixed_pdfa.pdfdissertacao_fixed_pdfa.pdfapplication/pdf9778665https://repositorio.ufmg.br/bitstream/1843/47526/3/dissertacao_fixed_pdfa.pdf0b47e93ca63eebc7b7dacc6d79fc0d47MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-82118https://repositorio.ufmg.br/bitstream/1843/47526/4/license.txtcda590c95a0b51b4d15f60c9642ca272MD541843/475262022-11-29 09:32:34.309oai:repositorio.ufmg.br:1843/47526TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KRepositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2022-11-29T12:32:34Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv Understanding the search space of methods for automatically designing graph neural networks
dc.title.alternative.pt_BR.fl_str_mv Uma análise do espaço de busca de métodos para o design automático de graph neural networks
title Understanding the search space of methods for automatically designing graph neural networks
spellingShingle Understanding the search space of methods for automatically designing graph neural networks
Matheus Henrique do Nascimento Nunes
Graph Neural Networks
Automated Machine Learning
Neural Architecture Search
Computação - Teses
Redes neurais ( Computação) - Teses
Aprendizado de máquina - Teses
title_short Understanding the search space of methods for automatically designing graph neural networks
title_full Understanding the search space of methods for automatically designing graph neural networks
title_fullStr Understanding the search space of methods for automatically designing graph neural networks
title_full_unstemmed Understanding the search space of methods for automatically designing graph neural networks
title_sort Understanding the search space of methods for automatically designing graph neural networks
author Matheus Henrique do Nascimento Nunes
author_facet Matheus Henrique do Nascimento Nunes
author_role author
dc.contributor.advisor1.fl_str_mv Gisele Lobo Pappa
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/5936682335701497
dc.contributor.referee1.fl_str_mv Fabrício Murai Ferreira
dc.contributor.referee2.fl_str_mv Nuno Lourenço
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/9801186721884441
dc.contributor.author.fl_str_mv Matheus Henrique do Nascimento Nunes
contributor_str_mv Gisele Lobo Pappa
Fabrício Murai Ferreira
Nuno Lourenço
dc.subject.por.fl_str_mv Graph Neural Networks
Automated Machine Learning
Neural Architecture Search
topic Graph Neural Networks
Automated Machine Learning
Neural Architecture Search
Computação - Teses
Redes neurais ( Computação) - Teses
Aprendizado de máquina - Teses
dc.subject.other.pt_BR.fl_str_mv Computação - Teses
Redes neurais ( Computação) - Teses
Aprendizado de máquina - Teses
description Graph-structured data has become increasingly available and, due to its ubiquity, an object of study in many areas of research. Due to the absence of the notion of sequence in graphs, Machine Learning (ML) methods have historically struggled to work on this data. Specialized methods for performing ML over graph data have gained a lot of attention from the research community, especially Graph Neural Networks (GNNs), which have been extensively used over real-world data, achieving state-of-the-art results in tasks such as circuit design, movie recommendation, and anomaly detection. Many GNN models have been recently proposed, and choosing the best model for each problem has become a cumbersome and error-prone task. Aiming at mitigating this problem, recent works have proposed strategies for applying Neural Architecture Search (NAS) - a set of methods designed to automatically configure neural networks, very successful on Convolutional Neural Networks, that deal with image data - to GNN models. Automatically configured GNNs have achieved high performance results, surpassing human-crafted ones. However, the NAS for GNNs literature is still in its early stages, and methods that have been successfully applied for NAS in CNNs have yet to be tested on GNNs as well. In this work we have conducted a comprehensive comparative analysis of a proposed Evolutionary Algorithm against a literature Reinforcement Learning and a simple Random Search baseline, considering 7 real-world datasets and two search spaces. We have shown that Random Search is just as effective in finding good performing architectures as other more complex methods. Another interesting finding is that all three search methods converge very early in the search (in about 10% of the budget). We hypothesized that this might have been happening due to the presence of Neutrality (regions in which all solutions have very similar performance values) in the search space. Shifting the focus from the first part of this work, in the second part we have conducted an extensive visual and analytical evaluation of one of the literature's search spaces, using dimensionality reduction and Fitness Landscape Analysis techniques. We have demonstrated that the search space for NAS in GNNs presents high searchability (i.e. it is not difficult for algorithms to explore and find a suitable solution) and neutrality (i.e. there are many regions in the search space in which the performance of the neighboring solutions are relatively equal). We hypothesize that in the future, less expensive methods can be used to perform the optimization in this scenario without loss of generality.
publishDate 2021
dc.date.issued.fl_str_mv 2021-12-07
dc.date.accessioned.fl_str_mv 2022-11-29T12:32:33Z
dc.date.available.fl_str_mv 2022-11-29T12:32:33Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1843/47526
dc.identifier.orcid.pt_BR.fl_str_mv https://orcid.org/0000-0001-5975-7903
url http://hdl.handle.net/1843/47526
https://orcid.org/0000-0001-5975-7903
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv UFMG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br/bitstream/1843/47526/3/dissertacao_fixed_pdfa.pdf
https://repositorio.ufmg.br/bitstream/1843/47526/4/license.txt
bitstream.checksum.fl_str_mv 0b47e93ca63eebc7b7dacc6d79fc0d47
cda590c95a0b51b4d15f60c9642ca272
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_ 1803589529103237120