Understanding the search space of methods for automatically designing graph neural networks
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFMG |
Texto Completo: | http://hdl.handle.net/1843/47526 https://orcid.org/0000-0001-5975-7903 |
Resumo: | Graph-structured data has become increasingly available and, due to its ubiquity, an object of study in many areas of research. Due to the absence of the notion of sequence in graphs, Machine Learning (ML) methods have historically struggled to work on this data. Specialized methods for performing ML over graph data have gained a lot of attention from the research community, especially Graph Neural Networks (GNNs), which have been extensively used over real-world data, achieving state-of-the-art results in tasks such as circuit design, movie recommendation, and anomaly detection. Many GNN models have been recently proposed, and choosing the best model for each problem has become a cumbersome and error-prone task. Aiming at mitigating this problem, recent works have proposed strategies for applying Neural Architecture Search (NAS) - a set of methods designed to automatically configure neural networks, very successful on Convolutional Neural Networks, that deal with image data - to GNN models. Automatically configured GNNs have achieved high performance results, surpassing human-crafted ones. However, the NAS for GNNs literature is still in its early stages, and methods that have been successfully applied for NAS in CNNs have yet to be tested on GNNs as well. In this work we have conducted a comprehensive comparative analysis of a proposed Evolutionary Algorithm against a literature Reinforcement Learning and a simple Random Search baseline, considering 7 real-world datasets and two search spaces. We have shown that Random Search is just as effective in finding good performing architectures as other more complex methods. Another interesting finding is that all three search methods converge very early in the search (in about 10% of the budget). We hypothesized that this might have been happening due to the presence of Neutrality (regions in which all solutions have very similar performance values) in the search space. Shifting the focus from the first part of this work, in the second part we have conducted an extensive visual and analytical evaluation of one of the literature's search spaces, using dimensionality reduction and Fitness Landscape Analysis techniques. We have demonstrated that the search space for NAS in GNNs presents high searchability (i.e. it is not difficult for algorithms to explore and find a suitable solution) and neutrality (i.e. there are many regions in the search space in which the performance of the neighboring solutions are relatively equal). We hypothesize that in the future, less expensive methods can be used to perform the optimization in this scenario without loss of generality. |
id |
UFMG_2d28dc1165e5b59c0398b04dcc00ddc4 |
---|---|
oai_identifier_str |
oai:repositorio.ufmg.br:1843/47526 |
network_acronym_str |
UFMG |
network_name_str |
Repositório Institucional da UFMG |
repository_id_str |
|
spelling |
Gisele Lobo Pappahttp://lattes.cnpq.br/5936682335701497Fabrício Murai FerreiraNuno Lourençohttp://lattes.cnpq.br/9801186721884441Matheus Henrique do Nascimento Nunes2022-11-29T12:32:33Z2022-11-29T12:32:33Z2021-12-07http://hdl.handle.net/1843/47526https://orcid.org/0000-0001-5975-7903Graph-structured data has become increasingly available and, due to its ubiquity, an object of study in many areas of research. Due to the absence of the notion of sequence in graphs, Machine Learning (ML) methods have historically struggled to work on this data. Specialized methods for performing ML over graph data have gained a lot of attention from the research community, especially Graph Neural Networks (GNNs), which have been extensively used over real-world data, achieving state-of-the-art results in tasks such as circuit design, movie recommendation, and anomaly detection. Many GNN models have been recently proposed, and choosing the best model for each problem has become a cumbersome and error-prone task. Aiming at mitigating this problem, recent works have proposed strategies for applying Neural Architecture Search (NAS) - a set of methods designed to automatically configure neural networks, very successful on Convolutional Neural Networks, that deal with image data - to GNN models. Automatically configured GNNs have achieved high performance results, surpassing human-crafted ones. However, the NAS for GNNs literature is still in its early stages, and methods that have been successfully applied for NAS in CNNs have yet to be tested on GNNs as well. In this work we have conducted a comprehensive comparative analysis of a proposed Evolutionary Algorithm against a literature Reinforcement Learning and a simple Random Search baseline, considering 7 real-world datasets and two search spaces. We have shown that Random Search is just as effective in finding good performing architectures as other more complex methods. Another interesting finding is that all three search methods converge very early in the search (in about 10% of the budget). We hypothesized that this might have been happening due to the presence of Neutrality (regions in which all solutions have very similar performance values) in the search space. Shifting the focus from the first part of this work, in the second part we have conducted an extensive visual and analytical evaluation of one of the literature's search spaces, using dimensionality reduction and Fitness Landscape Analysis techniques. We have demonstrated that the search space for NAS in GNNs presents high searchability (i.e. it is not difficult for algorithms to explore and find a suitable solution) and neutrality (i.e. there are many regions in the search space in which the performance of the neighboring solutions are relatively equal). We hypothesize that in the future, less expensive methods can be used to perform the optimization in this scenario without loss of generality.Dados estruturados em formato de grafos têm se tornado cada vez mais disponíveis, e devido à sua ubiquidade, têm se tornado objeto de estudo em várias áreas de pesquisa. Dada a ausência da noção de sequência entre elementos em um grafo, algoritmos de Aprendizado de Máquina (ML, em inglês) têm historicamente enfrentado dificuldades em trabalhar com este tipo de dados. Métodos especializados para grafos têm ganhado atenção da comunidade de pesquisa recentemente, especialmente as Redes Neurais para Grafos (GNNs, em inglês), que têm sido extensivamente utilizadas em dados reais, alcançando resultados estado-da-arte em tarefas como projeto de circuitos, recomendação de filmes e detecção de anomalias. Uma gama de modelos de GNN foi proposta recentemente, e escolher o melhor modelo para cada tarefa tem se tornado uma tarefa complicada e propensa a erros. Objetivando mitigar este problema, trabalhos recentes têm investigado estratégias para aplicar Busca de Arquitetura Neurais (NAS, em inglês) - um conjunto de métodos projetados para automaticamente configurar redes neurais, que têm obtido muito sucesso em Redes Neurais Convolucionais (CNNs, em inglês), que lidam com imagens - para modelos de GNN. GNNs automaticamente configuradas têm alcançado bons resultados em performance, superando redes configuradas por humanos. Porém, a literatura de NAS para GNNs ainda está em seus estágios iniciais, e métodos que foram aplicados com sucesso para NAS em CNNs, ainda não foram testados para GNNs. O foco deste trabalho é conduzir uma análise comparativa compreensiva de um Algoritmo Evolucionario proposto, contra um algoritmo de Aprendizado por Reforço da literatura, e uma Busca Aleatória como baseline, considerando 7 datasets reais, e dois espaços de busca. É demonstrado que a Busca Aleatória é tão efetiva quanto outros métodos mais complexos, em encontrar boas arquiteturas de GNN. Outro achado interessante é de que todos os três métodos convergem bem cedo na busca (utilizando aproximadamente 10% da cota). A hipótese é de que isto acontece devido à presença de Neutralidade no espaço (regiões do espaço em que todas as soluções tem valores de performance parecidas). Em uma segunda etapa do trabalho, o foco é em conduzir uma avaliação visual e analítica extensa de um dos espaços de busca da literatura, usando técnicas de redução de dimensionalidade e Fitness Landscape Analysis (FLA). É demonstrado que o espaço de busca para NAS em GNNs apresenta grande "Buscabilidade" (i.e. não é difícil para algoritmos explorar o espaço e encontrar boas soluções) e "Neutralidade" (i.e. existem várias regiões do espaço em que a performance de soluções vizinhas é relativamente igual). A hipótese é de que, no futuro, métodos menos computacionalmente custosos possam ser empregados para esta tarefa sem perda de generalidade.CNPq - Conselho Nacional de Desenvolvimento Científico e TecnológicoengUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em Ciência da ComputaçãoUFMGBrasilICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃOComputação - TesesRedes neurais ( Computação) - TesesAprendizado de máquina - TesesGraph Neural NetworksAutomated Machine LearningNeural Architecture SearchUnderstanding the search space of methods for automatically designing graph neural networksUma análise do espaço de busca de métodos para o design automático de graph neural networksinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGORIGINALdissertacao_fixed_pdfa.pdfdissertacao_fixed_pdfa.pdfapplication/pdf9778665https://repositorio.ufmg.br/bitstream/1843/47526/3/dissertacao_fixed_pdfa.pdf0b47e93ca63eebc7b7dacc6d79fc0d47MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-82118https://repositorio.ufmg.br/bitstream/1843/47526/4/license.txtcda590c95a0b51b4d15f60c9642ca272MD541843/475262022-11-29 09:32:34.309oai:repositorio.ufmg.br:1843/47526TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KRepositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2022-11-29T12:32:34Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false |
dc.title.pt_BR.fl_str_mv |
Understanding the search space of methods for automatically designing graph neural networks |
dc.title.alternative.pt_BR.fl_str_mv |
Uma análise do espaço de busca de métodos para o design automático de graph neural networks |
title |
Understanding the search space of methods for automatically designing graph neural networks |
spellingShingle |
Understanding the search space of methods for automatically designing graph neural networks Matheus Henrique do Nascimento Nunes Graph Neural Networks Automated Machine Learning Neural Architecture Search Computação - Teses Redes neurais ( Computação) - Teses Aprendizado de máquina - Teses |
title_short |
Understanding the search space of methods for automatically designing graph neural networks |
title_full |
Understanding the search space of methods for automatically designing graph neural networks |
title_fullStr |
Understanding the search space of methods for automatically designing graph neural networks |
title_full_unstemmed |
Understanding the search space of methods for automatically designing graph neural networks |
title_sort |
Understanding the search space of methods for automatically designing graph neural networks |
author |
Matheus Henrique do Nascimento Nunes |
author_facet |
Matheus Henrique do Nascimento Nunes |
author_role |
author |
dc.contributor.advisor1.fl_str_mv |
Gisele Lobo Pappa |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/5936682335701497 |
dc.contributor.referee1.fl_str_mv |
Fabrício Murai Ferreira |
dc.contributor.referee2.fl_str_mv |
Nuno Lourenço |
dc.contributor.authorLattes.fl_str_mv |
http://lattes.cnpq.br/9801186721884441 |
dc.contributor.author.fl_str_mv |
Matheus Henrique do Nascimento Nunes |
contributor_str_mv |
Gisele Lobo Pappa Fabrício Murai Ferreira Nuno Lourenço |
dc.subject.por.fl_str_mv |
Graph Neural Networks Automated Machine Learning Neural Architecture Search |
topic |
Graph Neural Networks Automated Machine Learning Neural Architecture Search Computação - Teses Redes neurais ( Computação) - Teses Aprendizado de máquina - Teses |
dc.subject.other.pt_BR.fl_str_mv |
Computação - Teses Redes neurais ( Computação) - Teses Aprendizado de máquina - Teses |
description |
Graph-structured data has become increasingly available and, due to its ubiquity, an object of study in many areas of research. Due to the absence of the notion of sequence in graphs, Machine Learning (ML) methods have historically struggled to work on this data. Specialized methods for performing ML over graph data have gained a lot of attention from the research community, especially Graph Neural Networks (GNNs), which have been extensively used over real-world data, achieving state-of-the-art results in tasks such as circuit design, movie recommendation, and anomaly detection. Many GNN models have been recently proposed, and choosing the best model for each problem has become a cumbersome and error-prone task. Aiming at mitigating this problem, recent works have proposed strategies for applying Neural Architecture Search (NAS) - a set of methods designed to automatically configure neural networks, very successful on Convolutional Neural Networks, that deal with image data - to GNN models. Automatically configured GNNs have achieved high performance results, surpassing human-crafted ones. However, the NAS for GNNs literature is still in its early stages, and methods that have been successfully applied for NAS in CNNs have yet to be tested on GNNs as well. In this work we have conducted a comprehensive comparative analysis of a proposed Evolutionary Algorithm against a literature Reinforcement Learning and a simple Random Search baseline, considering 7 real-world datasets and two search spaces. We have shown that Random Search is just as effective in finding good performing architectures as other more complex methods. Another interesting finding is that all three search methods converge very early in the search (in about 10% of the budget). We hypothesized that this might have been happening due to the presence of Neutrality (regions in which all solutions have very similar performance values) in the search space. Shifting the focus from the first part of this work, in the second part we have conducted an extensive visual and analytical evaluation of one of the literature's search spaces, using dimensionality reduction and Fitness Landscape Analysis techniques. We have demonstrated that the search space for NAS in GNNs presents high searchability (i.e. it is not difficult for algorithms to explore and find a suitable solution) and neutrality (i.e. there are many regions in the search space in which the performance of the neighboring solutions are relatively equal). We hypothesize that in the future, less expensive methods can be used to perform the optimization in this scenario without loss of generality. |
publishDate |
2021 |
dc.date.issued.fl_str_mv |
2021-12-07 |
dc.date.accessioned.fl_str_mv |
2022-11-29T12:32:33Z |
dc.date.available.fl_str_mv |
2022-11-29T12:32:33Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1843/47526 |
dc.identifier.orcid.pt_BR.fl_str_mv |
https://orcid.org/0000-0001-5975-7903 |
url |
http://hdl.handle.net/1843/47526 https://orcid.org/0000-0001-5975-7903 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Ciência da Computação |
dc.publisher.initials.fl_str_mv |
UFMG |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO |
publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG |
instname_str |
Universidade Federal de Minas Gerais (UFMG) |
instacron_str |
UFMG |
institution |
UFMG |
reponame_str |
Repositório Institucional da UFMG |
collection |
Repositório Institucional da UFMG |
bitstream.url.fl_str_mv |
https://repositorio.ufmg.br/bitstream/1843/47526/3/dissertacao_fixed_pdfa.pdf https://repositorio.ufmg.br/bitstream/1843/47526/4/license.txt |
bitstream.checksum.fl_str_mv |
0b47e93ca63eebc7b7dacc6d79fc0d47 cda590c95a0b51b4d15f60c9642ca272 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG) |
repository.mail.fl_str_mv |
|
_version_ |
1803589529103237120 |