Algoritmo genético compacto com dominância para seleção de variáveis

Detalhes bibliográficos
Autor(a) principal: Nogueira, Heber Valdo
Data de Publicação: 2017
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Institucional da UFG
Texto Completo: http://repositorio.bc.ufg.br/tede/handle/tede/7360
Resumo: The features selection problem consists in to select a subset of attributes that is able to reduce computational processing and storage resources, decrease curse of dimensionality effects and improve the performance of predictive models. Among the strategies used to solve this type of problem, we highlight evolutionary algorithms, such as the Genetic Algorithm. Despite the relative success of the Genetic Algorithm in solving various types of problems, different improvements have been proposed in order to improve their performance. Such improvements focus mainly on population representation, search mechanisms, and evaluation methods. In one of these proposals, the Genetic Compact Algorithm (CGA) arose, which proposes new ways of representing the population and guide the search for better solutions. Applying this type of strategy to solve the problem of variable selection often involves overfitting. In this context, this work proposes the implementation of a version of the Compact Genetic Algorithm to minimize more than one objective simultaneously. Such algorithm makes use of the concept of Pareto dominance and, therefore, is called Genetic Algorithm Compacted with Dominance (CGAD). As a case study, to evaluate the performance of the proposed algorithm, AGC-D is combined with Multiple Linear Regression (MLR) to select variables to better predict protein concentration in wheat samples. The proposed algorithm is compared to CGA and the Mutation-based Compact Genetic Algorithm. The results indicate that the CGAD is able to select a small set of variables, reducing the prediction error of the calibration model, reducing the possibility of overfitting.
id UFG-2_bd56ebd281957d243c42c9999c3ea521
oai_identifier_str oai:repositorio.bc.ufg.br:tede/7360
network_acronym_str UFG-2
network_name_str Repositório Institucional da UFG
repository_id_str
spelling Soares, Anderson da Silvahttp://lattes.cnpq.br/1096941114079527Soares, Telma Woerle de Limahttp://lattes.cnpq.br/6296363436468330Soares, Anderson da SilvaSoares, Telma Woerle de LimaCoelho , Clarimar JoséDias , Jailson Cardosohttp://lattes.cnpq.br/2529656171716581Nogueira, Heber Valdo2017-05-23T11:37:51Z2017-04-20NOGUEIRA, H. V. Algoritmo genético compacto com dominância para seleção de variáveis. 2017. 64 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2017.http://repositorio.bc.ufg.br/tede/handle/tede/7360The features selection problem consists in to select a subset of attributes that is able to reduce computational processing and storage resources, decrease curse of dimensionality effects and improve the performance of predictive models. Among the strategies used to solve this type of problem, we highlight evolutionary algorithms, such as the Genetic Algorithm. Despite the relative success of the Genetic Algorithm in solving various types of problems, different improvements have been proposed in order to improve their performance. Such improvements focus mainly on population representation, search mechanisms, and evaluation methods. In one of these proposals, the Genetic Compact Algorithm (CGA) arose, which proposes new ways of representing the population and guide the search for better solutions. Applying this type of strategy to solve the problem of variable selection often involves overfitting. In this context, this work proposes the implementation of a version of the Compact Genetic Algorithm to minimize more than one objective simultaneously. Such algorithm makes use of the concept of Pareto dominance and, therefore, is called Genetic Algorithm Compacted with Dominance (CGAD). As a case study, to evaluate the performance of the proposed algorithm, AGC-D is combined with Multiple Linear Regression (MLR) to select variables to better predict protein concentration in wheat samples. The proposed algorithm is compared to CGA and the Mutation-based Compact Genetic Algorithm. The results indicate that the CGAD is able to select a small set of variables, reducing the prediction error of the calibration model, reducing the possibility of overfitting.O problema de seleção de variáveis consiste em selecionar um subconjunto de atributos que seja capaz reduzir os recursos computacionais de processamento e armazenamento, diminuir os efeitos da maldição da dimensionalidade e melhorar a performance de modelos de predição. Dentre as estratégias utilizadas para solucionar esse tipo de problema, destacam-se os algoritmos evolutivos, como o Algoritmo Genético. Apesar do relativo sucesso do Algoritmo Genético na solução de variados tipos de problemas, diferentes propostas de melhoria têm sido apresentadas no sentido de aprimorar seu desempenho. Tais melhorias focam, sobretudo, na representação da população, nos mecanismos de busca e nos métodos de avaliação. Em uma dessas propostas, surgiu o Algoritmo Genético Compacto (AGC), que propõe novas formas de representar a população e de conduzir a busca por melhores soluções. A aplicação desse tipo de estratégia para solucionar o problema de seleção de variáveis, muitas vezes implica no overfitting. Diversas pesquisas na área têm indicado a abordagem multiobjetivo pode ser capaz de mitigar esse tipo de problema. Nesse contexto, este trabalho propõe a implementação de uma versão do Algoritmo Genético Compacto capaz de minimizar mais de um objetivo simultaneamente. Tal algoritmo faz uso do conceito de dominância de Pareto e, por isso, é chamado de Algoritmo Genético Compacto com Dominância (AGC-D). Como estudo de caso, para avaliar o desempenho dos algoritmos propostos, o AGC-D é combinado com a Regressão Linear Múltipla (RLM) com o objetivo de selecionar variáveis para melhor predizer a concentração de proteína em amostras de trigo. O algoritmo proposto é comparado ao AGC e ao AGC com operador de mutação. Os resultados obtidos indicam que o AGC-D é capaz de selecionar um pequeno conjunto de variáveis, reduzindo o erro de predição do modelo de calibração e minimizando a possibilidade de overfitting.Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2017-05-23T11:37:07Z No. of bitstreams: 2 Dissertação - Heber Valdo Nogueira - 2017.pdf: 1812540 bytes, checksum: 14c0f7496303095925cd3ae974fd4b7b (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-05-23T11:37:50Z (GMT) No. of bitstreams: 2 Dissertação - Heber Valdo Nogueira - 2017.pdf: 1812540 bytes, checksum: 14c0f7496303095925cd3ae974fd4b7b (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)Made available in DSpace on 2017-05-23T11:37:51Z (GMT). No. of bitstreams: 2 Dissertação - Heber Valdo Nogueira - 2017.pdf: 1812540 bytes, checksum: 14c0f7496303095925cd3ae974fd4b7b (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-04-20application/pdfporUniversidade Federal de GoiásPrograma de Pós-graduação em Ciência da Computação (INF)UFGBrasilInstituto de Informática - INF (RG)http://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessSeleção de variáveisAlgoritmo genético compactoOtimização multiobjetivoFeature selectionCompact genetic algorithmMultiobjective optimizationCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOAlgoritmo genético compacto com dominância para seleção de variáveisCompact genetic algorithm with dominance for variable selectioninfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis-3303550325223384799600600600-77122667346336447683671711205811204509reponame:Repositório Institucional da UFGinstname:Universidade Federal de Goiás (UFG)instacron:UFGLICENSElicense.txtlicense.txttext/plain; charset=utf-82165http://repositorio.bc.ufg.br/tede/bitstreams/16e57d2f-2a76-408d-9953-5e1ea3aa4661/downloadbd3efa91386c1718a7f26a329fdcb468MD51CC-LICENSElicense_urllicense_urltext/plain; charset=utf-849http://repositorio.bc.ufg.br/tede/bitstreams/f78e5c26-c8d4-44db-960a-3f58d714ef0e/download4afdbb8c545fd630ea7db775da747b2fMD52license_textlicense_texttext/html; charset=utf-80http://repositorio.bc.ufg.br/tede/bitstreams/48a6b91a-8be3-44db-b99c-3d71538bc842/downloadd41d8cd98f00b204e9800998ecf8427eMD53license_rdflicense_rdfapplication/rdf+xml; charset=utf-80http://repositorio.bc.ufg.br/tede/bitstreams/b863fb58-509b-4675-b253-7a952f58ce73/downloadd41d8cd98f00b204e9800998ecf8427eMD54ORIGINALDissertação - Heber Valdo Nogueira - 2017.pdfDissertação - Heber Valdo Nogueira - 2017.pdfapplication/pdf1812540http://repositorio.bc.ufg.br/tede/bitstreams/da00ddd6-f9d0-4ec5-bc66-75be9101cd13/download14c0f7496303095925cd3ae974fd4b7bMD55tede/73602017-05-23 08:37:51.025http://creativecommons.org/licenses/by-nc-nd/4.0/Acesso Abertoopen.accessoai:repositorio.bc.ufg.br:tede/7360http://repositorio.bc.ufg.br/tedeRepositório InstitucionalPUBhttp://repositorio.bc.ufg.br/oai/requesttasesdissertacoes.bc@ufg.bropendoar:2017-05-23T11:37:51Repositório Institucional da UFG - Universidade Federal de Goiás (UFG)falseTk9UQTogQ09MT1FVRSBBUVVJIEEgU1VBIFBSw5NQUklBIExJQ0VOw4dBCkVzdGEgbGljZW7Dp2EgZGUgZXhlbXBsbyDDqSBmb3JuZWNpZGEgYXBlbmFzIHBhcmEgZmlucyBpbmZvcm1hdGl2b3MuCgpMSUNFTsOHQSBERSBESVNUUklCVUnDh8ODTyBOw4NPLUVYQ0xVU0lWQQoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgClhYWCAoU2lnbGEgZGEgVW5pdmVyc2lkYWRlKSBvIGRpcmVpdG8gbsOjby1leGNsdXNpdm8gZGUgcmVwcm9kdXppciwgIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IApkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlIAplbSBxdWFscXVlciBtZWlvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBwb2RlLCBzZW0gYWx0ZXJhciBvIGNvbnRlw7pkbywgdHJhbnNwb3IgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIApwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgU2lnbGEgZGUgVW5pdmVyc2lkYWRlIHBvZGUgbWFudGVyIG1haXMgZGUgdW1hIGPDs3BpYSBhIHN1YSB0ZXNlIG91IApkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyAKbmVzdGEgbGljZW7Dp2EuIFZvY8OqIHRhbWLDqW0gZGVjbGFyYSBxdWUgbyBkZXDDs3NpdG8gZGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBuw6NvLCBxdWUgc2VqYSBkZSBzZXUgCmNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiAKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSAKb3MgZGlyZWl0b3MgYXByZXNlbnRhZG9zIG5lc3RhIGxpY2Vuw6dhLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIAppZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdSBubyBjb250ZcO6ZG8gZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG9yYSBkZXBvc2l0YWRhLgoKQ0FTTyBBIFRFU0UgT1UgRElTU0VSVEHDh8ODTyBPUkEgREVQT1NJVEFEQSBURU5IQSBTSURPIFJFU1VMVEFETyBERSBVTSBQQVRST0PDjU5JTyBPVSAKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBTSUdMQSBERSAKVU5JVkVSU0lEQURFLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyAKVEFNQsOJTSBBUyBERU1BSVMgT0JSSUdBw4fDlUVTIEVYSUdJREFTIFBPUiBDT05UUkFUTyBPVSBBQ09SRE8uCgpBIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIChzKSBvdSBvKHMpIG5vbWUocykgZG8ocykgCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIApjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgo=
dc.title.eng.fl_str_mv Algoritmo genético compacto com dominância para seleção de variáveis
dc.title.alternative.eng.fl_str_mv Compact genetic algorithm with dominance for variable selection
title Algoritmo genético compacto com dominância para seleção de variáveis
spellingShingle Algoritmo genético compacto com dominância para seleção de variáveis
Nogueira, Heber Valdo
Seleção de variáveis
Algoritmo genético compacto
Otimização multiobjetivo
Feature selection
Compact genetic algorithm
Multiobjective optimization
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Algoritmo genético compacto com dominância para seleção de variáveis
title_full Algoritmo genético compacto com dominância para seleção de variáveis
title_fullStr Algoritmo genético compacto com dominância para seleção de variáveis
title_full_unstemmed Algoritmo genético compacto com dominância para seleção de variáveis
title_sort Algoritmo genético compacto com dominância para seleção de variáveis
author Nogueira, Heber Valdo
author_facet Nogueira, Heber Valdo
author_role author
dc.contributor.advisor1.fl_str_mv Soares, Anderson da Silva
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/1096941114079527
dc.contributor.advisor-co1.fl_str_mv Soares, Telma Woerle de Lima
dc.contributor.advisor-co1Lattes.fl_str_mv http://lattes.cnpq.br/6296363436468330
dc.contributor.referee1.fl_str_mv Soares, Anderson da Silva
dc.contributor.referee2.fl_str_mv Soares, Telma Woerle de Lima
dc.contributor.referee3.fl_str_mv Coelho , Clarimar José
dc.contributor.referee4.fl_str_mv Dias , Jailson Cardoso
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/2529656171716581
dc.contributor.author.fl_str_mv Nogueira, Heber Valdo
contributor_str_mv Soares, Anderson da Silva
Soares, Telma Woerle de Lima
Soares, Anderson da Silva
Soares, Telma Woerle de Lima
Coelho , Clarimar José
Dias , Jailson Cardoso
dc.subject.por.fl_str_mv Seleção de variáveis
Algoritmo genético compacto
Otimização multiobjetivo
topic Seleção de variáveis
Algoritmo genético compacto
Otimização multiobjetivo
Feature selection
Compact genetic algorithm
Multiobjective optimization
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv Feature selection
Compact genetic algorithm
Multiobjective optimization
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description The features selection problem consists in to select a subset of attributes that is able to reduce computational processing and storage resources, decrease curse of dimensionality effects and improve the performance of predictive models. Among the strategies used to solve this type of problem, we highlight evolutionary algorithms, such as the Genetic Algorithm. Despite the relative success of the Genetic Algorithm in solving various types of problems, different improvements have been proposed in order to improve their performance. Such improvements focus mainly on population representation, search mechanisms, and evaluation methods. In one of these proposals, the Genetic Compact Algorithm (CGA) arose, which proposes new ways of representing the population and guide the search for better solutions. Applying this type of strategy to solve the problem of variable selection often involves overfitting. In this context, this work proposes the implementation of a version of the Compact Genetic Algorithm to minimize more than one objective simultaneously. Such algorithm makes use of the concept of Pareto dominance and, therefore, is called Genetic Algorithm Compacted with Dominance (CGAD). As a case study, to evaluate the performance of the proposed algorithm, AGC-D is combined with Multiple Linear Regression (MLR) to select variables to better predict protein concentration in wheat samples. The proposed algorithm is compared to CGA and the Mutation-based Compact Genetic Algorithm. The results indicate that the CGAD is able to select a small set of variables, reducing the prediction error of the calibration model, reducing the possibility of overfitting.
publishDate 2017
dc.date.accessioned.fl_str_mv 2017-05-23T11:37:51Z
dc.date.issued.fl_str_mv 2017-04-20
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv NOGUEIRA, H. V. Algoritmo genético compacto com dominância para seleção de variáveis. 2017. 64 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2017.
dc.identifier.uri.fl_str_mv http://repositorio.bc.ufg.br/tede/handle/tede/7360
identifier_str_mv NOGUEIRA, H. V. Algoritmo genético compacto com dominância para seleção de variáveis. 2017. 64 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2017.
url http://repositorio.bc.ufg.br/tede/handle/tede/7360
dc.language.iso.fl_str_mv por
language por
dc.relation.program.fl_str_mv -3303550325223384799
dc.relation.confidence.fl_str_mv 600
600
600
dc.relation.department.fl_str_mv -7712266734633644768
dc.relation.cnpq.fl_str_mv 3671711205811204509
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Goiás
dc.publisher.program.fl_str_mv Programa de Pós-graduação em Ciência da Computação (INF)
dc.publisher.initials.fl_str_mv UFG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv Instituto de Informática - INF (RG)
publisher.none.fl_str_mv Universidade Federal de Goiás
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFG
instname:Universidade Federal de Goiás (UFG)
instacron:UFG
instname_str Universidade Federal de Goiás (UFG)
instacron_str UFG
institution UFG
reponame_str Repositório Institucional da UFG
collection Repositório Institucional da UFG
bitstream.url.fl_str_mv http://repositorio.bc.ufg.br/tede/bitstreams/16e57d2f-2a76-408d-9953-5e1ea3aa4661/download
http://repositorio.bc.ufg.br/tede/bitstreams/f78e5c26-c8d4-44db-960a-3f58d714ef0e/download
http://repositorio.bc.ufg.br/tede/bitstreams/48a6b91a-8be3-44db-b99c-3d71538bc842/download
http://repositorio.bc.ufg.br/tede/bitstreams/b863fb58-509b-4675-b253-7a952f58ce73/download
http://repositorio.bc.ufg.br/tede/bitstreams/da00ddd6-f9d0-4ec5-bc66-75be9101cd13/download
bitstream.checksum.fl_str_mv bd3efa91386c1718a7f26a329fdcb468
4afdbb8c545fd630ea7db775da747b2f
d41d8cd98f00b204e9800998ecf8427e
d41d8cd98f00b204e9800998ecf8427e
14c0f7496303095925cd3ae974fd4b7b
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFG - Universidade Federal de Goiás (UFG)
repository.mail.fl_str_mv tasesdissertacoes.bc@ufg.br
_version_ 1798044425944825856