The population initialization affects the performance of subgroup discovery evolutionary algorithms in high dimensional datasets

Detalhes bibliográficos
Autor(a) principal: TORREÃO, Vítor de Albuquerque
Data de Publicação: 2019
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UFPE
dARK ID: ark:/64986/001300000vsvp
Texto Completo: https://repositorio.ufpe.br/handle/123456789/34516
Resumo: Knowledge Discovery in Databases (KDD) is a broad area in Artificial Intelligence concerned with the extraction of useful information and insights from a given dataset. Among the distinct extraction methodologies, an important subclass of KDD tasks, called Subgroup Discovery (SD), undertakes the discovery of interesting subsets in the data. Many Evolutionary Algorithms (EAs) have been proposed to solve the Subgroup Discovery task with considerable success in low dimensional datasets. Some of these, however, have been shown to perform poorly in high dimensional problems. The currently best performing Evolutionary Algorithm for Subgroup Discovery in high dimensional datasets, SSDP, has a peculiar way of initializing its populations, limiting the individuals to the smallest possible size. As with most population-based techniques, the outcome of an Evolutionary Algorithm is usually dependent on the initial set of solutions, which are typically generated at random. The impact of choosing one initialization technique over another in the final presented solution has been the topic of many published works in the broad area of evolutionary computation. Despite this, there is still a lack of studies which approach this topic in the specific scenario of Subgroup Discovery tasks, especially when considering high dimensional datasets. The ultimate goal of this research project is to evaluate the impact of initial population generation in the end result of the overall Evolutionary Algorithm used to solve a Subgroup Discovery task in high dimensional data. Specifically, we provide new initialization methods, designed for the specific characteristics of Subgroup Discovery tasks, which can be used in virtually any EA. Our conducted experiments show that, by just changing the initialization method, state of the art Evolutionary Algorithms have their performance increased in high dimensional datasets.
id UFPE_2d459d73bbd1fda0f260f664566be57d
oai_identifier_str oai:repositorio.ufpe.br:123456789/34516
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str 2221
spelling TORREÃO, Vítor de Albuquerquehttp://lattes.cnpq.br/8574157197594723http://lattes.cnpq.br/5736183954752317VIMIEIRO, Renato2019-10-11T19:49:06Z2019-10-11T19:49:06Z2019-03-15https://repositorio.ufpe.br/handle/123456789/34516ark:/64986/001300000vsvpKnowledge Discovery in Databases (KDD) is a broad area in Artificial Intelligence concerned with the extraction of useful information and insights from a given dataset. Among the distinct extraction methodologies, an important subclass of KDD tasks, called Subgroup Discovery (SD), undertakes the discovery of interesting subsets in the data. Many Evolutionary Algorithms (EAs) have been proposed to solve the Subgroup Discovery task with considerable success in low dimensional datasets. Some of these, however, have been shown to perform poorly in high dimensional problems. The currently best performing Evolutionary Algorithm for Subgroup Discovery in high dimensional datasets, SSDP, has a peculiar way of initializing its populations, limiting the individuals to the smallest possible size. As with most population-based techniques, the outcome of an Evolutionary Algorithm is usually dependent on the initial set of solutions, which are typically generated at random. The impact of choosing one initialization technique over another in the final presented solution has been the topic of many published works in the broad area of evolutionary computation. Despite this, there is still a lack of studies which approach this topic in the specific scenario of Subgroup Discovery tasks, especially when considering high dimensional datasets. The ultimate goal of this research project is to evaluate the impact of initial population generation in the end result of the overall Evolutionary Algorithm used to solve a Subgroup Discovery task in high dimensional data. Specifically, we provide new initialization methods, designed for the specific characteristics of Subgroup Discovery tasks, which can be used in virtually any EA. Our conducted experiments show that, by just changing the initialization method, state of the art Evolutionary Algorithms have their performance increased in high dimensional datasets.Descoberta de Conhecimento em Bases de Dados (KDD) é uma área ampla em Inteligência Artificial que se preocupa com a extração de informações e insights úteis a partir de um conjunto de dados. Dentre as diferentes metodologias de extração, uma importante subclasse de tarefas de KDD, chamada de Descoberta de Subgrupos (SD), lida com a descoberta de subconjuntos interessantes dentro dos dados. Vários Algoritmos Evolucionários (EAs) foram propostos para resolver a tarefa de descobrir subgrupos com sucesso considerável em bases de dados de baixa dimensionalidade. A literatura já mostrou, no entanto, que alguns desses tem uma performance baixa em problemas de alta dimensionalidade. O algoritmo evolucionário para descoberta de subgrupos com, atualmente, a melhor performance em bases de alta dimensionalidade, SSDP, possui uma forma peculiar de inicializar sua população, limitando os indivíduos ao menor tamanho possível. Assim como na maioria das técnicas baseadas em população, o resultado de um algoritmo evolucionário é, em geral, dependente do conjunto de soluções inicial, que é tipicamente gerado de forma aleatória. Escolher uma técnica de inicialização sob outra tem grande impacto na solução final apresentada, e este já foi o tópico de trabalhos publicados na área de computação evolucionária. Apesar disso, faltam trabalhos que estudem este tópico no caso específico de descoberta de subgrupos, especialmente quando são consideradas bases de alta dimensionalidade. O objetivo final desta pesquisa é avaliar o impacto da geração da população inicial no resultado final de um algoritmo evolucionário no contexto de uma tarefa de descoberta de subgrupos em dados de alta dimensionalidade. Especificamente, são apresentados novos métodos de inicialização, projetados para as características específicas de tarefas de descoberta de subgrupos, que podem ser utilizadas em praticamente qualquer algoritmo evolucionário. Os experimentos conduzidos mostram que mudar o método de inicialização é o suficiente para aumentar a performance de algoritmos evolucionários o estado da arte em bases de dados de alta dimensionalidade.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/embargoedAccessInteligência artificialAprendizagem de máquinaMineração de dadosThe population initialization affects the performance of subgroup discovery evolutionary algorithms in high dimensional datasetsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPETHUMBNAILDISSERTAÇÃO Vitor de Albuquerque Torreão.pdf.jpgDISSERTAÇÃO Vitor de Albuquerque Torreão.pdf.jpgGenerated Thumbnailimage/jpeg1269https://repositorio.ufpe.br/bitstream/123456789/34516/5/DISSERTA%c3%87%c3%83O%20Vitor%20de%20Albuquerque%20Torre%c3%a3o.pdf.jpg966482f8dec3ca85a1819cd0c00bd58aMD55ORIGINALDISSERTAÇÃO Vitor de Albuquerque Torreão.pdfDISSERTAÇÃO Vitor de Albuquerque Torreão.pdfapplication/pdf843130https://repositorio.ufpe.br/bitstream/123456789/34516/1/DISSERTA%c3%87%c3%83O%20Vitor%20de%20Albuquerque%20Torre%c3%a3o.pdf080c38115ebbc80b72388d7cf3cd8973MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/34516/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82310https://repositorio.ufpe.br/bitstream/123456789/34516/3/license.txtbd573a5ca8288eb7272482765f819534MD53TEXTDISSERTAÇÃO Vitor de Albuquerque Torreão.pdf.txtDISSERTAÇÃO Vitor de Albuquerque Torreão.pdf.txtExtracted texttext/plain140095https://repositorio.ufpe.br/bitstream/123456789/34516/4/DISSERTA%c3%87%c3%83O%20Vitor%20de%20Albuquerque%20Torre%c3%a3o.pdf.txt8d294390f9d21fc0a6ec668b78b63434MD54123456789/345162019-10-25 11:12:58.711oai:repositorio.ufpe.br:123456789/34516TGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKClRvZG8gZGVwb3NpdGFudGUgZGUgbWF0ZXJpYWwgbm8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgKFJJKSBkZXZlIGNvbmNlZGVyLCDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIChVRlBFKSwgdW1hIExpY2Vuw6dhIGRlIERpc3RyaWJ1acOnw6NvIE7Do28gRXhjbHVzaXZhIHBhcmEgbWFudGVyIGUgdG9ybmFyIGFjZXNzw612ZWlzIG9zIHNldXMgZG9jdW1lbnRvcywgZW0gZm9ybWF0byBkaWdpdGFsLCBuZXN0ZSByZXBvc2l0w7NyaW8uCgpDb20gYSBjb25jZXNzw6NvIGRlc3RhIGxpY2Vuw6dhIG7Do28gZXhjbHVzaXZhLCBvIGRlcG9zaXRhbnRlIG1hbnTDqW0gdG9kb3Mgb3MgZGlyZWl0b3MgZGUgYXV0b3IuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwoKTGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKCkFvIGNvbmNvcmRhciBjb20gZXN0YSBsaWNlbsOnYSBlIGFjZWl0w6EtbGEsIHZvY8OqIChhdXRvciBvdSBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMpOgoKYSkgRGVjbGFyYSBxdWUgY29uaGVjZSBhIHBvbMOtdGljYSBkZSBjb3B5cmlnaHQgZGEgZWRpdG9yYSBkbyBzZXUgZG9jdW1lbnRvOwpiKSBEZWNsYXJhIHF1ZSBjb25oZWNlIGUgYWNlaXRhIGFzIERpcmV0cml6ZXMgcGFyYSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGUEU7CmMpIENvbmNlZGUgw6AgVUZQRSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZGUgYXJxdWl2YXIsIHJlcHJvZHV6aXIsIGNvbnZlcnRlciAoY29tbyBkZWZpbmlkbyBhIHNlZ3VpciksIGNvbXVuaWNhciBlL291IGRpc3RyaWJ1aXIsIG5vIFJJLCBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbSBmb3JtYXRvIGRpZ2l0YWwgb3UgcG9yIG91dHJvIG1laW87CmQpIERlY2xhcmEgcXVlIGF1dG9yaXphIGEgVUZQRSBhIGFycXVpdmFyIG1haXMgZGUgdW1hIGPDs3BpYSBkZXN0ZSBkb2N1bWVudG8gZSBjb252ZXJ0w6otbG8sIHNlbSBhbHRlcmFyIG8gc2V1IGNvbnRlw7pkbywgcGFyYSBxdWFscXVlciBmb3JtYXRvIGRlIGZpY2hlaXJvLCBtZWlvIG91IHN1cG9ydGUsIHBhcmEgZWZlaXRvcyBkZSBzZWd1cmFuw6dhLCBwcmVzZXJ2YcOnw6NvIChiYWNrdXApIGUgYWNlc3NvOwplKSBEZWNsYXJhIHF1ZSBvIGRvY3VtZW50byBzdWJtZXRpZG8gw6kgbyBzZXUgdHJhYmFsaG8gb3JpZ2luYWwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBhIHRlcmNlaXJvcyBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBhIGVudHJlZ2EgZG8gZG9jdW1lbnRvIG7Do28gaW5mcmluZ2Ugb3MgZGlyZWl0b3MgZGUgb3V0cmEgcGVzc29hIG91IGVudGlkYWRlOwpmKSBEZWNsYXJhIHF1ZSwgbm8gY2FzbyBkbyBkb2N1bWVudG8gc3VibWV0aWRvIGNvbnRlciBtYXRlcmlhbCBkbyBxdWFsIG7Do28gZGV0w6ltIG9zIGRpcmVpdG9zIGRlCmF1dG9yLCBvYnRldmUgYSBhdXRvcml6YcOnw6NvIGlycmVzdHJpdGEgZG8gcmVzcGVjdGl2byBkZXRlbnRvciBkZXNzZXMgZGlyZWl0b3MgcGFyYSBjZWRlciDDoApVRlBFIG9zIGRpcmVpdG9zIHJlcXVlcmlkb3MgcG9yIGVzdGEgTGljZW7Dp2EgZSBhdXRvcml6YXIgYSB1bml2ZXJzaWRhZGUgYSB1dGlsaXrDoS1sb3MgbGVnYWxtZW50ZS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBlc3NlIG1hdGVyaWFsIGN1am9zIGRpcmVpdG9zIHPDo28gZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3UgY29udGXDumRvIGRvIGRvY3VtZW50byBlbnRyZWd1ZTsKZykgU2UgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgYmFzZWFkbyBlbSB0cmFiYWxobyBmaW5hbmNpYWRvIG91IGFwb2lhZG8gcG9yIG91dHJhIGluc3RpdHVpw6fDo28gcXVlIG7Do28gYSBVRlBFLCBkZWNsYXJhIHF1ZSBjdW1wcml1IHF1YWlzcXVlciBvYnJpZ2HDp8O1ZXMgZXhpZ2lkYXMgcGVsbyByZXNwZWN0aXZvIGNvbnRyYXRvIG91IGFjb3Jkby4KCkEgVUZQRSBpZGVudGlmaWNhcsOhIGNsYXJhbWVudGUgbyhzKSBub21lKHMpIGRvKHMpIGF1dG9yIChlcykgZG9zIGRpcmVpdG9zIGRvIGRvY3VtZW50byBlbnRyZWd1ZSBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIHBhcmEgYWzDqW0gZG8gcHJldmlzdG8gbmEgYWzDrW5lYSBjKS4KRepositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212019-10-25T14:12:58Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.pt_BR.fl_str_mv The population initialization affects the performance of subgroup discovery evolutionary algorithms in high dimensional datasets
title The population initialization affects the performance of subgroup discovery evolutionary algorithms in high dimensional datasets
spellingShingle The population initialization affects the performance of subgroup discovery evolutionary algorithms in high dimensional datasets
TORREÃO, Vítor de Albuquerque
Inteligência artificial
Aprendizagem de máquina
Mineração de dados
title_short The population initialization affects the performance of subgroup discovery evolutionary algorithms in high dimensional datasets
title_full The population initialization affects the performance of subgroup discovery evolutionary algorithms in high dimensional datasets
title_fullStr The population initialization affects the performance of subgroup discovery evolutionary algorithms in high dimensional datasets
title_full_unstemmed The population initialization affects the performance of subgroup discovery evolutionary algorithms in high dimensional datasets
title_sort The population initialization affects the performance of subgroup discovery evolutionary algorithms in high dimensional datasets
author TORREÃO, Vítor de Albuquerque
author_facet TORREÃO, Vítor de Albuquerque
author_role author
dc.contributor.authorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/8574157197594723
dc.contributor.advisorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/5736183954752317
dc.contributor.author.fl_str_mv TORREÃO, Vítor de Albuquerque
dc.contributor.advisor1.fl_str_mv VIMIEIRO, Renato
contributor_str_mv VIMIEIRO, Renato
dc.subject.por.fl_str_mv Inteligência artificial
Aprendizagem de máquina
Mineração de dados
topic Inteligência artificial
Aprendizagem de máquina
Mineração de dados
description Knowledge Discovery in Databases (KDD) is a broad area in Artificial Intelligence concerned with the extraction of useful information and insights from a given dataset. Among the distinct extraction methodologies, an important subclass of KDD tasks, called Subgroup Discovery (SD), undertakes the discovery of interesting subsets in the data. Many Evolutionary Algorithms (EAs) have been proposed to solve the Subgroup Discovery task with considerable success in low dimensional datasets. Some of these, however, have been shown to perform poorly in high dimensional problems. The currently best performing Evolutionary Algorithm for Subgroup Discovery in high dimensional datasets, SSDP, has a peculiar way of initializing its populations, limiting the individuals to the smallest possible size. As with most population-based techniques, the outcome of an Evolutionary Algorithm is usually dependent on the initial set of solutions, which are typically generated at random. The impact of choosing one initialization technique over another in the final presented solution has been the topic of many published works in the broad area of evolutionary computation. Despite this, there is still a lack of studies which approach this topic in the specific scenario of Subgroup Discovery tasks, especially when considering high dimensional datasets. The ultimate goal of this research project is to evaluate the impact of initial population generation in the end result of the overall Evolutionary Algorithm used to solve a Subgroup Discovery task in high dimensional data. Specifically, we provide new initialization methods, designed for the specific characteristics of Subgroup Discovery tasks, which can be used in virtually any EA. Our conducted experiments show that, by just changing the initialization method, state of the art Evolutionary Algorithms have their performance increased in high dimensional datasets.
publishDate 2019
dc.date.accessioned.fl_str_mv 2019-10-11T19:49:06Z
dc.date.available.fl_str_mv 2019-10-11T19:49:06Z
dc.date.issued.fl_str_mv 2019-03-15
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio.ufpe.br/handle/123456789/34516
dc.identifier.dark.fl_str_mv ark:/64986/001300000vsvp
url https://repositorio.ufpe.br/handle/123456789/34516
identifier_str_mv ark:/64986/001300000vsvp
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/embargoedAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv embargoedAccess
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.publisher.program.fl_str_mv Programa de Pos Graduacao em Ciencia da Computacao
dc.publisher.initials.fl_str_mv UFPE
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
bitstream.url.fl_str_mv https://repositorio.ufpe.br/bitstream/123456789/34516/5/DISSERTA%c3%87%c3%83O%20Vitor%20de%20Albuquerque%20Torre%c3%a3o.pdf.jpg
https://repositorio.ufpe.br/bitstream/123456789/34516/1/DISSERTA%c3%87%c3%83O%20Vitor%20de%20Albuquerque%20Torre%c3%a3o.pdf
https://repositorio.ufpe.br/bitstream/123456789/34516/2/license_rdf
https://repositorio.ufpe.br/bitstream/123456789/34516/3/license.txt
https://repositorio.ufpe.br/bitstream/123456789/34516/4/DISSERTA%c3%87%c3%83O%20Vitor%20de%20Albuquerque%20Torre%c3%a3o.pdf.txt
bitstream.checksum.fl_str_mv 966482f8dec3ca85a1819cd0c00bd58a
080c38115ebbc80b72388d7cf3cd8973
e39d27027a6cc9cb039ad269a5db8e34
bd573a5ca8288eb7272482765f819534
8d294390f9d21fc0a6ec668b78b63434
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1815172928160923648