Algoritmos genético para imputação múltipla de dados na classificação multirrótulo

JACOB JUNIOR, Antonio Fernando Lavareda

Algoritmos genético para imputação múltipla de dados na classificação multirrótulo

Detalhes bibliográficos
Autor(a) principal:	JACOB JUNIOR, Antonio Fernando Lavareda
Data de Publicação:	2024
Tipo de documento:	Tese
Idioma:	por
Título da fonte:	Biblioteca Digital de Teses e Dissertações da UFMA
Texto Completo:	https://tedebc.ufma.br/jspui/handle/tede/tede/5255
Resumo:	Missing data is a prevalent problem that requires attention, as most data analysis techniques are unable to handle it. This is particularly critical in Multi-Label Classification (MLC), where only a few studies have investigated missing data in this application domain. MLC differs from Single-Label Classification (SLC) by allowing an instance to be associated with multiple classes. Movie classification is a didactic example since it can be “drama” and “bibliography” simultaneously. One of the most usual missing data treatment methods is data imputation, which seeks plausible values to fill in the missing ones. In this scenario, we propose a novel imputation method based on a multi-objective genetic algorithm for optimizing multiple data imputations called Multiple Imputation of Multi- label Classification data with a genetic algorithm, or simply EvoImp. We applied the proposed method in multi-label learning and evaluated its performance using six synthetic databases, considering various missing values distribution scenarios. The method was compared with other state-of-the-art imputation strategies, such as K-Means Imputation (KMI) and weighted K-Nearest Neighbors Imputation (WKNNI). The results proved that the proposed method outperformed the baseline in all the scenarios by achieving the best evaluation measures considering the Exact Match, Accuracy, and Hamming Loss. The superior results were constant in different dataset domains and sizes, demonstrating the EvoImp robustness. Thus, EvoImp represents a feasible solution to missing data treatment for multi-label learning.

Metadados do item

id	UFMA_bdd9556351ee3603d2ca3ba0450b1b6e
oai_identifier_str	oai:tede2:tede/5255
network_acronym_str	UFMA
network_name_str	Biblioteca Digital de Teses e Dissertações da UFMA
repository_id_str	2131
spelling	SANTANA, Ewaldo Eder Carvalhohttp://lattes.cnpq.br/0660692009750374LOBATO, Fábio Manoel Françahttp://lattes.cnpq.br/8320014491229434SANTANA, Ewaldo Eder Carvalhohttp://lattes.cnpq.br/0660692009750374LOBATO, Fábio Manoel Françahttp://lattes.cnpq.br/8320014491229434BARROS FILHO, Allan Kardec Duailibehttp://lattes.cnpq.br/0492330410079141SILVA, Francisco Jose Da Silva ehttp://lattes.cnpq.br/0770343284012942CORTES, Omar Andres Carmonahttp://lattes.cnpq.br/5523293886612004http://lattes.cnpq.br/4510520291728075JACOB JUNIOR, Antonio Fernando Lavareda2024-04-24T15:03:26Z2024-02-23JACOB JUNIOR, Antonio Fernando Lavareda. Algoritmos genético para imputação múltipla de dados na classificação multirrótulo. 2024. 97 f. Tese (Programa de Pós-Graduação em Engenharia de Eletricidade/CCET) - Universidade Federal do Maranhão, São Luís, 2024.https://tedebc.ufma.br/jspui/handle/tede/tede/5255Missing data is a prevalent problem that requires attention, as most data analysis techniques are unable to handle it. This is particularly critical in Multi-Label Classification (MLC), where only a few studies have investigated missing data in this application domain. MLC differs from Single-Label Classification (SLC) by allowing an instance to be associated with multiple classes. Movie classification is a didactic example since it can be “drama” and “bibliography” simultaneously. One of the most usual missing data treatment methods is data imputation, which seeks plausible values to fill in the missing ones. In this scenario, we propose a novel imputation method based on a multi-objective genetic algorithm for optimizing multiple data imputations called Multiple Imputation of Multi- label Classification data with a genetic algorithm, or simply EvoImp. We applied the proposed method in multi-label learning and evaluated its performance using six synthetic databases, considering various missing values distribution scenarios. The method was compared with other state-of-the-art imputation strategies, such as K-Means Imputation (KMI) and weighted K-Nearest Neighbors Imputation (WKNNI). The results proved that the proposed method outperformed the baseline in all the scenarios by achieving the best evaluation measures considering the Exact Match, Accuracy, and Hamming Loss. The superior results were constant in different dataset domains and sizes, demonstrating the EvoImp robustness. Thus, EvoImp represents a feasible solution to missing data treatment for multi-label learning.Dados ausentes são um problema prevalente que requer atenção, uma vez que a maioria das técnicas de análise de dados não consegue lidar com isso. Esse problema é particularmente crítico em Classificação Multi-rótulo (MLC), onde poucos estudos têm investigado dados ausentes nesse domínio de aplicação. MLC difere da Classificação de Monorrótulo (SLC) ao permitir que uma instância seja associada a várias classes. A classificação de filmes é um exemplo didático, já que um filme pode ser classificado como “drama” e “biografia” simultaneamente. Um dos métodos mais comuns de tratamento de dados ausentes é por meio da imputação de dados, a qual busca valores plausíveis para preencher os ausentes. Nesse cenário, essa tese apresenta um novo método de imputação baseado em um algoritmo genético multiobjetivo para otimizar múltiplas imputações de dados, chamado Imputação Múltipla de Dados na Classificação Multirrótulo por meio de um Algoritmo Genético, ou simplesmente EvoImp. Aplicamos o método proposto em aprendizado multirrótulo e avaliamos seu desempenho usando seis bancos de dados sintéticos, considerando vários cenários de distribuição de valores ausentes. O método foi comparado com outras estratégias de imputação do estado-da-arte, como K-Means Imputation (KMI) e Weighted K-Nearest Neighbors Imputation (WKNNI). Os resultados comprovaram que o método proposto superou o baseline em todos os cenários, alcançando as melhores medidas de avaliação considerando: Exact Match, Acurácia e Hamming Loss. Os resultados superiores foram consistentes em diferentes domínios e tamanhos de conjuntos de dados, demonstrando a robustez do EvoImp. Assim, o EvoImp representa uma solução viável para o tratamento de dados ausentes em aprendizado multirrótulo.Submitted by Jonathan Sousa de Almeida (jonathan.sousa@ufma.br) on 2024-04-24T15:03:26Z No. of bitstreams: 1 AntonioFernandoLavaredaJacobJunior.pdf: 2311747 bytes, checksum: d479dcaf409dbe30f889fe10369550c0 (MD5)Made available in DSpace on 2024-04-24T15:03:26Z (GMT). No. of bitstreams: 1 AntonioFernandoLavaredaJacobJunior.pdf: 2311747 bytes, checksum: d479dcaf409dbe30f889fe10369550c0 (MD5) Previous issue date: 2024-02-23CNPqapplication/pdfporUniversidade Federal do MaranhãoPROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA DE ELETRICIDADE/CCETUFMABrasilDEPARTAMENTO DE ENGENHARIA DA ELETRICIDADE/CCETvalores ausentes;classificação multirrótulo;algoritmos genéticos.missing values;multi-label classificationgenetic algorithms.Ciências Exatas e da TerraAlgoritmos genético para imputação múltipla de dados na classificação multirrótuloGenetic algorithms for multiple imputation of data in multi-label classificationinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFMAinstname:Universidade Federal do Maranhão (UFMA)instacron:UFMAORIGINALAntonioFernandoLavaredaJacobJunior.pdfAntonioFernandoLavaredaJacobJunior.pdfapplication/pdf2311747http://tedebc.ufma.br:8080/bitstream/tede/5255/2/AntonioFernandoLavaredaJacobJunior.pdfd479dcaf409dbe30f889fe10369550c0MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82255http://tedebc.ufma.br:8080/bitstream/tede/5255/1/license.txt97eeade1fce43278e63fe063657f8083MD51tede/52552024-04-24 12:03:26.189oai:tede2:tede/5255IExJQ0VOw4dBIERFIERJU1RSSUJVScOHw4NPIE7Dg08tRVhDTFVTSVZBCgpDb20gYSBhcHJlc2VudGHDp8OjbyBkZXN0YSBsaWNlbsOnYSxvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvciBjb25jZWRlIMOgIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRvIE1hcmFuaMOjbyAoVUZNQSkgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IGRpc3RyaWJ1aXIgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBjb25jb3JkYSBxdWUgYSBVRk1BIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGTUEgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVUZNQSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRk1BLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCkEgVUZNQSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIG91IG8ocykgbm9tZShzKSBkbyhzKSBkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbywgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBhbMOpbSBkYXF1ZWxhcyBjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgoKRGVjbGFyYSB0YW1iw6ltIHF1ZSB0b2RhcyBhcyBhZmlsaWHDp8O1ZXMgY29ycG9yYXRpdmFzIG91IGluc3RpdHVjaW9uYWlzIGUgdG9kYXMgYXMgZm9udGVzIGRlIGFwb2lvIGZpbmFuY2Vpcm8gYW8gdHJhYmFsaG8gZXN0w6NvIGRldmlkYW1lbnRlIGNpdGFkYXMgb3UgbWVuY2lvbmFkYXMgZSBjZXJ0aWZpY2EgcXVlIG7Do28gaMOhIG5lbmh1bSBpbnRlcmVzc2UgY29tZXJjaWFsIG91IGFzc29jaWF0aXZvIHF1ZSByZXByZXNlbnRlIGNvbmZsaXRvIGRlIGludGVyZXNzZSBlbSBjb25leMOjbyBjb20gbyB0cmFiYWxobyBzdWJtZXRpZG8uCgoKCgoKCgo=Biblioteca Digital de Teses e Dissertaçõeshttps://tedebc.ufma.br/jspui/PUBhttp://tedebc.ufma.br:8080/oai/requestrepositorio@ufma.br\|\|repositorio@ufma.bropendoar:21312024-04-24T15:03:26Biblioteca Digital de Teses e Dissertações da UFMA - Universidade Federal do Maranhão (UFMA)false
dc.title.por.fl_str_mv	Algoritmos genético para imputação múltipla de dados na classificação multirrótulo
dc.title.alternative.eng.fl_str_mv	Genetic algorithms for multiple imputation of data in multi-label classification
title	Algoritmos genético para imputação múltipla de dados na classificação multirrótulo
spellingShingle	Algoritmos genético para imputação múltipla de dados na classificação multirrótulo JACOB JUNIOR, Antonio Fernando Lavareda valores ausentes; classificação multirrótulo; algoritmos genéticos. missing values; multi-label classification genetic algorithms. Ciências Exatas e da Terra
title_short	Algoritmos genético para imputação múltipla de dados na classificação multirrótulo
title_full	Algoritmos genético para imputação múltipla de dados na classificação multirrótulo
title_fullStr	Algoritmos genético para imputação múltipla de dados na classificação multirrótulo
title_full_unstemmed	Algoritmos genético para imputação múltipla de dados na classificação multirrótulo
title_sort	Algoritmos genético para imputação múltipla de dados na classificação multirrótulo
author	JACOB JUNIOR, Antonio Fernando Lavareda
author_facet	JACOB JUNIOR, Antonio Fernando Lavareda
author_role	author
dc.contributor.advisor1.fl_str_mv	SANTANA, Ewaldo Eder Carvalho
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/0660692009750374
dc.contributor.advisor-co1.fl_str_mv	LOBATO, Fábio Manoel França
dc.contributor.advisor-co1Lattes.fl_str_mv	http://lattes.cnpq.br/8320014491229434
dc.contributor.referee1.fl_str_mv	SANTANA, Ewaldo Eder Carvalho
dc.contributor.referee1Lattes.fl_str_mv	http://lattes.cnpq.br/0660692009750374
dc.contributor.referee2.fl_str_mv	LOBATO, Fábio Manoel França
dc.contributor.referee2Lattes.fl_str_mv	http://lattes.cnpq.br/8320014491229434
dc.contributor.referee3.fl_str_mv	BARROS FILHO, Allan Kardec Duailibe
dc.contributor.referee3Lattes.fl_str_mv	http://lattes.cnpq.br/0492330410079141
dc.contributor.referee4.fl_str_mv	SILVA, Francisco Jose Da Silva e
dc.contributor.referee4Lattes.fl_str_mv	http://lattes.cnpq.br/0770343284012942
dc.contributor.referee5.fl_str_mv	CORTES, Omar Andres Carmona
dc.contributor.referee5Lattes.fl_str_mv	http://lattes.cnpq.br/5523293886612004
dc.contributor.authorLattes.fl_str_mv	http://lattes.cnpq.br/4510520291728075
dc.contributor.author.fl_str_mv	JACOB JUNIOR, Antonio Fernando Lavareda
contributor_str_mv	SANTANA, Ewaldo Eder Carvalho LOBATO, Fábio Manoel França SANTANA, Ewaldo Eder Carvalho LOBATO, Fábio Manoel França BARROS FILHO, Allan Kardec Duailibe SILVA, Francisco Jose Da Silva e CORTES, Omar Andres Carmona
dc.subject.por.fl_str_mv	valores ausentes; classificação multirrótulo; algoritmos genéticos.
topic	valores ausentes; classificação multirrótulo; algoritmos genéticos. missing values; multi-label classification genetic algorithms. Ciências Exatas e da Terra
dc.subject.eng.fl_str_mv	missing values; multi-label classification genetic algorithms.
dc.subject.cnpq.fl_str_mv	Ciências Exatas e da Terra
description	Missing data is a prevalent problem that requires attention, as most data analysis techniques are unable to handle it. This is particularly critical in Multi-Label Classification (MLC), where only a few studies have investigated missing data in this application domain. MLC differs from Single-Label Classification (SLC) by allowing an instance to be associated with multiple classes. Movie classification is a didactic example since it can be “drama” and “bibliography” simultaneously. One of the most usual missing data treatment methods is data imputation, which seeks plausible values to fill in the missing ones. In this scenario, we propose a novel imputation method based on a multi-objective genetic algorithm for optimizing multiple data imputations called Multiple Imputation of Multi- label Classification data with a genetic algorithm, or simply EvoImp. We applied the proposed method in multi-label learning and evaluated its performance using six synthetic databases, considering various missing values distribution scenarios. The method was compared with other state-of-the-art imputation strategies, such as K-Means Imputation (KMI) and weighted K-Nearest Neighbors Imputation (WKNNI). The results proved that the proposed method outperformed the baseline in all the scenarios by achieving the best evaluation measures considering the Exact Match, Accuracy, and Hamming Loss. The superior results were constant in different dataset domains and sizes, demonstrating the EvoImp robustness. Thus, EvoImp represents a feasible solution to missing data treatment for multi-label learning.
publishDate	2024
dc.date.accessioned.fl_str_mv	2024-04-24T15:03:26Z
dc.date.issued.fl_str_mv	2024-02-23
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	JACOB JUNIOR, Antonio Fernando Lavareda. Algoritmos genético para imputação múltipla de dados na classificação multirrótulo. 2024. 97 f. Tese (Programa de Pós-Graduação em Engenharia de Eletricidade/CCET) - Universidade Federal do Maranhão, São Luís, 2024.
dc.identifier.uri.fl_str_mv	https://tedebc.ufma.br/jspui/handle/tede/tede/5255
identifier_str_mv	JACOB JUNIOR, Antonio Fernando Lavareda. Algoritmos genético para imputação múltipla de dados na classificação multirrótulo. 2024. 97 f. Tese (Programa de Pós-Graduação em Engenharia de Eletricidade/CCET) - Universidade Federal do Maranhão, São Luís, 2024.
url	https://tedebc.ufma.br/jspui/handle/tede/tede/5255
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal do Maranhão
dc.publisher.program.fl_str_mv	PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA DE ELETRICIDADE/CCET
dc.publisher.initials.fl_str_mv	UFMA
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	DEPARTAMENTO DE ENGENHARIA DA ELETRICIDADE/CCET
publisher.none.fl_str_mv	Universidade Federal do Maranhão
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da UFMA instname:Universidade Federal do Maranhão (UFMA) instacron:UFMA
instname_str	Universidade Federal do Maranhão (UFMA)
instacron_str	UFMA
institution	UFMA
reponame_str	Biblioteca Digital de Teses e Dissertações da UFMA
collection	Biblioteca Digital de Teses e Dissertações da UFMA
bitstream.url.fl_str_mv	http://tedebc.ufma.br:8080/bitstream/tede/5255/2/AntonioFernandoLavaredaJacobJunior.pdf http://tedebc.ufma.br:8080/bitstream/tede/5255/1/license.txt
bitstream.checksum.fl_str_mv	d479dcaf409dbe30f889fe10369550c0 97eeade1fce43278e63fe063657f8083
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da UFMA - Universidade Federal do Maranhão (UFMA)
repository.mail.fl_str_mv	repositorio@ufma.br\|\|repositorio@ufma.br
_version_	1809926184167800832

Algoritmos genético para imputação múltipla de dados na classificação multirrótulo

Registros relacionados