Semi-Supervised Self-Organizing Maps with Time-Varying Structures for Clustering and Classification
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFPE |
dARK ID: | ark:/64986/001300000nmdc |
Texto Completo: | https://repositorio.ufpe.br/handle/123456789/33484 |
Resumo: | In recent years, the advances in technology have produced datasets of increasing size, not only regarding the number of samples but also the number of features. Unfortunately, despite these advances, creating a sufficiently large amount of properly labeled data with enough examples for each class is not an easy task. Organizing and labeling such data is challenging, expensive, and time-consuming. Also, it is usually done manually, and people can label with different formats and styles, incorporating noise and errors to the dataset. Hence, there is a growing interest in semi-supervised learning, since, in many learning tasks, there is a plentiful supply of unlabeled data, but insufficient labeled ones. Therefore, at the current stage of research, it is of great importance to put forward semi-supervised learning models aiming to combine both types of data, in order to benefit from the distinct information they can provide, to obtain better performances of both clustering and classification tasks, that would expand the range of machine learning applications. Moreover, it is also important to develop methods that are easy to parameterize in a way that become robust to the different characteristics of the data at hand. In this sense, the Self-Organizing Maps (SOM) can be considered as good options to address such objectives. It is a biologically inspired neural model that uses unsupervised and incremental learning to produce prototypes of the input data. However, such an unsupervised characteristic makes it unfeasible for SOM to execute Semi-Supervised Learning. In that way, this Dissertation presents some new proposals based on SOM to perform Semi-Supervised learning tasks for both clustering and classification. It is done by introducing to SOM the standard concepts of Learning Vector Quantization (LVQ), which can be seen as its supervised counterpart, to build hybrid approaches. Such proposals can dynamically switch between the two types of learning at training time, according to the availability of labels and automatically adjust themselves to the local variance observed in each data cluster. In the course of this work, the experimental results show that the proposed models can surpass the performance of other traditional methods not only in terms of classification but also regarding clustering quality. It also enhances the range of possible applications of a SOM and LVQ-based models by combining them with recent and promising techniques from Deep Learning to solve more complex problems commonly found in such field. |
id |
UFPE_060a7cf75d1a99073350aade58c8a677 |
---|---|
oai_identifier_str |
oai:repositorio.ufpe.br:123456789/33484 |
network_acronym_str |
UFPE |
network_name_str |
Repositório Institucional da UFPE |
repository_id_str |
2221 |
spelling |
BRAGA, Pedro Henrique Magalhãeshttp://lattes.cnpq.br/2868489638143233http://lattes.cnpq.br/1931667959910637BASSANI, Hansenclever de França2019-09-23T18:08:25Z2019-09-23T18:08:25Z2019-02-26https://repositorio.ufpe.br/handle/123456789/33484ark:/64986/001300000nmdcIn recent years, the advances in technology have produced datasets of increasing size, not only regarding the number of samples but also the number of features. Unfortunately, despite these advances, creating a sufficiently large amount of properly labeled data with enough examples for each class is not an easy task. Organizing and labeling such data is challenging, expensive, and time-consuming. Also, it is usually done manually, and people can label with different formats and styles, incorporating noise and errors to the dataset. Hence, there is a growing interest in semi-supervised learning, since, in many learning tasks, there is a plentiful supply of unlabeled data, but insufficient labeled ones. Therefore, at the current stage of research, it is of great importance to put forward semi-supervised learning models aiming to combine both types of data, in order to benefit from the distinct information they can provide, to obtain better performances of both clustering and classification tasks, that would expand the range of machine learning applications. Moreover, it is also important to develop methods that are easy to parameterize in a way that become robust to the different characteristics of the data at hand. In this sense, the Self-Organizing Maps (SOM) can be considered as good options to address such objectives. It is a biologically inspired neural model that uses unsupervised and incremental learning to produce prototypes of the input data. However, such an unsupervised characteristic makes it unfeasible for SOM to execute Semi-Supervised Learning. In that way, this Dissertation presents some new proposals based on SOM to perform Semi-Supervised learning tasks for both clustering and classification. It is done by introducing to SOM the standard concepts of Learning Vector Quantization (LVQ), which can be seen as its supervised counterpart, to build hybrid approaches. Such proposals can dynamically switch between the two types of learning at training time, according to the availability of labels and automatically adjust themselves to the local variance observed in each data cluster. In the course of this work, the experimental results show that the proposed models can surpass the performance of other traditional methods not only in terms of classification but also regarding clustering quality. It also enhances the range of possible applications of a SOM and LVQ-based models by combining them with recent and promising techniques from Deep Learning to solve more complex problems commonly found in such field.CNPqNos últimos anos, os avanços na tecnologia tem produzido conjuntos de dados de tamanhos cada vez maiores, não apenas em relação ao número de amostras, mas também ao número de características. Infelizmente, apesar desses avanços, criar uma quantidade suficientemente grande de dados, adequadamente rotulados com amostras suficientes para cada classe, não é uma tarefa fácil. Organizar e rotular esses dados é desafiador, caro e demorado. Além disso, por ser geralmente feito de forma manual, pessoas podem rotular com diferentes formatos e estilos, incorporando ruído e erro aos dados. Assim, há um crescente interesse em aprendizagem semi-supervisionada, uma vez que, em muitas tarefas de aprendizagem, existe uma abundante quantidade de dados não rotulados, em contrapartida aos rotulados. Portanto, no atual estágio de pesquisa, é de grande importância desenvolver modelos de aprendizagem semi-supervisionada, com o intuito de combinar os dois tipos de dados, a fim de se beneficar das distintas informações que eles podem fornecer. Dessa forma, é possível obter melhores desempenhos para ambas as tarefas de agrupamento e classificação, o que pode expandir a gama de aplicações em aprendizagem de máquina. Ainda, desenvolver modelos que sejam fáceis de parametrizar de tal maneira que se tornem robustos às diferentes características dos dados disponíveis também é relevante. Nesse sentido, Mapas Auto-Organizáveis (SOM) podem ser considerados boas opções. O SOM é um modelo neural, biologicamente inspirado, que usa aprendizagem não-supervisionada e incremental para produzir protótipos dos dados de entrada. No entanto, sua característica nãosupervisionada inviabiliza a realização de aprendizagem semi-supervisionada. Esta Dissertação apresenta algumas novas propostas de modelos baseados em SOM para realizar tarefas de aprendizagem semi-supervisionada tanto para agrupamento, como para classificação. Isso é feito introduzindo ao SOM conceitos da tradicional Quantização Ventorial (LVQ), que pode ser vista como sua versão supervisionada para construir abordagens híbridas. Tais propostas podem alternar dinamicamente entre duas formas de aprendizagem em tempo de treinamento, de acordo com a disponibilidade de rótulos, além de se ajustarem automaticamente às variâncias locais observadas em cada grupo de dados. No decorrer deste trabalho, os resultados experimentais mostram que os modelos propostos podem superar o desempenho de outros métodos tradicionais, não apenas em termos de classificção, mas também na qualidade de agrupamento. As propostas também aumentam a gama de possíveis aplicações de modelos baseados em SOM e LVQ, uma vez que os combinam com técnicas recentes e promissoras de aprendizagem profunda para resolver problemas mais complexos comumente encontrados em tal área.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessInteligência ComputacionalMapas Auto-OrganizáveisAprendizagem Semi-SupervisionadaSemi-Supervised Self-Organizing Maps with Time-Varying Structures for Clustering and Classificationinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPETHUMBNAILDISSERTAÇÃO Pedro Henrique Magalhães Braga.pdf.jpgDISSERTAÇÃO Pedro Henrique Magalhães Braga.pdf.jpgGenerated Thumbnailimage/jpeg1281https://repositorio.ufpe.br/bitstream/123456789/33484/5/DISSERTA%c3%87%c3%83O%20Pedro%20Henrique%20Magalh%c3%a3es%20Braga.pdf.jpgd1ccf97c52c9be1e7f7a5c4a01bdd306MD55ORIGINALDISSERTAÇÃO Pedro Henrique Magalhães Braga.pdfDISSERTAÇÃO Pedro Henrique Magalhães Braga.pdfapplication/pdf3516461https://repositorio.ufpe.br/bitstream/123456789/33484/1/DISSERTA%c3%87%c3%83O%20Pedro%20Henrique%20Magalh%c3%a3es%20Braga.pdf53b4ec5b9247fc14aa7965377a927e38MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/33484/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82310https://repositorio.ufpe.br/bitstream/123456789/33484/3/license.txtbd573a5ca8288eb7272482765f819534MD53TEXTDISSERTAÇÃO Pedro Henrique Magalhães Braga.pdf.txtDISSERTAÇÃO Pedro Henrique Magalhães Braga.pdf.txtExtracted texttext/plain202038https://repositorio.ufpe.br/bitstream/123456789/33484/4/DISSERTA%c3%87%c3%83O%20Pedro%20Henrique%20Magalh%c3%a3es%20Braga.pdf.txt48f54ce6e0a3d0ec2d7ae6bffbd23dc8MD54123456789/334842019-10-25 08:34:24.687oai:repositorio.ufpe.br:123456789/33484TGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKClRvZG8gZGVwb3NpdGFudGUgZGUgbWF0ZXJpYWwgbm8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgKFJJKSBkZXZlIGNvbmNlZGVyLCDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIChVRlBFKSwgdW1hIExpY2Vuw6dhIGRlIERpc3RyaWJ1acOnw6NvIE7Do28gRXhjbHVzaXZhIHBhcmEgbWFudGVyIGUgdG9ybmFyIGFjZXNzw612ZWlzIG9zIHNldXMgZG9jdW1lbnRvcywgZW0gZm9ybWF0byBkaWdpdGFsLCBuZXN0ZSByZXBvc2l0w7NyaW8uCgpDb20gYSBjb25jZXNzw6NvIGRlc3RhIGxpY2Vuw6dhIG7Do28gZXhjbHVzaXZhLCBvIGRlcG9zaXRhbnRlIG1hbnTDqW0gdG9kb3Mgb3MgZGlyZWl0b3MgZGUgYXV0b3IuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwoKTGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKCkFvIGNvbmNvcmRhciBjb20gZXN0YSBsaWNlbsOnYSBlIGFjZWl0w6EtbGEsIHZvY8OqIChhdXRvciBvdSBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMpOgoKYSkgRGVjbGFyYSBxdWUgY29uaGVjZSBhIHBvbMOtdGljYSBkZSBjb3B5cmlnaHQgZGEgZWRpdG9yYSBkbyBzZXUgZG9jdW1lbnRvOwpiKSBEZWNsYXJhIHF1ZSBjb25oZWNlIGUgYWNlaXRhIGFzIERpcmV0cml6ZXMgcGFyYSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGUEU7CmMpIENvbmNlZGUgw6AgVUZQRSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZGUgYXJxdWl2YXIsIHJlcHJvZHV6aXIsIGNvbnZlcnRlciAoY29tbyBkZWZpbmlkbyBhIHNlZ3VpciksIGNvbXVuaWNhciBlL291IGRpc3RyaWJ1aXIsIG5vIFJJLCBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbSBmb3JtYXRvIGRpZ2l0YWwgb3UgcG9yIG91dHJvIG1laW87CmQpIERlY2xhcmEgcXVlIGF1dG9yaXphIGEgVUZQRSBhIGFycXVpdmFyIG1haXMgZGUgdW1hIGPDs3BpYSBkZXN0ZSBkb2N1bWVudG8gZSBjb252ZXJ0w6otbG8sIHNlbSBhbHRlcmFyIG8gc2V1IGNvbnRlw7pkbywgcGFyYSBxdWFscXVlciBmb3JtYXRvIGRlIGZpY2hlaXJvLCBtZWlvIG91IHN1cG9ydGUsIHBhcmEgZWZlaXRvcyBkZSBzZWd1cmFuw6dhLCBwcmVzZXJ2YcOnw6NvIChiYWNrdXApIGUgYWNlc3NvOwplKSBEZWNsYXJhIHF1ZSBvIGRvY3VtZW50byBzdWJtZXRpZG8gw6kgbyBzZXUgdHJhYmFsaG8gb3JpZ2luYWwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBhIHRlcmNlaXJvcyBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBhIGVudHJlZ2EgZG8gZG9jdW1lbnRvIG7Do28gaW5mcmluZ2Ugb3MgZGlyZWl0b3MgZGUgb3V0cmEgcGVzc29hIG91IGVudGlkYWRlOwpmKSBEZWNsYXJhIHF1ZSwgbm8gY2FzbyBkbyBkb2N1bWVudG8gc3VibWV0aWRvIGNvbnRlciBtYXRlcmlhbCBkbyBxdWFsIG7Do28gZGV0w6ltIG9zIGRpcmVpdG9zIGRlCmF1dG9yLCBvYnRldmUgYSBhdXRvcml6YcOnw6NvIGlycmVzdHJpdGEgZG8gcmVzcGVjdGl2byBkZXRlbnRvciBkZXNzZXMgZGlyZWl0b3MgcGFyYSBjZWRlciDDoApVRlBFIG9zIGRpcmVpdG9zIHJlcXVlcmlkb3MgcG9yIGVzdGEgTGljZW7Dp2EgZSBhdXRvcml6YXIgYSB1bml2ZXJzaWRhZGUgYSB1dGlsaXrDoS1sb3MgbGVnYWxtZW50ZS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBlc3NlIG1hdGVyaWFsIGN1am9zIGRpcmVpdG9zIHPDo28gZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3UgY29udGXDumRvIGRvIGRvY3VtZW50byBlbnRyZWd1ZTsKZykgU2UgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgYmFzZWFkbyBlbSB0cmFiYWxobyBmaW5hbmNpYWRvIG91IGFwb2lhZG8gcG9yIG91dHJhIGluc3RpdHVpw6fDo28gcXVlIG7Do28gYSBVRlBFLCBkZWNsYXJhIHF1ZSBjdW1wcml1IHF1YWlzcXVlciBvYnJpZ2HDp8O1ZXMgZXhpZ2lkYXMgcGVsbyByZXNwZWN0aXZvIGNvbnRyYXRvIG91IGFjb3Jkby4KCkEgVUZQRSBpZGVudGlmaWNhcsOhIGNsYXJhbWVudGUgbyhzKSBub21lKHMpIGRvKHMpIGF1dG9yIChlcykgZG9zIGRpcmVpdG9zIGRvIGRvY3VtZW50byBlbnRyZWd1ZSBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIHBhcmEgYWzDqW0gZG8gcHJldmlzdG8gbmEgYWzDrW5lYSBjKS4KRepositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212019-10-25T11:34:24Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false |
dc.title.pt_BR.fl_str_mv |
Semi-Supervised Self-Organizing Maps with Time-Varying Structures for Clustering and Classification |
title |
Semi-Supervised Self-Organizing Maps with Time-Varying Structures for Clustering and Classification |
spellingShingle |
Semi-Supervised Self-Organizing Maps with Time-Varying Structures for Clustering and Classification BRAGA, Pedro Henrique Magalhães Inteligência Computacional Mapas Auto-Organizáveis Aprendizagem Semi-Supervisionada |
title_short |
Semi-Supervised Self-Organizing Maps with Time-Varying Structures for Clustering and Classification |
title_full |
Semi-Supervised Self-Organizing Maps with Time-Varying Structures for Clustering and Classification |
title_fullStr |
Semi-Supervised Self-Organizing Maps with Time-Varying Structures for Clustering and Classification |
title_full_unstemmed |
Semi-Supervised Self-Organizing Maps with Time-Varying Structures for Clustering and Classification |
title_sort |
Semi-Supervised Self-Organizing Maps with Time-Varying Structures for Clustering and Classification |
author |
BRAGA, Pedro Henrique Magalhães |
author_facet |
BRAGA, Pedro Henrique Magalhães |
author_role |
author |
dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/2868489638143233 |
dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/1931667959910637 |
dc.contributor.author.fl_str_mv |
BRAGA, Pedro Henrique Magalhães |
dc.contributor.advisor1.fl_str_mv |
BASSANI, Hansenclever de França |
contributor_str_mv |
BASSANI, Hansenclever de França |
dc.subject.por.fl_str_mv |
Inteligência Computacional Mapas Auto-Organizáveis Aprendizagem Semi-Supervisionada |
topic |
Inteligência Computacional Mapas Auto-Organizáveis Aprendizagem Semi-Supervisionada |
description |
In recent years, the advances in technology have produced datasets of increasing size, not only regarding the number of samples but also the number of features. Unfortunately, despite these advances, creating a sufficiently large amount of properly labeled data with enough examples for each class is not an easy task. Organizing and labeling such data is challenging, expensive, and time-consuming. Also, it is usually done manually, and people can label with different formats and styles, incorporating noise and errors to the dataset. Hence, there is a growing interest in semi-supervised learning, since, in many learning tasks, there is a plentiful supply of unlabeled data, but insufficient labeled ones. Therefore, at the current stage of research, it is of great importance to put forward semi-supervised learning models aiming to combine both types of data, in order to benefit from the distinct information they can provide, to obtain better performances of both clustering and classification tasks, that would expand the range of machine learning applications. Moreover, it is also important to develop methods that are easy to parameterize in a way that become robust to the different characteristics of the data at hand. In this sense, the Self-Organizing Maps (SOM) can be considered as good options to address such objectives. It is a biologically inspired neural model that uses unsupervised and incremental learning to produce prototypes of the input data. However, such an unsupervised characteristic makes it unfeasible for SOM to execute Semi-Supervised Learning. In that way, this Dissertation presents some new proposals based on SOM to perform Semi-Supervised learning tasks for both clustering and classification. It is done by introducing to SOM the standard concepts of Learning Vector Quantization (LVQ), which can be seen as its supervised counterpart, to build hybrid approaches. Such proposals can dynamically switch between the two types of learning at training time, according to the availability of labels and automatically adjust themselves to the local variance observed in each data cluster. In the course of this work, the experimental results show that the proposed models can surpass the performance of other traditional methods not only in terms of classification but also regarding clustering quality. It also enhances the range of possible applications of a SOM and LVQ-based models by combining them with recent and promising techniques from Deep Learning to solve more complex problems commonly found in such field. |
publishDate |
2019 |
dc.date.accessioned.fl_str_mv |
2019-09-23T18:08:25Z |
dc.date.available.fl_str_mv |
2019-09-23T18:08:25Z |
dc.date.issued.fl_str_mv |
2019-02-26 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufpe.br/handle/123456789/33484 |
dc.identifier.dark.fl_str_mv |
ark:/64986/001300000nmdc |
url |
https://repositorio.ufpe.br/handle/123456789/33484 |
identifier_str_mv |
ark:/64986/001300000nmdc |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.publisher.program.fl_str_mv |
Programa de Pos Graduacao em Ciencia da Computacao |
dc.publisher.initials.fl_str_mv |
UFPE |
dc.publisher.country.fl_str_mv |
Brasil |
publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE |
instname_str |
Universidade Federal de Pernambuco (UFPE) |
instacron_str |
UFPE |
institution |
UFPE |
reponame_str |
Repositório Institucional da UFPE |
collection |
Repositório Institucional da UFPE |
bitstream.url.fl_str_mv |
https://repositorio.ufpe.br/bitstream/123456789/33484/5/DISSERTA%c3%87%c3%83O%20Pedro%20Henrique%20Magalh%c3%a3es%20Braga.pdf.jpg https://repositorio.ufpe.br/bitstream/123456789/33484/1/DISSERTA%c3%87%c3%83O%20Pedro%20Henrique%20Magalh%c3%a3es%20Braga.pdf https://repositorio.ufpe.br/bitstream/123456789/33484/2/license_rdf https://repositorio.ufpe.br/bitstream/123456789/33484/3/license.txt https://repositorio.ufpe.br/bitstream/123456789/33484/4/DISSERTA%c3%87%c3%83O%20Pedro%20Henrique%20Magalh%c3%a3es%20Braga.pdf.txt |
bitstream.checksum.fl_str_mv |
d1ccf97c52c9be1e7f7a5c4a01bdd306 53b4ec5b9247fc14aa7965377a927e38 e39d27027a6cc9cb039ad269a5db8e34 bd573a5ca8288eb7272482765f819534 48f54ce6e0a3d0ec2d7ae6bffbd23dc8 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE) |
repository.mail.fl_str_mv |
attena@ufpe.br |
_version_ |
1815172868412014592 |