Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos

Hirasawa, João Gabriel Viana

Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos

Detalhes bibliográficos
Autor(a) principal:	Hirasawa, João Gabriel Viana
Data de Publicação:	2023
Tipo de documento:	Trabalho de conclusão de curso
Idioma:	por
Título da fonte:	Repositório Institucional da UFSCAR
Texto Completo:	https://repositorio.ufscar.br/handle/ufscar/18998
Resumo:	Much of the data collected and used in machine learning applications is structured in high-dimensional spaces. Images, text documents and sensor data are some examples of data collected all the time, and whose number of attributes can easily exceed the number of samples in the set. As a consequence, the curse of dimensionality requires the study of ways to mitigate negative effects in models that use these high-dimensional data sets. One solution to deal with this is dimensionality reduction methods, which seek to generate representations with a more tangible number of dimensions, minimizing the loss of information. In this way, the use of such methods within machine learning becomes a field with potential, as they simplify the structure of the data that feeds the models. This work aimed to evaluate the use of different non-linear dimensionality reduction methods together with parametric and non-parametric models in classification tasks. UMAP and PaCMAP were used on high-dimensional data sets, available on the OpenML platform, and the classification performance of the Quadratic Discriminant Analysis (QDA), Gaussian Naive Bayes, k-NN and XGBoost models was evaluated. The results obtained show an improvement in performance for parametric models, mainly with the use of the supervised implementation of UMAP. Although they were not as effective in a more robust and heavy model, XGBoost, the use of the methods represented an improvement in the model's execution time, which indicates an opportunity for application and study in these situations.

Metadados do item

id	SCAR_90d11793a0bf777b60327521212b7d07
oai_identifier_str	oai:repositorio.ufscar.br:ufscar/18998
network_acronym_str	SCAR
network_name_str	Repositório Institucional da UFSCAR
repository_id_str	4322
spelling	Hirasawa, João Gabriel VianaLevada, Alexandre Luis Magalhãeshttp://lattes.cnpq.br/3341441596395463http://lattes.cnpq.br/5009735422150540https://orcid.org/0000-0001-8253-27292023-12-06T20:12:24Z2023-12-06T20:12:24Z2023-12-04HIRASAWA, João Gabriel Viana. Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos. 2023. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2023. Disponível em: https://repositorio.ufscar.br/handle/ufscar/18998.https://repositorio.ufscar.br/handle/ufscar/18998Much of the data collected and used in machine learning applications is structured in high-dimensional spaces. Images, text documents and sensor data are some examples of data collected all the time, and whose number of attributes can easily exceed the number of samples in the set. As a consequence, the curse of dimensionality requires the study of ways to mitigate negative effects in models that use these high-dimensional data sets. One solution to deal with this is dimensionality reduction methods, which seek to generate representations with a more tangible number of dimensions, minimizing the loss of information. In this way, the use of such methods within machine learning becomes a field with potential, as they simplify the structure of the data that feeds the models. This work aimed to evaluate the use of different non-linear dimensionality reduction methods together with parametric and non-parametric models in classification tasks. UMAP and PaCMAP were used on high-dimensional data sets, available on the OpenML platform, and the classification performance of the Quadratic Discriminant Analysis (QDA), Gaussian Naive Bayes, k-NN and XGBoost models was evaluated. The results obtained show an improvement in performance for parametric models, mainly with the use of the supervised implementation of UMAP. Although they were not as effective in a more robust and heavy model, XGBoost, the use of the methods represented an improvement in the model's execution time, which indicates an opportunity for application and study in these situations.Muitos dos dados coletados e utilizados nas aplicações de aprendizado de máquina estão estruturados em conjuntos de alta dimensionalidade. Imagens, documentos de texto e dados de sensores são alguns exemplos de dados coletados o tempo todo, e cujo número de atributos pode ultrapassar facilmente a quantidade de amostras no conjunto. Como consequência, a maldição da dimensionalidade torna pertinente o estudo de meios para mitigar seus efeitos em modelos que utilizam esses conjuntos de dados de alta dimensionalidade. Uma solução para lidar com isso são os métodos de redução de dimensionalidade, que buscam gerar representações com um número mais palpável de dimensões, minimizando a perda de informação. Dessa forma, o uso de tais métodos dentro do aprendizado de máquina se torna um campo com potencial, à medida que simplificam a estrutura dos dados que alimentam os modelos. Este trabalho teve como objetivo avaliar o uso de diferentes métodos de redução de dimensionalidade não lineares junto a modelos paramétricos e não paramétricos em tarefas de classificação. Foram utilizados o UMAP e o PaCMAP em conjuntos de dados com alta dimensionalidade, disponíveis na plataforma do OpenML, e foi avaliado o desempenho de classificação dos modelos Quadratic Discriminant Analysis (QDA), Gaussian Naive Bayes, k-NN e XGBoost. Os resultados obtidos mostram uma melhora de desempenho para os modelos paramétricos, principalmente com o uso da implementação supervisionada do UMAP. Apesar de não terem sido tão efetivos num modelo mais robusto e pesado, o XGBoost, o uso dos métodos representou uma melhora no tempo de execução do modelo, que indica uma oportunidade de aplicação e estudo nessas situações.Não recebi financiamentoporUniversidade Federal de São CarlosCâmpus São CarlosEngenharia de Computação - ECUFSCarAttribution 3.0 Brazilhttp://creativecommons.org/licenses/by/3.0/br/info:eu-repo/semantics/openAccessRedução de dimensionalidadeUMAPPaCMAPReconhecimento de padrõesCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAOAplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricosApplication of non-linear dimensionality reduction methods to parametric and non-parametric classifiersinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALTCC_Joao_Gabriel_Viana_Hirasawa_Final.pdfTCC_Joao_Gabriel_Viana_Hirasawa_Final.pdfTrabalho de conclusão de cursoapplication/pdf1997176https://repositorio.ufscar.br/bitstream/ufscar/18998/1/TCC_Joao_Gabriel_Viana_Hirasawa_Final.pdf655d9f15a2108970a67707468dd95d94MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8913https://repositorio.ufscar.br/bitstream/ufscar/18998/2/license_rdf3185b4de2190c2d366d1d324db01f8b8MD52TEXTTCC_Joao_Gabriel_Viana_Hirasawa_Final.pdf.txtTCC_Joao_Gabriel_Viana_Hirasawa_Final.pdf.txtExtracted texttext/plain89750https://repositorio.ufscar.br/bitstream/ufscar/18998/3/TCC_Joao_Gabriel_Viana_Hirasawa_Final.pdf.txt509b1cff8f4596afaebd11cdb4b51f93MD53ufscar/189982024-05-14 17:24:21.066oai:repositorio.ufscar.br:ufscar/18998Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222024-05-14T17:24:21Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv	Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos
dc.title.alternative.eng.fl_str_mv	Application of non-linear dimensionality reduction methods to parametric and non-parametric classifiers
title	Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos
spellingShingle	Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos Hirasawa, João Gabriel Viana Redução de dimensionalidade UMAP PaCMAP Reconhecimento de padrões CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
title_short	Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos
title_full	Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos
title_fullStr	Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos
title_full_unstemmed	Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos
title_sort	Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos
author	Hirasawa, João Gabriel Viana
author_facet	Hirasawa, João Gabriel Viana
author_role	author
dc.contributor.authorlattes.por.fl_str_mv	http://lattes.cnpq.br/5009735422150540
dc.contributor.advisor1orcid.por.fl_str_mv	https://orcid.org/0000-0001-8253-2729
dc.contributor.author.fl_str_mv	Hirasawa, João Gabriel Viana
dc.contributor.advisor1.fl_str_mv	Levada, Alexandre Luis Magalhães
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/3341441596395463
contributor_str_mv	Levada, Alexandre Luis Magalhães
dc.subject.por.fl_str_mv	Redução de dimensionalidade UMAP PaCMAP Reconhecimento de padrões
topic	Redução de dimensionalidade UMAP PaCMAP Reconhecimento de padrões CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
description	Much of the data collected and used in machine learning applications is structured in high-dimensional spaces. Images, text documents and sensor data are some examples of data collected all the time, and whose number of attributes can easily exceed the number of samples in the set. As a consequence, the curse of dimensionality requires the study of ways to mitigate negative effects in models that use these high-dimensional data sets. One solution to deal with this is dimensionality reduction methods, which seek to generate representations with a more tangible number of dimensions, minimizing the loss of information. In this way, the use of such methods within machine learning becomes a field with potential, as they simplify the structure of the data that feeds the models. This work aimed to evaluate the use of different non-linear dimensionality reduction methods together with parametric and non-parametric models in classification tasks. UMAP and PaCMAP were used on high-dimensional data sets, available on the OpenML platform, and the classification performance of the Quadratic Discriminant Analysis (QDA), Gaussian Naive Bayes, k-NN and XGBoost models was evaluated. The results obtained show an improvement in performance for parametric models, mainly with the use of the supervised implementation of UMAP. Although they were not as effective in a more robust and heavy model, XGBoost, the use of the methods represented an improvement in the model's execution time, which indicates an opportunity for application and study in these situations.
publishDate	2023
dc.date.accessioned.fl_str_mv	2023-12-06T20:12:24Z
dc.date.available.fl_str_mv	2023-12-06T20:12:24Z
dc.date.issued.fl_str_mv	2023-12-04
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/bachelorThesis
format	bachelorThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	HIRASAWA, João Gabriel Viana. Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos. 2023. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2023. Disponível em: https://repositorio.ufscar.br/handle/ufscar/18998.
dc.identifier.uri.fl_str_mv	https://repositorio.ufscar.br/handle/ufscar/18998
identifier_str_mv	HIRASAWA, João Gabriel Viana. Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos. 2023. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2023. Disponível em: https://repositorio.ufscar.br/handle/ufscar/18998.
url	https://repositorio.ufscar.br/handle/ufscar/18998
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	Attribution 3.0 Brazil http://creativecommons.org/licenses/by/3.0/br/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution 3.0 Brazil http://creativecommons.org/licenses/by/3.0/br/
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos Engenharia de Computação - EC
dc.publisher.initials.fl_str_mv	UFSCar
publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos Engenharia de Computação - EC
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR
instname_str	Universidade Federal de São Carlos (UFSCAR)
instacron_str	UFSCAR
institution	UFSCAR
reponame_str	Repositório Institucional da UFSCAR
collection	Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv	https://repositorio.ufscar.br/bitstream/ufscar/18998/1/TCC_Joao_Gabriel_Viana_Hirasawa_Final.pdf https://repositorio.ufscar.br/bitstream/ufscar/18998/2/license_rdf https://repositorio.ufscar.br/bitstream/ufscar/18998/3/TCC_Joao_Gabriel_Viana_Hirasawa_Final.pdf.txt
bitstream.checksum.fl_str_mv	655d9f15a2108970a67707468dd95d94 3185b4de2190c2d366d1d324db01f8b8 509b1cff8f4596afaebd11cdb4b51f93
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_	1802136430426718208

Aplicação de métodos de redução de dimensionalidade não lineares em classificadores paramétricos e não paramétricos

Registros relacionados