PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado

Detalhes bibliográficos
Autor(a) principal: SHIH TING JU
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Institucional da UFMS
Texto Completo: https://repositorio.ufms.br/handle/123456789/4032
Resumo: The large amount of data currently available is a source for extracting information for commercial and academic purposes. One approach for extracting knowledge on such bases that has gained prominence is one-class classification (OCC). The use of OCC in classifying whether an example is of a specific class is appropriate in datasets where the classes are unbalanced or where only the data of the class of interest are present during the training. Several OCC algorithms found in the literature use unsupervised clustering to delimit the border of the class of interest. These algorithms present competitive results with those presented by other OCC algorithms. Although semisupervised learning has shown the possibility to achieve better results in several areas than with unsupervised, semi-supervised clustering is still little explored for OCC. One approach for OCC is Positive and Unlabeled Learning (PUL), in which learning occurs only with positive (interest) and unlabeled data. PUL algorithms seek to find a delimitation of the positive class. This master’s degree project proposes a new algorithm PUL-SSC (Positive and Unlabeled Learning with Semi-Supervised Clustering) that learns the delimitation of the class of interest by creating and using must-link and cannot-link restrictions, clustering data with semi-supervised algorithm and a transductive learning process for label propagation. Two widely used semi-supervised clustering algorithms were employed: PCK-Means and MPCK-Means. In our experimental evaluation, semi-supervised algorithms outperformed the k-Means based algorithm and one-class SVM (OC-SVM) in most of the scenarios. In particular, the distance-based algorithm MPCK-Means was dominant in most of the comparisons using numerical and textual databases.
id UFMS_d3e923ae6ec4261724f1f6b2ad078c32
oai_identifier_str oai:repositorio.ufms.br:123456789/4032
network_acronym_str UFMS
network_name_str Repositório Institucional da UFMS
repository_id_str 2124
spelling 2021-10-04T17:35:09Z2021-10-04T17:35:09Z2021https://repositorio.ufms.br/handle/123456789/4032The large amount of data currently available is a source for extracting information for commercial and academic purposes. One approach for extracting knowledge on such bases that has gained prominence is one-class classification (OCC). The use of OCC in classifying whether an example is of a specific class is appropriate in datasets where the classes are unbalanced or where only the data of the class of interest are present during the training. Several OCC algorithms found in the literature use unsupervised clustering to delimit the border of the class of interest. These algorithms present competitive results with those presented by other OCC algorithms. Although semisupervised learning has shown the possibility to achieve better results in several areas than with unsupervised, semi-supervised clustering is still little explored for OCC. One approach for OCC is Positive and Unlabeled Learning (PUL), in which learning occurs only with positive (interest) and unlabeled data. PUL algorithms seek to find a delimitation of the positive class. This master’s degree project proposes a new algorithm PUL-SSC (Positive and Unlabeled Learning with Semi-Supervised Clustering) that learns the delimitation of the class of interest by creating and using must-link and cannot-link restrictions, clustering data with semi-supervised algorithm and a transductive learning process for label propagation. Two widely used semi-supervised clustering algorithms were employed: PCK-Means and MPCK-Means. In our experimental evaluation, semi-supervised algorithms outperformed the k-Means based algorithm and one-class SVM (OC-SVM) in most of the scenarios. In particular, the distance-based algorithm MPCK-Means was dominant in most of the comparisons using numerical and textual databases.A grande quantidade de dados disponíveis atualmente é uma fonte de extração de informações para fins comerciais e acadêmicos. Uma abordagem para extrair conhecimento em bases de dados que ganhou destaque é a classificação de uma única classe (em inglês, One-class Classification - OCC). O uso de OCC para classificar se um exemplo é de uma classe específica é apropriado em conjuntos de dados em que as classes são desbalanceadas ou apenas os dados da classe de interesse estão presentes durante o treinamento. Vários algoritmos de OCC encontrados na literatura utilizam agrupamento não supervisionado para delimitar a fronteira da classe de interesse. Esses algoritmos conseguem ter resultados competitivos com aqueles apresentados por outros algoritmos de OCC. Embora o aprendizado semissupervisionado tenha mostrado a possibilidade de alcançar melhores resultados em várias áreas do que com o agrupamento semissupervisionado, o agrupamento semissupervisionado ainda é pouco explorado para OCC. Uma abordagem para OCC é o Positive and Unlabeled Learning (PUL), em que o aprendizado ocorre apenas com dados positivos (interesse) e não rotulados. Os algoritmos de PUL procuram encontrar uma delimitação da classe positiva. Este trabalho de mestrado propõe um novo algoritmo PUL-SSC (Positive and Unlabeled Learning with Semi-Supervised Clustering) que aprende a delimitar a classe de interesse através da criação e utilização de restrições must-link e cannot-link, agrupamento de dados com algoritmo semisupervisionado e um processo de aprendizado transdutivo para propagação de rótulos. Foram explorados dois algoritmos de agrupamento semissupervisionados amplamente usados: PCKMeans e MPCK-Means. Na avaliação experimental, os algoritmos semissupervisionados superaram o algoritmo baseado em k-Means e o SVM de uma classe (OC-SVM) na maioria dos cenários. Em particular, o algoritmo baseado em distância MPCK-Means foi dominante na maioria das comparações usando conjuntos de dados numéricos e textuais.Fundação Universidade Federal de Mato Grosso do SulUFMSBrasilone-class learningagrupamento semissupervisionadoaprendizado de métricaPUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionadoinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisBruno Magalhaes NogueiraSHIH TING JUinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMSinstname:Universidade Federal de Mato Grosso do Sul (UFMS)instacron:UFMSTHUMBNAILdissertacao_final_corrigida_shih.pdf.jpgdissertacao_final_corrigida_shih.pdf.jpgGenerated Thumbnailimage/jpeg1088https://repositorio.ufms.br/bitstream/123456789/4032/3/dissertacao_final_corrigida_shih.pdf.jpgf9f335f280f12b8d5882d784a438f9f7MD53TEXTdissertacao_final_corrigida_shih.pdf.txtdissertacao_final_corrigida_shih.pdf.txtExtracted texttext/plain112785https://repositorio.ufms.br/bitstream/123456789/4032/2/dissertacao_final_corrigida_shih.pdf.txt477ffca9d34203c8e64498db160e9060MD52ORIGINALdissertacao_final_corrigida_shih.pdfdissertacao_final_corrigida_shih.pdfapplication/pdf1233742https://repositorio.ufms.br/bitstream/123456789/4032/1/dissertacao_final_corrigida_shih.pdf191b9ece72e8db504291142492709be9MD51123456789/40322021-10-05 03:01:22.241oai:repositorio.ufms.br:123456789/4032Repositório InstitucionalPUBhttps://repositorio.ufms.br/oai/requestri.prograd@ufms.bropendoar:21242021-10-05T07:01:22Repositório Institucional da UFMS - Universidade Federal de Mato Grosso do Sul (UFMS)false
dc.title.pt_BR.fl_str_mv PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado
title PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado
spellingShingle PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado
SHIH TING JU
one-class learning
agrupamento semissupervisionado
aprendizado de métrica
title_short PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado
title_full PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado
title_fullStr PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado
title_full_unstemmed PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado
title_sort PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado
author SHIH TING JU
author_facet SHIH TING JU
author_role author
dc.contributor.advisor1.fl_str_mv Bruno Magalhaes Nogueira
dc.contributor.author.fl_str_mv SHIH TING JU
contributor_str_mv Bruno Magalhaes Nogueira
dc.subject.por.fl_str_mv one-class learning
agrupamento semissupervisionado
aprendizado de métrica
topic one-class learning
agrupamento semissupervisionado
aprendizado de métrica
description The large amount of data currently available is a source for extracting information for commercial and academic purposes. One approach for extracting knowledge on such bases that has gained prominence is one-class classification (OCC). The use of OCC in classifying whether an example is of a specific class is appropriate in datasets where the classes are unbalanced or where only the data of the class of interest are present during the training. Several OCC algorithms found in the literature use unsupervised clustering to delimit the border of the class of interest. These algorithms present competitive results with those presented by other OCC algorithms. Although semisupervised learning has shown the possibility to achieve better results in several areas than with unsupervised, semi-supervised clustering is still little explored for OCC. One approach for OCC is Positive and Unlabeled Learning (PUL), in which learning occurs only with positive (interest) and unlabeled data. PUL algorithms seek to find a delimitation of the positive class. This master’s degree project proposes a new algorithm PUL-SSC (Positive and Unlabeled Learning with Semi-Supervised Clustering) that learns the delimitation of the class of interest by creating and using must-link and cannot-link restrictions, clustering data with semi-supervised algorithm and a transductive learning process for label propagation. Two widely used semi-supervised clustering algorithms were employed: PCK-Means and MPCK-Means. In our experimental evaluation, semi-supervised algorithms outperformed the k-Means based algorithm and one-class SVM (OC-SVM) in most of the scenarios. In particular, the distance-based algorithm MPCK-Means was dominant in most of the comparisons using numerical and textual databases.
publishDate 2021
dc.date.accessioned.fl_str_mv 2021-10-04T17:35:09Z
dc.date.available.fl_str_mv 2021-10-04T17:35:09Z
dc.date.issued.fl_str_mv 2021
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio.ufms.br/handle/123456789/4032
url https://repositorio.ufms.br/handle/123456789/4032
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Fundação Universidade Federal de Mato Grosso do Sul
dc.publisher.initials.fl_str_mv UFMS
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Fundação Universidade Federal de Mato Grosso do Sul
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMS
instname:Universidade Federal de Mato Grosso do Sul (UFMS)
instacron:UFMS
instname_str Universidade Federal de Mato Grosso do Sul (UFMS)
instacron_str UFMS
institution UFMS
reponame_str Repositório Institucional da UFMS
collection Repositório Institucional da UFMS
bitstream.url.fl_str_mv https://repositorio.ufms.br/bitstream/123456789/4032/3/dissertacao_final_corrigida_shih.pdf.jpg
https://repositorio.ufms.br/bitstream/123456789/4032/2/dissertacao_final_corrigida_shih.pdf.txt
https://repositorio.ufms.br/bitstream/123456789/4032/1/dissertacao_final_corrigida_shih.pdf
bitstream.checksum.fl_str_mv f9f335f280f12b8d5882d784a438f9f7
477ffca9d34203c8e64498db160e9060
191b9ece72e8db504291142492709be9
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMS - Universidade Federal de Mato Grosso do Sul (UFMS)
repository.mail.fl_str_mv ri.prograd@ufms.br
_version_ 1807552858906689536