PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFMS |
Texto Completo: | https://repositorio.ufms.br/handle/123456789/4032 |
Resumo: | The large amount of data currently available is a source for extracting information for commercial and academic purposes. One approach for extracting knowledge on such bases that has gained prominence is one-class classification (OCC). The use of OCC in classifying whether an example is of a specific class is appropriate in datasets where the classes are unbalanced or where only the data of the class of interest are present during the training. Several OCC algorithms found in the literature use unsupervised clustering to delimit the border of the class of interest. These algorithms present competitive results with those presented by other OCC algorithms. Although semisupervised learning has shown the possibility to achieve better results in several areas than with unsupervised, semi-supervised clustering is still little explored for OCC. One approach for OCC is Positive and Unlabeled Learning (PUL), in which learning occurs only with positive (interest) and unlabeled data. PUL algorithms seek to find a delimitation of the positive class. This master’s degree project proposes a new algorithm PUL-SSC (Positive and Unlabeled Learning with Semi-Supervised Clustering) that learns the delimitation of the class of interest by creating and using must-link and cannot-link restrictions, clustering data with semi-supervised algorithm and a transductive learning process for label propagation. Two widely used semi-supervised clustering algorithms were employed: PCK-Means and MPCK-Means. In our experimental evaluation, semi-supervised algorithms outperformed the k-Means based algorithm and one-class SVM (OC-SVM) in most of the scenarios. In particular, the distance-based algorithm MPCK-Means was dominant in most of the comparisons using numerical and textual databases. |
id |
UFMS_d3e923ae6ec4261724f1f6b2ad078c32 |
---|---|
oai_identifier_str |
oai:repositorio.ufms.br:123456789/4032 |
network_acronym_str |
UFMS |
network_name_str |
Repositório Institucional da UFMS |
repository_id_str |
2124 |
spelling |
2021-10-04T17:35:09Z2021-10-04T17:35:09Z2021https://repositorio.ufms.br/handle/123456789/4032The large amount of data currently available is a source for extracting information for commercial and academic purposes. One approach for extracting knowledge on such bases that has gained prominence is one-class classification (OCC). The use of OCC in classifying whether an example is of a specific class is appropriate in datasets where the classes are unbalanced or where only the data of the class of interest are present during the training. Several OCC algorithms found in the literature use unsupervised clustering to delimit the border of the class of interest. These algorithms present competitive results with those presented by other OCC algorithms. Although semisupervised learning has shown the possibility to achieve better results in several areas than with unsupervised, semi-supervised clustering is still little explored for OCC. One approach for OCC is Positive and Unlabeled Learning (PUL), in which learning occurs only with positive (interest) and unlabeled data. PUL algorithms seek to find a delimitation of the positive class. This master’s degree project proposes a new algorithm PUL-SSC (Positive and Unlabeled Learning with Semi-Supervised Clustering) that learns the delimitation of the class of interest by creating and using must-link and cannot-link restrictions, clustering data with semi-supervised algorithm and a transductive learning process for label propagation. Two widely used semi-supervised clustering algorithms were employed: PCK-Means and MPCK-Means. In our experimental evaluation, semi-supervised algorithms outperformed the k-Means based algorithm and one-class SVM (OC-SVM) in most of the scenarios. In particular, the distance-based algorithm MPCK-Means was dominant in most of the comparisons using numerical and textual databases.A grande quantidade de dados disponíveis atualmente é uma fonte de extração de informações para fins comerciais e acadêmicos. Uma abordagem para extrair conhecimento em bases de dados que ganhou destaque é a classificação de uma única classe (em inglês, One-class Classification - OCC). O uso de OCC para classificar se um exemplo é de uma classe específica é apropriado em conjuntos de dados em que as classes são desbalanceadas ou apenas os dados da classe de interesse estão presentes durante o treinamento. Vários algoritmos de OCC encontrados na literatura utilizam agrupamento não supervisionado para delimitar a fronteira da classe de interesse. Esses algoritmos conseguem ter resultados competitivos com aqueles apresentados por outros algoritmos de OCC. Embora o aprendizado semissupervisionado tenha mostrado a possibilidade de alcançar melhores resultados em várias áreas do que com o agrupamento semissupervisionado, o agrupamento semissupervisionado ainda é pouco explorado para OCC. Uma abordagem para OCC é o Positive and Unlabeled Learning (PUL), em que o aprendizado ocorre apenas com dados positivos (interesse) e não rotulados. Os algoritmos de PUL procuram encontrar uma delimitação da classe positiva. Este trabalho de mestrado propõe um novo algoritmo PUL-SSC (Positive and Unlabeled Learning with Semi-Supervised Clustering) que aprende a delimitar a classe de interesse através da criação e utilização de restrições must-link e cannot-link, agrupamento de dados com algoritmo semisupervisionado e um processo de aprendizado transdutivo para propagação de rótulos. Foram explorados dois algoritmos de agrupamento semissupervisionados amplamente usados: PCKMeans e MPCK-Means. Na avaliação experimental, os algoritmos semissupervisionados superaram o algoritmo baseado em k-Means e o SVM de uma classe (OC-SVM) na maioria dos cenários. Em particular, o algoritmo baseado em distância MPCK-Means foi dominante na maioria das comparações usando conjuntos de dados numéricos e textuais.Fundação Universidade Federal de Mato Grosso do SulUFMSBrasilone-class learningagrupamento semissupervisionadoaprendizado de métricaPUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionadoinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisBruno Magalhaes NogueiraSHIH TING JUinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMSinstname:Universidade Federal de Mato Grosso do Sul (UFMS)instacron:UFMSTHUMBNAILdissertacao_final_corrigida_shih.pdf.jpgdissertacao_final_corrigida_shih.pdf.jpgGenerated Thumbnailimage/jpeg1088https://repositorio.ufms.br/bitstream/123456789/4032/3/dissertacao_final_corrigida_shih.pdf.jpgf9f335f280f12b8d5882d784a438f9f7MD53TEXTdissertacao_final_corrigida_shih.pdf.txtdissertacao_final_corrigida_shih.pdf.txtExtracted texttext/plain112785https://repositorio.ufms.br/bitstream/123456789/4032/2/dissertacao_final_corrigida_shih.pdf.txt477ffca9d34203c8e64498db160e9060MD52ORIGINALdissertacao_final_corrigida_shih.pdfdissertacao_final_corrigida_shih.pdfapplication/pdf1233742https://repositorio.ufms.br/bitstream/123456789/4032/1/dissertacao_final_corrigida_shih.pdf191b9ece72e8db504291142492709be9MD51123456789/40322021-10-05 03:01:22.241oai:repositorio.ufms.br:123456789/4032Repositório InstitucionalPUBhttps://repositorio.ufms.br/oai/requestri.prograd@ufms.bropendoar:21242021-10-05T07:01:22Repositório Institucional da UFMS - Universidade Federal de Mato Grosso do Sul (UFMS)false |
dc.title.pt_BR.fl_str_mv |
PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado |
title |
PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado |
spellingShingle |
PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado SHIH TING JU one-class learning agrupamento semissupervisionado aprendizado de métrica |
title_short |
PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado |
title_full |
PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado |
title_fullStr |
PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado |
title_full_unstemmed |
PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado |
title_sort |
PUL-SSC: Aprendizado baseado em umaúnica classe com agrupamentosemissupervisionado |
author |
SHIH TING JU |
author_facet |
SHIH TING JU |
author_role |
author |
dc.contributor.advisor1.fl_str_mv |
Bruno Magalhaes Nogueira |
dc.contributor.author.fl_str_mv |
SHIH TING JU |
contributor_str_mv |
Bruno Magalhaes Nogueira |
dc.subject.por.fl_str_mv |
one-class learning agrupamento semissupervisionado aprendizado de métrica |
topic |
one-class learning agrupamento semissupervisionado aprendizado de métrica |
description |
The large amount of data currently available is a source for extracting information for commercial and academic purposes. One approach for extracting knowledge on such bases that has gained prominence is one-class classification (OCC). The use of OCC in classifying whether an example is of a specific class is appropriate in datasets where the classes are unbalanced or where only the data of the class of interest are present during the training. Several OCC algorithms found in the literature use unsupervised clustering to delimit the border of the class of interest. These algorithms present competitive results with those presented by other OCC algorithms. Although semisupervised learning has shown the possibility to achieve better results in several areas than with unsupervised, semi-supervised clustering is still little explored for OCC. One approach for OCC is Positive and Unlabeled Learning (PUL), in which learning occurs only with positive (interest) and unlabeled data. PUL algorithms seek to find a delimitation of the positive class. This master’s degree project proposes a new algorithm PUL-SSC (Positive and Unlabeled Learning with Semi-Supervised Clustering) that learns the delimitation of the class of interest by creating and using must-link and cannot-link restrictions, clustering data with semi-supervised algorithm and a transductive learning process for label propagation. Two widely used semi-supervised clustering algorithms were employed: PCK-Means and MPCK-Means. In our experimental evaluation, semi-supervised algorithms outperformed the k-Means based algorithm and one-class SVM (OC-SVM) in most of the scenarios. In particular, the distance-based algorithm MPCK-Means was dominant in most of the comparisons using numerical and textual databases. |
publishDate |
2021 |
dc.date.accessioned.fl_str_mv |
2021-10-04T17:35:09Z |
dc.date.available.fl_str_mv |
2021-10-04T17:35:09Z |
dc.date.issued.fl_str_mv |
2021 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufms.br/handle/123456789/4032 |
url |
https://repositorio.ufms.br/handle/123456789/4032 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Fundação Universidade Federal de Mato Grosso do Sul |
dc.publisher.initials.fl_str_mv |
UFMS |
dc.publisher.country.fl_str_mv |
Brasil |
publisher.none.fl_str_mv |
Fundação Universidade Federal de Mato Grosso do Sul |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMS instname:Universidade Federal de Mato Grosso do Sul (UFMS) instacron:UFMS |
instname_str |
Universidade Federal de Mato Grosso do Sul (UFMS) |
instacron_str |
UFMS |
institution |
UFMS |
reponame_str |
Repositório Institucional da UFMS |
collection |
Repositório Institucional da UFMS |
bitstream.url.fl_str_mv |
https://repositorio.ufms.br/bitstream/123456789/4032/3/dissertacao_final_corrigida_shih.pdf.jpg https://repositorio.ufms.br/bitstream/123456789/4032/2/dissertacao_final_corrigida_shih.pdf.txt https://repositorio.ufms.br/bitstream/123456789/4032/1/dissertacao_final_corrigida_shih.pdf |
bitstream.checksum.fl_str_mv |
f9f335f280f12b8d5882d784a438f9f7 477ffca9d34203c8e64498db160e9060 191b9ece72e8db504291142492709be9 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFMS - Universidade Federal de Mato Grosso do Sul (UFMS) |
repository.mail.fl_str_mv |
ri.prograd@ufms.br |
_version_ |
1815448057107447808 |