Label noise detection under Noise at Random model with ensemble filters
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFPE |
Texto Completo: | https://repositorio.ufpe.br/handle/123456789/36043 |
Resumo: | Label noise detection has been widely studied in Machine Learning due to its importance to improve training data quality. Satisfactory noise detection has been achieved by adopting an ensemble of classifiers. In this approach, an instance is assigned as mislabeled if a high proportion of members in the pool misclassifies that instance. Previous authors have empirically evaluated this approach with results in accuracy, nevertheless, they mostly assumed that label noise is generated completely at random in a dataset. This is a strong assumption since there are other types of label noise which are feasible in practice and can influence noise detection results. This work investigates the performance of ensemble noise detection in two different noise models: the Noisy at Random (NAR), in which the probability of label noise depends on the instance class, in comparison to the Noisy Completely at Random model, in which the probability of label noise is completely independent. In this setting, we also investigate the effect of class distribution on noise detection performance, since it changes the total noise level observed in a dataset under the NAR assumption. Further, an evaluation of the ensemble vote threshold is carried out to contrast with the most common approaches in the literature. Finally, it is shown in a number of performed experiments that the choice of a noise generation model over another can lead to distinct results when taking into consideration aspects such as class imbalance and noise level ratio among different classes. |
id |
UFPE_18141e8b69e5533b1b14735c02312193 |
---|---|
oai_identifier_str |
oai:repositorio.ufpe.br:123456789/36043 |
network_acronym_str |
UFPE |
network_name_str |
Repositório Institucional da UFPE |
repository_id_str |
2221 |
spelling |
MOURA, Kecia Gomes dehttp://lattes.cnpq.br/9290980721484997http://lattes.cnpq.br/2984888073123287http://lattes.cnpq.br/8577312109146354PRUDÊNCIO, Ricardo Bastos CavalcanteCAVALCANTI, George Darmiton da Cunha2020-01-17T12:05:08Z2020-01-17T12:05:08Z2019-03-15MOURA, Kecia Gomes de. Label noise detection under Noise at Random model with ensemble filters. 2019. Dissertação (Mestrado em Ciência da computação) – Universidade Federal de Pernambuco, Recife, 2019.https://repositorio.ufpe.br/handle/123456789/36043Label noise detection has been widely studied in Machine Learning due to its importance to improve training data quality. Satisfactory noise detection has been achieved by adopting an ensemble of classifiers. In this approach, an instance is assigned as mislabeled if a high proportion of members in the pool misclassifies that instance. Previous authors have empirically evaluated this approach with results in accuracy, nevertheless, they mostly assumed that label noise is generated completely at random in a dataset. This is a strong assumption since there are other types of label noise which are feasible in practice and can influence noise detection results. This work investigates the performance of ensemble noise detection in two different noise models: the Noisy at Random (NAR), in which the probability of label noise depends on the instance class, in comparison to the Noisy Completely at Random model, in which the probability of label noise is completely independent. In this setting, we also investigate the effect of class distribution on noise detection performance, since it changes the total noise level observed in a dataset under the NAR assumption. Further, an evaluation of the ensemble vote threshold is carried out to contrast with the most common approaches in the literature. Finally, it is shown in a number of performed experiments that the choice of a noise generation model over another can lead to distinct results when taking into consideration aspects such as class imbalance and noise level ratio among different classes.CNPqA detecção de ruído de dados tem sido amplamente estudada em Aprendizagem de Máquina devido à sua importância para melhorar a qualidade dos dados de treinamento. Uma detecção de ruído satisfatória tem sido conseguida através da utilização de um conjunto de classificadores (ensemble). Nessa abordagem, uma instância é considerada como rotulada erroneamente se uma alta proporção de classificadores a classificarem incorretamente. Trabalhos anteriores avaliaram empiricamente esta abordagem obtendo resultados na acurácia. No entanto, a maioria deles, assumem que o ruído de rótulo é gerado completamente ao acaso em um conjunto de dados. Essa suposição singular pode induzir em erro ou a resultados incompletos uma vez que existem outros tipos de ruídos de rótulo que são viáveis na prática e podem influenciar os resultados de detecção. Este trabalho investiga o desempenho da detecção de ruído levando em consideração o modelo "Noisy at Random" (NAR), no qual a probabilidade de ruído de rótulo depende da classe da instância, em comparação ao modelo "Noisy Completely at Random" (NCAR), em que o ruído de rótulo é totalmente aleatório. Nesse cenário, também investigamos o efeito do desbalanceamento de classes no desempenho da detecção de ruído, uma vez que essa desproporção altera o nível total de ruído observado quando há a suposição de NAR. Além disso, uma avaliação do limiar para a votação do ensemble é realizada para contrastar com as abordagens mais comuns na literatura. Finalmente, é demonstrado em vários experimentos realizados que a escolha por um modelo de geração de ruído em detrimento de outro pode levar a resultados distintos considerando-se aspectos como desbalanceamento de classes e proporção de ruído em cada classe.porUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessMachine LearningDetecção de RuídoCombinação de ClassificadoresRuído AleatórioLabel noise detection under Noise at Random model with ensemble filtersinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPEORIGINALDISSERTAÇÃO Kecia Gomes de Moura.pdfDISSERTAÇÃO Kecia Gomes de Moura.pdfapplication/pdf14581706https://repositorio.ufpe.br/bitstream/123456789/36043/1/DISSERTA%c3%87%c3%83O%20Kecia%20Gomes%20de%20Moura.pdf74a15642887988cf4b0828145119f76fMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/36043/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82310https://repositorio.ufpe.br/bitstream/123456789/36043/3/license.txtbd573a5ca8288eb7272482765f819534MD53TEXTDISSERTAÇÃO Kecia Gomes de Moura.pdf.txtDISSERTAÇÃO Kecia Gomes de Moura.pdf.txtExtracted texttext/plain151393https://repositorio.ufpe.br/bitstream/123456789/36043/4/DISSERTA%c3%87%c3%83O%20Kecia%20Gomes%20de%20Moura.pdf.txt7704117667298464b6427ce87302f542MD54THUMBNAILDISSERTAÇÃO Kecia Gomes de Moura.pdf.jpgDISSERTAÇÃO Kecia Gomes de Moura.pdf.jpgGenerated Thumbnailimage/jpeg1252https://repositorio.ufpe.br/bitstream/123456789/36043/5/DISSERTA%c3%87%c3%83O%20Kecia%20Gomes%20de%20Moura.pdf.jpgce01ec110c3fa57fbb3b047fb3c18066MD55123456789/360432020-01-18 02:15:56.005oai:repositorio.ufpe.br:123456789/36043TGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKClRvZG8gZGVwb3NpdGFudGUgZGUgbWF0ZXJpYWwgbm8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgKFJJKSBkZXZlIGNvbmNlZGVyLCDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIChVRlBFKSwgdW1hIExpY2Vuw6dhIGRlIERpc3RyaWJ1acOnw6NvIE7Do28gRXhjbHVzaXZhIHBhcmEgbWFudGVyIGUgdG9ybmFyIGFjZXNzw612ZWlzIG9zIHNldXMgZG9jdW1lbnRvcywgZW0gZm9ybWF0byBkaWdpdGFsLCBuZXN0ZSByZXBvc2l0w7NyaW8uCgpDb20gYSBjb25jZXNzw6NvIGRlc3RhIGxpY2Vuw6dhIG7Do28gZXhjbHVzaXZhLCBvIGRlcG9zaXRhbnRlIG1hbnTDqW0gdG9kb3Mgb3MgZGlyZWl0b3MgZGUgYXV0b3IuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwoKTGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKCkFvIGNvbmNvcmRhciBjb20gZXN0YSBsaWNlbsOnYSBlIGFjZWl0w6EtbGEsIHZvY8OqIChhdXRvciBvdSBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMpOgoKYSkgRGVjbGFyYSBxdWUgY29uaGVjZSBhIHBvbMOtdGljYSBkZSBjb3B5cmlnaHQgZGEgZWRpdG9yYSBkbyBzZXUgZG9jdW1lbnRvOwpiKSBEZWNsYXJhIHF1ZSBjb25oZWNlIGUgYWNlaXRhIGFzIERpcmV0cml6ZXMgcGFyYSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGUEU7CmMpIENvbmNlZGUgw6AgVUZQRSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZGUgYXJxdWl2YXIsIHJlcHJvZHV6aXIsIGNvbnZlcnRlciAoY29tbyBkZWZpbmlkbyBhIHNlZ3VpciksIGNvbXVuaWNhciBlL291IGRpc3RyaWJ1aXIsIG5vIFJJLCBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbSBmb3JtYXRvIGRpZ2l0YWwgb3UgcG9yIG91dHJvIG1laW87CmQpIERlY2xhcmEgcXVlIGF1dG9yaXphIGEgVUZQRSBhIGFycXVpdmFyIG1haXMgZGUgdW1hIGPDs3BpYSBkZXN0ZSBkb2N1bWVudG8gZSBjb252ZXJ0w6otbG8sIHNlbSBhbHRlcmFyIG8gc2V1IGNvbnRlw7pkbywgcGFyYSBxdWFscXVlciBmb3JtYXRvIGRlIGZpY2hlaXJvLCBtZWlvIG91IHN1cG9ydGUsIHBhcmEgZWZlaXRvcyBkZSBzZWd1cmFuw6dhLCBwcmVzZXJ2YcOnw6NvIChiYWNrdXApIGUgYWNlc3NvOwplKSBEZWNsYXJhIHF1ZSBvIGRvY3VtZW50byBzdWJtZXRpZG8gw6kgbyBzZXUgdHJhYmFsaG8gb3JpZ2luYWwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBhIHRlcmNlaXJvcyBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBhIGVudHJlZ2EgZG8gZG9jdW1lbnRvIG7Do28gaW5mcmluZ2Ugb3MgZGlyZWl0b3MgZGUgb3V0cmEgcGVzc29hIG91IGVudGlkYWRlOwpmKSBEZWNsYXJhIHF1ZSwgbm8gY2FzbyBkbyBkb2N1bWVudG8gc3VibWV0aWRvIGNvbnRlciBtYXRlcmlhbCBkbyBxdWFsIG7Do28gZGV0w6ltIG9zIGRpcmVpdG9zIGRlCmF1dG9yLCBvYnRldmUgYSBhdXRvcml6YcOnw6NvIGlycmVzdHJpdGEgZG8gcmVzcGVjdGl2byBkZXRlbnRvciBkZXNzZXMgZGlyZWl0b3MgcGFyYSBjZWRlciDDoApVRlBFIG9zIGRpcmVpdG9zIHJlcXVlcmlkb3MgcG9yIGVzdGEgTGljZW7Dp2EgZSBhdXRvcml6YXIgYSB1bml2ZXJzaWRhZGUgYSB1dGlsaXrDoS1sb3MgbGVnYWxtZW50ZS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBlc3NlIG1hdGVyaWFsIGN1am9zIGRpcmVpdG9zIHPDo28gZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3UgY29udGXDumRvIGRvIGRvY3VtZW50byBlbnRyZWd1ZTsKZykgU2UgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgYmFzZWFkbyBlbSB0cmFiYWxobyBmaW5hbmNpYWRvIG91IGFwb2lhZG8gcG9yIG91dHJhIGluc3RpdHVpw6fDo28gcXVlIG7Do28gYSBVRlBFLCBkZWNsYXJhIHF1ZSBjdW1wcml1IHF1YWlzcXVlciBvYnJpZ2HDp8O1ZXMgZXhpZ2lkYXMgcGVsbyByZXNwZWN0aXZvIGNvbnRyYXRvIG91IGFjb3Jkby4KCkEgVUZQRSBpZGVudGlmaWNhcsOhIGNsYXJhbWVudGUgbyhzKSBub21lKHMpIGRvKHMpIGF1dG9yIChlcykgZG9zIGRpcmVpdG9zIGRvIGRvY3VtZW50byBlbnRyZWd1ZSBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIHBhcmEgYWzDqW0gZG8gcHJldmlzdG8gbmEgYWzDrW5lYSBjKS4KRepositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212020-01-18T05:15:56Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false |
dc.title.pt_BR.fl_str_mv |
Label noise detection under Noise at Random model with ensemble filters |
title |
Label noise detection under Noise at Random model with ensemble filters |
spellingShingle |
Label noise detection under Noise at Random model with ensemble filters MOURA, Kecia Gomes de Machine Learning Detecção de Ruído Combinação de Classificadores Ruído Aleatório |
title_short |
Label noise detection under Noise at Random model with ensemble filters |
title_full |
Label noise detection under Noise at Random model with ensemble filters |
title_fullStr |
Label noise detection under Noise at Random model with ensemble filters |
title_full_unstemmed |
Label noise detection under Noise at Random model with ensemble filters |
title_sort |
Label noise detection under Noise at Random model with ensemble filters |
author |
MOURA, Kecia Gomes de |
author_facet |
MOURA, Kecia Gomes de |
author_role |
author |
dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/9290980721484997 |
dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/2984888073123287 |
dc.contributor.advisor-coLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/8577312109146354 |
dc.contributor.author.fl_str_mv |
MOURA, Kecia Gomes de |
dc.contributor.advisor1.fl_str_mv |
PRUDÊNCIO, Ricardo Bastos Cavalcante |
dc.contributor.advisor-co1.fl_str_mv |
CAVALCANTI, George Darmiton da Cunha |
contributor_str_mv |
PRUDÊNCIO, Ricardo Bastos Cavalcante CAVALCANTI, George Darmiton da Cunha |
dc.subject.por.fl_str_mv |
Machine Learning Detecção de Ruído Combinação de Classificadores Ruído Aleatório |
topic |
Machine Learning Detecção de Ruído Combinação de Classificadores Ruído Aleatório |
description |
Label noise detection has been widely studied in Machine Learning due to its importance to improve training data quality. Satisfactory noise detection has been achieved by adopting an ensemble of classifiers. In this approach, an instance is assigned as mislabeled if a high proportion of members in the pool misclassifies that instance. Previous authors have empirically evaluated this approach with results in accuracy, nevertheless, they mostly assumed that label noise is generated completely at random in a dataset. This is a strong assumption since there are other types of label noise which are feasible in practice and can influence noise detection results. This work investigates the performance of ensemble noise detection in two different noise models: the Noisy at Random (NAR), in which the probability of label noise depends on the instance class, in comparison to the Noisy Completely at Random model, in which the probability of label noise is completely independent. In this setting, we also investigate the effect of class distribution on noise detection performance, since it changes the total noise level observed in a dataset under the NAR assumption. Further, an evaluation of the ensemble vote threshold is carried out to contrast with the most common approaches in the literature. Finally, it is shown in a number of performed experiments that the choice of a noise generation model over another can lead to distinct results when taking into consideration aspects such as class imbalance and noise level ratio among different classes. |
publishDate |
2019 |
dc.date.issued.fl_str_mv |
2019-03-15 |
dc.date.accessioned.fl_str_mv |
2020-01-17T12:05:08Z |
dc.date.available.fl_str_mv |
2020-01-17T12:05:08Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
MOURA, Kecia Gomes de. Label noise detection under Noise at Random model with ensemble filters. 2019. Dissertação (Mestrado em Ciência da computação) – Universidade Federal de Pernambuco, Recife, 2019. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufpe.br/handle/123456789/36043 |
identifier_str_mv |
MOURA, Kecia Gomes de. Label noise detection under Noise at Random model with ensemble filters. 2019. Dissertação (Mestrado em Ciência da computação) – Universidade Federal de Pernambuco, Recife, 2019. |
url |
https://repositorio.ufpe.br/handle/123456789/36043 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.publisher.program.fl_str_mv |
Programa de Pos Graduacao em Ciencia da Computacao |
dc.publisher.initials.fl_str_mv |
UFPE |
dc.publisher.country.fl_str_mv |
Brasil |
publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE |
instname_str |
Universidade Federal de Pernambuco (UFPE) |
instacron_str |
UFPE |
institution |
UFPE |
reponame_str |
Repositório Institucional da UFPE |
collection |
Repositório Institucional da UFPE |
bitstream.url.fl_str_mv |
https://repositorio.ufpe.br/bitstream/123456789/36043/1/DISSERTA%c3%87%c3%83O%20Kecia%20Gomes%20de%20Moura.pdf https://repositorio.ufpe.br/bitstream/123456789/36043/2/license_rdf https://repositorio.ufpe.br/bitstream/123456789/36043/3/license.txt https://repositorio.ufpe.br/bitstream/123456789/36043/4/DISSERTA%c3%87%c3%83O%20Kecia%20Gomes%20de%20Moura.pdf.txt https://repositorio.ufpe.br/bitstream/123456789/36043/5/DISSERTA%c3%87%c3%83O%20Kecia%20Gomes%20de%20Moura.pdf.jpg |
bitstream.checksum.fl_str_mv |
74a15642887988cf4b0828145119f76f e39d27027a6cc9cb039ad269a5db8e34 bd573a5ca8288eb7272482765f819534 7704117667298464b6427ce87302f542 ce01ec110c3fa57fbb3b047fb3c18066 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE) |
repository.mail.fl_str_mv |
attena@ufpe.br |
_version_ |
1802310795576475648 |