Evaluation of Multiclass Novelty Detection Algorithms for Data Streams

Detalhes bibliográficos
Autor(a) principal: de Faria,ER
Data de Publicação: 2015
Outros Autores: Goncalves,IR, João Gama, de Leon Ferreira Carvalho,ACPDF
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://repositorio.inesctec.pt/handle/123456789/5312
http://dx.doi.org/10.1109/tkde.2015.2441713
Resumo: Data stream mining is an emergent research area that investigates knowledge extraction from large amounts of continuously generated data, produced by non-stationary distribution. Novelty detection, the ability to identify new or previously unknown situations, is a useful ability for learning systems, especially when dealing with data streams, where concepts may appear, disappear, or evolve over time. There are several studies currently investigating the application of novelty detection techniques in data streams. However, there is no consensus regarding how to evaluate the performance of these techniques. In this study, we propose a new evaluation methodology for multiclass novelty detection in data streams able to deal with: i) unsupervised learning, which generates novelty patterns without an association with the true classes, where one class may be composed of a novelty set, ii) confusion matrix that increases over time, iii) confusion matrix with a column representing unknown examples, i.e., those not explained by the model, and iv) representation of the evaluation measures over time. We propose a new methodology to associate the novelty patterns detected by the algorithm, in an unsupervised fashion, with the true classes. Finally, we evaluate the performance of the proposed methodology through the use of known novelty detection algorithms with artificial and real data sets.
id RCAP_e8154b1afdfb286afe3cb13d6d55d0b1
oai_identifier_str oai:repositorio.inesctec.pt:123456789/5312
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Evaluation of Multiclass Novelty Detection Algorithms for Data StreamsData stream mining is an emergent research area that investigates knowledge extraction from large amounts of continuously generated data, produced by non-stationary distribution. Novelty detection, the ability to identify new or previously unknown situations, is a useful ability for learning systems, especially when dealing with data streams, where concepts may appear, disappear, or evolve over time. There are several studies currently investigating the application of novelty detection techniques in data streams. However, there is no consensus regarding how to evaluate the performance of these techniques. In this study, we propose a new evaluation methodology for multiclass novelty detection in data streams able to deal with: i) unsupervised learning, which generates novelty patterns without an association with the true classes, where one class may be composed of a novelty set, ii) confusion matrix that increases over time, iii) confusion matrix with a column representing unknown examples, i.e., those not explained by the model, and iv) representation of the evaluation measures over time. We propose a new methodology to associate the novelty patterns detected by the algorithm, in an unsupervised fashion, with the true classes. Finally, we evaluate the performance of the proposed methodology through the use of known novelty detection algorithms with artificial and real data sets.2018-01-03T10:35:14Z2015-01-01T00:00:00Z2015info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.inesctec.pt/handle/123456789/5312http://dx.doi.org/10.1109/tkde.2015.2441713engde Faria,ERGoncalves,IRJoão Gamade Leon Ferreira Carvalho,ACPDFinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-05-15T10:20:45Zoai:repositorio.inesctec.pt:123456789/5312Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:53:34.732033Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Evaluation of Multiclass Novelty Detection Algorithms for Data Streams
title Evaluation of Multiclass Novelty Detection Algorithms for Data Streams
spellingShingle Evaluation of Multiclass Novelty Detection Algorithms for Data Streams
de Faria,ER
title_short Evaluation of Multiclass Novelty Detection Algorithms for Data Streams
title_full Evaluation of Multiclass Novelty Detection Algorithms for Data Streams
title_fullStr Evaluation of Multiclass Novelty Detection Algorithms for Data Streams
title_full_unstemmed Evaluation of Multiclass Novelty Detection Algorithms for Data Streams
title_sort Evaluation of Multiclass Novelty Detection Algorithms for Data Streams
author de Faria,ER
author_facet de Faria,ER
Goncalves,IR
João Gama
de Leon Ferreira Carvalho,ACPDF
author_role author
author2 Goncalves,IR
João Gama
de Leon Ferreira Carvalho,ACPDF
author2_role author
author
author
dc.contributor.author.fl_str_mv de Faria,ER
Goncalves,IR
João Gama
de Leon Ferreira Carvalho,ACPDF
description Data stream mining is an emergent research area that investigates knowledge extraction from large amounts of continuously generated data, produced by non-stationary distribution. Novelty detection, the ability to identify new or previously unknown situations, is a useful ability for learning systems, especially when dealing with data streams, where concepts may appear, disappear, or evolve over time. There are several studies currently investigating the application of novelty detection techniques in data streams. However, there is no consensus regarding how to evaluate the performance of these techniques. In this study, we propose a new evaluation methodology for multiclass novelty detection in data streams able to deal with: i) unsupervised learning, which generates novelty patterns without an association with the true classes, where one class may be composed of a novelty set, ii) confusion matrix that increases over time, iii) confusion matrix with a column representing unknown examples, i.e., those not explained by the model, and iv) representation of the evaluation measures over time. We propose a new methodology to associate the novelty patterns detected by the algorithm, in an unsupervised fashion, with the true classes. Finally, we evaluate the performance of the proposed methodology through the use of known novelty detection algorithms with artificial and real data sets.
publishDate 2015
dc.date.none.fl_str_mv 2015-01-01T00:00:00Z
2015
2018-01-03T10:35:14Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://repositorio.inesctec.pt/handle/123456789/5312
http://dx.doi.org/10.1109/tkde.2015.2441713
url http://repositorio.inesctec.pt/handle/123456789/5312
http://dx.doi.org/10.1109/tkde.2015.2441713
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799131609791725568