Performance of combined models in discrete binary classification

Detalhes bibliográficos
Autor(a) principal: Marques, A.
Data de Publicação: 2017
Outros Autores: Ferreira, A. S., Cardoso, M. G. M. S.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/13072
Resumo: Diverse Discrete Discriminant Analysis (DDA) models perform differently in different samples. This fact has encouraged research in combined models which seems particularly promising when the a priori classes are not well separated or when small or moderate sized samples are considered, which often occurs in practice. In this study, we evaluate the performance of a convex combination of two DDA models: the First-Order Independence Model (FOIM) and the Dependence Trees Model (DTM). We use simulated data sets with two classes and consider diverse data complexity factors which may influence performance of the combined model -the separation of classes, balance, and number of missing states, as well as sample size and also the number of parameters to be estimated in DDA. We resort to cross-validation to evaluate the precision of classification. The results obtained illustrate the advantage of the proposed combination when compared with FOIM and DTM: it yields the best results, especially when very small samples are considered. The experimental study also provides a ranking of the data complexity factors, according to their relative impact on classification performance, by means of a regression model. It leads to the conclusion that the separation of classes is the most influential factor in classification performance. The ratio between the number of degrees of freedom and sample size, along with the proportion of missing states in the minority class, also has significant impact on classification performance. An additional gain of this study, also deriving from the estimated regression model, is the ability to successfully predict the precision of classification in a real data set based on the data complexity factors.
id RCAP_e56692c4df942a7915fba014c7237ac8
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/13072
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Performance of combined models in discrete binary classificationClassification performanceCombined models for classificationDiscrete discriminant analysisSeparabilityDiverse Discrete Discriminant Analysis (DDA) models perform differently in different samples. This fact has encouraged research in combined models which seems particularly promising when the a priori classes are not well separated or when small or moderate sized samples are considered, which often occurs in practice. In this study, we evaluate the performance of a convex combination of two DDA models: the First-Order Independence Model (FOIM) and the Dependence Trees Model (DTM). We use simulated data sets with two classes and consider diverse data complexity factors which may influence performance of the combined model -the separation of classes, balance, and number of missing states, as well as sample size and also the number of parameters to be estimated in DDA. We resort to cross-validation to evaluate the precision of classification. The results obtained illustrate the advantage of the proposed combination when compared with FOIM and DTM: it yields the best results, especially when very small samples are considered. The experimental study also provides a ranking of the data complexity factors, according to their relative impact on classification performance, by means of a regression model. It leads to the conclusion that the separation of classes is the most influential factor in classification performance. The ratio between the number of degrees of freedom and sample size, along with the proportion of missing states in the minority class, also has significant impact on classification performance. An additional gain of this study, also deriving from the estimated regression model, is the ability to successfully predict the precision of classification in a real data set based on the data complexity factors.Hogrefe and Huber Publisher2017-04-20T15:21:27Z2017-01-01T00:00:00Z20172019-03-21T17:47:10Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10071/13072eng1614-188110.1027/1614-2241/a000117Marques, A.Ferreira, A. S.Cardoso, M. G. M. S.info:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:51:24Zoai:repositorio.iscte-iul.pt:10071/13072Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:25:27.552711Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Performance of combined models in discrete binary classification
title Performance of combined models in discrete binary classification
spellingShingle Performance of combined models in discrete binary classification
Marques, A.
Classification performance
Combined models for classification
Discrete discriminant analysis
Separability
title_short Performance of combined models in discrete binary classification
title_full Performance of combined models in discrete binary classification
title_fullStr Performance of combined models in discrete binary classification
title_full_unstemmed Performance of combined models in discrete binary classification
title_sort Performance of combined models in discrete binary classification
author Marques, A.
author_facet Marques, A.
Ferreira, A. S.
Cardoso, M. G. M. S.
author_role author
author2 Ferreira, A. S.
Cardoso, M. G. M. S.
author2_role author
author
dc.contributor.author.fl_str_mv Marques, A.
Ferreira, A. S.
Cardoso, M. G. M. S.
dc.subject.por.fl_str_mv Classification performance
Combined models for classification
Discrete discriminant analysis
Separability
topic Classification performance
Combined models for classification
Discrete discriminant analysis
Separability
description Diverse Discrete Discriminant Analysis (DDA) models perform differently in different samples. This fact has encouraged research in combined models which seems particularly promising when the a priori classes are not well separated or when small or moderate sized samples are considered, which often occurs in practice. In this study, we evaluate the performance of a convex combination of two DDA models: the First-Order Independence Model (FOIM) and the Dependence Trees Model (DTM). We use simulated data sets with two classes and consider diverse data complexity factors which may influence performance of the combined model -the separation of classes, balance, and number of missing states, as well as sample size and also the number of parameters to be estimated in DDA. We resort to cross-validation to evaluate the precision of classification. The results obtained illustrate the advantage of the proposed combination when compared with FOIM and DTM: it yields the best results, especially when very small samples are considered. The experimental study also provides a ranking of the data complexity factors, according to their relative impact on classification performance, by means of a regression model. It leads to the conclusion that the separation of classes is the most influential factor in classification performance. The ratio between the number of degrees of freedom and sample size, along with the proportion of missing states in the minority class, also has significant impact on classification performance. An additional gain of this study, also deriving from the estimated regression model, is the ability to successfully predict the precision of classification in a real data set based on the data complexity factors.
publishDate 2017
dc.date.none.fl_str_mv 2017-04-20T15:21:27Z
2017-01-01T00:00:00Z
2017
2019-03-21T17:47:10Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/13072
url http://hdl.handle.net/10071/13072
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1614-1881
10.1027/1614-2241/a000117
dc.rights.driver.fl_str_mv info:eu-repo/semantics/embargoedAccess
eu_rights_str_mv embargoedAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Hogrefe and Huber Publisher
publisher.none.fl_str_mv Hogrefe and Huber Publisher
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134817492664320