Performance of combined models in discrete binary classification

Marques, A.; Ferreira, A. S.; Cardoso, M. G. M. S.

Performance of combined models in discrete binary classification

Detalhes bibliográficos
Autor(a) principal:	Marques, A.
Data de Publicação:	2017
Outros Autores:	Ferreira, A. S., Cardoso, M. G. M. S.
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10071/13072
Resumo:	Diverse Discrete Discriminant Analysis (DDA) models perform differently in different samples. This fact has encouraged research in combined models which seems particularly promising when the a priori classes are not well separated or when small or moderate sized samples are considered, which often occurs in practice. In this study, we evaluate the performance of a convex combination of two DDA models: the First-Order Independence Model (FOIM) and the Dependence Trees Model (DTM). We use simulated data sets with two classes and consider diverse data complexity factors which may influence performance of the combined model -the separation of classes, balance, and number of missing states, as well as sample size and also the number of parameters to be estimated in DDA. We resort to cross-validation to evaluate the precision of classification. The results obtained illustrate the advantage of the proposed combination when compared with FOIM and DTM: it yields the best results, especially when very small samples are considered. The experimental study also provides a ranking of the data complexity factors, according to their relative impact on classification performance, by means of a regression model. It leads to the conclusion that the separation of classes is the most influential factor in classification performance. The ratio between the number of degrees of freedom and sample size, along with the proportion of missing states in the minority class, also has significant impact on classification performance. An additional gain of this study, also deriving from the estimated regression model, is the ability to successfully predict the precision of classification in a real data set based on the data complexity factors.

Metadados do item

id	RCAP_e56692c4df942a7915fba014c7237ac8
oai_identifier_str	oai:repositorio.iscte-iul.pt:10071/13072
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Performance of combined models in discrete binary classificationClassification performanceCombined models for classificationDiscrete discriminant analysisSeparabilityDiverse Discrete Discriminant Analysis (DDA) models perform differently in different samples. This fact has encouraged research in combined models which seems particularly promising when the a priori classes are not well separated or when small or moderate sized samples are considered, which often occurs in practice. In this study, we evaluate the performance of a convex combination of two DDA models: the First-Order Independence Model (FOIM) and the Dependence Trees Model (DTM). We use simulated data sets with two classes and consider diverse data complexity factors which may influence performance of the combined model -the separation of classes, balance, and number of missing states, as well as sample size and also the number of parameters to be estimated in DDA. We resort to cross-validation to evaluate the precision of classification. The results obtained illustrate the advantage of the proposed combination when compared with FOIM and DTM: it yields the best results, especially when very small samples are considered. The experimental study also provides a ranking of the data complexity factors, according to their relative impact on classification performance, by means of a regression model. It leads to the conclusion that the separation of classes is the most influential factor in classification performance. The ratio between the number of degrees of freedom and sample size, along with the proportion of missing states in the minority class, also has significant impact on classification performance. An additional gain of this study, also deriving from the estimated regression model, is the ability to successfully predict the precision of classification in a real data set based on the data complexity factors.Hogrefe and Huber Publisher2017-04-20T15:21:27Z2017-01-01T00:00:00Z20172019-03-21T17:47:10Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10071/13072eng1614-188110.1027/1614-2241/a000117Marques, A.Ferreira, A. S.Cardoso, M. G. M. S.info:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:51:24Zoai:repositorio.iscte-iul.pt:10071/13072Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:25:27.552711Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Performance of combined models in discrete binary classification
title	Performance of combined models in discrete binary classification
spellingShingle	Performance of combined models in discrete binary classification Marques, A. Classification performance Combined models for classification Discrete discriminant analysis Separability
title_short	Performance of combined models in discrete binary classification
title_full	Performance of combined models in discrete binary classification
title_fullStr	Performance of combined models in discrete binary classification
title_full_unstemmed	Performance of combined models in discrete binary classification
title_sort	Performance of combined models in discrete binary classification
author	Marques, A.
author_facet	Marques, A. Ferreira, A. S. Cardoso, M. G. M. S.
author_role	author
author2	Ferreira, A. S. Cardoso, M. G. M. S.
author2_role	author author
dc.contributor.author.fl_str_mv	Marques, A. Ferreira, A. S. Cardoso, M. G. M. S.
dc.subject.por.fl_str_mv	Classification performance Combined models for classification Discrete discriminant analysis Separability
topic	Classification performance Combined models for classification Discrete discriminant analysis Separability
description	Diverse Discrete Discriminant Analysis (DDA) models perform differently in different samples. This fact has encouraged research in combined models which seems particularly promising when the a priori classes are not well separated or when small or moderate sized samples are considered, which often occurs in practice. In this study, we evaluate the performance of a convex combination of two DDA models: the First-Order Independence Model (FOIM) and the Dependence Trees Model (DTM). We use simulated data sets with two classes and consider diverse data complexity factors which may influence performance of the combined model -the separation of classes, balance, and number of missing states, as well as sample size and also the number of parameters to be estimated in DDA. We resort to cross-validation to evaluate the precision of classification. The results obtained illustrate the advantage of the proposed combination when compared with FOIM and DTM: it yields the best results, especially when very small samples are considered. The experimental study also provides a ranking of the data complexity factors, according to their relative impact on classification performance, by means of a regression model. It leads to the conclusion that the separation of classes is the most influential factor in classification performance. The ratio between the number of degrees of freedom and sample size, along with the proportion of missing states in the minority class, also has significant impact on classification performance. An additional gain of this study, also deriving from the estimated regression model, is the ability to successfully predict the precision of classification in a real data set based on the data complexity factors.
publishDate	2017
dc.date.none.fl_str_mv	2017-04-20T15:21:27Z 2017-01-01T00:00:00Z 2017 2019-03-21T17:47:10Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10071/13072
url	http://hdl.handle.net/10071/13072
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	1614-1881 10.1027/1614-2241/a000117
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/embargoedAccess
eu_rights_str_mv	embargoedAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Hogrefe and Huber Publisher
publisher.none.fl_str_mv	Hogrefe and Huber Publisher
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799134817492664320

Performance of combined models in discrete binary classification

Registros relacionados