Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study

Detalhes bibliográficos
Autor(a) principal: Olivera,André Rodrigues
Data de Publicação: 2017
Outros Autores: Roesler,Valter, Iochpe,Cirano, Schmidt,Maria Inês, Vigo,Álvaro, Barreto,Sandhi Maria, Duncan,Bruce Bartholow
Tipo de documento: Artigo
Idioma: eng
Título da fonte: São Paulo medical journal (Online)
Texto Completo: http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-31802017000300234
Resumo: ABSTRACT CONTEXT AND OBJECTIVE: Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. RESULTS: The best models were created using artificial neural networks and logistic regression. ­These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.
id APM-1_0c32f898136510887bb79234e5fdb56a
oai_identifier_str oai:scielo:S1516-31802017000300234
network_acronym_str APM-1
network_name_str São Paulo medical journal (Online)
repository_id_str
spelling Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy studySupervised machine learningDecision support techniquesData miningModels, statisticalDiabetes mellitus, type 2ABSTRACT CONTEXT AND OBJECTIVE: Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. RESULTS: The best models were created using artificial neural networks and logistic regression. ­These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.Associação Paulista de Medicina - APM2017-06-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-31802017000300234Sao Paulo Medical Journal v.135 n.3 2017reponame:São Paulo medical journal (Online)instname:Associação Paulista de Medicinainstacron:APM10.1590/1516-3180.2016.0309010217info:eu-repo/semantics/openAccessOlivera,André RodriguesRoesler,ValterIochpe,CiranoSchmidt,Maria InêsVigo,ÁlvaroBarreto,Sandhi MariaDuncan,Bruce Bartholoweng2017-07-20T00:00:00Zoai:scielo:S1516-31802017000300234Revistahttp://www.scielo.br/spmjhttps://old.scielo.br/oai/scielo-oai.phprevistas@apm.org.br1806-94601516-3180opendoar:2017-07-20T00:00São Paulo medical journal (Online) - Associação Paulista de Medicinafalse
dc.title.none.fl_str_mv Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study
title Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study
spellingShingle Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study
Olivera,André Rodrigues
Supervised machine learning
Decision support techniques
Data mining
Models, statistical
Diabetes mellitus, type 2
title_short Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study
title_full Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study
title_fullStr Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study
title_full_unstemmed Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study
title_sort Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study
author Olivera,André Rodrigues
author_facet Olivera,André Rodrigues
Roesler,Valter
Iochpe,Cirano
Schmidt,Maria Inês
Vigo,Álvaro
Barreto,Sandhi Maria
Duncan,Bruce Bartholow
author_role author
author2 Roesler,Valter
Iochpe,Cirano
Schmidt,Maria Inês
Vigo,Álvaro
Barreto,Sandhi Maria
Duncan,Bruce Bartholow
author2_role author
author
author
author
author
author
dc.contributor.author.fl_str_mv Olivera,André Rodrigues
Roesler,Valter
Iochpe,Cirano
Schmidt,Maria Inês
Vigo,Álvaro
Barreto,Sandhi Maria
Duncan,Bruce Bartholow
dc.subject.por.fl_str_mv Supervised machine learning
Decision support techniques
Data mining
Models, statistical
Diabetes mellitus, type 2
topic Supervised machine learning
Decision support techniques
Data mining
Models, statistical
Diabetes mellitus, type 2
description ABSTRACT CONTEXT AND OBJECTIVE: Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. RESULTS: The best models were created using artificial neural networks and logistic regression. ­These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.
publishDate 2017
dc.date.none.fl_str_mv 2017-06-01
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-31802017000300234
url http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-31802017000300234
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1590/1516-3180.2016.0309010217
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv text/html
dc.publisher.none.fl_str_mv Associação Paulista de Medicina - APM
publisher.none.fl_str_mv Associação Paulista de Medicina - APM
dc.source.none.fl_str_mv Sao Paulo Medical Journal v.135 n.3 2017
reponame:São Paulo medical journal (Online)
instname:Associação Paulista de Medicina
instacron:APM
instname_str Associação Paulista de Medicina
instacron_str APM
institution APM
reponame_str São Paulo medical journal (Online)
collection São Paulo medical journal (Online)
repository.name.fl_str_mv São Paulo medical journal (Online) - Associação Paulista de Medicina
repository.mail.fl_str_mv revistas@apm.org.br
_version_ 1754209265341431808