Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Outros Autores: | , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | São Paulo medical journal (Online) |
Texto Completo: | http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-31802017000300234 |
Resumo: | ABSTRACT CONTEXT AND OBJECTIVE: Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. RESULTS: The best models were created using artificial neural networks and logistic regression. These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data. |
id |
APM-1_0c32f898136510887bb79234e5fdb56a |
---|---|
oai_identifier_str |
oai:scielo:S1516-31802017000300234 |
network_acronym_str |
APM-1 |
network_name_str |
São Paulo medical journal (Online) |
repository_id_str |
|
spelling |
Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy studySupervised machine learningDecision support techniquesData miningModels, statisticalDiabetes mellitus, type 2ABSTRACT CONTEXT AND OBJECTIVE: Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. RESULTS: The best models were created using artificial neural networks and logistic regression. These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.Associação Paulista de Medicina - APM2017-06-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-31802017000300234Sao Paulo Medical Journal v.135 n.3 2017reponame:São Paulo medical journal (Online)instname:Associação Paulista de Medicinainstacron:APM10.1590/1516-3180.2016.0309010217info:eu-repo/semantics/openAccessOlivera,André RodriguesRoesler,ValterIochpe,CiranoSchmidt,Maria InêsVigo,ÁlvaroBarreto,Sandhi MariaDuncan,Bruce Bartholoweng2017-07-20T00:00:00Zoai:scielo:S1516-31802017000300234Revistahttp://www.scielo.br/spmjhttps://old.scielo.br/oai/scielo-oai.phprevistas@apm.org.br1806-94601516-3180opendoar:2017-07-20T00:00São Paulo medical journal (Online) - Associação Paulista de Medicinafalse |
dc.title.none.fl_str_mv |
Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study |
title |
Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study |
spellingShingle |
Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study Olivera,André Rodrigues Supervised machine learning Decision support techniques Data mining Models, statistical Diabetes mellitus, type 2 |
title_short |
Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study |
title_full |
Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study |
title_fullStr |
Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study |
title_full_unstemmed |
Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study |
title_sort |
Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study |
author |
Olivera,André Rodrigues |
author_facet |
Olivera,André Rodrigues Roesler,Valter Iochpe,Cirano Schmidt,Maria Inês Vigo,Álvaro Barreto,Sandhi Maria Duncan,Bruce Bartholow |
author_role |
author |
author2 |
Roesler,Valter Iochpe,Cirano Schmidt,Maria Inês Vigo,Álvaro Barreto,Sandhi Maria Duncan,Bruce Bartholow |
author2_role |
author author author author author author |
dc.contributor.author.fl_str_mv |
Olivera,André Rodrigues Roesler,Valter Iochpe,Cirano Schmidt,Maria Inês Vigo,Álvaro Barreto,Sandhi Maria Duncan,Bruce Bartholow |
dc.subject.por.fl_str_mv |
Supervised machine learning Decision support techniques Data mining Models, statistical Diabetes mellitus, type 2 |
topic |
Supervised machine learning Decision support techniques Data mining Models, statistical Diabetes mellitus, type 2 |
description |
ABSTRACT CONTEXT AND OBJECTIVE: Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. RESULTS: The best models were created using artificial neural networks and logistic regression. These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-06-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-31802017000300234 |
url |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S1516-31802017000300234 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.1590/1516-3180.2016.0309010217 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
text/html |
dc.publisher.none.fl_str_mv |
Associação Paulista de Medicina - APM |
publisher.none.fl_str_mv |
Associação Paulista de Medicina - APM |
dc.source.none.fl_str_mv |
Sao Paulo Medical Journal v.135 n.3 2017 reponame:São Paulo medical journal (Online) instname:Associação Paulista de Medicina instacron:APM |
instname_str |
Associação Paulista de Medicina |
instacron_str |
APM |
institution |
APM |
reponame_str |
São Paulo medical journal (Online) |
collection |
São Paulo medical journal (Online) |
repository.name.fl_str_mv |
São Paulo medical journal (Online) - Associação Paulista de Medicina |
repository.mail.fl_str_mv |
revistas@apm.org.br |
_version_ |
1754209265341431808 |