Strategic procedure in three stages for the selection of variables to obtain balanced results in public health research
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Cadernos de Saúde Pública |
Texto Completo: | http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2018000704001 |
Resumo: | Multidisciplinary research in public health is approached using methods from many scientific disciplines. One of the main characteristics of this type of research is dealing with large data sets. Classic statistical variable selection methods, known as “screen and clean”, and used in a single-step, select the variables with greater explanatory weight in the model. These methods, commonly used in public health research, may induce masking and multicollinearity, excluding relevant variables for the experts in each discipline and skewing the result. Some specific techniques are used to solve this problem, such as penalized regressions and Bayesian statistics, they offer more balanced results among subsets of variables, but with less restrictive selection thresholds. Using a combination of classical methods, a three-step procedure is proposed in this manuscript, capturing the relevant variables of each scientific discipline, minimizing the selection of variables in each of them and obtaining a balanced distribution that explains most of the variability. This procedure was applied on a dataset from a public health research. Comparing the results with the single-step methods, the proposed method shows a greater reduction in the number of variables, as well as a balanced distribution among the scientific disciplines associated with the response variable. We propose an innovative procedure for variable selection and apply it to our dataset. Furthermore, we compare the new method with the classic single-step procedures. |
id |
FIOCRUZ-5_91121cc8365887c979281b170fa46de2 |
---|---|
oai_identifier_str |
oai:scielo:S0102-311X2018000704001 |
network_acronym_str |
FIOCRUZ-5 |
network_name_str |
Cadernos de Saúde Pública |
repository_id_str |
|
spelling |
Strategic procedure in three stages for the selection of variables to obtain balanced results in public health researchStatistics as TopicMethodsInterdisciplinary ResearchMultidisciplinary research in public health is approached using methods from many scientific disciplines. One of the main characteristics of this type of research is dealing with large data sets. Classic statistical variable selection methods, known as “screen and clean”, and used in a single-step, select the variables with greater explanatory weight in the model. These methods, commonly used in public health research, may induce masking and multicollinearity, excluding relevant variables for the experts in each discipline and skewing the result. Some specific techniques are used to solve this problem, such as penalized regressions and Bayesian statistics, they offer more balanced results among subsets of variables, but with less restrictive selection thresholds. Using a combination of classical methods, a three-step procedure is proposed in this manuscript, capturing the relevant variables of each scientific discipline, minimizing the selection of variables in each of them and obtaining a balanced distribution that explains most of the variability. This procedure was applied on a dataset from a public health research. Comparing the results with the single-step methods, the proposed method shows a greater reduction in the number of variables, as well as a balanced distribution among the scientific disciplines associated with the response variable. We propose an innovative procedure for variable selection and apply it to our dataset. Furthermore, we compare the new method with the classic single-step procedures.Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz2018-01-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersiontext/htmlhttp://old.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2018000704001Cadernos de Saúde Pública v.34 n.7 2018reponame:Cadernos de Saúde Públicainstname:Fundação Oswaldo Cruz (FIOCRUZ)instacron:FIOCRUZ10.1590/0102-311x00174017info:eu-repo/semantics/openAccessLozano,ManuelManyes,LaraPeiró,JuanjoIftimi,AdinaRamada,José Maríaeng2018-08-29T00:00:00Zoai:scielo:S0102-311X2018000704001Revistahttp://cadernos.ensp.fiocruz.br/csp/https://old.scielo.br/oai/scielo-oai.phpcadernos@ensp.fiocruz.br||cadernos@ensp.fiocruz.br1678-44640102-311Xopendoar:2018-08-29T00:00Cadernos de Saúde Pública - Fundação Oswaldo Cruz (FIOCRUZ)false |
dc.title.none.fl_str_mv |
Strategic procedure in three stages for the selection of variables to obtain balanced results in public health research |
title |
Strategic procedure in three stages for the selection of variables to obtain balanced results in public health research |
spellingShingle |
Strategic procedure in three stages for the selection of variables to obtain balanced results in public health research Lozano,Manuel Statistics as Topic Methods Interdisciplinary Research |
title_short |
Strategic procedure in three stages for the selection of variables to obtain balanced results in public health research |
title_full |
Strategic procedure in three stages for the selection of variables to obtain balanced results in public health research |
title_fullStr |
Strategic procedure in three stages for the selection of variables to obtain balanced results in public health research |
title_full_unstemmed |
Strategic procedure in three stages for the selection of variables to obtain balanced results in public health research |
title_sort |
Strategic procedure in three stages for the selection of variables to obtain balanced results in public health research |
author |
Lozano,Manuel |
author_facet |
Lozano,Manuel Manyes,Lara Peiró,Juanjo Iftimi,Adina Ramada,José María |
author_role |
author |
author2 |
Manyes,Lara Peiró,Juanjo Iftimi,Adina Ramada,José María |
author2_role |
author author author author |
dc.contributor.author.fl_str_mv |
Lozano,Manuel Manyes,Lara Peiró,Juanjo Iftimi,Adina Ramada,José María |
dc.subject.por.fl_str_mv |
Statistics as Topic Methods Interdisciplinary Research |
topic |
Statistics as Topic Methods Interdisciplinary Research |
description |
Multidisciplinary research in public health is approached using methods from many scientific disciplines. One of the main characteristics of this type of research is dealing with large data sets. Classic statistical variable selection methods, known as “screen and clean”, and used in a single-step, select the variables with greater explanatory weight in the model. These methods, commonly used in public health research, may induce masking and multicollinearity, excluding relevant variables for the experts in each discipline and skewing the result. Some specific techniques are used to solve this problem, such as penalized regressions and Bayesian statistics, they offer more balanced results among subsets of variables, but with less restrictive selection thresholds. Using a combination of classical methods, a three-step procedure is proposed in this manuscript, capturing the relevant variables of each scientific discipline, minimizing the selection of variables in each of them and obtaining a balanced distribution that explains most of the variability. This procedure was applied on a dataset from a public health research. Comparing the results with the single-step methods, the proposed method shows a greater reduction in the number of variables, as well as a balanced distribution among the scientific disciplines associated with the response variable. We propose an innovative procedure for variable selection and apply it to our dataset. Furthermore, we compare the new method with the classic single-step procedures. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-01-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2018000704001 |
url |
http://old.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2018000704001 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.1590/0102-311x00174017 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
text/html |
dc.publisher.none.fl_str_mv |
Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz |
publisher.none.fl_str_mv |
Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz |
dc.source.none.fl_str_mv |
Cadernos de Saúde Pública v.34 n.7 2018 reponame:Cadernos de Saúde Pública instname:Fundação Oswaldo Cruz (FIOCRUZ) instacron:FIOCRUZ |
instname_str |
Fundação Oswaldo Cruz (FIOCRUZ) |
instacron_str |
FIOCRUZ |
institution |
FIOCRUZ |
reponame_str |
Cadernos de Saúde Pública |
collection |
Cadernos de Saúde Pública |
repository.name.fl_str_mv |
Cadernos de Saúde Pública - Fundação Oswaldo Cruz (FIOCRUZ) |
repository.mail.fl_str_mv |
cadernos@ensp.fiocruz.br||cadernos@ensp.fiocruz.br |
_version_ |
1754115738906394624 |