Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system

Detalhes bibliográficos
Autor(a) principal: Ferrão, Maria Eugénia
Data de Publicação: 2020
Outros Autores: Prata, Paula, Alves, Maria Teresa G.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.6/10484
Resumo: Almost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.
id RCAP_e13b79e2396f1d3fac07a2456c2d3609
oai_identifier_str oai:ubibliorum.ubi.pt:10400.6/10484
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment systemProva BrasilMissing dataRMultiple imputationAlmost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.Quase todos os estudos quantitativos em aferição, avaliação e pesquisa educacional são baseados em conjuntos de dados incompletos, que têm sido um problema há anos sem solução única. O uso de grandes dados identificáveis apresenta novos desafios para lidar com valores ausentes. Na primeira parte deste artigo, apresentamos o estado-da-arte do tópico na literatura científica educacional brasileira e como os pesquisadores têm tratado os dados omissos. Em seguida, usamos o software de acesso livre para analisar dados do mundo real, a Prova Brasil 2017, para várias unidades da federação, e documentamos como pressuposto de dados omissos completamente aleatórios pode afetar os resultados estatísticos, as interpretações e implicações subsequentes para políticas e práticas. Concluímos com sugestões diretas para qualquer pesquisador de Educação sobre a aplicação de rotinas R para realizar o teste de hipóteses de dados omissos completamente aleatórios e, se a hipótese nula for rejeitada, como implementar a imputação múltipla, que parece ser um dos métodos mais apropriados para manipular dados ausentes.Casi todos los estudios cuantitativos en evaluación, evaluación e investigación educativa se basan en conjuntos de datos incompletos, que han sido un problema desde hace años sin solución única. El uso de grandes datos identificables presenta nuevos desafíos para manejar los valores ausentes. En la primera parte de este artículo, presentamos el estado del arte del tópico en la literatura científica educativa brasileña y cómo los investigadores han tratado los datos omisos. A continuación, utilizamos el software de acceso libre para analizar datos del mundo real, la Prueba Brasil 2017, para varias unidades de la federación, y documentamos cómo la asunción de datos omisos completamente aleatorios puede afectar los resultados estadísticos, las interpretaciones e implicaciones subsecuentes para políticas y prácticas. Concluimos con sugerencias directas para cualquier investigador de Educación sobre la aplicación de rutinas R para realizar la prueba de hipótesis de datos omisos completamente aleatorios y, si la hipótesis nula es rechazada, cómo implementar la imputación múltiple, que parece ser uno de los métodos más apropiados para manipular datos ausentes.Centro-01-0145-FEDER-000019-C4-Centro de Competências em Cloud Computing and by the Brazilian Coordination for the Improvement of Higher Education Personnel Foundation, through a post-doc fellowship for a research project, which took place at the Faculty of Sciences of the University of Beira Interior, Portugal (Capes-PVE88881.169888/2018-01), and partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq-process 440172 / 2017-9).ScielouBibliorumFerrão, Maria EugéniaPrata, PaulaAlves, Maria Teresa G.2020-10-26T09:41:00Z20202020-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.6/10484engFerrão, Maria Eugénia, Prata, Paula, & Alves, Maria Teresa Gonzaga. (2020). Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system. Ensaio: Avaliação e Políticas Públicas em Educação, 28(108), 599-621. Epub May 08, 2020.https://doi.org/10.1590/s0104-4036202000280234610.1590/s0104-40362020002802346info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-12-15T09:52:20Zoai:ubibliorum.ubi.pt:10400.6/10484Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:50:25.725288Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
spellingShingle Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
Ferrão, Maria Eugénia
Prova Brasil
Missing data
R
Multiple imputation
title_short Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_full Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_fullStr Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_full_unstemmed Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_sort Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
author Ferrão, Maria Eugénia
author_facet Ferrão, Maria Eugénia
Prata, Paula
Alves, Maria Teresa G.
author_role author
author2 Prata, Paula
Alves, Maria Teresa G.
author2_role author
author
dc.contributor.none.fl_str_mv uBibliorum
dc.contributor.author.fl_str_mv Ferrão, Maria Eugénia
Prata, Paula
Alves, Maria Teresa G.
dc.subject.por.fl_str_mv Prova Brasil
Missing data
R
Multiple imputation
topic Prova Brasil
Missing data
R
Multiple imputation
description Almost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.
publishDate 2020
dc.date.none.fl_str_mv 2020-10-26T09:41:00Z
2020
2020-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.6/10484
url http://hdl.handle.net/10400.6/10484
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Ferrão, Maria Eugénia, Prata, Paula, & Alves, Maria Teresa Gonzaga. (2020). Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system. Ensaio: Avaliação e Políticas Públicas em Educação, 28(108), 599-621. Epub May 08, 2020.https://doi.org/10.1590/s0104-40362020002802346
10.1590/s0104-40362020002802346
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Scielo
publisher.none.fl_str_mv Scielo
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136394689380352