Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system

Ferrão, Maria Eugénia; Prata, Paula; Alves, Maria Teresa Gonzaga

Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system

Detalhes bibliográficos
Autor(a) principal:	Ferrão, Maria Eugénia
Data de Publicação:	2020
Outros Autores:	Prata, Paula, Alves, Maria Teresa Gonzaga
Tipo de documento:	Artigo
Idioma:	por eng spa
Título da fonte:	Ensaio (Rio de Janeiro. Online)
Texto Completo:	https://revistas.cesgranrio.org.br/index.php/ensaio/article/view/2346
Resumo:	Almost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.

Metadados do item

id	CESGRANRIO_3dcef08790650135d92d67a8290466fc
oai_identifier_str	oai:ojs.localhost:article/2346
network_acronym_str	CESGRANRIO
network_name_str	Ensaio (Rio de Janeiro. Online)
repository_id_str
spelling	Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment systemImputación múltiple en grandes datos identificables para la investigación educativa: un ejemplo del sistema brasileño de evaluación educativaImputação múltipla em grandes dados identificáveis para pesquisa educacional: um exemplo do sistema brasileiro de avaliação educacionalEducaçãoProva Brasil; Missing data; R; Multiple imputationPrueba Brasil; Datos omisos; R; Imputación múltipleProva Brasil; Dados omissos; R; Imputação múltiplaAlmost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.Casi todos los estudios cuantitativos en evaluación, evaluación e investigación educativa se basan en conjuntos de datos incompletos, que han sido un problema desde hace años sin solución única. El uso de grandes datos identificables presenta nuevos desafíos para manejar los valores ausentes. En la primera parte de este artículo, presentamos el estado del arte del tópico en la literatura científica educativa brasileña y cómo los investigadores han tratado los datos omisos. A continuación, utilizamos el software de acceso libre para analizar datos del mundo real, la Prueba Brasil 2017, para varias unidades de la federación, y documentamos cómo la asunción de datos omisos completamente aleatorios puede afectar los resultados estadísticos, las interpretaciones e implicaciones subsecuentes para políticas y prácticas. Concluimos con sugerencias directas para cualquier investigador de educación sobre la aplicación de rutinas R para realizar la prueba de hipótesis de datos omisos completamente aleatorios y, si la hipótesis nula es rechazada, cómo implementar la imputación múltiple, que parece ser uno de los métodos más apropiados para manipular datos ausentes.Quase todos os estudos quantitativos em aferição, avaliação e pesquisa educacional são baseados em conjuntos de dados incompletos, que têm sido um problema há anos sem solução única. O uso de grandes dados identificáveis apresenta novos desafios para lidar com valores ausentes. Na primeira parte deste artigo, apresentamos o estado-da-arte do tópico na literatura científica educacional brasileira e como os pesquisadores têm tratado os dados omissos. Em seguida, usamos o software de acesso livre para analisar dados do mundo real, a Prova Brasil 2017, para várias unidades da federação, e documentamos como pressuposto de dados omissos completamente aleatórios pode afetar os resultados estatísticos, as interpretações e implicações subsequentes para políticas e práticas. Concluímos com sugestões diretas para qualquer pesquisador de educação sobre a aplicação de rotinas R para realizar o teste de hipóteses de dados omissos completamente aleatórios e, se a hipótese nula for rejeitada, como implementar a imputação múltipla, que parece ser um dos métodos mais apropriados para manipular dados ausentes. Fundação CesgranrioFundação para a Ciência e TecnologiaFerrão, Maria EugéniaPrata, PaulaAlves, Maria Teresa Gonzaga2020-07-07info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttps://revistas.cesgranrio.org.br/index.php/ensaio/article/view/234610.1590/s0104-40362020002802346Ensaio: Avaliação e Políticas Públicas em Educação; v. 28, n. 108 (2020): Revista Ensaio Jul./Set.; 599-6211809-44650104-4036reponame:Ensaio (Rio de Janeiro. Online)instname:Fundação Cesgranrioinstacron:CESGRANRIO-2porengspahttps://revistas.cesgranrio.org.br/index.php/ensaio/article/view/2346/1242https://revistas.cesgranrio.org.br/index.php/ensaio/article/view/2346/1243https://revistas.cesgranrio.org.br/index.php/ensaio/article/view/2346/1244Direitos autorais 2020 Revista Ensaio: Avaliação e Políticas Públicas em Educaçãohttp://creativecommons.org/licenses/by-nc/4.0info:eu-repo/semantics/openAccess2020-07-08T20:33:30Zoai:ojs.localhost:article/2346Revistahttps://revistas.cesgranrio.org.br/index.php/ensaioONGhttps://revistas.cesgranrio.org.br/index.php/ensaio/oaiensaio@cesgranrio.org.br\|\|fatimacunha@cesgranrio.org.br\|\|alan@cesgranrio.org.br1809-44650104-4036opendoar:2020-07-08T20:33:30Ensaio (Rio de Janeiro. Online) - Fundação Cesgranriofalse
dc.title.none.fl_str_mv	Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system Imputación múltiple en grandes datos identificables para la investigación educativa: un ejemplo del sistema brasileño de evaluación educativa Imputação múltipla em grandes dados identificáveis para pesquisa educacional: um exemplo do sistema brasileiro de avaliação educacional
title	Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
spellingShingle	Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system Ferrão, Maria Eugénia Educação Prova Brasil; Missing data; R; Multiple imputation Prueba Brasil; Datos omisos; R; Imputación múltiple Prova Brasil; Dados omissos; R; Imputação múltipla
title_short	Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_full	Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_fullStr	Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_full_unstemmed	Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
title_sort	Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system
author	Ferrão, Maria Eugénia
author_facet	Ferrão, Maria Eugénia Prata, Paula Alves, Maria Teresa Gonzaga
author_role	author
author2	Prata, Paula Alves, Maria Teresa Gonzaga
author2_role	author author
dc.contributor.none.fl_str_mv	Fundação para a Ciência e Tecnologia
dc.contributor.author.fl_str_mv	Ferrão, Maria Eugénia Prata, Paula Alves, Maria Teresa Gonzaga
dc.subject.none.fl_str_mv
dc.subject.por.fl_str_mv	Educação Prova Brasil; Missing data; R; Multiple imputation Prueba Brasil; Datos omisos; R; Imputación múltiple Prova Brasil; Dados omissos; R; Imputação múltipla
topic	Educação Prova Brasil; Missing data; R; Multiple imputation Prueba Brasil; Datos omisos; R; Imputación múltiple Prova Brasil; Dados omissos; R; Imputação múltipla
description	Almost all quantitative studies in educational assessment, evaluation and educational research are based on incomplete data sets, which have been a problem for years without a single solution. The use of big identifiable data poses new challenges in dealing with missing values. In the first part of this paper, we present the state-of-art of the topic in the Brazilian education scientific literature, and how researchers have dealt with missing data since the turn of the century. Next, we use open access software to analyze real-world data, the 2017 Prova Brasil , for several federation units to document how the naïve assumption of missing completely at random may substantially affect statistical conclusions, researcher interpretations, and subsequent implications for policy and practice. We conclude with straightforward suggestions for any education researcher on applying R routines to conduct the hypotheses test of missing completely at random and, if the null hypothesis is rejected, then how to implement the multiple imputation, which appears to be one of the most appropriate methods for handling missing data.
publishDate	2020
dc.date.none.fl_str_mv	2020-07-07
dc.type.none.fl_str_mv
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://revistas.cesgranrio.org.br/index.php/ensaio/article/view/2346 10.1590/s0104-40362020002802346
url	https://revistas.cesgranrio.org.br/index.php/ensaio/article/view/2346
identifier_str_mv	10.1590/s0104-40362020002802346
dc.language.iso.fl_str_mv	por eng spa
language	por eng spa
dc.relation.none.fl_str_mv	https://revistas.cesgranrio.org.br/index.php/ensaio/article/view/2346/1242 https://revistas.cesgranrio.org.br/index.php/ensaio/article/view/2346/1243 https://revistas.cesgranrio.org.br/index.php/ensaio/article/view/2346/1244
dc.rights.driver.fl_str_mv	Direitos autorais 2020 Revista Ensaio: Avaliação e Políticas Públicas em Educação http://creativecommons.org/licenses/by-nc/4.0 info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Direitos autorais 2020 Revista Ensaio: Avaliação e Políticas Públicas em Educação http://creativecommons.org/licenses/by-nc/4.0
eu_rights_str_mv	openAccess
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Fundação Cesgranrio
publisher.none.fl_str_mv	Fundação Cesgranrio
dc.source.none.fl_str_mv	Ensaio: Avaliação e Políticas Públicas em Educação; v. 28, n. 108 (2020): Revista Ensaio Jul./Set.; 599-621 1809-4465 0104-4036 reponame:Ensaio (Rio de Janeiro. Online) instname:Fundação Cesgranrio instacron:CESGRANRIO-2
instname_str	Fundação Cesgranrio
instacron_str	CESGRANRIO-2
institution	CESGRANRIO-2
reponame_str	Ensaio (Rio de Janeiro. Online)
collection	Ensaio (Rio de Janeiro. Online)
repository.name.fl_str_mv	Ensaio (Rio de Janeiro. Online) - Fundação Cesgranrio
repository.mail.fl_str_mv	ensaio@cesgranrio.org.br\|\|fatimacunha@cesgranrio.org.br\|\|alan@cesgranrio.org.br
_version_	1754832030672093184

Multiple imputation in big identifiable data for educational research: An example from the Brazilian education assessment system

Registros relacionados