Using VAE for Incomplete Educational Data
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Tese |
Idioma: | eng |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da USP |
Texto Completo: | https://www.teses.usp.br/teses/disponiveis/104/104131/tde-24082023-102049/ |
Resumo: | In Psychometrics, and in particular in educational assessments, it is common to find incomplete databases. Lack of time, forgetting the content involved, nervousness or even the test design are some of the reasons why an individual may leave items unanswered in an assessment. In this context, it is important to have estimation methods for psychometric models that deal with missing data and are affected as little as possible by the lack of information in those unanswered items. In a small-scale scenario, traditional estimation methods for Item Response Theory (IRT) models, for example, are suitable for situations with complete and incomplete data. However, for high-dimensional situations, such as assessments involving many latent skills and abilities, traditional methods are not computationally efficient or even unable to obtain estimates for so many parameters. Deep learning has been adapted to incorporate IRT models and make predictions and estimates from large, high-dimensional databases. In this work, we deepen the investigation of (?)]Curi, who defined a Two Parameter Logistic Model (ML2P) in the architecture of a variational autoencoder (VAE) as a proposal to solve the problem of estimating the many parameters of the model. We performed a simulation study to compare two variations of deep neural networks, autoencoders (AE) and VAE, defined with an ML2P model in the decoder, for situations with a large number of latent traces and complete data. After favorable results of the VAE, we propose an extension of the same (IVAE) to be able to make predictions in cases of missing data and, thus, make the model more general and useful in practice. Simulations of the proposed model were performed under different scenarios to investigate the efficiency of the new method in recovering the parameters. Comparisons of the results with one of the methodologies currently most indicated in IRT to deal with a situation of greater dimensionality, the joint maximum likelihood, were also made, in addition to the application to a real case of high dimension and with missing data. |
id |
USP_6c72659e75e41b50d6beac5d1245e2c2 |
---|---|
oai_identifier_str |
oai:teses.usp.br:tde-24082023-102049 |
network_acronym_str |
USP |
network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
repository_id_str |
2721 |
spelling |
Using VAE for Incomplete Educational DataUsando VAE para Dados Educacionais IncompletosAutoencoderAutoencoderAutoencoder variacionalDados ausentesDados educacionais incompletosIncomplete educational dataItem response theoryMissing dataNeural networksRedes neuraisTeoria da resposta ao itemVariational autoencoderIn Psychometrics, and in particular in educational assessments, it is common to find incomplete databases. Lack of time, forgetting the content involved, nervousness or even the test design are some of the reasons why an individual may leave items unanswered in an assessment. In this context, it is important to have estimation methods for psychometric models that deal with missing data and are affected as little as possible by the lack of information in those unanswered items. In a small-scale scenario, traditional estimation methods for Item Response Theory (IRT) models, for example, are suitable for situations with complete and incomplete data. However, for high-dimensional situations, such as assessments involving many latent skills and abilities, traditional methods are not computationally efficient or even unable to obtain estimates for so many parameters. Deep learning has been adapted to incorporate IRT models and make predictions and estimates from large, high-dimensional databases. In this work, we deepen the investigation of (?)]Curi, who defined a Two Parameter Logistic Model (ML2P) in the architecture of a variational autoencoder (VAE) as a proposal to solve the problem of estimating the many parameters of the model. We performed a simulation study to compare two variations of deep neural networks, autoencoders (AE) and VAE, defined with an ML2P model in the decoder, for situations with a large number of latent traces and complete data. After favorable results of the VAE, we propose an extension of the same (IVAE) to be able to make predictions in cases of missing data and, thus, make the model more general and useful in practice. Simulations of the proposed model were performed under different scenarios to investigate the efficiency of the new method in recovering the parameters. Comparisons of the results with one of the methodologies currently most indicated in IRT to deal with a situation of greater dimensionality, the joint maximum likelihood, were also made, in addition to the application to a real case of high dimension and with missing data.Em Psicometria, e em particular em avaliações educacionais, é comum encontrar bases de dados incompletas. A falta de tempo, esquecimento do conteúdo envolvido, nervosismo ou mesmo o delineamento da prova são alguns dos motivos pelos quais um indivíduo pode deixar itens sem responder em uma avaliação. Neste contexto, é importante a existência de métodos de estimação para modelos psicométricos que lidem com dados faltantes e sejam afetados o menos possível pela ausência de informação naqueles itens não respondidos. Num cenário de pequena dimensão, métodos tradicionais de estimação para modelos de Teoria de Resposta ao Item (TRI), por exemplo, são adequados para situações com dados completos e incompletos. No entanto, para situações de alta dimensionalidade, como em avaliações que envolvem muitas competências e habilidades latentes, os métodos tradicionais não são computacionalmente eficientes ou mesmo incapazes de obter estimativas para tantos parâmetros. Aprendizagem profunda vem sendo adaptada de forma a incorporar modelos de TRI e fazer previsões e estimações a partir de grandes bancos de dados, de alta dimensionalidade. Neste trabalho, aprofundamos a investigação de (?)]Curi, que definiu um Modelo Logístico de Dois Parametros (ML2P) na arquitetura de um autoencoder variacional (VAE) como uma proposta para solucionar o problema de estimação dos muitos parâmetros do modelo. Realizamos um estudo de simulação para comparar duas variações de redes neurais profundas, autoencoders (AE) e VAE, definidas com um modelo ML2P no decodificador, para situações com um número grande de traços latentes e dados completos. Após resultados favoráveis do VAE, propomos uma extensão do mesmo (IVAE) para poder fazer previsões em casos de dados faltantes e, assim, tornar o modelo mais geral e útil na prática. Simulações do modelo proposto foram realizadas sob diferentes cenários para investigar a eficiência do novo método na recuperação dos parâmetros. Comparações dos resultados com uma das metodologias atualmente mais indicadas em TRI para lidar numa situação de maior dimensionalidade, a máxima verossimilhança conjunta, também foram feitas, além da aplicação a um caso real de alta dimensão e com dados faltantes.Biblioteca Digitais de Teses e Dissertações da USPCúri, MarianaMontecino, Claudia Evelyn Escobar2023-03-13info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/104/104131/tde-24082023-102049/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-03-20T12:23:54Zoai:teses.usp.br:tde-24082023-102049Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-03-20T12:23:54Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Using VAE for Incomplete Educational Data Usando VAE para Dados Educacionais Incompletos |
title |
Using VAE for Incomplete Educational Data |
spellingShingle |
Using VAE for Incomplete Educational Data Montecino, Claudia Evelyn Escobar Autoencoder Autoencoder Autoencoder variacional Dados ausentes Dados educacionais incompletos Incomplete educational data Item response theory Missing data Neural networks Redes neurais Teoria da resposta ao item Variational autoencoder |
title_short |
Using VAE for Incomplete Educational Data |
title_full |
Using VAE for Incomplete Educational Data |
title_fullStr |
Using VAE for Incomplete Educational Data |
title_full_unstemmed |
Using VAE for Incomplete Educational Data |
title_sort |
Using VAE for Incomplete Educational Data |
author |
Montecino, Claudia Evelyn Escobar |
author_facet |
Montecino, Claudia Evelyn Escobar |
author_role |
author |
dc.contributor.none.fl_str_mv |
Cúri, Mariana |
dc.contributor.author.fl_str_mv |
Montecino, Claudia Evelyn Escobar |
dc.subject.por.fl_str_mv |
Autoencoder Autoencoder Autoencoder variacional Dados ausentes Dados educacionais incompletos Incomplete educational data Item response theory Missing data Neural networks Redes neurais Teoria da resposta ao item Variational autoencoder |
topic |
Autoencoder Autoencoder Autoencoder variacional Dados ausentes Dados educacionais incompletos Incomplete educational data Item response theory Missing data Neural networks Redes neurais Teoria da resposta ao item Variational autoencoder |
description |
In Psychometrics, and in particular in educational assessments, it is common to find incomplete databases. Lack of time, forgetting the content involved, nervousness or even the test design are some of the reasons why an individual may leave items unanswered in an assessment. In this context, it is important to have estimation methods for psychometric models that deal with missing data and are affected as little as possible by the lack of information in those unanswered items. In a small-scale scenario, traditional estimation methods for Item Response Theory (IRT) models, for example, are suitable for situations with complete and incomplete data. However, for high-dimensional situations, such as assessments involving many latent skills and abilities, traditional methods are not computationally efficient or even unable to obtain estimates for so many parameters. Deep learning has been adapted to incorporate IRT models and make predictions and estimates from large, high-dimensional databases. In this work, we deepen the investigation of (?)]Curi, who defined a Two Parameter Logistic Model (ML2P) in the architecture of a variational autoencoder (VAE) as a proposal to solve the problem of estimating the many parameters of the model. We performed a simulation study to compare two variations of deep neural networks, autoencoders (AE) and VAE, defined with an ML2P model in the decoder, for situations with a large number of latent traces and complete data. After favorable results of the VAE, we propose an extension of the same (IVAE) to be able to make predictions in cases of missing data and, thus, make the model more general and useful in practice. Simulations of the proposed model were performed under different scenarios to investigate the efficiency of the new method in recovering the parameters. Comparisons of the results with one of the methodologies currently most indicated in IRT to deal with a situation of greater dimensionality, the joint maximum likelihood, were also made, in addition to the application to a real case of high dimension and with missing data. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-03-13 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/104/104131/tde-24082023-102049/ |
url |
https://www.teses.usp.br/teses/disponiveis/104/104131/tde-24082023-102049/ |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
|
dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.coverage.none.fl_str_mv |
|
dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
collection |
Biblioteca Digital de Teses e Dissertações da USP |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
_version_ |
1815257312701448192 |