Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs

Detalhes bibliográficos
Autor(a) principal: Santos, Ricardo Miguel Costa
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/148995
Resumo: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
id RCAP_20d26c192d57a8920e4d0d4052c84c6b
oai_identifier_str oai:run.unl.pt:10362/148995
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logsStudent performanceLearning AnalyticsMachine LearningClassificationDeep LearningDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceLearning management systems are essential intermediaries between students and educational content in the digital era. Among other factors, the institutional adoption of such systems is meant to foster student engagement and lead to better educational outcomes in a scalable manner. However, a significant challenge facing educators and institutions is the timely identification of students who may require special attention and feedback. Early identification of students allows educators to provide necessary feedback and adopt suitable corrective measures. Therefore, a significant body of research has been dedicated to developing early warning systems with clickstream data. However, comprehensive studies that attempt prediction on multiple courses are few and far between. Moreover, most predictive models require sophisticated domain knowledge, data skills and computational power that may not be available in practice. In this work, we used an academic year’s worth of data collected from all courses at a Portuguese information management school to perform two main experiments on two binary classification problems: the first being students at risk vs students not at risk and the second being high-performing students vs not high-performing students. In the first experiment, we compared the performances obtained with traditional machine learning classifiers against majority class classifiers at multiple stages of course completion (more specifically, the 10%, 25%, 33%, 50% and 100% course completion thresholds). For both classification problems, performances on all metrics peaked when using all of the data collected throughout the course – 88.6% accuracy and 92.3% Area Under the Receiver Operating Characteristic (AUROC) using Random Forest (RF) for students at risk and 78.2% accuracy and 79.6% AUROC using ExtraTrees for high-performing students. Concerning early prediction, acceptable performances for classifying at-risk students are achieved as early as the 25% course duration threshold (72.8% AUROC using RF). Performances for high-performing students were generally lower, with AUROC at earlier stages peaking at the courses’ midway point (64.4% AUROC using RF). Our second experiment deployed long-short term memory units (LSTM) trained with a time-dependent representation of a single feature (number of total clicks). While this approach achieved inferior performances, we argue that the more straightforward data pre-processing of this approach may represent a worthwhile tradeoff against relatively small losses in model performance, especially at earlier moments of prediction. We found the best tradeoff at 33% course duration – 64% AUROC against 74% AUROC using RF to predict at-risk students. To predict high-performing students, we found the best tradeoff to occur at 25% course duration (56% AUROC against 61% using RF). Results obtained using a different set of logs validate the portability of our approach when it comes to static aggregate models. However, our deep learning approach did not generalize well on this data, which suggests that portability between courses using this approach may only be possible in specific instances.Henriques, Roberto André PereiraRUNSantos, Ricardo Miguel Costa2023-02-10T16:51:24Z2023-01-242023-01-24T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/148995TID:203220943enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:30:47Zoai:run.unl.pt:10362/148995Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:53:35.439156Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs
title Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs
spellingShingle Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs
Santos, Ricardo Miguel Costa
Student performance
Learning Analytics
Machine Learning
Classification
Deep Learning
title_short Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs
title_full Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs
title_fullStr Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs
title_full_unstemmed Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs
title_sort Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs
author Santos, Ricardo Miguel Costa
author_facet Santos, Ricardo Miguel Costa
author_role author
dc.contributor.none.fl_str_mv Henriques, Roberto André Pereira
RUN
dc.contributor.author.fl_str_mv Santos, Ricardo Miguel Costa
dc.subject.por.fl_str_mv Student performance
Learning Analytics
Machine Learning
Classification
Deep Learning
topic Student performance
Learning Analytics
Machine Learning
Classification
Deep Learning
description Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
publishDate 2023
dc.date.none.fl_str_mv 2023-02-10T16:51:24Z
2023-01-24
2023-01-24T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/148995
TID:203220943
url http://hdl.handle.net/10362/148995
identifier_str_mv TID:203220943
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138126059274240