Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/148995 |
Resumo: | Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science |
id |
RCAP_20d26c192d57a8920e4d0d4052c84c6b |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/148995 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logsStudent performanceLearning AnalyticsMachine LearningClassificationDeep LearningDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceLearning management systems are essential intermediaries between students and educational content in the digital era. Among other factors, the institutional adoption of such systems is meant to foster student engagement and lead to better educational outcomes in a scalable manner. However, a significant challenge facing educators and institutions is the timely identification of students who may require special attention and feedback. Early identification of students allows educators to provide necessary feedback and adopt suitable corrective measures. Therefore, a significant body of research has been dedicated to developing early warning systems with clickstream data. However, comprehensive studies that attempt prediction on multiple courses are few and far between. Moreover, most predictive models require sophisticated domain knowledge, data skills and computational power that may not be available in practice. In this work, we used an academic year’s worth of data collected from all courses at a Portuguese information management school to perform two main experiments on two binary classification problems: the first being students at risk vs students not at risk and the second being high-performing students vs not high-performing students. In the first experiment, we compared the performances obtained with traditional machine learning classifiers against majority class classifiers at multiple stages of course completion (more specifically, the 10%, 25%, 33%, 50% and 100% course completion thresholds). For both classification problems, performances on all metrics peaked when using all of the data collected throughout the course – 88.6% accuracy and 92.3% Area Under the Receiver Operating Characteristic (AUROC) using Random Forest (RF) for students at risk and 78.2% accuracy and 79.6% AUROC using ExtraTrees for high-performing students. Concerning early prediction, acceptable performances for classifying at-risk students are achieved as early as the 25% course duration threshold (72.8% AUROC using RF). Performances for high-performing students were generally lower, with AUROC at earlier stages peaking at the courses’ midway point (64.4% AUROC using RF). Our second experiment deployed long-short term memory units (LSTM) trained with a time-dependent representation of a single feature (number of total clicks). While this approach achieved inferior performances, we argue that the more straightforward data pre-processing of this approach may represent a worthwhile tradeoff against relatively small losses in model performance, especially at earlier moments of prediction. We found the best tradeoff at 33% course duration – 64% AUROC against 74% AUROC using RF to predict at-risk students. To predict high-performing students, we found the best tradeoff to occur at 25% course duration (56% AUROC against 61% using RF). Results obtained using a different set of logs validate the portability of our approach when it comes to static aggregate models. However, our deep learning approach did not generalize well on this data, which suggests that portability between courses using this approach may only be possible in specific instances.Henriques, Roberto André PereiraRUNSantos, Ricardo Miguel Costa2023-02-10T16:51:24Z2023-01-242023-01-24T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/148995TID:203220943enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:30:47Zoai:run.unl.pt:10362/148995Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:53:35.439156Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs |
title |
Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs |
spellingShingle |
Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs Santos, Ricardo Miguel Costa Student performance Learning Analytics Machine Learning Classification Deep Learning |
title_short |
Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs |
title_full |
Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs |
title_fullStr |
Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs |
title_full_unstemmed |
Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs |
title_sort |
Accurate, timely and portable: course-agnostic early prediction of student performance from LMS logs |
author |
Santos, Ricardo Miguel Costa |
author_facet |
Santos, Ricardo Miguel Costa |
author_role |
author |
dc.contributor.none.fl_str_mv |
Henriques, Roberto André Pereira RUN |
dc.contributor.author.fl_str_mv |
Santos, Ricardo Miguel Costa |
dc.subject.por.fl_str_mv |
Student performance Learning Analytics Machine Learning Classification Deep Learning |
topic |
Student performance Learning Analytics Machine Learning Classification Deep Learning |
description |
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-02-10T16:51:24Z 2023-01-24 2023-01-24T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/148995 TID:203220943 |
url |
http://hdl.handle.net/10362/148995 |
identifier_str_mv |
TID:203220943 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138126059274240 |