Uncertainty-Aware AI for ECG arrhythmia multi-label classification
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/157128 |
Resumo: | Machine Learning (ML) models are able to predict a variety of diseases, with performances that can be superior to those achieved by healthcare professionals. However, when implemented in clinical settings as decision support systems, their generalisation capabilities are often compromised, rendering healthcare professionals more susceptible into delivering erroneous diagnostics. This research focuses on uncertainty measures as a key method to abstain from classifying samples with high uncertainty as well as a selection criterion for active learning strategies. For this purpose, it was employed four large public multi-label Electrocardiogram (ECG) databases for the classification of cardiac arrhythmias. Regarding the uncertainty measures, single distribution uncertainty and classical information-theoretic measures of entropy were tested and compared. Thus, three Deep Learning models were developed: a single convolutional neural network and two multiple-models using Monte-Carlo Dropout and Deep Ensemble techniques. When tested with samples from the same database used for training, all models achieved performances higher than 95% for F1-score. However, when tested on an external dataset, their performances dropped to approximately 70%, indicating a probable scenario of dataset shift. The Deep Ensemble model obtained the highest F1-score in both test sets with a maximum difference of 3% from the others. The classification withrejection option increased from a rejection of10% to a range between 30% to 50% depending on the model or uncertainty measure, with the highest rejection rates being obtained on external data. This reveals that external dataset’s classifications have higher uncertainty, also an indication of dataset shift. For the active learning approach, 10% of the highest uncertainty sampleswere used to retrain the models. The performances results increased by almost 5%, suggesting uncertainty as a good selection method. Although there are still challenges to the implementation of ML models, the preliminary studies show that uncertainty quantification is a valuable method for classification with rejection option and active learning approaches under dataset shift conditions. |
id |
RCAP_4e3bd52bcffecb83eb8a6960698022ba |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/157128 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Uncertainty-Aware AI for ECG arrhythmia multi-label classificationUncertainty QuantificationMonte Carlo DropoutDeep EnsembleDataset shiftActive LearningDomínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e TecnologiasMachine Learning (ML) models are able to predict a variety of diseases, with performances that can be superior to those achieved by healthcare professionals. However, when implemented in clinical settings as decision support systems, their generalisation capabilities are often compromised, rendering healthcare professionals more susceptible into delivering erroneous diagnostics. This research focuses on uncertainty measures as a key method to abstain from classifying samples with high uncertainty as well as a selection criterion for active learning strategies. For this purpose, it was employed four large public multi-label Electrocardiogram (ECG) databases for the classification of cardiac arrhythmias. Regarding the uncertainty measures, single distribution uncertainty and classical information-theoretic measures of entropy were tested and compared. Thus, three Deep Learning models were developed: a single convolutional neural network and two multiple-models using Monte-Carlo Dropout and Deep Ensemble techniques. When tested with samples from the same database used for training, all models achieved performances higher than 95% for F1-score. However, when tested on an external dataset, their performances dropped to approximately 70%, indicating a probable scenario of dataset shift. The Deep Ensemble model obtained the highest F1-score in both test sets with a maximum difference of 3% from the others. The classification withrejection option increased from a rejection of10% to a range between 30% to 50% depending on the model or uncertainty measure, with the highest rejection rates being obtained on external data. This reveals that external dataset’s classifications have higher uncertainty, also an indication of dataset shift. For the active learning approach, 10% of the highest uncertainty sampleswere used to retrain the models. The performances results increased by almost 5%, suggesting uncertainty as a good selection method. Although there are still challenges to the implementation of ML models, the preliminary studies show that uncertainty quantification is a valuable method for classification with rejection option and active learning approaches under dataset shift conditions.Modelos de aprendizagem automática conseguem prever um leque de doenças, muitas vezes com desempenhos superiores aos obtidos pelos profissionais de saúde. Contudo, quando integrados em ambientes clínicos como sistemas de apoio à decisão, a generalização destes fica comprometida, o que leva a que profissionais de saúde fiquem mais suscetíveis de fornecer diagnósticos incorretos. Deste modo, este projeto foca-se no papel da incerteza na rejeição de classificações com elevada incerteza e na aprendizagem ativa. Quatro bases de dados públicas de sinais ECG multi-label foram utilizadas na classificação de arritmias cardíacas. Relativamente à quantificação da incerteza, foram testadas e comparadas incertezas provenientes das distribuições e da teoria de informação clássica da entropia. Para tal, foram desenvolvidos três tipos de redes neurais convolucionais: um modelo único e dois modelos obtidos através das técnicas de Monte-Carlo Dropout e Deep Ensemble. Quando testados com dados da mesma base de dados de treino, os modelos alcançaram desempenhos superiores a 95% de F1-score. No entanto, quando testados com dados externos, os desempenhos desceram para cerca de 70%, revelando a possibilidade de dataset shift. O modelo Deep Ensemble obteve os melhores resultados em ambos os dados de teste, com uma diferença máxima de 3% em relação aos outros modelos. O threshold de rejeição de 10% em treino aumentou para valores entre 30% a 50%, dependendo do modelo e da medida de incerteza, sendo que as rejeições mais elevadas são obtidas nos dados externos. Isto revela que estes dados têm maior incerteza nas suas classificações, confirmando a presença de dataset shift. Para a abordagem de aprendizagem ativa, 10% de dados com elevada incerteza foram utilizados para retreinar os modelos. O desempenho destes aumentou quase 5%, sugerindo a incerteza como um bom critério de seleção. Apesar de ainda existirem desafios na implementação de modelos de aprendizagem automática, os resultados preliminares revelam que a quantificação da incerteza é um método valioso na classificação com rejeição e na aprendizagem ativa, em condições de dataset shift.Gamboa, HugoRUNSimão, Raquel Filipa Birra2023-09-01T13:03:03Z2022-112022-11-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/157128enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:39:23Zoai:run.unl.pt:10362/157128Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:56:34.014823Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Uncertainty-Aware AI for ECG arrhythmia multi-label classification |
title |
Uncertainty-Aware AI for ECG arrhythmia multi-label classification |
spellingShingle |
Uncertainty-Aware AI for ECG arrhythmia multi-label classification Simão, Raquel Filipa Birra Uncertainty Quantification Monte Carlo Dropout Deep Ensemble Dataset shift Active Learning Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias |
title_short |
Uncertainty-Aware AI for ECG arrhythmia multi-label classification |
title_full |
Uncertainty-Aware AI for ECG arrhythmia multi-label classification |
title_fullStr |
Uncertainty-Aware AI for ECG arrhythmia multi-label classification |
title_full_unstemmed |
Uncertainty-Aware AI for ECG arrhythmia multi-label classification |
title_sort |
Uncertainty-Aware AI for ECG arrhythmia multi-label classification |
author |
Simão, Raquel Filipa Birra |
author_facet |
Simão, Raquel Filipa Birra |
author_role |
author |
dc.contributor.none.fl_str_mv |
Gamboa, Hugo RUN |
dc.contributor.author.fl_str_mv |
Simão, Raquel Filipa Birra |
dc.subject.por.fl_str_mv |
Uncertainty Quantification Monte Carlo Dropout Deep Ensemble Dataset shift Active Learning Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias |
topic |
Uncertainty Quantification Monte Carlo Dropout Deep Ensemble Dataset shift Active Learning Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias |
description |
Machine Learning (ML) models are able to predict a variety of diseases, with performances that can be superior to those achieved by healthcare professionals. However, when implemented in clinical settings as decision support systems, their generalisation capabilities are often compromised, rendering healthcare professionals more susceptible into delivering erroneous diagnostics. This research focuses on uncertainty measures as a key method to abstain from classifying samples with high uncertainty as well as a selection criterion for active learning strategies. For this purpose, it was employed four large public multi-label Electrocardiogram (ECG) databases for the classification of cardiac arrhythmias. Regarding the uncertainty measures, single distribution uncertainty and classical information-theoretic measures of entropy were tested and compared. Thus, three Deep Learning models were developed: a single convolutional neural network and two multiple-models using Monte-Carlo Dropout and Deep Ensemble techniques. When tested with samples from the same database used for training, all models achieved performances higher than 95% for F1-score. However, when tested on an external dataset, their performances dropped to approximately 70%, indicating a probable scenario of dataset shift. The Deep Ensemble model obtained the highest F1-score in both test sets with a maximum difference of 3% from the others. The classification withrejection option increased from a rejection of10% to a range between 30% to 50% depending on the model or uncertainty measure, with the highest rejection rates being obtained on external data. This reveals that external dataset’s classifications have higher uncertainty, also an indication of dataset shift. For the active learning approach, 10% of the highest uncertainty sampleswere used to retrain the models. The performances results increased by almost 5%, suggesting uncertainty as a good selection method. Although there are still challenges to the implementation of ML models, the preliminary studies show that uncertainty quantification is a valuable method for classification with rejection option and active learning approaches under dataset shift conditions. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-11 2022-11-01T00:00:00Z 2023-09-01T13:03:03Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/157128 |
url |
http://hdl.handle.net/10362/157128 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138150513115136 |