Estimation of relapse probability in early stages non-small cell lung cancer patients

Detalhes bibliográficos
Autor(a) principal: Pardal, Mariana Raimundo
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/155748
Resumo: Lung cancer is Europe’s third most prevalent cancer in women and men’s second most common cancer. With an expected 1.8 million deaths in 2020, lung cancer remains the leading cause of cancer mortality worldwide. It is estimated that this number will increase in the coming years, causing alarm among global health organisations attempting to prevent this tendency. Even though improvements in early diagnosis and treatment have been made in the hope of increasing survival, recurrence remains a significant problem. Between 30% and 70% of patients with early-stage lung cancer who undergo surgery end up experiencing a relapse. A promising strategy is to leverage data in electronic health records with machine learning algorithms to produce a more reliable risk stratification and identify better the patient’s propensity to relapse, improving survival rates and enhancing patient quality of life. For this purpose, this research developed three logistic regression models to predict recurrence in early-stage NSCLC patients in time horizons of one year, three years, and five years following surgery. After understanding the dataset’s content, a descriptive analysis of the dataset follows, where each attribute used in the models is described. It also explains the logistic regression, the K-fold Cross-Validation method and the concept of relevant metrics to assess the models’ performance. Finally, the implementation and the results of the produced models are presented. The first year following the surgery model produced an accuracy of 91.65%, while the three-year and five-year models achieved 89.71% and 89.94%, respectively. Regarding AUC values, the results were 91.65%, 89.16%, and 90.23% for the one-year, three-year, and five-year models, respectively. This dissertation was conducted with the collaboration of the University Hospital Puerta Hierro de Majadahonda’s oncology department within the European project CLARIFY.
id RCAP_69aa1907d3a3530ec60b1b94f11ec9cc
oai_identifier_str oai:run.unl.pt:10362/155748
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Estimation of relapse probability in early stages non-small cell lung cancer patientsNon-Small Cell Lung CancersMachine LearningLogistic RegressionProbability of relapseDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaLung cancer is Europe’s third most prevalent cancer in women and men’s second most common cancer. With an expected 1.8 million deaths in 2020, lung cancer remains the leading cause of cancer mortality worldwide. It is estimated that this number will increase in the coming years, causing alarm among global health organisations attempting to prevent this tendency. Even though improvements in early diagnosis and treatment have been made in the hope of increasing survival, recurrence remains a significant problem. Between 30% and 70% of patients with early-stage lung cancer who undergo surgery end up experiencing a relapse. A promising strategy is to leverage data in electronic health records with machine learning algorithms to produce a more reliable risk stratification and identify better the patient’s propensity to relapse, improving survival rates and enhancing patient quality of life. For this purpose, this research developed three logistic regression models to predict recurrence in early-stage NSCLC patients in time horizons of one year, three years, and five years following surgery. After understanding the dataset’s content, a descriptive analysis of the dataset follows, where each attribute used in the models is described. It also explains the logistic regression, the K-fold Cross-Validation method and the concept of relevant metrics to assess the models’ performance. Finally, the implementation and the results of the produced models are presented. The first year following the surgery model produced an accuracy of 91.65%, while the three-year and five-year models achieved 89.71% and 89.94%, respectively. Regarding AUC values, the results were 91.65%, 89.16%, and 90.23% for the one-year, three-year, and five-year models, respectively. This dissertation was conducted with the collaboration of the University Hospital Puerta Hierro de Majadahonda’s oncology department within the European project CLARIFY.Na Europa, cancro do pulmão é o terceiro cancro mais prevalente em mulheres e o segundo mais comum nos homens. Com uma previsão de 1,8 milhões de mortes em 2020, o cancro do pulmão continua a ser a principal causa de mortalidade por cancro a nível mundial. Prevê-se que este número aumente nos próximos anos, causando alarme entre as organizações de saúde mundiais que tentam prevenir esta tendência. Embora tenham sido feitas melhorias no diagnóstico precoce e no tratamento na esperança de aumentar a sobrevivência, a recorrência continua a ser um problema significativo. Entre 30% a 70% dos pacientes com cancro do pulmão em fase inicial que se submetem a cirurgia acabam por sofrer uma reincidência. Uma estratégia promissora é aproveitar os dados dos registos de saúde electrónicos com algoritmos de machine learning para produzir uma estratificação de risco mais precisa, identificar a propensão do paciente para a reincidência, melhorando assim, as taxas de sobrevivência e a qualidade de vida do paciente. Para este propósito, este estudo desenvolveu três modelos de regressão logística para prever a recorrência em pacientes em fase inicial com cancro do pulmão de células não pequenas, em horizontes temporais de um ano, três anos, e cinco anos após a cirurgia. Após a compreensão do conteúdo dos dados, segue-se uma análise descritiva do conjunto de dados, onde cada atributo utilizado nos modelos é explicado. É também descrita a regressão logística, o método K-fold Cross-Validation e métricas utilizadas para avaliar o desempenho dos modelos. Finalmente, é apresentado a implementação e os resultados dos modelos produzidos. O modelo referente ao primeiro ano após a cirugia alcançou uma precisão de 91,65%, enquanto os modelos de três e cinco anos alcançaram 89,71% e 89,94%, respectivamente. Quanto aos valores da AUC, os resultados foram de 91,65%, 89,16%, e 90,23% para os modelos de um ano, três anos, e cinco anos, respectivamente. Esta dissertação foi realizada com a colaboração do departamento de oncologia do Hospital Universitário Puerta Hierro de Majadahonda no âmbito do projecto europeu CLARIFY.Sousa, PedroGuerreiro, GracindaRUNPardal, Mariana Raimundo2023-07-24T14:57:43Z2022-122022-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/155748enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:38:21Zoai:run.unl.pt:10362/155748Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:56:12.248842Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Estimation of relapse probability in early stages non-small cell lung cancer patients
title Estimation of relapse probability in early stages non-small cell lung cancer patients
spellingShingle Estimation of relapse probability in early stages non-small cell lung cancer patients
Pardal, Mariana Raimundo
Non-Small Cell Lung Cancers
Machine Learning
Logistic Regression
Probability of relapse
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Estimation of relapse probability in early stages non-small cell lung cancer patients
title_full Estimation of relapse probability in early stages non-small cell lung cancer patients
title_fullStr Estimation of relapse probability in early stages non-small cell lung cancer patients
title_full_unstemmed Estimation of relapse probability in early stages non-small cell lung cancer patients
title_sort Estimation of relapse probability in early stages non-small cell lung cancer patients
author Pardal, Mariana Raimundo
author_facet Pardal, Mariana Raimundo
author_role author
dc.contributor.none.fl_str_mv Sousa, Pedro
Guerreiro, Gracinda
RUN
dc.contributor.author.fl_str_mv Pardal, Mariana Raimundo
dc.subject.por.fl_str_mv Non-Small Cell Lung Cancers
Machine Learning
Logistic Regression
Probability of relapse
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic Non-Small Cell Lung Cancers
Machine Learning
Logistic Regression
Probability of relapse
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description Lung cancer is Europe’s third most prevalent cancer in women and men’s second most common cancer. With an expected 1.8 million deaths in 2020, lung cancer remains the leading cause of cancer mortality worldwide. It is estimated that this number will increase in the coming years, causing alarm among global health organisations attempting to prevent this tendency. Even though improvements in early diagnosis and treatment have been made in the hope of increasing survival, recurrence remains a significant problem. Between 30% and 70% of patients with early-stage lung cancer who undergo surgery end up experiencing a relapse. A promising strategy is to leverage data in electronic health records with machine learning algorithms to produce a more reliable risk stratification and identify better the patient’s propensity to relapse, improving survival rates and enhancing patient quality of life. For this purpose, this research developed three logistic regression models to predict recurrence in early-stage NSCLC patients in time horizons of one year, three years, and five years following surgery. After understanding the dataset’s content, a descriptive analysis of the dataset follows, where each attribute used in the models is described. It also explains the logistic regression, the K-fold Cross-Validation method and the concept of relevant metrics to assess the models’ performance. Finally, the implementation and the results of the produced models are presented. The first year following the surgery model produced an accuracy of 91.65%, while the three-year and five-year models achieved 89.71% and 89.94%, respectively. Regarding AUC values, the results were 91.65%, 89.16%, and 90.23% for the one-year, three-year, and five-year models, respectively. This dissertation was conducted with the collaboration of the University Hospital Puerta Hierro de Majadahonda’s oncology department within the European project CLARIFY.
publishDate 2022
dc.date.none.fl_str_mv 2022-12
2022-12-01T00:00:00Z
2023-07-24T14:57:43Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/155748
url http://hdl.handle.net/10362/155748
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138148015407104