Estimation of relapse probability in early stages non-small cell lung cancer patients
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/155748 |
Resumo: | Lung cancer is Europe’s third most prevalent cancer in women and men’s second most common cancer. With an expected 1.8 million deaths in 2020, lung cancer remains the leading cause of cancer mortality worldwide. It is estimated that this number will increase in the coming years, causing alarm among global health organisations attempting to prevent this tendency. Even though improvements in early diagnosis and treatment have been made in the hope of increasing survival, recurrence remains a significant problem. Between 30% and 70% of patients with early-stage lung cancer who undergo surgery end up experiencing a relapse. A promising strategy is to leverage data in electronic health records with machine learning algorithms to produce a more reliable risk stratification and identify better the patient’s propensity to relapse, improving survival rates and enhancing patient quality of life. For this purpose, this research developed three logistic regression models to predict recurrence in early-stage NSCLC patients in time horizons of one year, three years, and five years following surgery. After understanding the dataset’s content, a descriptive analysis of the dataset follows, where each attribute used in the models is described. It also explains the logistic regression, the K-fold Cross-Validation method and the concept of relevant metrics to assess the models’ performance. Finally, the implementation and the results of the produced models are presented. The first year following the surgery model produced an accuracy of 91.65%, while the three-year and five-year models achieved 89.71% and 89.94%, respectively. Regarding AUC values, the results were 91.65%, 89.16%, and 90.23% for the one-year, three-year, and five-year models, respectively. This dissertation was conducted with the collaboration of the University Hospital Puerta Hierro de Majadahonda’s oncology department within the European project CLARIFY. |
id |
RCAP_69aa1907d3a3530ec60b1b94f11ec9cc |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/155748 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Estimation of relapse probability in early stages non-small cell lung cancer patientsNon-Small Cell Lung CancersMachine LearningLogistic RegressionProbability of relapseDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaLung cancer is Europe’s third most prevalent cancer in women and men’s second most common cancer. With an expected 1.8 million deaths in 2020, lung cancer remains the leading cause of cancer mortality worldwide. It is estimated that this number will increase in the coming years, causing alarm among global health organisations attempting to prevent this tendency. Even though improvements in early diagnosis and treatment have been made in the hope of increasing survival, recurrence remains a significant problem. Between 30% and 70% of patients with early-stage lung cancer who undergo surgery end up experiencing a relapse. A promising strategy is to leverage data in electronic health records with machine learning algorithms to produce a more reliable risk stratification and identify better the patient’s propensity to relapse, improving survival rates and enhancing patient quality of life. For this purpose, this research developed three logistic regression models to predict recurrence in early-stage NSCLC patients in time horizons of one year, three years, and five years following surgery. After understanding the dataset’s content, a descriptive analysis of the dataset follows, where each attribute used in the models is described. It also explains the logistic regression, the K-fold Cross-Validation method and the concept of relevant metrics to assess the models’ performance. Finally, the implementation and the results of the produced models are presented. The first year following the surgery model produced an accuracy of 91.65%, while the three-year and five-year models achieved 89.71% and 89.94%, respectively. Regarding AUC values, the results were 91.65%, 89.16%, and 90.23% for the one-year, three-year, and five-year models, respectively. This dissertation was conducted with the collaboration of the University Hospital Puerta Hierro de Majadahonda’s oncology department within the European project CLARIFY.Na Europa, cancro do pulmão é o terceiro cancro mais prevalente em mulheres e o segundo mais comum nos homens. Com uma previsão de 1,8 milhões de mortes em 2020, o cancro do pulmão continua a ser a principal causa de mortalidade por cancro a nível mundial. Prevê-se que este número aumente nos próximos anos, causando alarme entre as organizações de saúde mundiais que tentam prevenir esta tendência. Embora tenham sido feitas melhorias no diagnóstico precoce e no tratamento na esperança de aumentar a sobrevivência, a recorrência continua a ser um problema significativo. Entre 30% a 70% dos pacientes com cancro do pulmão em fase inicial que se submetem a cirurgia acabam por sofrer uma reincidência. Uma estratégia promissora é aproveitar os dados dos registos de saúde electrónicos com algoritmos de machine learning para produzir uma estratificação de risco mais precisa, identificar a propensão do paciente para a reincidência, melhorando assim, as taxas de sobrevivência e a qualidade de vida do paciente. Para este propósito, este estudo desenvolveu três modelos de regressão logística para prever a recorrência em pacientes em fase inicial com cancro do pulmão de células não pequenas, em horizontes temporais de um ano, três anos, e cinco anos após a cirurgia. Após a compreensão do conteúdo dos dados, segue-se uma análise descritiva do conjunto de dados, onde cada atributo utilizado nos modelos é explicado. É também descrita a regressão logística, o método K-fold Cross-Validation e métricas utilizadas para avaliar o desempenho dos modelos. Finalmente, é apresentado a implementação e os resultados dos modelos produzidos. O modelo referente ao primeiro ano após a cirugia alcançou uma precisão de 91,65%, enquanto os modelos de três e cinco anos alcançaram 89,71% e 89,94%, respectivamente. Quanto aos valores da AUC, os resultados foram de 91,65%, 89,16%, e 90,23% para os modelos de um ano, três anos, e cinco anos, respectivamente. Esta dissertação foi realizada com a colaboração do departamento de oncologia do Hospital Universitário Puerta Hierro de Majadahonda no âmbito do projecto europeu CLARIFY.Sousa, PedroGuerreiro, GracindaRUNPardal, Mariana Raimundo2023-07-24T14:57:43Z2022-122022-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/155748enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:38:21Zoai:run.unl.pt:10362/155748Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:56:12.248842Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Estimation of relapse probability in early stages non-small cell lung cancer patients |
title |
Estimation of relapse probability in early stages non-small cell lung cancer patients |
spellingShingle |
Estimation of relapse probability in early stages non-small cell lung cancer patients Pardal, Mariana Raimundo Non-Small Cell Lung Cancers Machine Learning Logistic Regression Probability of relapse Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
title_short |
Estimation of relapse probability in early stages non-small cell lung cancer patients |
title_full |
Estimation of relapse probability in early stages non-small cell lung cancer patients |
title_fullStr |
Estimation of relapse probability in early stages non-small cell lung cancer patients |
title_full_unstemmed |
Estimation of relapse probability in early stages non-small cell lung cancer patients |
title_sort |
Estimation of relapse probability in early stages non-small cell lung cancer patients |
author |
Pardal, Mariana Raimundo |
author_facet |
Pardal, Mariana Raimundo |
author_role |
author |
dc.contributor.none.fl_str_mv |
Sousa, Pedro Guerreiro, Gracinda RUN |
dc.contributor.author.fl_str_mv |
Pardal, Mariana Raimundo |
dc.subject.por.fl_str_mv |
Non-Small Cell Lung Cancers Machine Learning Logistic Regression Probability of relapse Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
topic |
Non-Small Cell Lung Cancers Machine Learning Logistic Regression Probability of relapse Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
description |
Lung cancer is Europe’s third most prevalent cancer in women and men’s second most common cancer. With an expected 1.8 million deaths in 2020, lung cancer remains the leading cause of cancer mortality worldwide. It is estimated that this number will increase in the coming years, causing alarm among global health organisations attempting to prevent this tendency. Even though improvements in early diagnosis and treatment have been made in the hope of increasing survival, recurrence remains a significant problem. Between 30% and 70% of patients with early-stage lung cancer who undergo surgery end up experiencing a relapse. A promising strategy is to leverage data in electronic health records with machine learning algorithms to produce a more reliable risk stratification and identify better the patient’s propensity to relapse, improving survival rates and enhancing patient quality of life. For this purpose, this research developed three logistic regression models to predict recurrence in early-stage NSCLC patients in time horizons of one year, three years, and five years following surgery. After understanding the dataset’s content, a descriptive analysis of the dataset follows, where each attribute used in the models is described. It also explains the logistic regression, the K-fold Cross-Validation method and the concept of relevant metrics to assess the models’ performance. Finally, the implementation and the results of the produced models are presented. The first year following the surgery model produced an accuracy of 91.65%, while the three-year and five-year models achieved 89.71% and 89.94%, respectively. Regarding AUC values, the results were 91.65%, 89.16%, and 90.23% for the one-year, three-year, and five-year models, respectively. This dissertation was conducted with the collaboration of the University Hospital Puerta Hierro de Majadahonda’s oncology department within the European project CLARIFY. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-12 2022-12-01T00:00:00Z 2023-07-24T14:57:43Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/155748 |
url |
http://hdl.handle.net/10362/155748 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138148015407104 |