Machine learning and feature selection methods for egfr mutation status prediction in lung cancer

Detalhes bibliográficos
Autor(a) principal: Morgado, J
Data de Publicação: 2021
Outros Autores: Pereira, T, Silva, F, Freitas, C, Negrão, E, Lima, BF, Silva, MC, Madureira, AJ, Ramos, I, Hespanhol, V, Costa, JL, Cunha, A, Oliveira, HP
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/152468
Resumo: The evolution of personalized medicine has changed the therapeutic strategy from classical chemotherapy and radiotherapy to a genetic modification targeted therapy, and although biopsy is the traditional method to genetically characterize lung cancer tumor, it is an invasive and painful procedure for the patient. Nodule image features extracted from computed tomography (CT) scans have been used to create machine learning models that predict gene mutation status in a noninvasive, fast, and easy-to-use manner. However, recent studies have shown that radiomic features extracted from an extended region of interest (ROI) beyond the tumor, might be more relevant to predict the mutation status in lung cancer, and consequently may be used to significantly decrease the mortality rate of patients battling this condition. In this work, we investigated the relation between image phenotypes and the mutation status of Epidermal Growth Factor Receptor (EGFR), the most frequently mutated gene in lung cancer with several approved targeted-therapies, using radiomic features extracted from the lung containing the nodule. A variety of linear, nonlinear, and ensemble predictive classification models, along with several feature selection methods, were used to classify the binary outcome of wild-type or mutant EGFR mutation status. The results show that a comprehensive approach using a ROI that included the lung with nodule can capture relevant information and successfully predict the EGFR mutation status with increased performance compared to local nodule analyses. Linear Support Vector Machine, Elastic Net, and Logistic Regression, combined with the Principal Component Analysis feature selection method implemented with 70% of variance in the feature set, were the best-performing classifiers, reaching Area Under the Curve (AUC) values ranging from 0.725 to 0.737. This approach that exploits a holistic analysis indicates that information from more extensive regions of the lung containing the nodule allows a more complete lung cancer characterization and should be considered in future radiogenomic studies.
id RCAP_60879f15a5025b310ccdfbe1d2a51ad9
oai_identifier_str oai:repositorio-aberto.up.pt:10216/152468
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Machine learning and feature selection methods for egfr mutation status prediction in lung cancerEGFR predictionFeature selectionLung cancerMachine learningRadiogenomicsThe evolution of personalized medicine has changed the therapeutic strategy from classical chemotherapy and radiotherapy to a genetic modification targeted therapy, and although biopsy is the traditional method to genetically characterize lung cancer tumor, it is an invasive and painful procedure for the patient. Nodule image features extracted from computed tomography (CT) scans have been used to create machine learning models that predict gene mutation status in a noninvasive, fast, and easy-to-use manner. However, recent studies have shown that radiomic features extracted from an extended region of interest (ROI) beyond the tumor, might be more relevant to predict the mutation status in lung cancer, and consequently may be used to significantly decrease the mortality rate of patients battling this condition. In this work, we investigated the relation between image phenotypes and the mutation status of Epidermal Growth Factor Receptor (EGFR), the most frequently mutated gene in lung cancer with several approved targeted-therapies, using radiomic features extracted from the lung containing the nodule. A variety of linear, nonlinear, and ensemble predictive classification models, along with several feature selection methods, were used to classify the binary outcome of wild-type or mutant EGFR mutation status. The results show that a comprehensive approach using a ROI that included the lung with nodule can capture relevant information and successfully predict the EGFR mutation status with increased performance compared to local nodule analyses. Linear Support Vector Machine, Elastic Net, and Logistic Regression, combined with the Principal Component Analysis feature selection method implemented with 70% of variance in the feature set, were the best-performing classifiers, reaching Area Under the Curve (AUC) values ranging from 0.725 to 0.737. This approach that exploits a holistic analysis indicates that information from more extensive regions of the lung containing the nodule allows a more complete lung cancer characterization and should be considered in future radiogenomic studies.MDPI20212021-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/10216/152468eng2076-341710.3390/app11073273Morgado, JPereira, TSilva, FFreitas, CNegrão, ELima, BFSilva, MCMadureira, AJRamos, IHespanhol, VCosta, JLCunha, AOliveira, HPinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:33:43Zoai:repositorio-aberto.up.pt:10216/152468Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:03:56.718438Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Machine learning and feature selection methods for egfr mutation status prediction in lung cancer
title Machine learning and feature selection methods for egfr mutation status prediction in lung cancer
spellingShingle Machine learning and feature selection methods for egfr mutation status prediction in lung cancer
Morgado, J
EGFR prediction
Feature selection
Lung cancer
Machine learning
Radiogenomics
title_short Machine learning and feature selection methods for egfr mutation status prediction in lung cancer
title_full Machine learning and feature selection methods for egfr mutation status prediction in lung cancer
title_fullStr Machine learning and feature selection methods for egfr mutation status prediction in lung cancer
title_full_unstemmed Machine learning and feature selection methods for egfr mutation status prediction in lung cancer
title_sort Machine learning and feature selection methods for egfr mutation status prediction in lung cancer
author Morgado, J
author_facet Morgado, J
Pereira, T
Silva, F
Freitas, C
Negrão, E
Lima, BF
Silva, MC
Madureira, AJ
Ramos, I
Hespanhol, V
Costa, JL
Cunha, A
Oliveira, HP
author_role author
author2 Pereira, T
Silva, F
Freitas, C
Negrão, E
Lima, BF
Silva, MC
Madureira, AJ
Ramos, I
Hespanhol, V
Costa, JL
Cunha, A
Oliveira, HP
author2_role author
author
author
author
author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Morgado, J
Pereira, T
Silva, F
Freitas, C
Negrão, E
Lima, BF
Silva, MC
Madureira, AJ
Ramos, I
Hespanhol, V
Costa, JL
Cunha, A
Oliveira, HP
dc.subject.por.fl_str_mv EGFR prediction
Feature selection
Lung cancer
Machine learning
Radiogenomics
topic EGFR prediction
Feature selection
Lung cancer
Machine learning
Radiogenomics
description The evolution of personalized medicine has changed the therapeutic strategy from classical chemotherapy and radiotherapy to a genetic modification targeted therapy, and although biopsy is the traditional method to genetically characterize lung cancer tumor, it is an invasive and painful procedure for the patient. Nodule image features extracted from computed tomography (CT) scans have been used to create machine learning models that predict gene mutation status in a noninvasive, fast, and easy-to-use manner. However, recent studies have shown that radiomic features extracted from an extended region of interest (ROI) beyond the tumor, might be more relevant to predict the mutation status in lung cancer, and consequently may be used to significantly decrease the mortality rate of patients battling this condition. In this work, we investigated the relation between image phenotypes and the mutation status of Epidermal Growth Factor Receptor (EGFR), the most frequently mutated gene in lung cancer with several approved targeted-therapies, using radiomic features extracted from the lung containing the nodule. A variety of linear, nonlinear, and ensemble predictive classification models, along with several feature selection methods, were used to classify the binary outcome of wild-type or mutant EGFR mutation status. The results show that a comprehensive approach using a ROI that included the lung with nodule can capture relevant information and successfully predict the EGFR mutation status with increased performance compared to local nodule analyses. Linear Support Vector Machine, Elastic Net, and Logistic Regression, combined with the Principal Component Analysis feature selection method implemented with 70% of variance in the feature set, were the best-performing classifiers, reaching Area Under the Curve (AUC) values ranging from 0.725 to 0.737. This approach that exploits a holistic analysis indicates that information from more extensive regions of the lung containing the nodule allows a more complete lung cancer characterization and should be considered in future radiogenomic studies.
publishDate 2021
dc.date.none.fl_str_mv 2021
2021-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/152468
url https://hdl.handle.net/10216/152468
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2076-3417
10.3390/app11073273
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv MDPI
publisher.none.fl_str_mv MDPI
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799135963466694656