Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Tese |
Idioma: | eng |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da USP |
Texto Completo: | https://www.teses.usp.br/teses/disponiveis/85/85134/tde-10072023-162427/ |
Resumo: | Breast cancer is the most incident cancer worldwide. The evaluation of molecular subtypes and their biomarkers plays an essential role in prognosis. The biomarkers used are Estrogen Receptor (ER), Progesterone Receptor (PR), Human Epidermal growth factor Receptor-type 2 (HER2), and Ki67. Based on these, subtypes are classified as Luminal A (LA), Luminal B (LB), HER2 subtype, and Triple-Negative Breast Cancer (TNBC). The gold standard for this analysis is histology and immunohistochemistry, semi-quantitative techniques that present inter-laboratory and inter-observer variations. The Fourier Transform Infrared micro-spectroscopy (micro-FTIR), which provides hyperspectral images with biochemical information of biological tissues, is applied together with artificial intelligence (AI) for cancer evaluation. In this thesis, twenty samples of two breast cancer cell lines, BT-474 and SK-BR-3, were used to define the optimal number of co-added scans for machine learning (ML) techniques. Linear Discriminant Analysis (LDA), Partial Least Squares Discriminant Analysis (PLS-DA), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB) models were used. Sixty hyperspectral images of 320x320 pixels were collected from thirty patients of a human breast biopsies microarray, each containing a breast cancer (CA) and an adjacent tissue (AT) core. Automated methods based on K-Means clustering were developed for data organization and pre-processing to one-dimensional (1D) and two-dimensional (2D) data. The dataset was used to train two new deep learning models for breast cancer subtype evaluation: CaReNet-V1, a 1D Convolutional Neural Network (CNN); and CaReNet-V2, a 2D CNN. All ML models achieved similar performances with the b256_064 (256 background scans and 64 sample scans), b256_128, and b128_128 groups, where the best accuracy of 0.995 was presented by the XGB model. The b256_064 was established as the ideal among the three due to the shortest acquisition time. The K-Means-based method enabled fully automated preprocessing and organization, improving data quality and optimizing CNN training. CaReNet-V1 effectively classified CA and AT (individual spectra test accuracy of 0.89), as well as HER2 and TNBC subtypes (0.83 and 0.86), with greater difficulty for LA and LB (0.74 and 0.68). The model enabled the evaluation of the most contributing wavenumbers to the predictions, providing a direct relationship with the biochemical content of the samples. CaReNet-V2 demonstrated better performance than 1D, with test accuracies above 0.84, and enabled the prediction of ER, PR, and HER2 levels, where borderline values showed lower performance (minimum accuracy of 0.54). The Ki67 percentage regression demonstrated an absolute mean error of 3.6%. On the other hand, its impact evaluation by wavenumber was inferior to 1D. Thus, this study indicates image-based AI techniques using micro-FTIR as potential providers of additional information to pathological reports, also serving as patient biopsy screening techniques. |
id |
USP_14935d4d179b795209ebefc026b4be09 |
---|---|
oai_identifier_str |
oai:teses.usp.br:tde-10072023-162427 |
network_acronym_str |
USP |
network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
repository_id_str |
2721 |
spelling |
Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral imagesAvaliação de subtipos moleculares de câncer de mama utilizando inteligência artificial em imagens hiperespectrais por micro-FTIaprendizado máquinabiomarker levelbreast cancer subtypeco-added scansconvolutional neural networkimagem micro-FTIRmachine learningmicro-FTIR imagingnível biomarcadorrede neural convolucionalsubtipo câncer mamavarreduras co-adicionadasBreast cancer is the most incident cancer worldwide. The evaluation of molecular subtypes and their biomarkers plays an essential role in prognosis. The biomarkers used are Estrogen Receptor (ER), Progesterone Receptor (PR), Human Epidermal growth factor Receptor-type 2 (HER2), and Ki67. Based on these, subtypes are classified as Luminal A (LA), Luminal B (LB), HER2 subtype, and Triple-Negative Breast Cancer (TNBC). The gold standard for this analysis is histology and immunohistochemistry, semi-quantitative techniques that present inter-laboratory and inter-observer variations. The Fourier Transform Infrared micro-spectroscopy (micro-FTIR), which provides hyperspectral images with biochemical information of biological tissues, is applied together with artificial intelligence (AI) for cancer evaluation. In this thesis, twenty samples of two breast cancer cell lines, BT-474 and SK-BR-3, were used to define the optimal number of co-added scans for machine learning (ML) techniques. Linear Discriminant Analysis (LDA), Partial Least Squares Discriminant Analysis (PLS-DA), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB) models were used. Sixty hyperspectral images of 320x320 pixels were collected from thirty patients of a human breast biopsies microarray, each containing a breast cancer (CA) and an adjacent tissue (AT) core. Automated methods based on K-Means clustering were developed for data organization and pre-processing to one-dimensional (1D) and two-dimensional (2D) data. The dataset was used to train two new deep learning models for breast cancer subtype evaluation: CaReNet-V1, a 1D Convolutional Neural Network (CNN); and CaReNet-V2, a 2D CNN. All ML models achieved similar performances with the b256_064 (256 background scans and 64 sample scans), b256_128, and b128_128 groups, where the best accuracy of 0.995 was presented by the XGB model. The b256_064 was established as the ideal among the three due to the shortest acquisition time. The K-Means-based method enabled fully automated preprocessing and organization, improving data quality and optimizing CNN training. CaReNet-V1 effectively classified CA and AT (individual spectra test accuracy of 0.89), as well as HER2 and TNBC subtypes (0.83 and 0.86), with greater difficulty for LA and LB (0.74 and 0.68). The model enabled the evaluation of the most contributing wavenumbers to the predictions, providing a direct relationship with the biochemical content of the samples. CaReNet-V2 demonstrated better performance than 1D, with test accuracies above 0.84, and enabled the prediction of ER, PR, and HER2 levels, where borderline values showed lower performance (minimum accuracy of 0.54). The Ki67 percentage regression demonstrated an absolute mean error of 3.6%. On the other hand, its impact evaluation by wavenumber was inferior to 1D. Thus, this study indicates image-based AI techniques using micro-FTIR as potential providers of additional information to pathological reports, also serving as patient biopsy screening techniques.O câncer de mama é o mais incidente no mundo. A avaliação do subtipo molecular e seus biomarcadores tem um papel fundamental para o prognóstico. Os biomarcadores utilizados são os Receptores de Estrogênio (ER), de Progesterona (PR), de tipo 2 do fator de Crescimento Epidérmico Humano (HER2), e Ki67. Com base nestes, os subtipos são classificados como Luminal A (LA), Luminal B (LB), subtipo HER2 e Triplo-Negativo (TNBC). O padrão-ouro desta análise é a histologia e imuno-histoquímica, técnicas semiquantitativas que apresentam variações inter-laboratorial e inter-observador. A técnica de micro-espectroscopia no Infravermelho por Transformada de Fourier (FTIR), que fornece imagens hiperspectrais com informações bioquímicas de tecidos biológicos, é aplicada em conjunto de inteligência artificial (IA) para avaliação de cânceres. Nesta tese, foram utilizadas vinte amostras de duas linhagens celulares de câncer de mama, BT-474 e SK-BR-3, para definição do número ótimo de varreduras co-adicionadas para técnicas de aprendizado de máquina (ML). Foram utilizados os modelos de Análise Discriminante Linear (LDA), Análise Discriminante por Mínimos Quadrados Parciais (PLS-DA), K-Vizinhos Mais Próximos (KNN), Máquinas de Vetores de Suporte (SVM), Floresta Aleatória (RF) e Aumento de Gradiente Extremo (XGB). Sessenta imagens hiperespectrais de 320x320 pixels foram coletadas de trinta pacientes de biópsias humanas de mama em um microarranjo, cada qual contendo um núcleo de Câncer de mama (CA) e um de Tecido Adjacente (AT). Foram desenvolvidos métodos automatizados para organização e pré-processamento dos dados em unidimensionais (1D) e bidimensionais (2D) baseados em agrupamento K-Médias. Os dados foram utilizados para treinamento de dois novos modelos de aprendizado profundo para avaliação de subtipo de câncer de mama: CaReNet-V1, Rede Neural Convolucional (CNN) 1D; e CaReNet-V2, CNN 2D. Todos os modelos de ML alcançaram desempenhos semelhantes com os grupos b256_064 (256 varreduras de fundo e 64 varreduras de amostra), b256_128 e b128_128, onde a melhor acurácia de 0.995 foi apresentada pelo modelo XGB. O b256_064 foi estabelecido como o ideal dentre os três devido ao menor tempo de aquisição. O método baseado em K-Médias possibilitou o pré-processamento e organização totalmente automatizado, melhorando a qualidade dos dados e otimizando o treinamento das CNN. A CaReNet-V1 classificou com eficácia CA e AT (acurácia de teste dos espectros individuais de 0,89), além dos subtipos HER2 e TNBC (0,83 e 0,86), apresentando maiores dificuldades para LA e LB (0,74 e 0,68). O modelo possibilitou a avaliação dos números de onda que mais contribuíram para as predições, fornecendo uma relação direta com o conteúdo bioquímico das amostras. A CaReNet-V2 demonstrou melhor desempenho que a 1D, com acurácias de teste acima de 0,84, e possibilitou a predição dos níveis de ER, PR e HER2, onde os valores limítrofes apresentaram menor desempenho (acurácia mínima de 0,54). A regressão da porcentagem de Ki67 demonstrou erro médio absoluto de 3,6%. Por outro lado, sua avaliação de impacto por número de onda foi inferior ao 1D. Assim, este estudo aponta as técnicas de IA por imagens por micro-FTIR como potenciais para prover informações adicionais aos relatórios patológicos, servindo ainda como técnicas de triagem de pacientes.Biblioteca Digitais de Teses e Dissertações da USPBernardes, Emerson SoaresZezell, Denise MariaValle, Matheus Del2023-04-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/85/85134/tde-10072023-162427/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2023-07-12T19:47:14Zoai:teses.usp.br:tde-10072023-162427Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212023-07-12T19:47:14Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images Avaliação de subtipos moleculares de câncer de mama utilizando inteligência artificial em imagens hiperespectrais por micro-FTI |
title |
Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images |
spellingShingle |
Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images Valle, Matheus Del aprendizado máquina biomarker level breast cancer subtype co-added scans convolutional neural network imagem micro-FTIR machine learning micro-FTIR imaging nível biomarcador rede neural convolucional subtipo câncer mama varreduras co-adicionadas |
title_short |
Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images |
title_full |
Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images |
title_fullStr |
Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images |
title_full_unstemmed |
Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images |
title_sort |
Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images |
author |
Valle, Matheus Del |
author_facet |
Valle, Matheus Del |
author_role |
author |
dc.contributor.none.fl_str_mv |
Bernardes, Emerson Soares Zezell, Denise Maria |
dc.contributor.author.fl_str_mv |
Valle, Matheus Del |
dc.subject.por.fl_str_mv |
aprendizado máquina biomarker level breast cancer subtype co-added scans convolutional neural network imagem micro-FTIR machine learning micro-FTIR imaging nível biomarcador rede neural convolucional subtipo câncer mama varreduras co-adicionadas |
topic |
aprendizado máquina biomarker level breast cancer subtype co-added scans convolutional neural network imagem micro-FTIR machine learning micro-FTIR imaging nível biomarcador rede neural convolucional subtipo câncer mama varreduras co-adicionadas |
description |
Breast cancer is the most incident cancer worldwide. The evaluation of molecular subtypes and their biomarkers plays an essential role in prognosis. The biomarkers used are Estrogen Receptor (ER), Progesterone Receptor (PR), Human Epidermal growth factor Receptor-type 2 (HER2), and Ki67. Based on these, subtypes are classified as Luminal A (LA), Luminal B (LB), HER2 subtype, and Triple-Negative Breast Cancer (TNBC). The gold standard for this analysis is histology and immunohistochemistry, semi-quantitative techniques that present inter-laboratory and inter-observer variations. The Fourier Transform Infrared micro-spectroscopy (micro-FTIR), which provides hyperspectral images with biochemical information of biological tissues, is applied together with artificial intelligence (AI) for cancer evaluation. In this thesis, twenty samples of two breast cancer cell lines, BT-474 and SK-BR-3, were used to define the optimal number of co-added scans for machine learning (ML) techniques. Linear Discriminant Analysis (LDA), Partial Least Squares Discriminant Analysis (PLS-DA), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB) models were used. Sixty hyperspectral images of 320x320 pixels were collected from thirty patients of a human breast biopsies microarray, each containing a breast cancer (CA) and an adjacent tissue (AT) core. Automated methods based on K-Means clustering were developed for data organization and pre-processing to one-dimensional (1D) and two-dimensional (2D) data. The dataset was used to train two new deep learning models for breast cancer subtype evaluation: CaReNet-V1, a 1D Convolutional Neural Network (CNN); and CaReNet-V2, a 2D CNN. All ML models achieved similar performances with the b256_064 (256 background scans and 64 sample scans), b256_128, and b128_128 groups, where the best accuracy of 0.995 was presented by the XGB model. The b256_064 was established as the ideal among the three due to the shortest acquisition time. The K-Means-based method enabled fully automated preprocessing and organization, improving data quality and optimizing CNN training. CaReNet-V1 effectively classified CA and AT (individual spectra test accuracy of 0.89), as well as HER2 and TNBC subtypes (0.83 and 0.86), with greater difficulty for LA and LB (0.74 and 0.68). The model enabled the evaluation of the most contributing wavenumbers to the predictions, providing a direct relationship with the biochemical content of the samples. CaReNet-V2 demonstrated better performance than 1D, with test accuracies above 0.84, and enabled the prediction of ER, PR, and HER2 levels, where borderline values showed lower performance (minimum accuracy of 0.54). The Ki67 percentage regression demonstrated an absolute mean error of 3.6%. On the other hand, its impact evaluation by wavenumber was inferior to 1D. Thus, this study indicates image-based AI techniques using micro-FTIR as potential providers of additional information to pathological reports, also serving as patient biopsy screening techniques. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-04-04 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/85/85134/tde-10072023-162427/ |
url |
https://www.teses.usp.br/teses/disponiveis/85/85134/tde-10072023-162427/ |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
|
dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.coverage.none.fl_str_mv |
|
dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
collection |
Biblioteca Digital de Teses e Dissertações da USP |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
_version_ |
1809090584886902784 |