Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images

Detalhes bibliográficos
Autor(a) principal: Valle, Matheus Del
Data de Publicação: 2023
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: https://www.teses.usp.br/teses/disponiveis/85/85134/tde-10072023-162427/
Resumo: Breast cancer is the most incident cancer worldwide. The evaluation of molecular subtypes and their biomarkers plays an essential role in prognosis. The biomarkers used are Estrogen Receptor (ER), Progesterone Receptor (PR), Human Epidermal growth factor Receptor-type 2 (HER2), and Ki67. Based on these, subtypes are classified as Luminal A (LA), Luminal B (LB), HER2 subtype, and Triple-Negative Breast Cancer (TNBC). The gold standard for this analysis is histology and immunohistochemistry, semi-quantitative techniques that present inter-laboratory and inter-observer variations. The Fourier Transform Infrared micro-spectroscopy (micro-FTIR), which provides hyperspectral images with biochemical information of biological tissues, is applied together with artificial intelligence (AI) for cancer evaluation. In this thesis, twenty samples of two breast cancer cell lines, BT-474 and SK-BR-3, were used to define the optimal number of co-added scans for machine learning (ML) techniques. Linear Discriminant Analysis (LDA), Partial Least Squares Discriminant Analysis (PLS-DA), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB) models were used. Sixty hyperspectral images of 320x320 pixels were collected from thirty patients of a human breast biopsies microarray, each containing a breast cancer (CA) and an adjacent tissue (AT) core. Automated methods based on K-Means clustering were developed for data organization and pre-processing to one-dimensional (1D) and two-dimensional (2D) data. The dataset was used to train two new deep learning models for breast cancer subtype evaluation: CaReNet-V1, a 1D Convolutional Neural Network (CNN); and CaReNet-V2, a 2D CNN. All ML models achieved similar performances with the b256_064 (256 background scans and 64 sample scans), b256_128, and b128_128 groups, where the best accuracy of 0.995 was presented by the XGB model. The b256_064 was established as the ideal among the three due to the shortest acquisition time. The K-Means-based method enabled fully automated preprocessing and organization, improving data quality and optimizing CNN training. CaReNet-V1 effectively classified CA and AT (individual spectra test accuracy of 0.89), as well as HER2 and TNBC subtypes (0.83 and 0.86), with greater difficulty for LA and LB (0.74 and 0.68). The model enabled the evaluation of the most contributing wavenumbers to the predictions, providing a direct relationship with the biochemical content of the samples. CaReNet-V2 demonstrated better performance than 1D, with test accuracies above 0.84, and enabled the prediction of ER, PR, and HER2 levels, where borderline values showed lower performance (minimum accuracy of 0.54). The Ki67 percentage regression demonstrated an absolute mean error of 3.6%. On the other hand, its impact evaluation by wavenumber was inferior to 1D. Thus, this study indicates image-based AI techniques using micro-FTIR as potential providers of additional information to pathological reports, also serving as patient biopsy screening techniques.
id USP_14935d4d179b795209ebefc026b4be09
oai_identifier_str oai:teses.usp.br:tde-10072023-162427
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral imagesAvaliação de subtipos moleculares de câncer de mama utilizando inteligência artificial em imagens hiperespectrais por micro-FTIaprendizado máquinabiomarker levelbreast cancer subtypeco-added scansconvolutional neural networkimagem micro-FTIRmachine learningmicro-FTIR imagingnível biomarcadorrede neural convolucionalsubtipo câncer mamavarreduras co-adicionadasBreast cancer is the most incident cancer worldwide. The evaluation of molecular subtypes and their biomarkers plays an essential role in prognosis. The biomarkers used are Estrogen Receptor (ER), Progesterone Receptor (PR), Human Epidermal growth factor Receptor-type 2 (HER2), and Ki67. Based on these, subtypes are classified as Luminal A (LA), Luminal B (LB), HER2 subtype, and Triple-Negative Breast Cancer (TNBC). The gold standard for this analysis is histology and immunohistochemistry, semi-quantitative techniques that present inter-laboratory and inter-observer variations. The Fourier Transform Infrared micro-spectroscopy (micro-FTIR), which provides hyperspectral images with biochemical information of biological tissues, is applied together with artificial intelligence (AI) for cancer evaluation. In this thesis, twenty samples of two breast cancer cell lines, BT-474 and SK-BR-3, were used to define the optimal number of co-added scans for machine learning (ML) techniques. Linear Discriminant Analysis (LDA), Partial Least Squares Discriminant Analysis (PLS-DA), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB) models were used. Sixty hyperspectral images of 320x320 pixels were collected from thirty patients of a human breast biopsies microarray, each containing a breast cancer (CA) and an adjacent tissue (AT) core. Automated methods based on K-Means clustering were developed for data organization and pre-processing to one-dimensional (1D) and two-dimensional (2D) data. The dataset was used to train two new deep learning models for breast cancer subtype evaluation: CaReNet-V1, a 1D Convolutional Neural Network (CNN); and CaReNet-V2, a 2D CNN. All ML models achieved similar performances with the b256_064 (256 background scans and 64 sample scans), b256_128, and b128_128 groups, where the best accuracy of 0.995 was presented by the XGB model. The b256_064 was established as the ideal among the three due to the shortest acquisition time. The K-Means-based method enabled fully automated preprocessing and organization, improving data quality and optimizing CNN training. CaReNet-V1 effectively classified CA and AT (individual spectra test accuracy of 0.89), as well as HER2 and TNBC subtypes (0.83 and 0.86), with greater difficulty for LA and LB (0.74 and 0.68). The model enabled the evaluation of the most contributing wavenumbers to the predictions, providing a direct relationship with the biochemical content of the samples. CaReNet-V2 demonstrated better performance than 1D, with test accuracies above 0.84, and enabled the prediction of ER, PR, and HER2 levels, where borderline values showed lower performance (minimum accuracy of 0.54). The Ki67 percentage regression demonstrated an absolute mean error of 3.6%. On the other hand, its impact evaluation by wavenumber was inferior to 1D. Thus, this study indicates image-based AI techniques using micro-FTIR as potential providers of additional information to pathological reports, also serving as patient biopsy screening techniques.O câncer de mama é o mais incidente no mundo. A avaliação do subtipo molecular e seus biomarcadores tem um papel fundamental para o prognóstico. Os biomarcadores utilizados são os Receptores de Estrogênio (ER), de Progesterona (PR), de tipo 2 do fator de Crescimento Epidérmico Humano (HER2), e Ki67. Com base nestes, os subtipos são classificados como Luminal A (LA), Luminal B (LB), subtipo HER2 e Triplo-Negativo (TNBC). O padrão-ouro desta análise é a histologia e imuno-histoquímica, técnicas semiquantitativas que apresentam variações inter-laboratorial e inter-observador. A técnica de micro-espectroscopia no Infravermelho por Transformada de Fourier (FTIR), que fornece imagens hiperspectrais com informações bioquímicas de tecidos biológicos, é aplicada em conjunto de inteligência artificial (IA) para avaliação de cânceres. Nesta tese, foram utilizadas vinte amostras de duas linhagens celulares de câncer de mama, BT-474 e SK-BR-3, para definição do número ótimo de varreduras co-adicionadas para técnicas de aprendizado de máquina (ML). Foram utilizados os modelos de Análise Discriminante Linear (LDA), Análise Discriminante por Mínimos Quadrados Parciais (PLS-DA), K-Vizinhos Mais Próximos (KNN), Máquinas de Vetores de Suporte (SVM), Floresta Aleatória (RF) e Aumento de Gradiente Extremo (XGB). Sessenta imagens hiperespectrais de 320x320 pixels foram coletadas de trinta pacientes de biópsias humanas de mama em um microarranjo, cada qual contendo um núcleo de Câncer de mama (CA) e um de Tecido Adjacente (AT). Foram desenvolvidos métodos automatizados para organização e pré-processamento dos dados em unidimensionais (1D) e bidimensionais (2D) baseados em agrupamento K-Médias. Os dados foram utilizados para treinamento de dois novos modelos de aprendizado profundo para avaliação de subtipo de câncer de mama: CaReNet-V1, Rede Neural Convolucional (CNN) 1D; e CaReNet-V2, CNN 2D. Todos os modelos de ML alcançaram desempenhos semelhantes com os grupos b256_064 (256 varreduras de fundo e 64 varreduras de amostra), b256_128 e b128_128, onde a melhor acurácia de 0.995 foi apresentada pelo modelo XGB. O b256_064 foi estabelecido como o ideal dentre os três devido ao menor tempo de aquisição. O método baseado em K-Médias possibilitou o pré-processamento e organização totalmente automatizado, melhorando a qualidade dos dados e otimizando o treinamento das CNN. A CaReNet-V1 classificou com eficácia CA e AT (acurácia de teste dos espectros individuais de 0,89), além dos subtipos HER2 e TNBC (0,83 e 0,86), apresentando maiores dificuldades para LA e LB (0,74 e 0,68). O modelo possibilitou a avaliação dos números de onda que mais contribuíram para as predições, fornecendo uma relação direta com o conteúdo bioquímico das amostras. A CaReNet-V2 demonstrou melhor desempenho que a 1D, com acurácias de teste acima de 0,84, e possibilitou a predição dos níveis de ER, PR e HER2, onde os valores limítrofes apresentaram menor desempenho (acurácia mínima de 0,54). A regressão da porcentagem de Ki67 demonstrou erro médio absoluto de 3,6%. Por outro lado, sua avaliação de impacto por número de onda foi inferior ao 1D. Assim, este estudo aponta as técnicas de IA por imagens por micro-FTIR como potenciais para prover informações adicionais aos relatórios patológicos, servindo ainda como técnicas de triagem de pacientes.Biblioteca Digitais de Teses e Dissertações da USPBernardes, Emerson SoaresZezell, Denise MariaValle, Matheus Del2023-04-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/85/85134/tde-10072023-162427/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2023-07-12T19:47:14Zoai:teses.usp.br:tde-10072023-162427Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212023-07-12T19:47:14Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images
Avaliação de subtipos moleculares de câncer de mama utilizando inteligência artificial em imagens hiperespectrais por micro-FTI
title Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images
spellingShingle Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images
Valle, Matheus Del
aprendizado máquina
biomarker level
breast cancer subtype
co-added scans
convolutional neural network
imagem micro-FTIR
machine learning
micro-FTIR imaging
nível biomarcador
rede neural convolucional
subtipo câncer mama
varreduras co-adicionadas
title_short Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images
title_full Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images
title_fullStr Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images
title_full_unstemmed Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images
title_sort Evaluation of breast cancer molecular subtypes using artificial intelligence in micro-FTIR hyperspectral images
author Valle, Matheus Del
author_facet Valle, Matheus Del
author_role author
dc.contributor.none.fl_str_mv Bernardes, Emerson Soares
Zezell, Denise Maria
dc.contributor.author.fl_str_mv Valle, Matheus Del
dc.subject.por.fl_str_mv aprendizado máquina
biomarker level
breast cancer subtype
co-added scans
convolutional neural network
imagem micro-FTIR
machine learning
micro-FTIR imaging
nível biomarcador
rede neural convolucional
subtipo câncer mama
varreduras co-adicionadas
topic aprendizado máquina
biomarker level
breast cancer subtype
co-added scans
convolutional neural network
imagem micro-FTIR
machine learning
micro-FTIR imaging
nível biomarcador
rede neural convolucional
subtipo câncer mama
varreduras co-adicionadas
description Breast cancer is the most incident cancer worldwide. The evaluation of molecular subtypes and their biomarkers plays an essential role in prognosis. The biomarkers used are Estrogen Receptor (ER), Progesterone Receptor (PR), Human Epidermal growth factor Receptor-type 2 (HER2), and Ki67. Based on these, subtypes are classified as Luminal A (LA), Luminal B (LB), HER2 subtype, and Triple-Negative Breast Cancer (TNBC). The gold standard for this analysis is histology and immunohistochemistry, semi-quantitative techniques that present inter-laboratory and inter-observer variations. The Fourier Transform Infrared micro-spectroscopy (micro-FTIR), which provides hyperspectral images with biochemical information of biological tissues, is applied together with artificial intelligence (AI) for cancer evaluation. In this thesis, twenty samples of two breast cancer cell lines, BT-474 and SK-BR-3, were used to define the optimal number of co-added scans for machine learning (ML) techniques. Linear Discriminant Analysis (LDA), Partial Least Squares Discriminant Analysis (PLS-DA), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB) models were used. Sixty hyperspectral images of 320x320 pixels were collected from thirty patients of a human breast biopsies microarray, each containing a breast cancer (CA) and an adjacent tissue (AT) core. Automated methods based on K-Means clustering were developed for data organization and pre-processing to one-dimensional (1D) and two-dimensional (2D) data. The dataset was used to train two new deep learning models for breast cancer subtype evaluation: CaReNet-V1, a 1D Convolutional Neural Network (CNN); and CaReNet-V2, a 2D CNN. All ML models achieved similar performances with the b256_064 (256 background scans and 64 sample scans), b256_128, and b128_128 groups, where the best accuracy of 0.995 was presented by the XGB model. The b256_064 was established as the ideal among the three due to the shortest acquisition time. The K-Means-based method enabled fully automated preprocessing and organization, improving data quality and optimizing CNN training. CaReNet-V1 effectively classified CA and AT (individual spectra test accuracy of 0.89), as well as HER2 and TNBC subtypes (0.83 and 0.86), with greater difficulty for LA and LB (0.74 and 0.68). The model enabled the evaluation of the most contributing wavenumbers to the predictions, providing a direct relationship with the biochemical content of the samples. CaReNet-V2 demonstrated better performance than 1D, with test accuracies above 0.84, and enabled the prediction of ER, PR, and HER2 levels, where borderline values showed lower performance (minimum accuracy of 0.54). The Ki67 percentage regression demonstrated an absolute mean error of 3.6%. On the other hand, its impact evaluation by wavenumber was inferior to 1D. Thus, this study indicates image-based AI techniques using micro-FTIR as potential providers of additional information to pathological reports, also serving as patient biopsy screening techniques.
publishDate 2023
dc.date.none.fl_str_mv 2023-04-04
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/85/85134/tde-10072023-162427/
url https://www.teses.usp.br/teses/disponiveis/85/85134/tde-10072023-162427/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1809090584886902784