Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression

Detalhes bibliográficos
Autor(a) principal: Morais-Rodrigues, Francielly
Data de Publicação: 2019
Outros Autores: Silv́erio-Machado, Rita, Kato, Rodrigo Bentes, Rodrigues, Diego Lucas Neres, Valdez-Baez, Juan, Fonseca, Vagner, San, Emmanuel James, Gomes, Lucas Gabriel Rodrigues, Santos, Roselane Gonçalves dos, Viana, Marcus Vinicius Canário, Dutra, Joyceda Cruz Ferraz, Parise, Mariana Teixeira Dornelles, Parise, Doglas, Campos, Frederico F., Souza, Sandro José de, Ortega, José Miguel, Barh, Debmalya, Ghosh, Preetam, Azevedo, Vasco A. C., Santos, Marcos A. dos
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFRN
Texto Completo: https://repositorio.ufrn.br/jspui/handle/123456789/28210
https://doi.org/10.1016/j.gene.2019.144168
Resumo: Methods based around statistics and linear algebra have been increasingly used in attempts to address emerging questions in microarray literature. Microarray technology is a long-used tool in the global analysis of gene expression, allowing for the simultaneous investigation of hundreds or thousands of genes in a sample. It is characterized by a low sample size and a large feature number created a non-square matrix, and by the incomplete rank, that can generate countless more solution in classifiers. To avoid the problem of the ‘curse of dimensionality’ many authors have performed feature selection or reduced the size of data matrix. In this work, we introduce a new logistic regression-based model to classify breast cancer tumor samples based on microarray expression data, including all features of gene expression and without reducing the microarray data matrix. If the user still deems it necessary to perform feature reduction, it can be done after the application of the methodology, still maintaining a good classification. This methodology allowed the correct classification of breast cancer sample data sets from Gene Expression Omnibus (GEO) data series GSE65194, GSE20711, and GSE25055, which contain the microarray data of said breast cancer samples. Classification had a minimum performance of 80% (sensitivity and specificity), and explored all possible data combinations, including breast cancer subtypes. This methodology highlighted genes not yet studied in breast cancer, some of which have been observed in Gene Regulatory Networks (GRNs). In this work we examine the patterns and features of a GRN composed of transcription factors (TFs) in MCF-7 breast cancer cell lines, providing valuable information regarding breast cancer. In particular, some genes whose αi ∗ associated parameter values revealed extreme positive and negative values, and, as such, can be identified as breast cancer prediction genes. We indicate that the PKN2, MKL1, MED23, CUL5 and GLI genes demonstrate a tumor suppressor profile, and that the MTR, ITGA2B, TELO2, MRPL9, MTTL1, WIPI1, KLHL20, PI4KB, FOLR1 and SHC1 genes demonstrate an oncogenic profile. We propose that these may serve as potential breast cancer prediction genes, and should be prioritized for further clinical studies on breast cancer. This new model allows for the assignment of values to the αi ∗ parameters associated with gene expression. It was noted that some αi ∗ parameters are associated with genes previously described as breast cancer biomarkers, as well as other genes not yet studied in relation to this disease.
id UFRN_7e5b700c17f0829d4dee8e03c8a58414
oai_identifier_str oai:https://repositorio.ufrn.br:123456789/28210
network_acronym_str UFRN
network_name_str Repositório Institucional da UFRN
repository_id_str
spelling Morais-Rodrigues, FranciellySilv́erio-Machado, RitaKato, Rodrigo BentesRodrigues, Diego Lucas NeresValdez-Baez, JuanFonseca, VagnerSan, Emmanuel JamesGomes, Lucas Gabriel RodriguesSantos, Roselane Gonçalves dosViana, Marcus Vinicius CanárioDutra, Joyceda Cruz FerrazParise, Mariana Teixeira DornellesParise, DoglasCampos, Frederico F.Souza, Sandro José deOrtega, José MiguelBarh, DebmalyaGhosh, PreetamAzevedo, Vasco A. C.Santos, Marcos A. dos2019-12-18T17:09:08Z2019-12-18T17:09:08Z2019-11-21MORAIS-RODRIGUESA, F.; SILV́ERIO-MACHADO, R.; KATO, R. B.; RODRIGUES, D. L. N.; VALDEZ-BAEZ, J.; FONSECA, V.; SAN, E. J.; GOMES, L. G. R.; SANTOS, R. G.; VIANA, M. V. C.; DUTRA, J. C. F.; PARISE, M. T. D.; PARISE, D.; CAMPOS, F. F.; SOUZA, S. J.; ORTEGA, J. M.; BARH, D.; GHOSH, P.; AZEVEDO, V. A. C.; SANTOS, M. A. Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression. Gene, [s. l.], p. 144168, nov. 2019. Doi: https://doi.org/10.1016/j.gene.2019.144168. Disponível em: https://www.sciencedirect.com/science/article/pii/S0378111919308273#!. Acesso em: 18 dez. 2019.https://repositorio.ufrn.br/jspui/handle/123456789/28210https://doi.org/10.1016/j.gene.2019.144168Tumor classificationSamplesNew logistic regression-based modelGRNTFsMCF-7OncogenicAnalysis of the microarray gene expression for breast cancer progression after the application modified logistic regressioninfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleMethods based around statistics and linear algebra have been increasingly used in attempts to address emerging questions in microarray literature. Microarray technology is a long-used tool in the global analysis of gene expression, allowing for the simultaneous investigation of hundreds or thousands of genes in a sample. It is characterized by a low sample size and a large feature number created a non-square matrix, and by the incomplete rank, that can generate countless more solution in classifiers. To avoid the problem of the ‘curse of dimensionality’ many authors have performed feature selection or reduced the size of data matrix. In this work, we introduce a new logistic regression-based model to classify breast cancer tumor samples based on microarray expression data, including all features of gene expression and without reducing the microarray data matrix. If the user still deems it necessary to perform feature reduction, it can be done after the application of the methodology, still maintaining a good classification. This methodology allowed the correct classification of breast cancer sample data sets from Gene Expression Omnibus (GEO) data series GSE65194, GSE20711, and GSE25055, which contain the microarray data of said breast cancer samples. Classification had a minimum performance of 80% (sensitivity and specificity), and explored all possible data combinations, including breast cancer subtypes. This methodology highlighted genes not yet studied in breast cancer, some of which have been observed in Gene Regulatory Networks (GRNs). In this work we examine the patterns and features of a GRN composed of transcription factors (TFs) in MCF-7 breast cancer cell lines, providing valuable information regarding breast cancer. In particular, some genes whose αi ∗ associated parameter values revealed extreme positive and negative values, and, as such, can be identified as breast cancer prediction genes. We indicate that the PKN2, MKL1, MED23, CUL5 and GLI genes demonstrate a tumor suppressor profile, and that the MTR, ITGA2B, TELO2, MRPL9, MTTL1, WIPI1, KLHL20, PI4KB, FOLR1 and SHC1 genes demonstrate an oncogenic profile. We propose that these may serve as potential breast cancer prediction genes, and should be prioritized for further clinical studies on breast cancer. This new model allows for the assignment of values to the αi ∗ parameters associated with gene expression. It was noted that some αi ∗ parameters are associated with genes previously described as breast cancer biomarkers, as well as other genes not yet studied in relation to this disease.engreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRNinfo:eu-repo/semantics/openAccessORIGINALSandroSouza_ICe_2019_Analysis of the microarray gene.pdfSandroSouza_ICe_2019_Analysis of the microarray gene.pdfSandroSouza_ICe_2019_Analysis of the microarray geneapplication/pdf590970https://repositorio.ufrn.br/bitstream/123456789/28210/1/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf0f2511427e0c0966cd56d19873e89ea4MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81484https://repositorio.ufrn.br/bitstream/123456789/28210/2/license.txte9597aa2854d128fd968be5edc8a28d9MD52TEXTSandroSouza_ICe_2019_Analysis of the microarray gene.pdf.txtSandroSouza_ICe_2019_Analysis of the microarray gene.pdf.txtExtracted texttext/plain46871https://repositorio.ufrn.br/bitstream/123456789/28210/3/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf.txtee6118c5da8c9b2b32228049f156227aMD53THUMBNAILSandroSouza_ICe_2019_Analysis of the microarray gene.pdf.jpgSandroSouza_ICe_2019_Analysis of the microarray gene.pdf.jpgGenerated Thumbnailimage/jpeg1667https://repositorio.ufrn.br/bitstream/123456789/28210/4/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf.jpgb8832c90d3d20cccdca725f5fac55e07MD54123456789/282102021-07-09 19:41:35.128oai:https://repositorio.ufrn.br:123456789/28210Tk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5TRQoKCkJ5IHNpZ25pbmcgYW5kIGRlbGl2ZXJpbmcgdGhpcyBsaWNlbnNlLCBNci4gKGF1dGhvciBvciBjb3B5cmlnaHQgaG9sZGVyKToKCgphKSBHcmFudHMgdGhlIFVuaXZlcnNpZGFkZSBGZWRlcmFsIFJpbyBHcmFuZGUgZG8gTm9ydGUgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgb2YKcmVwcm9kdWNlLCBjb252ZXJ0IChhcyBkZWZpbmVkIGJlbG93KSwgY29tbXVuaWNhdGUgYW5kIC8gb3IKZGlzdHJpYnV0ZSB0aGUgZGVsaXZlcmVkIGRvY3VtZW50IChpbmNsdWRpbmcgYWJzdHJhY3QgLyBhYnN0cmFjdCkgaW4KZGlnaXRhbCBvciBwcmludGVkIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bS4KCmIpIERlY2xhcmVzIHRoYXQgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBpdHMgb3JpZ2luYWwgd29yaywgYW5kIHRoYXQKeW91IGhhdmUgdGhlIHJpZ2h0IHRvIGdyYW50IHRoZSByaWdodHMgY29udGFpbmVkIGluIHRoaXMgbGljZW5zZS4gRGVjbGFyZXMKdGhhdCB0aGUgZGVsaXZlcnkgb2YgdGhlIGRvY3VtZW50IGRvZXMgbm90IGluZnJpbmdlLCBhcyBmYXIgYXMgaXQgaXMKdGhlIHJpZ2h0cyBvZiBhbnkgb3RoZXIgcGVyc29uIG9yIGVudGl0eS4KCmMpIElmIHRoZSBkb2N1bWVudCBkZWxpdmVyZWQgY29udGFpbnMgbWF0ZXJpYWwgd2hpY2ggZG9lcyBub3QKcmlnaHRzLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBvYnRhaW5lZCBhdXRob3JpemF0aW9uIGZyb20gdGhlIGhvbGRlciBvZiB0aGUKY29weXJpZ2h0IHRvIGdyYW50IHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdCB0aGlzIG1hdGVyaWFsIHdob3NlIHJpZ2h0cyBhcmUgb2YKdGhpcmQgcGFydGllcyBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIHJlY29nbml6ZWQgaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgZG9jdW1lbnQgZGVsaXZlcmVkLgoKSWYgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBiYXNlZCBvbiBmdW5kZWQgb3Igc3VwcG9ydGVkIHdvcmsKYnkgYW5vdGhlciBpbnN0aXR1dGlvbiBvdGhlciB0aGFuIHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBmdWxmaWxsZWQgYW55IG9ibGlnYXRpb25zIHJlcXVpcmVkIGJ5IHRoZSByZXNwZWN0aXZlIGFncmVlbWVudCBvciBhZ3JlZW1lbnQuCgpUaGUgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZG8gUmlvIEdyYW5kZSBkbyBOb3J0ZSB3aWxsIGNsZWFybHkgaWRlbnRpZnkgaXRzIG5hbWUgKHMpIGFzIHRoZSBhdXRob3IgKHMpIG9yIGhvbGRlciAocykgb2YgdGhlIGRvY3VtZW50J3MgcmlnaHRzCmRlbGl2ZXJlZCwgYW5kIHdpbGwgbm90IG1ha2UgYW55IGNoYW5nZXMsIG90aGVyIHRoYW4gdGhvc2UgcGVybWl0dGVkIGJ5CnRoaXMgbGljZW5zZQo=Repositório de PublicaçõesPUBhttp://repositorio.ufrn.br/oai/opendoar:2021-07-09T22:41:35Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false
dc.title.pt_BR.fl_str_mv Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression
title Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression
spellingShingle Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression
Morais-Rodrigues, Francielly
Tumor classification
Samples
New logistic regression-based model
GRN
TFs
MCF-7
Oncogenic
title_short Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression
title_full Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression
title_fullStr Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression
title_full_unstemmed Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression
title_sort Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression
author Morais-Rodrigues, Francielly
author_facet Morais-Rodrigues, Francielly
Silv́erio-Machado, Rita
Kato, Rodrigo Bentes
Rodrigues, Diego Lucas Neres
Valdez-Baez, Juan
Fonseca, Vagner
San, Emmanuel James
Gomes, Lucas Gabriel Rodrigues
Santos, Roselane Gonçalves dos
Viana, Marcus Vinicius Canário
Dutra, Joyceda Cruz Ferraz
Parise, Mariana Teixeira Dornelles
Parise, Doglas
Campos, Frederico F.
Souza, Sandro José de
Ortega, José Miguel
Barh, Debmalya
Ghosh, Preetam
Azevedo, Vasco A. C.
Santos, Marcos A. dos
author_role author
author2 Silv́erio-Machado, Rita
Kato, Rodrigo Bentes
Rodrigues, Diego Lucas Neres
Valdez-Baez, Juan
Fonseca, Vagner
San, Emmanuel James
Gomes, Lucas Gabriel Rodrigues
Santos, Roselane Gonçalves dos
Viana, Marcus Vinicius Canário
Dutra, Joyceda Cruz Ferraz
Parise, Mariana Teixeira Dornelles
Parise, Doglas
Campos, Frederico F.
Souza, Sandro José de
Ortega, José Miguel
Barh, Debmalya
Ghosh, Preetam
Azevedo, Vasco A. C.
Santos, Marcos A. dos
author2_role author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Morais-Rodrigues, Francielly
Silv́erio-Machado, Rita
Kato, Rodrigo Bentes
Rodrigues, Diego Lucas Neres
Valdez-Baez, Juan
Fonseca, Vagner
San, Emmanuel James
Gomes, Lucas Gabriel Rodrigues
Santos, Roselane Gonçalves dos
Viana, Marcus Vinicius Canário
Dutra, Joyceda Cruz Ferraz
Parise, Mariana Teixeira Dornelles
Parise, Doglas
Campos, Frederico F.
Souza, Sandro José de
Ortega, José Miguel
Barh, Debmalya
Ghosh, Preetam
Azevedo, Vasco A. C.
Santos, Marcos A. dos
dc.subject.por.fl_str_mv Tumor classification
Samples
New logistic regression-based model
GRN
TFs
MCF-7
Oncogenic
topic Tumor classification
Samples
New logistic regression-based model
GRN
TFs
MCF-7
Oncogenic
description Methods based around statistics and linear algebra have been increasingly used in attempts to address emerging questions in microarray literature. Microarray technology is a long-used tool in the global analysis of gene expression, allowing for the simultaneous investigation of hundreds or thousands of genes in a sample. It is characterized by a low sample size and a large feature number created a non-square matrix, and by the incomplete rank, that can generate countless more solution in classifiers. To avoid the problem of the ‘curse of dimensionality’ many authors have performed feature selection or reduced the size of data matrix. In this work, we introduce a new logistic regression-based model to classify breast cancer tumor samples based on microarray expression data, including all features of gene expression and without reducing the microarray data matrix. If the user still deems it necessary to perform feature reduction, it can be done after the application of the methodology, still maintaining a good classification. This methodology allowed the correct classification of breast cancer sample data sets from Gene Expression Omnibus (GEO) data series GSE65194, GSE20711, and GSE25055, which contain the microarray data of said breast cancer samples. Classification had a minimum performance of 80% (sensitivity and specificity), and explored all possible data combinations, including breast cancer subtypes. This methodology highlighted genes not yet studied in breast cancer, some of which have been observed in Gene Regulatory Networks (GRNs). In this work we examine the patterns and features of a GRN composed of transcription factors (TFs) in MCF-7 breast cancer cell lines, providing valuable information regarding breast cancer. In particular, some genes whose αi ∗ associated parameter values revealed extreme positive and negative values, and, as such, can be identified as breast cancer prediction genes. We indicate that the PKN2, MKL1, MED23, CUL5 and GLI genes demonstrate a tumor suppressor profile, and that the MTR, ITGA2B, TELO2, MRPL9, MTTL1, WIPI1, KLHL20, PI4KB, FOLR1 and SHC1 genes demonstrate an oncogenic profile. We propose that these may serve as potential breast cancer prediction genes, and should be prioritized for further clinical studies on breast cancer. This new model allows for the assignment of values to the αi ∗ parameters associated with gene expression. It was noted that some αi ∗ parameters are associated with genes previously described as breast cancer biomarkers, as well as other genes not yet studied in relation to this disease.
publishDate 2019
dc.date.accessioned.fl_str_mv 2019-12-18T17:09:08Z
dc.date.available.fl_str_mv 2019-12-18T17:09:08Z
dc.date.issued.fl_str_mv 2019-11-21
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.citation.fl_str_mv MORAIS-RODRIGUESA, F.; SILV́ERIO-MACHADO, R.; KATO, R. B.; RODRIGUES, D. L. N.; VALDEZ-BAEZ, J.; FONSECA, V.; SAN, E. J.; GOMES, L. G. R.; SANTOS, R. G.; VIANA, M. V. C.; DUTRA, J. C. F.; PARISE, M. T. D.; PARISE, D.; CAMPOS, F. F.; SOUZA, S. J.; ORTEGA, J. M.; BARH, D.; GHOSH, P.; AZEVEDO, V. A. C.; SANTOS, M. A. Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression. Gene, [s. l.], p. 144168, nov. 2019. Doi: https://doi.org/10.1016/j.gene.2019.144168. Disponível em: https://www.sciencedirect.com/science/article/pii/S0378111919308273#!. Acesso em: 18 dez. 2019.
dc.identifier.uri.fl_str_mv https://repositorio.ufrn.br/jspui/handle/123456789/28210
dc.identifier.doi.none.fl_str_mv https://doi.org/10.1016/j.gene.2019.144168
identifier_str_mv MORAIS-RODRIGUESA, F.; SILV́ERIO-MACHADO, R.; KATO, R. B.; RODRIGUES, D. L. N.; VALDEZ-BAEZ, J.; FONSECA, V.; SAN, E. J.; GOMES, L. G. R.; SANTOS, R. G.; VIANA, M. V. C.; DUTRA, J. C. F.; PARISE, M. T. D.; PARISE, D.; CAMPOS, F. F.; SOUZA, S. J.; ORTEGA, J. M.; BARH, D.; GHOSH, P.; AZEVEDO, V. A. C.; SANTOS, M. A. Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression. Gene, [s. l.], p. 144168, nov. 2019. Doi: https://doi.org/10.1016/j.gene.2019.144168. Disponível em: https://www.sciencedirect.com/science/article/pii/S0378111919308273#!. Acesso em: 18 dez. 2019.
url https://repositorio.ufrn.br/jspui/handle/123456789/28210
https://doi.org/10.1016/j.gene.2019.144168
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFRN
instname:Universidade Federal do Rio Grande do Norte (UFRN)
instacron:UFRN
instname_str Universidade Federal do Rio Grande do Norte (UFRN)
instacron_str UFRN
institution UFRN
reponame_str Repositório Institucional da UFRN
collection Repositório Institucional da UFRN
bitstream.url.fl_str_mv https://repositorio.ufrn.br/bitstream/123456789/28210/1/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf
https://repositorio.ufrn.br/bitstream/123456789/28210/2/license.txt
https://repositorio.ufrn.br/bitstream/123456789/28210/3/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf.txt
https://repositorio.ufrn.br/bitstream/123456789/28210/4/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf.jpg
bitstream.checksum.fl_str_mv 0f2511427e0c0966cd56d19873e89ea4
e9597aa2854d128fd968be5edc8a28d9
ee6118c5da8c9b2b32228049f156227a
b8832c90d3d20cccdca725f5fac55e07
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)
repository.mail.fl_str_mv
_version_ 1814832823863869440