Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Outros Autores: | , , , , , , , , , , , , , , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFRN |
Texto Completo: | https://repositorio.ufrn.br/jspui/handle/123456789/28210 https://doi.org/10.1016/j.gene.2019.144168 |
Resumo: | Methods based around statistics and linear algebra have been increasingly used in attempts to address emerging questions in microarray literature. Microarray technology is a long-used tool in the global analysis of gene expression, allowing for the simultaneous investigation of hundreds or thousands of genes in a sample. It is characterized by a low sample size and a large feature number created a non-square matrix, and by the incomplete rank, that can generate countless more solution in classifiers. To avoid the problem of the ‘curse of dimensionality’ many authors have performed feature selection or reduced the size of data matrix. In this work, we introduce a new logistic regression-based model to classify breast cancer tumor samples based on microarray expression data, including all features of gene expression and without reducing the microarray data matrix. If the user still deems it necessary to perform feature reduction, it can be done after the application of the methodology, still maintaining a good classification. This methodology allowed the correct classification of breast cancer sample data sets from Gene Expression Omnibus (GEO) data series GSE65194, GSE20711, and GSE25055, which contain the microarray data of said breast cancer samples. Classification had a minimum performance of 80% (sensitivity and specificity), and explored all possible data combinations, including breast cancer subtypes. This methodology highlighted genes not yet studied in breast cancer, some of which have been observed in Gene Regulatory Networks (GRNs). In this work we examine the patterns and features of a GRN composed of transcription factors (TFs) in MCF-7 breast cancer cell lines, providing valuable information regarding breast cancer. In particular, some genes whose αi ∗ associated parameter values revealed extreme positive and negative values, and, as such, can be identified as breast cancer prediction genes. We indicate that the PKN2, MKL1, MED23, CUL5 and GLI genes demonstrate a tumor suppressor profile, and that the MTR, ITGA2B, TELO2, MRPL9, MTTL1, WIPI1, KLHL20, PI4KB, FOLR1 and SHC1 genes demonstrate an oncogenic profile. We propose that these may serve as potential breast cancer prediction genes, and should be prioritized for further clinical studies on breast cancer. This new model allows for the assignment of values to the αi ∗ parameters associated with gene expression. It was noted that some αi ∗ parameters are associated with genes previously described as breast cancer biomarkers, as well as other genes not yet studied in relation to this disease. |
id |
UFRN_7e5b700c17f0829d4dee8e03c8a58414 |
---|---|
oai_identifier_str |
oai:https://repositorio.ufrn.br:123456789/28210 |
network_acronym_str |
UFRN |
network_name_str |
Repositório Institucional da UFRN |
repository_id_str |
|
spelling |
Morais-Rodrigues, FranciellySilv́erio-Machado, RitaKato, Rodrigo BentesRodrigues, Diego Lucas NeresValdez-Baez, JuanFonseca, VagnerSan, Emmanuel JamesGomes, Lucas Gabriel RodriguesSantos, Roselane Gonçalves dosViana, Marcus Vinicius CanárioDutra, Joyceda Cruz FerrazParise, Mariana Teixeira DornellesParise, DoglasCampos, Frederico F.Souza, Sandro José deOrtega, José MiguelBarh, DebmalyaGhosh, PreetamAzevedo, Vasco A. C.Santos, Marcos A. dos2019-12-18T17:09:08Z2019-12-18T17:09:08Z2019-11-21MORAIS-RODRIGUESA, F.; SILV́ERIO-MACHADO, R.; KATO, R. B.; RODRIGUES, D. L. N.; VALDEZ-BAEZ, J.; FONSECA, V.; SAN, E. J.; GOMES, L. G. R.; SANTOS, R. G.; VIANA, M. V. C.; DUTRA, J. C. F.; PARISE, M. T. D.; PARISE, D.; CAMPOS, F. F.; SOUZA, S. J.; ORTEGA, J. M.; BARH, D.; GHOSH, P.; AZEVEDO, V. A. C.; SANTOS, M. A. Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression. Gene, [s. l.], p. 144168, nov. 2019. Doi: https://doi.org/10.1016/j.gene.2019.144168. Disponível em: https://www.sciencedirect.com/science/article/pii/S0378111919308273#!. Acesso em: 18 dez. 2019.https://repositorio.ufrn.br/jspui/handle/123456789/28210https://doi.org/10.1016/j.gene.2019.144168Tumor classificationSamplesNew logistic regression-based modelGRNTFsMCF-7OncogenicAnalysis of the microarray gene expression for breast cancer progression after the application modified logistic regressioninfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleMethods based around statistics and linear algebra have been increasingly used in attempts to address emerging questions in microarray literature. Microarray technology is a long-used tool in the global analysis of gene expression, allowing for the simultaneous investigation of hundreds or thousands of genes in a sample. It is characterized by a low sample size and a large feature number created a non-square matrix, and by the incomplete rank, that can generate countless more solution in classifiers. To avoid the problem of the ‘curse of dimensionality’ many authors have performed feature selection or reduced the size of data matrix. In this work, we introduce a new logistic regression-based model to classify breast cancer tumor samples based on microarray expression data, including all features of gene expression and without reducing the microarray data matrix. If the user still deems it necessary to perform feature reduction, it can be done after the application of the methodology, still maintaining a good classification. This methodology allowed the correct classification of breast cancer sample data sets from Gene Expression Omnibus (GEO) data series GSE65194, GSE20711, and GSE25055, which contain the microarray data of said breast cancer samples. Classification had a minimum performance of 80% (sensitivity and specificity), and explored all possible data combinations, including breast cancer subtypes. This methodology highlighted genes not yet studied in breast cancer, some of which have been observed in Gene Regulatory Networks (GRNs). In this work we examine the patterns and features of a GRN composed of transcription factors (TFs) in MCF-7 breast cancer cell lines, providing valuable information regarding breast cancer. In particular, some genes whose αi ∗ associated parameter values revealed extreme positive and negative values, and, as such, can be identified as breast cancer prediction genes. We indicate that the PKN2, MKL1, MED23, CUL5 and GLI genes demonstrate a tumor suppressor profile, and that the MTR, ITGA2B, TELO2, MRPL9, MTTL1, WIPI1, KLHL20, PI4KB, FOLR1 and SHC1 genes demonstrate an oncogenic profile. We propose that these may serve as potential breast cancer prediction genes, and should be prioritized for further clinical studies on breast cancer. This new model allows for the assignment of values to the αi ∗ parameters associated with gene expression. It was noted that some αi ∗ parameters are associated with genes previously described as breast cancer biomarkers, as well as other genes not yet studied in relation to this disease.engreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRNinfo:eu-repo/semantics/openAccessORIGINALSandroSouza_ICe_2019_Analysis of the microarray gene.pdfSandroSouza_ICe_2019_Analysis of the microarray gene.pdfSandroSouza_ICe_2019_Analysis of the microarray geneapplication/pdf590970https://repositorio.ufrn.br/bitstream/123456789/28210/1/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf0f2511427e0c0966cd56d19873e89ea4MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81484https://repositorio.ufrn.br/bitstream/123456789/28210/2/license.txte9597aa2854d128fd968be5edc8a28d9MD52TEXTSandroSouza_ICe_2019_Analysis of the microarray gene.pdf.txtSandroSouza_ICe_2019_Analysis of the microarray gene.pdf.txtExtracted texttext/plain46871https://repositorio.ufrn.br/bitstream/123456789/28210/3/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf.txtee6118c5da8c9b2b32228049f156227aMD53THUMBNAILSandroSouza_ICe_2019_Analysis of the microarray gene.pdf.jpgSandroSouza_ICe_2019_Analysis of the microarray gene.pdf.jpgGenerated Thumbnailimage/jpeg1667https://repositorio.ufrn.br/bitstream/123456789/28210/4/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf.jpgb8832c90d3d20cccdca725f5fac55e07MD54123456789/282102021-07-09 19:41:35.128oai:https://repositorio.ufrn.br:123456789/28210Tk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5TRQoKCkJ5IHNpZ25pbmcgYW5kIGRlbGl2ZXJpbmcgdGhpcyBsaWNlbnNlLCBNci4gKGF1dGhvciBvciBjb3B5cmlnaHQgaG9sZGVyKToKCgphKSBHcmFudHMgdGhlIFVuaXZlcnNpZGFkZSBGZWRlcmFsIFJpbyBHcmFuZGUgZG8gTm9ydGUgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgb2YKcmVwcm9kdWNlLCBjb252ZXJ0IChhcyBkZWZpbmVkIGJlbG93KSwgY29tbXVuaWNhdGUgYW5kIC8gb3IKZGlzdHJpYnV0ZSB0aGUgZGVsaXZlcmVkIGRvY3VtZW50IChpbmNsdWRpbmcgYWJzdHJhY3QgLyBhYnN0cmFjdCkgaW4KZGlnaXRhbCBvciBwcmludGVkIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bS4KCmIpIERlY2xhcmVzIHRoYXQgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBpdHMgb3JpZ2luYWwgd29yaywgYW5kIHRoYXQKeW91IGhhdmUgdGhlIHJpZ2h0IHRvIGdyYW50IHRoZSByaWdodHMgY29udGFpbmVkIGluIHRoaXMgbGljZW5zZS4gRGVjbGFyZXMKdGhhdCB0aGUgZGVsaXZlcnkgb2YgdGhlIGRvY3VtZW50IGRvZXMgbm90IGluZnJpbmdlLCBhcyBmYXIgYXMgaXQgaXMKdGhlIHJpZ2h0cyBvZiBhbnkgb3RoZXIgcGVyc29uIG9yIGVudGl0eS4KCmMpIElmIHRoZSBkb2N1bWVudCBkZWxpdmVyZWQgY29udGFpbnMgbWF0ZXJpYWwgd2hpY2ggZG9lcyBub3QKcmlnaHRzLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBvYnRhaW5lZCBhdXRob3JpemF0aW9uIGZyb20gdGhlIGhvbGRlciBvZiB0aGUKY29weXJpZ2h0IHRvIGdyYW50IHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdCB0aGlzIG1hdGVyaWFsIHdob3NlIHJpZ2h0cyBhcmUgb2YKdGhpcmQgcGFydGllcyBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIHJlY29nbml6ZWQgaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgZG9jdW1lbnQgZGVsaXZlcmVkLgoKSWYgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBiYXNlZCBvbiBmdW5kZWQgb3Igc3VwcG9ydGVkIHdvcmsKYnkgYW5vdGhlciBpbnN0aXR1dGlvbiBvdGhlciB0aGFuIHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBmdWxmaWxsZWQgYW55IG9ibGlnYXRpb25zIHJlcXVpcmVkIGJ5IHRoZSByZXNwZWN0aXZlIGFncmVlbWVudCBvciBhZ3JlZW1lbnQuCgpUaGUgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZG8gUmlvIEdyYW5kZSBkbyBOb3J0ZSB3aWxsIGNsZWFybHkgaWRlbnRpZnkgaXRzIG5hbWUgKHMpIGFzIHRoZSBhdXRob3IgKHMpIG9yIGhvbGRlciAocykgb2YgdGhlIGRvY3VtZW50J3MgcmlnaHRzCmRlbGl2ZXJlZCwgYW5kIHdpbGwgbm90IG1ha2UgYW55IGNoYW5nZXMsIG90aGVyIHRoYW4gdGhvc2UgcGVybWl0dGVkIGJ5CnRoaXMgbGljZW5zZQo=Repositório de PublicaçõesPUBhttp://repositorio.ufrn.br/oai/opendoar:2021-07-09T22:41:35Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false |
dc.title.pt_BR.fl_str_mv |
Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression |
title |
Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression |
spellingShingle |
Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression Morais-Rodrigues, Francielly Tumor classification Samples New logistic regression-based model GRN TFs MCF-7 Oncogenic |
title_short |
Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression |
title_full |
Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression |
title_fullStr |
Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression |
title_full_unstemmed |
Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression |
title_sort |
Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression |
author |
Morais-Rodrigues, Francielly |
author_facet |
Morais-Rodrigues, Francielly Silv́erio-Machado, Rita Kato, Rodrigo Bentes Rodrigues, Diego Lucas Neres Valdez-Baez, Juan Fonseca, Vagner San, Emmanuel James Gomes, Lucas Gabriel Rodrigues Santos, Roselane Gonçalves dos Viana, Marcus Vinicius Canário Dutra, Joyceda Cruz Ferraz Parise, Mariana Teixeira Dornelles Parise, Doglas Campos, Frederico F. Souza, Sandro José de Ortega, José Miguel Barh, Debmalya Ghosh, Preetam Azevedo, Vasco A. C. Santos, Marcos A. dos |
author_role |
author |
author2 |
Silv́erio-Machado, Rita Kato, Rodrigo Bentes Rodrigues, Diego Lucas Neres Valdez-Baez, Juan Fonseca, Vagner San, Emmanuel James Gomes, Lucas Gabriel Rodrigues Santos, Roselane Gonçalves dos Viana, Marcus Vinicius Canário Dutra, Joyceda Cruz Ferraz Parise, Mariana Teixeira Dornelles Parise, Doglas Campos, Frederico F. Souza, Sandro José de Ortega, José Miguel Barh, Debmalya Ghosh, Preetam Azevedo, Vasco A. C. Santos, Marcos A. dos |
author2_role |
author author author author author author author author author author author author author author author author author author author |
dc.contributor.author.fl_str_mv |
Morais-Rodrigues, Francielly Silv́erio-Machado, Rita Kato, Rodrigo Bentes Rodrigues, Diego Lucas Neres Valdez-Baez, Juan Fonseca, Vagner San, Emmanuel James Gomes, Lucas Gabriel Rodrigues Santos, Roselane Gonçalves dos Viana, Marcus Vinicius Canário Dutra, Joyceda Cruz Ferraz Parise, Mariana Teixeira Dornelles Parise, Doglas Campos, Frederico F. Souza, Sandro José de Ortega, José Miguel Barh, Debmalya Ghosh, Preetam Azevedo, Vasco A. C. Santos, Marcos A. dos |
dc.subject.por.fl_str_mv |
Tumor classification Samples New logistic regression-based model GRN TFs MCF-7 Oncogenic |
topic |
Tumor classification Samples New logistic regression-based model GRN TFs MCF-7 Oncogenic |
description |
Methods based around statistics and linear algebra have been increasingly used in attempts to address emerging questions in microarray literature. Microarray technology is a long-used tool in the global analysis of gene expression, allowing for the simultaneous investigation of hundreds or thousands of genes in a sample. It is characterized by a low sample size and a large feature number created a non-square matrix, and by the incomplete rank, that can generate countless more solution in classifiers. To avoid the problem of the ‘curse of dimensionality’ many authors have performed feature selection or reduced the size of data matrix. In this work, we introduce a new logistic regression-based model to classify breast cancer tumor samples based on microarray expression data, including all features of gene expression and without reducing the microarray data matrix. If the user still deems it necessary to perform feature reduction, it can be done after the application of the methodology, still maintaining a good classification. This methodology allowed the correct classification of breast cancer sample data sets from Gene Expression Omnibus (GEO) data series GSE65194, GSE20711, and GSE25055, which contain the microarray data of said breast cancer samples. Classification had a minimum performance of 80% (sensitivity and specificity), and explored all possible data combinations, including breast cancer subtypes. This methodology highlighted genes not yet studied in breast cancer, some of which have been observed in Gene Regulatory Networks (GRNs). In this work we examine the patterns and features of a GRN composed of transcription factors (TFs) in MCF-7 breast cancer cell lines, providing valuable information regarding breast cancer. In particular, some genes whose αi ∗ associated parameter values revealed extreme positive and negative values, and, as such, can be identified as breast cancer prediction genes. We indicate that the PKN2, MKL1, MED23, CUL5 and GLI genes demonstrate a tumor suppressor profile, and that the MTR, ITGA2B, TELO2, MRPL9, MTTL1, WIPI1, KLHL20, PI4KB, FOLR1 and SHC1 genes demonstrate an oncogenic profile. We propose that these may serve as potential breast cancer prediction genes, and should be prioritized for further clinical studies on breast cancer. This new model allows for the assignment of values to the αi ∗ parameters associated with gene expression. It was noted that some αi ∗ parameters are associated with genes previously described as breast cancer biomarkers, as well as other genes not yet studied in relation to this disease. |
publishDate |
2019 |
dc.date.accessioned.fl_str_mv |
2019-12-18T17:09:08Z |
dc.date.available.fl_str_mv |
2019-12-18T17:09:08Z |
dc.date.issued.fl_str_mv |
2019-11-21 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
MORAIS-RODRIGUESA, F.; SILV́ERIO-MACHADO, R.; KATO, R. B.; RODRIGUES, D. L. N.; VALDEZ-BAEZ, J.; FONSECA, V.; SAN, E. J.; GOMES, L. G. R.; SANTOS, R. G.; VIANA, M. V. C.; DUTRA, J. C. F.; PARISE, M. T. D.; PARISE, D.; CAMPOS, F. F.; SOUZA, S. J.; ORTEGA, J. M.; BARH, D.; GHOSH, P.; AZEVEDO, V. A. C.; SANTOS, M. A. Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression. Gene, [s. l.], p. 144168, nov. 2019. Doi: https://doi.org/10.1016/j.gene.2019.144168. Disponível em: https://www.sciencedirect.com/science/article/pii/S0378111919308273#!. Acesso em: 18 dez. 2019. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufrn.br/jspui/handle/123456789/28210 |
dc.identifier.doi.none.fl_str_mv |
https://doi.org/10.1016/j.gene.2019.144168 |
identifier_str_mv |
MORAIS-RODRIGUESA, F.; SILV́ERIO-MACHADO, R.; KATO, R. B.; RODRIGUES, D. L. N.; VALDEZ-BAEZ, J.; FONSECA, V.; SAN, E. J.; GOMES, L. G. R.; SANTOS, R. G.; VIANA, M. V. C.; DUTRA, J. C. F.; PARISE, M. T. D.; PARISE, D.; CAMPOS, F. F.; SOUZA, S. J.; ORTEGA, J. M.; BARH, D.; GHOSH, P.; AZEVEDO, V. A. C.; SANTOS, M. A. Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression. Gene, [s. l.], p. 144168, nov. 2019. Doi: https://doi.org/10.1016/j.gene.2019.144168. Disponível em: https://www.sciencedirect.com/science/article/pii/S0378111919308273#!. Acesso em: 18 dez. 2019. |
url |
https://repositorio.ufrn.br/jspui/handle/123456789/28210 https://doi.org/10.1016/j.gene.2019.144168 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRN instname:Universidade Federal do Rio Grande do Norte (UFRN) instacron:UFRN |
instname_str |
Universidade Federal do Rio Grande do Norte (UFRN) |
instacron_str |
UFRN |
institution |
UFRN |
reponame_str |
Repositório Institucional da UFRN |
collection |
Repositório Institucional da UFRN |
bitstream.url.fl_str_mv |
https://repositorio.ufrn.br/bitstream/123456789/28210/1/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf https://repositorio.ufrn.br/bitstream/123456789/28210/2/license.txt https://repositorio.ufrn.br/bitstream/123456789/28210/3/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf.txt https://repositorio.ufrn.br/bitstream/123456789/28210/4/SandroSouza_ICe_2019_Analysis%20of%20the%20microarray%20gene.pdf.jpg |
bitstream.checksum.fl_str_mv |
0f2511427e0c0966cd56d19873e89ea4 e9597aa2854d128fd968be5edc8a28d9 ee6118c5da8c9b2b32228049f156227a b8832c90d3d20cccdca725f5fac55e07 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN) |
repository.mail.fl_str_mv |
|
_version_ |
1814832823863869440 |