Prediction of alpha helices in proteins using Modified Logistic Regression Model

Detalhes bibliográficos
Autor(a) principal: Carmelina Figueiredo Vieira Leite
Data de Publicação: 2016
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UFMG
Texto Completo: http://hdl.handle.net/1843/33892
Resumo: The advance in proteins secondary structure prediction produces directly impacts on health and biological processes knowledge. Despite the achievements and advances, the prediction of proteins structure remains a challenge. Considering this fact, we propose a de novo method for the prediction of alpha helix. Initially, we created a list of proteins with low identity between them, from the repository Protein Data Bank, using PISCES. Each protein was separated into fragments (of size 9) using the sliding window technique. From the obtained fragments, we classified them into the ones that were 100% a standard type alpha helix, the ones that were not a 100% of the same type of secondary structure. For each fragment, we used a sliding window of size 3 to characterize them. These had a value associated with the occurrence of the alpha helix structure. It was possible to predict the secondary structure group, alpha helix, of an unknown protein/query. To accomplish our goals, we used modified logistic regression and constructed two methods for prediction of these structures. Tests of accuracy and specificity applied to the methods gave results greater than 70%. Unfortunately, the sensitivity did not show good results. One of the methods revealed to be a very promising application for the secondary structure prediction problem, and to a possible usage in other purpose. All methods were implemented in MatLab R2015b (2015)
id UFMG_b0baf415b4222a57daa9711b33b7d469
oai_identifier_str oai:repositorio.ufmg.br:1843/33892
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling Marcos Augusto dos Santoshttp://lattes.cnpq.br/7251716819215153http://lattes.cnpq.br/9779299719144051Carmelina Figueiredo Vieira Leite2020-08-01T20:55:59Z2020-08-01T20:55:59Z2016-08-29http://hdl.handle.net/1843/33892The advance in proteins secondary structure prediction produces directly impacts on health and biological processes knowledge. Despite the achievements and advances, the prediction of proteins structure remains a challenge. Considering this fact, we propose a de novo method for the prediction of alpha helix. Initially, we created a list of proteins with low identity between them, from the repository Protein Data Bank, using PISCES. Each protein was separated into fragments (of size 9) using the sliding window technique. From the obtained fragments, we classified them into the ones that were 100% a standard type alpha helix, the ones that were not a 100% of the same type of secondary structure. For each fragment, we used a sliding window of size 3 to characterize them. These had a value associated with the occurrence of the alpha helix structure. It was possible to predict the secondary structure group, alpha helix, of an unknown protein/query. To accomplish our goals, we used modified logistic regression and constructed two methods for prediction of these structures. Tests of accuracy and specificity applied to the methods gave results greater than 70%. Unfortunately, the sensitivity did not show good results. One of the methods revealed to be a very promising application for the secondary structure prediction problem, and to a possible usage in other purpose. All methods were implemented in MatLab R2015b (2015)O avanço na predição da estrutura secundária de proteínas produz diretamente impactos na saúde e no conhecimento de processos biológicos. Apesar das conquistas e avanços, a predição da estrutura de proteínas continua a ser um desafio. Neste trabalho, nós propomos um método de novo para a predição de alfa hélice. Primeiramente, criamos uma lista de proteínas com baixa identidade entre eles, a partir do Banco de dados Protein Data Bank, utilizando a ferramenta PISCES. Cada proteína foi separada em fragmentos de tamanho (9), utilizando a técnica de janela deslizante. Os fragmentos obtidos foram classificados em aqueles que são 100% alfa hélice do tipo padrão e aquelas que não têm 100% deste tipo de estrutura secundária. Para cada fragmento, utilizamos uma janela deslizante de tamanho 3 para caracterizar cada um. Estes tripletos têm um valor associado com a ocorrência da estrutura α hélice. Com isso, é possível prever a estrutura secundária de uma proteína desconhecida. Para isso, usamos regressão logística modificada e construídos dois métodos de predição. Testes de precisão, especificidade deram origem a resultados superiores a 70%. Infelizmente, a sensibilidade não teve um bom resultado. Um dos métodos criados revelou-se promissor, tanto para este problema quanto para os outros problemas. Todos os métodos foram implementados em Matlab R2015b (2015)CNPq - Conselho Nacional de Desenvolvimento Científico e TecnológicoFAPEMIG - Fundação de Amparo à Pesquisa do Estado de Minas GeraisengUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em BioinformaticaUFMGBrasilICB - INSTITUTO DE CIÊNCIAS BIOLOGICAShttp://creativecommons.org/licenses/by-nc-nd/3.0/pt/info:eu-repo/semantics/openAccessBioinformáticaModelos LogísticosPrevisõesProteínasLogistic regressionPredictionProteinStructurePrediction of alpha helices in proteins using Modified Logistic Regression Modelinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGORIGINALPPGBioinformatica_CarmelinaFigueiredoVieiraLeite_DissertacaoMESTRADO.pdfPPGBioinformatica_CarmelinaFigueiredoVieiraLeite_DissertacaoMESTRADO.pdfapplication/pdf2691070https://repositorio.ufmg.br/bitstream/1843/33892/1/PPGBioinformatica_CarmelinaFigueiredoVieiraLeite_DissertacaoMESTRADO.pdf3a0874965274058015a7dc51059016d0MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufmg.br/bitstream/1843/33892/2/license_rdfcfd6801dba008cb6adbd9838b81582abMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82119https://repositorio.ufmg.br/bitstream/1843/33892/3/license.txt34badce4be7e31e3adb4575ae96af679MD531843/338922020-08-01 17:55:59.698oai:repositorio.ufmg.br:1843/33892TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KCg==Repositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2020-08-01T20:55:59Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv Prediction of alpha helices in proteins using Modified Logistic Regression Model
title Prediction of alpha helices in proteins using Modified Logistic Regression Model
spellingShingle Prediction of alpha helices in proteins using Modified Logistic Regression Model
Carmelina Figueiredo Vieira Leite
Logistic regression
Prediction
Protein
Structure
Bioinformática
Modelos Logísticos
Previsões
Proteínas
title_short Prediction of alpha helices in proteins using Modified Logistic Regression Model
title_full Prediction of alpha helices in proteins using Modified Logistic Regression Model
title_fullStr Prediction of alpha helices in proteins using Modified Logistic Regression Model
title_full_unstemmed Prediction of alpha helices in proteins using Modified Logistic Regression Model
title_sort Prediction of alpha helices in proteins using Modified Logistic Regression Model
author Carmelina Figueiredo Vieira Leite
author_facet Carmelina Figueiredo Vieira Leite
author_role author
dc.contributor.advisor1.fl_str_mv Marcos Augusto dos Santos
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/7251716819215153
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/9779299719144051
dc.contributor.author.fl_str_mv Carmelina Figueiredo Vieira Leite
contributor_str_mv Marcos Augusto dos Santos
dc.subject.por.fl_str_mv Logistic regression
Prediction
Protein
Structure
topic Logistic regression
Prediction
Protein
Structure
Bioinformática
Modelos Logísticos
Previsões
Proteínas
dc.subject.other.pt_BR.fl_str_mv Bioinformática
Modelos Logísticos
Previsões
Proteínas
description The advance in proteins secondary structure prediction produces directly impacts on health and biological processes knowledge. Despite the achievements and advances, the prediction of proteins structure remains a challenge. Considering this fact, we propose a de novo method for the prediction of alpha helix. Initially, we created a list of proteins with low identity between them, from the repository Protein Data Bank, using PISCES. Each protein was separated into fragments (of size 9) using the sliding window technique. From the obtained fragments, we classified them into the ones that were 100% a standard type alpha helix, the ones that were not a 100% of the same type of secondary structure. For each fragment, we used a sliding window of size 3 to characterize them. These had a value associated with the occurrence of the alpha helix structure. It was possible to predict the secondary structure group, alpha helix, of an unknown protein/query. To accomplish our goals, we used modified logistic regression and constructed two methods for prediction of these structures. Tests of accuracy and specificity applied to the methods gave results greater than 70%. Unfortunately, the sensitivity did not show good results. One of the methods revealed to be a very promising application for the secondary structure prediction problem, and to a possible usage in other purpose. All methods were implemented in MatLab R2015b (2015)
publishDate 2016
dc.date.issued.fl_str_mv 2016-08-29
dc.date.accessioned.fl_str_mv 2020-08-01T20:55:59Z
dc.date.available.fl_str_mv 2020-08-01T20:55:59Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1843/33892
url http://hdl.handle.net/1843/33892
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/pt/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/pt/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Bioinformatica
dc.publisher.initials.fl_str_mv UFMG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv ICB - INSTITUTO DE CIÊNCIAS BIOLOGICAS
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br/bitstream/1843/33892/1/PPGBioinformatica_CarmelinaFigueiredoVieiraLeite_DissertacaoMESTRADO.pdf
https://repositorio.ufmg.br/bitstream/1843/33892/2/license_rdf
https://repositorio.ufmg.br/bitstream/1843/33892/3/license.txt
bitstream.checksum.fl_str_mv 3a0874965274058015a7dc51059016d0
cfd6801dba008cb6adbd9838b81582ab
34badce4be7e31e3adb4575ae96af679
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_ 1803589442881978368