Universal Features for the Classification of Coding and Non-coding DNA Sequences

Detalhes bibliográficos
Autor(a) principal: Carels, Nicolas
Data de Publicação: 2009
Outros Autores: Vidal, Ramon, Frias, Diego
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da FIOCRUZ (ARCA)
Texto Completo: https://www.arca.fiocruz.br/handle/icict/30768
Resumo: Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de genômica Funcional e Bioinformática. Rio de Janeiro, RJ, Brasil / Universidade Estadual de santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil.
id CRUZ_a4f6a31d01a517a2ee466568db55bb01
oai_identifier_str oai:www.arca.fiocruz.br:icict/30768
network_acronym_str CRUZ
network_name_str Repositório Institucional da FIOCRUZ (ARCA)
repository_id_str 2135
spelling Carels, NicolasVidal, RamonFrias, Diego2018-12-26T12:52:26Z2018-12-26T12:52:26Z2009CARELS, Nicolas; VIDAL, Ramon, FRIAS, Diego. Universal Features for the Classification of Coding and Non-coding DNA Sequences. Bioinformatics and Biology Insights, v.3, p.37-49, 2009.1177-9322https://www.arca.fiocruz.br/handle/icict/30768engLibertas Academicagenômicapredição do exonviés de purinarecursos de codificaçãoquadro de leitura abertacodon ancestralgenomicsexon predictionpurine biascoding featuresopen reading frameancestral codonUniversal Features for the Classification of Coding and Non-coding DNA Sequencesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleFundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de genômica Funcional e Bioinformática. Rio de Janeiro, RJ, Brasil / Universidade Estadual de santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil.Universidade Estadual de Santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil.Universidade Estadual de Santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil.In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate 95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding.info:eu-repo/semantics/openAccessreponame:Repositório Institucional da FIOCRUZ (ARCA)instname:Fundação Oswaldo Cruz (FIOCRUZ)instacron:FIOCRUZLICENSElicense.txtlicense.txttext/plain; charset=utf-82991https://www.arca.fiocruz.br/bitstream/icict/30768/1/license.txt5a560609d32a3863062d77ff32785d58MD51ORIGINALnicolas_carels_etal_IOC_2009.pdfnicolas_carels_etal_IOC_2009.pdfapplication/pdf928114https://www.arca.fiocruz.br/bitstream/icict/30768/2/nicolas_carels_etal_IOC_2009.pdf1573365bd0a8a5d063f82e0e8587124fMD52TEXTnicolas_carels_etal_IOC_2009.pdf.txtnicolas_carels_etal_IOC_2009.pdf.txtExtracted texttext/plain41303https://www.arca.fiocruz.br/bitstream/icict/30768/3/nicolas_carels_etal_IOC_2009.pdf.txt5772f4731240d961d261c46e77097f08MD53icict/307682018-12-27 02:01:35.027oai:www.arca.fiocruz.br:icict/30768Q0VTU8ODTyBOw4NPIEVYQ0xVU0lWQSBERSBESVJFSVRPUyBBVVRPUkFJUwoKQW8gYWNlaXRhciBvcyBURVJNT1MgZSBDT05EScOHw5VFUyBkZXN0YSBDRVNTw4NPLCBvIEFVVE9SIGUvb3UgVElUVUxBUiBkZSBkaXJlaXRvcwphdXRvcmFpcyBzb2JyZSBhIE9CUkEgZGUgcXVlIHRyYXRhIGVzdGUgZG9jdW1lbnRvOgoKKDEpIENFREUgZSBUUkFOU0ZFUkUsIHRvdGFsIGUgZ3JhdHVpdGFtZW50ZSwgw6AgRklPQ1JVWiAtIEZVTkRBw4fDg08gT1NXQUxETyBDUlVaLCBlbQpjYXLDoXRlciBwZXJtYW5lbnRlLCBpcnJldm9nw6F2ZWwgZSBOw4NPIEVYQ0xVU0lWTywgdG9kb3Mgb3MgZGlyZWl0b3MgcGF0cmltb25pYWlzIE7Dg08KQ09NRVJDSUFJUyBkZSB1dGlsaXphw6fDo28gZGEgT0JSQSBhcnTDrXN0aWNhIGUvb3UgY2llbnTDrWZpY2EgaW5kaWNhZGEgYWNpbWEsIGluY2x1c2l2ZSBvcyBkaXJlaXRvcwpkZSB2b3ogZSBpbWFnZW0gdmluY3VsYWRvcyDDoCBPQlJBLCBkdXJhbnRlIHRvZG8gbyBwcmF6byBkZSBkdXJhw6fDo28gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBlbQpxdWFscXVlciBpZGlvbWEgZSBlbSB0b2RvcyBvcyBwYcOtc2VzOwoKKDIpIEFDRUlUQSBxdWUgYSBjZXNzw6NvIHRvdGFsIG7Do28gZXhjbHVzaXZhLCBwZXJtYW5lbnRlIGUgaXJyZXZvZ8OhdmVsIGRvcyBkaXJlaXRvcyBhdXRvcmFpcwpwYXRyaW1vbmlhaXMgbsOjbyBjb21lcmNpYWlzIGRlIHV0aWxpemHDp8OjbyBkZSBxdWUgdHJhdGEgZXN0ZSBkb2N1bWVudG8gaW5jbHVpLCBleGVtcGxpZmljYXRpdmFtZW50ZSwKb3MgZGlyZWl0b3MgZGUgZGlzcG9uaWJpbGl6YcOnw6NvIGUgY29tdW5pY2HDp8OjbyBww7pibGljYSBkYSBPQlJBLCBlbSBxdWFscXVlciBtZWlvIG91IHZlw61jdWxvLAppbmNsdXNpdmUgZW0gUmVwb3NpdMOzcmlvcyBEaWdpdGFpcywgYmVtIGNvbW8gb3MgZGlyZWl0b3MgZGUgcmVwcm9kdcOnw6NvLCBleGliacOnw6NvLCBleGVjdcOnw6NvLApkZWNsYW1hw6fDo28sIHJlY2l0YcOnw6NvLCBleHBvc2nDp8OjbywgYXJxdWl2YW1lbnRvLCBpbmNsdXPDo28gZW0gYmFuY28gZGUgZGFkb3MsIHByZXNlcnZhw6fDo28sIGRpZnVzw6NvLApkaXN0cmlidWnDp8OjbywgZGl2dWxnYcOnw6NvLCBlbXByw6lzdGltbywgdHJhZHXDp8OjbywgZHVibGFnZW0sIGxlZ2VuZGFnZW0sIGluY2x1c8OjbyBlbSBub3ZhcyBvYnJhcyBvdQpjb2xldMOibmVhcywgcmV1dGlsaXphw6fDo28sIGVkacOnw6NvLCBwcm9kdcOnw6NvIGRlIG1hdGVyaWFsIGRpZMOhdGljbyBlIGN1cnNvcyBvdSBxdWFscXVlciBmb3JtYSBkZQp1dGlsaXphw6fDo28gbsOjbyBjb21lcmNpYWw7CgooMykgUkVDT05IRUNFIHF1ZSBhIGNlc3PDo28gYXF1aSBlc3BlY2lmaWNhZGEgY29uY2VkZSDDoCBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPCkNSVVogbyBkaXJlaXRvIGRlIGF1dG9yaXphciBxdWFscXVlciBwZXNzb2Eg4oCTIGbDrXNpY2Egb3UganVyw61kaWNhLCBww7pibGljYSBvdSBwcml2YWRhLCBuYWNpb25hbCBvdQplc3RyYW5nZWlyYSDigJMgYSBhY2Vzc2FyIGUgdXRpbGl6YXIgYW1wbGFtZW50ZSBhIE9CUkEsIHNlbSBleGNsdXNpdmlkYWRlLCBwYXJhIHF1YWlzcXVlcgpmaW5hbGlkYWRlcyBuw6NvIGNvbWVyY2lhaXM7CgooNCkgREVDTEFSQSBxdWUgYSBvYnJhIMOpIGNyaWHDp8OjbyBvcmlnaW5hbCBlIHF1ZSDDqSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGFxdWkgY2VkaWRvcyBlIGF1dG9yaXphZG9zLApyZXNwb25zYWJpbGl6YW5kby1zZSBpbnRlZ3JhbG1lbnRlIHBlbG8gY29udGXDumRvIGUgb3V0cm9zIGVsZW1lbnRvcyBxdWUgZmF6ZW0gcGFydGUgZGEgT0JSQSwKaW5jbHVzaXZlIG9zIGRpcmVpdG9zIGRlIHZveiBlIGltYWdlbSB2aW5jdWxhZG9zIMOgIE9CUkEsIG9icmlnYW5kby1zZSBhIGluZGVuaXphciB0ZXJjZWlyb3MgcG9yCmRhbm9zLCBiZW0gY29tbyBpbmRlbml6YXIgZSByZXNzYXJjaXIgYSBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPIENSVVogZGUKZXZlbnR1YWlzIGRlc3Blc2FzIHF1ZSB2aWVyZW0gYSBzdXBvcnRhciwgZW0gcmF6w6NvIGRlIHF1YWxxdWVyIG9mZW5zYSBhIGRpcmVpdG9zIGF1dG9yYWlzIG91CmRpcmVpdG9zIGRlIHZveiBvdSBpbWFnZW0sIHByaW5jaXBhbG1lbnRlIG5vIHF1ZSBkaXogcmVzcGVpdG8gYSBwbMOhZ2lvIGUgdmlvbGHDp8O1ZXMgZGUgZGlyZWl0b3M7CgooNSkgQUZJUk1BIHF1ZSBjb25oZWNlIGEgUG9sw610aWNhIEluc3RpdHVjaW9uYWwgZGUgQWNlc3NvIEFiZXJ0byBkYSBGSU9DUlVaIC0gRlVOREHDh8ODTwpPU1dBTERPIENSVVogZSBhcyBkaXJldHJpemVzIHBhcmEgbyBmdW5jaW9uYW1lbnRvIGRvIHJlcG9zaXTDs3JpbyBpbnN0aXR1Y2lvbmFsIEFSQ0EuCgpBIFBvbMOtdGljYSBJbnN0aXR1Y2lvbmFsIGRlIEFjZXNzbyBBYmVydG8gZGEgRklPQ1JVWiAtIEZVTkRBw4fDg08gT1NXQUxETyBDUlVaIHJlc2VydmEKZXhjbHVzaXZhbWVudGUgYW8gQVVUT1Igb3MgZGlyZWl0b3MgbW9yYWlzIGUgb3MgdXNvcyBjb21lcmNpYWlzIHNvYnJlIGFzIG9icmFzIGRlIHN1YSBhdXRvcmlhCmUvb3UgdGl0dWxhcmlkYWRlLCBzZW5kbyBvcyB0ZXJjZWlyb3MgdXN1w6FyaW9zIHJlc3BvbnPDoXZlaXMgcGVsYSBhdHJpYnVpw6fDo28gZGUgYXV0b3JpYSBlIG1hbnV0ZW7Dp8OjbwpkYSBpbnRlZ3JpZGFkZSBkYSBPQlJBIGVtIHF1YWxxdWVyIHV0aWxpemHDp8Ojby4KCkEgUG9sw610aWNhIEluc3RpdHVjaW9uYWwgZGUgQWNlc3NvIEFiZXJ0byBkYSBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPIENSVVoKcmVzcGVpdGEgb3MgY29udHJhdG9zIGUgYWNvcmRvcyBwcmVleGlzdGVudGVzIGRvcyBBdXRvcmVzIGNvbSB0ZXJjZWlyb3MsIGNhYmVuZG8gYW9zIEF1dG9yZXMKaW5mb3JtYXIgw6AgSW5zdGl0dWnDp8OjbyBhcyBjb25kacOnw7VlcyBlIG91dHJhcyByZXN0cmnDp8O1ZXMgaW1wb3N0YXMgcG9yIGVzdGVzIGluc3RydW1lbnRvcy4KRepositório InstitucionalPUBhttps://www.arca.fiocruz.br/oai/requestrepositorio.arca@fiocruz.bropendoar:21352018-12-27T04:01:35Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)false
dc.title.pt_BR.fl_str_mv Universal Features for the Classification of Coding and Non-coding DNA Sequences
title Universal Features for the Classification of Coding and Non-coding DNA Sequences
spellingShingle Universal Features for the Classification of Coding and Non-coding DNA Sequences
Carels, Nicolas
genômica
predição do exon
viés de purina
recursos de codificação
quadro de leitura aberta
codon ancestral
genomics
exon prediction
purine bias
coding features
open reading frame
ancestral codon
title_short Universal Features for the Classification of Coding and Non-coding DNA Sequences
title_full Universal Features for the Classification of Coding and Non-coding DNA Sequences
title_fullStr Universal Features for the Classification of Coding and Non-coding DNA Sequences
title_full_unstemmed Universal Features for the Classification of Coding and Non-coding DNA Sequences
title_sort Universal Features for the Classification of Coding and Non-coding DNA Sequences
author Carels, Nicolas
author_facet Carels, Nicolas
Vidal, Ramon
Frias, Diego
author_role author
author2 Vidal, Ramon
Frias, Diego
author2_role author
author
dc.contributor.author.fl_str_mv Carels, Nicolas
Vidal, Ramon
Frias, Diego
dc.subject.other.pt_BR.fl_str_mv genômica
predição do exon
viés de purina
recursos de codificação
quadro de leitura aberta
codon ancestral
topic genômica
predição do exon
viés de purina
recursos de codificação
quadro de leitura aberta
codon ancestral
genomics
exon prediction
purine bias
coding features
open reading frame
ancestral codon
dc.subject.en.pt_BR.fl_str_mv genomics
exon prediction
purine bias
coding features
open reading frame
ancestral codon
description Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de genômica Funcional e Bioinformática. Rio de Janeiro, RJ, Brasil / Universidade Estadual de santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil.
publishDate 2009
dc.date.issued.fl_str_mv 2009
dc.date.accessioned.fl_str_mv 2018-12-26T12:52:26Z
dc.date.available.fl_str_mv 2018-12-26T12:52:26Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.citation.fl_str_mv CARELS, Nicolas; VIDAL, Ramon, FRIAS, Diego. Universal Features for the Classification of Coding and Non-coding DNA Sequences. Bioinformatics and Biology Insights, v.3, p.37-49, 2009.
dc.identifier.uri.fl_str_mv https://www.arca.fiocruz.br/handle/icict/30768
dc.identifier.issn.pt_BR.fl_str_mv 1177-9322
identifier_str_mv CARELS, Nicolas; VIDAL, Ramon, FRIAS, Diego. Universal Features for the Classification of Coding and Non-coding DNA Sequences. Bioinformatics and Biology Insights, v.3, p.37-49, 2009.
1177-9322
url https://www.arca.fiocruz.br/handle/icict/30768
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Libertas Academica
publisher.none.fl_str_mv Libertas Academica
dc.source.none.fl_str_mv reponame:Repositório Institucional da FIOCRUZ (ARCA)
instname:Fundação Oswaldo Cruz (FIOCRUZ)
instacron:FIOCRUZ
instname_str Fundação Oswaldo Cruz (FIOCRUZ)
instacron_str FIOCRUZ
institution FIOCRUZ
reponame_str Repositório Institucional da FIOCRUZ (ARCA)
collection Repositório Institucional da FIOCRUZ (ARCA)
bitstream.url.fl_str_mv https://www.arca.fiocruz.br/bitstream/icict/30768/1/license.txt
https://www.arca.fiocruz.br/bitstream/icict/30768/2/nicolas_carels_etal_IOC_2009.pdf
https://www.arca.fiocruz.br/bitstream/icict/30768/3/nicolas_carels_etal_IOC_2009.pdf.txt
bitstream.checksum.fl_str_mv 5a560609d32a3863062d77ff32785d58
1573365bd0a8a5d063f82e0e8587124f
5772f4731240d961d261c46e77097f08
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)
repository.mail.fl_str_mv repositorio.arca@fiocruz.br
_version_ 1798324937009659904