Universal Features for the Classification of Coding and Non-coding DNA Sequences
Autor(a) principal: | |
---|---|
Data de Publicação: | 2009 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da FIOCRUZ (ARCA) |
Texto Completo: | https://www.arca.fiocruz.br/handle/icict/30768 |
Resumo: | Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de genômica Funcional e Bioinformática. Rio de Janeiro, RJ, Brasil / Universidade Estadual de santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil. |
id |
CRUZ_a4f6a31d01a517a2ee466568db55bb01 |
---|---|
oai_identifier_str |
oai:www.arca.fiocruz.br:icict/30768 |
network_acronym_str |
CRUZ |
network_name_str |
Repositório Institucional da FIOCRUZ (ARCA) |
repository_id_str |
2135 |
spelling |
Carels, NicolasVidal, RamonFrias, Diego2018-12-26T12:52:26Z2018-12-26T12:52:26Z2009CARELS, Nicolas; VIDAL, Ramon, FRIAS, Diego. Universal Features for the Classification of Coding and Non-coding DNA Sequences. Bioinformatics and Biology Insights, v.3, p.37-49, 2009.1177-9322https://www.arca.fiocruz.br/handle/icict/30768engLibertas Academicagenômicapredição do exonviés de purinarecursos de codificaçãoquadro de leitura abertacodon ancestralgenomicsexon predictionpurine biascoding featuresopen reading frameancestral codonUniversal Features for the Classification of Coding and Non-coding DNA Sequencesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleFundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de genômica Funcional e Bioinformática. Rio de Janeiro, RJ, Brasil / Universidade Estadual de santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil.Universidade Estadual de Santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil.Universidade Estadual de Santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil.In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate 95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding.info:eu-repo/semantics/openAccessreponame:Repositório Institucional da FIOCRUZ (ARCA)instname:Fundação Oswaldo Cruz (FIOCRUZ)instacron:FIOCRUZLICENSElicense.txtlicense.txttext/plain; charset=utf-82991https://www.arca.fiocruz.br/bitstream/icict/30768/1/license.txt5a560609d32a3863062d77ff32785d58MD51ORIGINALnicolas_carels_etal_IOC_2009.pdfnicolas_carels_etal_IOC_2009.pdfapplication/pdf928114https://www.arca.fiocruz.br/bitstream/icict/30768/2/nicolas_carels_etal_IOC_2009.pdf1573365bd0a8a5d063f82e0e8587124fMD52TEXTnicolas_carels_etal_IOC_2009.pdf.txtnicolas_carels_etal_IOC_2009.pdf.txtExtracted texttext/plain41303https://www.arca.fiocruz.br/bitstream/icict/30768/3/nicolas_carels_etal_IOC_2009.pdf.txt5772f4731240d961d261c46e77097f08MD53icict/307682018-12-27 02:01:35.027oai:www.arca.fiocruz.br:icict/30768Q0VTU8ODTyBOw4NPIEVYQ0xVU0lWQSBERSBESVJFSVRPUyBBVVRPUkFJUwoKQW8gYWNlaXRhciBvcyBURVJNT1MgZSBDT05EScOHw5VFUyBkZXN0YSBDRVNTw4NPLCBvIEFVVE9SIGUvb3UgVElUVUxBUiBkZSBkaXJlaXRvcwphdXRvcmFpcyBzb2JyZSBhIE9CUkEgZGUgcXVlIHRyYXRhIGVzdGUgZG9jdW1lbnRvOgoKKDEpIENFREUgZSBUUkFOU0ZFUkUsIHRvdGFsIGUgZ3JhdHVpdGFtZW50ZSwgw6AgRklPQ1JVWiAtIEZVTkRBw4fDg08gT1NXQUxETyBDUlVaLCBlbQpjYXLDoXRlciBwZXJtYW5lbnRlLCBpcnJldm9nw6F2ZWwgZSBOw4NPIEVYQ0xVU0lWTywgdG9kb3Mgb3MgZGlyZWl0b3MgcGF0cmltb25pYWlzIE7Dg08KQ09NRVJDSUFJUyBkZSB1dGlsaXphw6fDo28gZGEgT0JSQSBhcnTDrXN0aWNhIGUvb3UgY2llbnTDrWZpY2EgaW5kaWNhZGEgYWNpbWEsIGluY2x1c2l2ZSBvcyBkaXJlaXRvcwpkZSB2b3ogZSBpbWFnZW0gdmluY3VsYWRvcyDDoCBPQlJBLCBkdXJhbnRlIHRvZG8gbyBwcmF6byBkZSBkdXJhw6fDo28gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBlbQpxdWFscXVlciBpZGlvbWEgZSBlbSB0b2RvcyBvcyBwYcOtc2VzOwoKKDIpIEFDRUlUQSBxdWUgYSBjZXNzw6NvIHRvdGFsIG7Do28gZXhjbHVzaXZhLCBwZXJtYW5lbnRlIGUgaXJyZXZvZ8OhdmVsIGRvcyBkaXJlaXRvcyBhdXRvcmFpcwpwYXRyaW1vbmlhaXMgbsOjbyBjb21lcmNpYWlzIGRlIHV0aWxpemHDp8OjbyBkZSBxdWUgdHJhdGEgZXN0ZSBkb2N1bWVudG8gaW5jbHVpLCBleGVtcGxpZmljYXRpdmFtZW50ZSwKb3MgZGlyZWl0b3MgZGUgZGlzcG9uaWJpbGl6YcOnw6NvIGUgY29tdW5pY2HDp8OjbyBww7pibGljYSBkYSBPQlJBLCBlbSBxdWFscXVlciBtZWlvIG91IHZlw61jdWxvLAppbmNsdXNpdmUgZW0gUmVwb3NpdMOzcmlvcyBEaWdpdGFpcywgYmVtIGNvbW8gb3MgZGlyZWl0b3MgZGUgcmVwcm9kdcOnw6NvLCBleGliacOnw6NvLCBleGVjdcOnw6NvLApkZWNsYW1hw6fDo28sIHJlY2l0YcOnw6NvLCBleHBvc2nDp8OjbywgYXJxdWl2YW1lbnRvLCBpbmNsdXPDo28gZW0gYmFuY28gZGUgZGFkb3MsIHByZXNlcnZhw6fDo28sIGRpZnVzw6NvLApkaXN0cmlidWnDp8OjbywgZGl2dWxnYcOnw6NvLCBlbXByw6lzdGltbywgdHJhZHXDp8OjbywgZHVibGFnZW0sIGxlZ2VuZGFnZW0sIGluY2x1c8OjbyBlbSBub3ZhcyBvYnJhcyBvdQpjb2xldMOibmVhcywgcmV1dGlsaXphw6fDo28sIGVkacOnw6NvLCBwcm9kdcOnw6NvIGRlIG1hdGVyaWFsIGRpZMOhdGljbyBlIGN1cnNvcyBvdSBxdWFscXVlciBmb3JtYSBkZQp1dGlsaXphw6fDo28gbsOjbyBjb21lcmNpYWw7CgooMykgUkVDT05IRUNFIHF1ZSBhIGNlc3PDo28gYXF1aSBlc3BlY2lmaWNhZGEgY29uY2VkZSDDoCBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPCkNSVVogbyBkaXJlaXRvIGRlIGF1dG9yaXphciBxdWFscXVlciBwZXNzb2Eg4oCTIGbDrXNpY2Egb3UganVyw61kaWNhLCBww7pibGljYSBvdSBwcml2YWRhLCBuYWNpb25hbCBvdQplc3RyYW5nZWlyYSDigJMgYSBhY2Vzc2FyIGUgdXRpbGl6YXIgYW1wbGFtZW50ZSBhIE9CUkEsIHNlbSBleGNsdXNpdmlkYWRlLCBwYXJhIHF1YWlzcXVlcgpmaW5hbGlkYWRlcyBuw6NvIGNvbWVyY2lhaXM7CgooNCkgREVDTEFSQSBxdWUgYSBvYnJhIMOpIGNyaWHDp8OjbyBvcmlnaW5hbCBlIHF1ZSDDqSBvIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGFxdWkgY2VkaWRvcyBlIGF1dG9yaXphZG9zLApyZXNwb25zYWJpbGl6YW5kby1zZSBpbnRlZ3JhbG1lbnRlIHBlbG8gY29udGXDumRvIGUgb3V0cm9zIGVsZW1lbnRvcyBxdWUgZmF6ZW0gcGFydGUgZGEgT0JSQSwKaW5jbHVzaXZlIG9zIGRpcmVpdG9zIGRlIHZveiBlIGltYWdlbSB2aW5jdWxhZG9zIMOgIE9CUkEsIG9icmlnYW5kby1zZSBhIGluZGVuaXphciB0ZXJjZWlyb3MgcG9yCmRhbm9zLCBiZW0gY29tbyBpbmRlbml6YXIgZSByZXNzYXJjaXIgYSBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPIENSVVogZGUKZXZlbnR1YWlzIGRlc3Blc2FzIHF1ZSB2aWVyZW0gYSBzdXBvcnRhciwgZW0gcmF6w6NvIGRlIHF1YWxxdWVyIG9mZW5zYSBhIGRpcmVpdG9zIGF1dG9yYWlzIG91CmRpcmVpdG9zIGRlIHZveiBvdSBpbWFnZW0sIHByaW5jaXBhbG1lbnRlIG5vIHF1ZSBkaXogcmVzcGVpdG8gYSBwbMOhZ2lvIGUgdmlvbGHDp8O1ZXMgZGUgZGlyZWl0b3M7CgooNSkgQUZJUk1BIHF1ZSBjb25oZWNlIGEgUG9sw610aWNhIEluc3RpdHVjaW9uYWwgZGUgQWNlc3NvIEFiZXJ0byBkYSBGSU9DUlVaIC0gRlVOREHDh8ODTwpPU1dBTERPIENSVVogZSBhcyBkaXJldHJpemVzIHBhcmEgbyBmdW5jaW9uYW1lbnRvIGRvIHJlcG9zaXTDs3JpbyBpbnN0aXR1Y2lvbmFsIEFSQ0EuCgpBIFBvbMOtdGljYSBJbnN0aXR1Y2lvbmFsIGRlIEFjZXNzbyBBYmVydG8gZGEgRklPQ1JVWiAtIEZVTkRBw4fDg08gT1NXQUxETyBDUlVaIHJlc2VydmEKZXhjbHVzaXZhbWVudGUgYW8gQVVUT1Igb3MgZGlyZWl0b3MgbW9yYWlzIGUgb3MgdXNvcyBjb21lcmNpYWlzIHNvYnJlIGFzIG9icmFzIGRlIHN1YSBhdXRvcmlhCmUvb3UgdGl0dWxhcmlkYWRlLCBzZW5kbyBvcyB0ZXJjZWlyb3MgdXN1w6FyaW9zIHJlc3BvbnPDoXZlaXMgcGVsYSBhdHJpYnVpw6fDo28gZGUgYXV0b3JpYSBlIG1hbnV0ZW7Dp8OjbwpkYSBpbnRlZ3JpZGFkZSBkYSBPQlJBIGVtIHF1YWxxdWVyIHV0aWxpemHDp8Ojby4KCkEgUG9sw610aWNhIEluc3RpdHVjaW9uYWwgZGUgQWNlc3NvIEFiZXJ0byBkYSBGSU9DUlVaIC0gRlVOREHDh8ODTyBPU1dBTERPIENSVVoKcmVzcGVpdGEgb3MgY29udHJhdG9zIGUgYWNvcmRvcyBwcmVleGlzdGVudGVzIGRvcyBBdXRvcmVzIGNvbSB0ZXJjZWlyb3MsIGNhYmVuZG8gYW9zIEF1dG9yZXMKaW5mb3JtYXIgw6AgSW5zdGl0dWnDp8OjbyBhcyBjb25kacOnw7VlcyBlIG91dHJhcyByZXN0cmnDp8O1ZXMgaW1wb3N0YXMgcG9yIGVzdGVzIGluc3RydW1lbnRvcy4KRepositório InstitucionalPUBhttps://www.arca.fiocruz.br/oai/requestrepositorio.arca@fiocruz.bropendoar:21352018-12-27T04:01:35Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)false |
dc.title.pt_BR.fl_str_mv |
Universal Features for the Classification of Coding and Non-coding DNA Sequences |
title |
Universal Features for the Classification of Coding and Non-coding DNA Sequences |
spellingShingle |
Universal Features for the Classification of Coding and Non-coding DNA Sequences Carels, Nicolas genômica predição do exon viés de purina recursos de codificação quadro de leitura aberta codon ancestral genomics exon prediction purine bias coding features open reading frame ancestral codon |
title_short |
Universal Features for the Classification of Coding and Non-coding DNA Sequences |
title_full |
Universal Features for the Classification of Coding and Non-coding DNA Sequences |
title_fullStr |
Universal Features for the Classification of Coding and Non-coding DNA Sequences |
title_full_unstemmed |
Universal Features for the Classification of Coding and Non-coding DNA Sequences |
title_sort |
Universal Features for the Classification of Coding and Non-coding DNA Sequences |
author |
Carels, Nicolas |
author_facet |
Carels, Nicolas Vidal, Ramon Frias, Diego |
author_role |
author |
author2 |
Vidal, Ramon Frias, Diego |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Carels, Nicolas Vidal, Ramon Frias, Diego |
dc.subject.other.pt_BR.fl_str_mv |
genômica predição do exon viés de purina recursos de codificação quadro de leitura aberta codon ancestral |
topic |
genômica predição do exon viés de purina recursos de codificação quadro de leitura aberta codon ancestral genomics exon prediction purine bias coding features open reading frame ancestral codon |
dc.subject.en.pt_BR.fl_str_mv |
genomics exon prediction purine bias coding features open reading frame ancestral codon |
description |
Fundação Oswaldo Cruz. Instituto Oswaldo Cruz. Laboratório de genômica Funcional e Bioinformática. Rio de Janeiro, RJ, Brasil / Universidade Estadual de santa Cruz. Núcleo de Biologia Computacional e Gestão de Informações Biotecnológicas. Ilhéus, BA, Brasil. |
publishDate |
2009 |
dc.date.issued.fl_str_mv |
2009 |
dc.date.accessioned.fl_str_mv |
2018-12-26T12:52:26Z |
dc.date.available.fl_str_mv |
2018-12-26T12:52:26Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
CARELS, Nicolas; VIDAL, Ramon, FRIAS, Diego. Universal Features for the Classification of Coding and Non-coding DNA Sequences. Bioinformatics and Biology Insights, v.3, p.37-49, 2009. |
dc.identifier.uri.fl_str_mv |
https://www.arca.fiocruz.br/handle/icict/30768 |
dc.identifier.issn.pt_BR.fl_str_mv |
1177-9322 |
identifier_str_mv |
CARELS, Nicolas; VIDAL, Ramon, FRIAS, Diego. Universal Features for the Classification of Coding and Non-coding DNA Sequences. Bioinformatics and Biology Insights, v.3, p.37-49, 2009. 1177-9322 |
url |
https://www.arca.fiocruz.br/handle/icict/30768 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Libertas Academica |
publisher.none.fl_str_mv |
Libertas Academica |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da FIOCRUZ (ARCA) instname:Fundação Oswaldo Cruz (FIOCRUZ) instacron:FIOCRUZ |
instname_str |
Fundação Oswaldo Cruz (FIOCRUZ) |
instacron_str |
FIOCRUZ |
institution |
FIOCRUZ |
reponame_str |
Repositório Institucional da FIOCRUZ (ARCA) |
collection |
Repositório Institucional da FIOCRUZ (ARCA) |
bitstream.url.fl_str_mv |
https://www.arca.fiocruz.br/bitstream/icict/30768/1/license.txt https://www.arca.fiocruz.br/bitstream/icict/30768/2/nicolas_carels_etal_IOC_2009.pdf https://www.arca.fiocruz.br/bitstream/icict/30768/3/nicolas_carels_etal_IOC_2009.pdf.txt |
bitstream.checksum.fl_str_mv |
5a560609d32a3863062d77ff32785d58 1573365bd0a8a5d063f82e0e8587124f 5772f4731240d961d261c46e77097f08 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ) |
repository.mail.fl_str_mv |
repositorio.arca@fiocruz.br |
_version_ |
1813009154647785472 |