A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model
Autor(a) principal: | |
---|---|
Data de Publicação: | 2008 |
Outros Autores: | , , , , , , , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UCB |
Texto Completo: | http://twingo.ucb.br:8080/jspui/handle/10869/653 https://repositorio.ucb.br:9443/jspui/handle/123456789/7652 |
Resumo: | A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species. |
id |
UCB-2_fe0e62fd31fc233144735100e79ee873 |
---|---|
oai_identifier_str |
oai:200.214.135.189:123456789/7652 |
network_acronym_str |
UCB-2 |
network_name_str |
Repositório Institucional da UCB |
spelling |
Fernandes, Gabriel da RochaBarbosa, Daniela Vale CamposProsdocimi, FranciscoNeshich, Izabella Agostinho PenaSantos, Lucas SantanaCoelho Júnior, Oto SoaresSilva, Adriano BarbosaMelo, Henrique Velloso FerreiraMudado, Maurício de AlvarengaNatale, Darren A.Campos, Alessandra C. FariaCampos, Sérgio Vale AguiarOrtega, José Miguel2016-10-10T03:52:14Z2016-10-10T03:52:14Z2008FERNANDES, Gabriel da Rocha et. al. A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model. Genetics and Molecular Research, v. 7, p. 925-932, 2008.http://twingo.ucb.br:8080/jspui/handle/10869/653https://repositorio.ucb.br:9443/jspui/handle/123456789/7652A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species.Made available in DSpace on 2016-10-10T03:52:14Z (GMT). No. of bitstreams: 5 A procedure to recruit members to enlarge protein family databases_the building of UECOG.pdf: 887570 bytes, checksum: 093ef6ed5d0cf349fd5d795606ed20ca (MD5) license_url: 52 bytes, checksum: 3d480ae6c91e310daba2020f8787d6f9 (MD5) license_text: 23851 bytes, checksum: 294cb7010cc40c47642971e073de3dba (MD5) license_rdf: 23892 bytes, checksum: afd5dad10b1d1e6dc10c8c5d25222c7a (MD5) license.txt: 1887 bytes, checksum: 445d1980f282ec865917de35a4c622f6 (MD5) Previous issue date: 2008PublicadoTextoCOGSecondary databaseUniRefUniProtUECOGA procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a modelinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleGenetics and Molecular Researchinfo:eu-repo/semantics/openAccessengreponame:Repositório Institucional da UCBinstname:Universidade Católica de Brasília (UCB)instacron:UCBORIGINALA procedure to recruit members to enlarge protein family databases_the building of UECOG.pdfapplication/pdf887570https://200.214.135.178:9443/jspui/bitstream/123456789/7652/1/A%20procedure%20to%20recruit%20members%20to%20enlarge%20protein%20family%20databases_the%20building%20of%20UECOG.pdf093ef6ed5d0cf349fd5d795606ed20caMD51CC-LICENSElicense_urlapplication/octet-stream52https://200.214.135.178:9443/jspui/bitstream/123456789/7652/2/license_url3d480ae6c91e310daba2020f8787d6f9MD52license_textapplication/octet-stream23851https://200.214.135.178:9443/jspui/bitstream/123456789/7652/3/license_text294cb7010cc40c47642971e073de3dbaMD53license_rdfapplication/octet-stream23892https://200.214.135.178:9443/jspui/bitstream/123456789/7652/4/license_rdfafd5dad10b1d1e6dc10c8c5d25222c7aMD54LICENSElicense.txttext/plain1887https://200.214.135.178:9443/jspui/bitstream/123456789/7652/5/license.txt445d1980f282ec865917de35a4c622f6MD55TEXTA procedure to recruit members to enlarge protein family databases_the building of UECOG.pdf.txtA procedure to recruit members to enlarge protein family databases_the building of UECOG.pdf.txtExtracted texttext/plain40733https://200.214.135.178:9443/jspui/bitstream/123456789/7652/6/A%20procedure%20to%20recruit%20members%20to%20enlarge%20protein%20family%20databases_the%20building%20of%20UECOG.pdf.txt709875d06fd96b35d5005749bdb24155MD56123456789/76522017-01-17 15:10:05.939TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkFvIGFzc2luYXIgZSBlbnRyZWdhciBlc3RhIGxpY2Vuw6dhLCBvL2EgU3IuL1NyYS4gKGF1dG9yIG91IGRldGVudG9yCmRvcyBkaXJlaXRvcyBkZSBhdXRvcik6CgphKSBDb25jZWRlIGEgVW5pdmVyc2lkYWRlIENhdMOzbGljYSBkZSBCcmFzw61saWEgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvCiBkZSByZXByb2R1emlyLCBjb252ZXJ0ZXIgKGNvbW8gZGVmaW5pZG8gZW0gYmFpeG8pLGNvbXVuaWNhciBlL291CiBkaXN0cmlidWlyIG8gZG9jdW1lbnRvIGVudHJlZ3VlIChpbmNsdWluZG8gbyByZXN1bW8vYWJzdHJhY3QpIGVtCiBmb3JtYXRvIGRpZ2l0YWwgb3UgaW1wcmVzc28gZSBlbSBxdWFscXVlciBtZWlvLiAKCmIpIERlY2xhcmEgcXVlIG8gZG9jdW1lbnRvIGVudHJlZ3VlIMOpIHNldSB0cmFiYWxobyBvcmlnaW5hbCwgZSBxdWUKIGRldMOpbSBvIGRpcmVpdG8gZGUgY29uY2VkZXJvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gCiBEZWNsYXJhIHRhbWLDqW0gcXVlIGEgZW50cmVnYSBkbyBkb2N1bWVudG8gbsOjbyBpbmZyaW5nZSwgdGFudG8gcXVhbnRvCiBsaGUgw6kgcG9zc8OtdmVsIHNhYmVyLCBvcyBkaXJlaXRvcyBkZSBxdWFscXVlciBvdXRyYSBwZXNzb2Egb3UKIGVudGlkYWRlLiAKCmMpIFNlIG8gZG9jdW1lbnRvIGVudHJlZ3VlIGNvbnTDqW0gbWF0ZXJpYWwgZG8gcXVhbCBuw6NvIGRldMOpbSBvcwogZGlyZWl0b3MgZGUgYXV0b3IsIGRlY2xhcmEgcXVlIG9idGV2ZSBhdXRvcml6YcOnw6NvIGRvIGRldGVudG9yIGRvcwogZGlyZWl0b3MgZGUgYXV0b3IgcGFyYSBjb25jZWRlciBhIFVuaXZlcnNpZGFkZSBDYXTDs2xpY2EgZGUgQnJhc8OtbGlhCiBvcyBkaXJlaXRvcyByZXF1ZXJpZG9zIHBvciBlc3RhIGxpY2Vuw6dhLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGN1am9zCiBkaXJlaXRvcyBzw6NvIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIGlkZW50aWZpY2FkbyBlIHJlY29uaGVjaWRvCiBubyB0ZXh0byBvdSBjb250ZcO6ZG8gZG8gZG9jdW1lbnRvIGVudHJlZ3VlLiAKClNlIG8gZG9jdW1lbnRvIGVudHJlZ3VlIMOpIGJhc2VhZG8gZW0gdHJhYmFsaG8gZmluYW5jaWFkbyBvdSBhcG9pYWRvCiBwb3Igb3V0cmEgaW5zdGl0dWnDp8OjbyBxdWUgbsOjbyBhIFVuaXZlcnNpZGFkZSBDYXTDs2xpY2EgZGUgQnJhc8OtbGlhLAogZGVjbGFyYSBxdWUgY3VtcHJpdSBxdWFpc3F1ZXIgb2JyaWdhw6fDtWVzIGV4aWdpZGFzIHBlbG8gcmVzcGVjdGl2bwogY29udHJhdG8gb3UgYWNvcmRvLiAKCkEgVW5pdmVyc2lkYWRlIENhdMOzbGljYSBkZSBCcmFzw61saWEgaWRlbnRpZmljYXLDoSBjbGFyYW1lbnRlIG8ocykgc2V1CiAodm9zc28pIG5vbWUocykgY29tbyBvKHMpIGF1dG9yKGVzKSBvdSBkZXRlbnRvcihlcylkb3MgZGlyZWl0b3MgZG8KIGRvY3VtZW50byBlbnRyZWd1ZSwgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBwYXJhIGFsw6ltIGRhcwogcGVybWl0aWRhcyBwb3IgZXN0YSBsaWNlbsOnYQoKw4kgbmVjZXNzw6FyaW8gcXVlIGNvbmNvcmRlIGNvbSBhIGxpY2Vuw6dhIGRlIGRpc3RyaWJ1acOnw6NvIG7Do28tZXhjbHVzaXZhLAogYW50ZXMgZG8gc2V1IGRvY3VtZW50byBwb2RlciBhcGFyZWNlciBuYSBSZXBvc2l0w7NyaW8gZGEgVW5pdmVyc2lkYWRlCiBDYXTDs2xpY2EgZGUgQnJhc8OtbGlhLiBQb3IgZmF2b3IsIGxlaWEgYSBsaWNlbsOnYSBhdGVudGFtZW50ZS4gQ2FzbwogcHJldGVuZGEgYWxndW0gZXNjbGFyZWNpbWVudG8gZW50cmUgZW0gY29udGF0byBwb3IgY29ycmVpbyBlbGV0csO0bmljbwogLSBjZGlAdWNiLmJyIG91IHRlbGVmb25lIC0gKDB4eDYxKSAzMzU2LTkwMjkKRepositório de Publicaçõeshttps://repositorio.ucb.br:9443/jspui/ |
dc.title.pt_BR.fl_str_mv |
A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model |
title |
A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model |
spellingShingle |
A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model Fernandes, Gabriel da Rocha COG Secondary database UniRef UniProt UECOG |
title_short |
A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model |
title_full |
A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model |
title_fullStr |
A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model |
title_full_unstemmed |
A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model |
title_sort |
A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model |
author |
Fernandes, Gabriel da Rocha |
author_facet |
Fernandes, Gabriel da Rocha Barbosa, Daniela Vale Campos Prosdocimi, Francisco Neshich, Izabella Agostinho Pena Santos, Lucas Santana Coelho Júnior, Oto Soares Silva, Adriano Barbosa Melo, Henrique Velloso Ferreira Mudado, Maurício de Alvarenga Natale, Darren A. Campos, Alessandra C. Faria Campos, Sérgio Vale Aguiar Ortega, José Miguel |
author_role |
author |
author2 |
Barbosa, Daniela Vale Campos Prosdocimi, Francisco Neshich, Izabella Agostinho Pena Santos, Lucas Santana Coelho Júnior, Oto Soares Silva, Adriano Barbosa Melo, Henrique Velloso Ferreira Mudado, Maurício de Alvarenga Natale, Darren A. Campos, Alessandra C. Faria Campos, Sérgio Vale Aguiar Ortega, José Miguel |
author2_role |
author author author author author author author author author author author author |
dc.contributor.author.fl_str_mv |
Fernandes, Gabriel da Rocha Barbosa, Daniela Vale Campos Prosdocimi, Francisco Neshich, Izabella Agostinho Pena Santos, Lucas Santana Coelho Júnior, Oto Soares Silva, Adriano Barbosa Melo, Henrique Velloso Ferreira Mudado, Maurício de Alvarenga Natale, Darren A. Campos, Alessandra C. Faria Campos, Sérgio Vale Aguiar Ortega, José Miguel |
dc.subject.por.fl_str_mv |
COG Secondary database UniRef UniProt UECOG |
topic |
COG Secondary database UniRef UniProt UECOG |
dc.description.abstract.por.fl_txt_mv |
A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species. |
dc.description.status.pt_BR.fl_txt_mv |
Publicado |
description |
A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species. |
publishDate |
2008 |
dc.date.issued.fl_str_mv |
2008 |
dc.date.accessioned.fl_str_mv |
2016-10-10T03:52:14Z |
dc.date.available.fl_str_mv |
2016-10-10T03:52:14Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
status_str |
publishedVersion |
format |
article |
dc.identifier.citation.fl_str_mv |
FERNANDES, Gabriel da Rocha et. al. A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model. Genetics and Molecular Research, v. 7, p. 925-932, 2008. |
dc.identifier.uri.fl_str_mv |
http://twingo.ucb.br:8080/jspui/handle/10869/653 https://repositorio.ucb.br:9443/jspui/handle/123456789/7652 |
identifier_str_mv |
FERNANDES, Gabriel da Rocha et. al. A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model. Genetics and Molecular Research, v. 7, p. 925-932, 2008. |
url |
http://twingo.ucb.br:8080/jspui/handle/10869/653 https://repositorio.ucb.br:9443/jspui/handle/123456789/7652 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
Texto |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UCB instname:Universidade Católica de Brasília (UCB) instacron:UCB |
instname_str |
Universidade Católica de Brasília (UCB) |
instacron_str |
UCB |
institution |
UCB |
reponame_str |
Repositório Institucional da UCB |
collection |
Repositório Institucional da UCB |
bitstream.url.fl_str_mv |
https://200.214.135.178:9443/jspui/bitstream/123456789/7652/1/A%20procedure%20to%20recruit%20members%20to%20enlarge%20protein%20family%20databases_the%20building%20of%20UECOG.pdf https://200.214.135.178:9443/jspui/bitstream/123456789/7652/2/license_url https://200.214.135.178:9443/jspui/bitstream/123456789/7652/3/license_text https://200.214.135.178:9443/jspui/bitstream/123456789/7652/4/license_rdf https://200.214.135.178:9443/jspui/bitstream/123456789/7652/5/license.txt https://200.214.135.178:9443/jspui/bitstream/123456789/7652/6/A%20procedure%20to%20recruit%20members%20to%20enlarge%20protein%20family%20databases_the%20building%20of%20UECOG.pdf.txt |
bitstream.checksum.fl_str_mv |
093ef6ed5d0cf349fd5d795606ed20ca 3d480ae6c91e310daba2020f8787d6f9 294cb7010cc40c47642971e073de3dba afd5dad10b1d1e6dc10c8c5d25222c7a 445d1980f282ec865917de35a4c622f6 709875d06fd96b35d5005749bdb24155 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
|
repository.mail.fl_str_mv |
|
_version_ |
1724829829979701248 |