A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model

Detalhes bibliográficos
Autor(a) principal: Fernandes, Gabriel da Rocha
Data de Publicação: 2008
Outros Autores: Barbosa, Daniela Vale Campos, Prosdocimi, Francisco, Neshich, Izabella Agostinho Pena, Santos, Lucas Santana, Coelho Júnior, Oto Soares, Silva, Adriano Barbosa, Melo, Henrique Velloso Ferreira, Mudado, Maurício de Alvarenga, Natale, Darren A., Campos, Alessandra C. Faria, Campos, Sérgio Vale Aguiar, Ortega, José Miguel
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UCB
Texto Completo: http://twingo.ucb.br:8080/jspui/handle/10869/653
https://repositorio.ucb.br:9443/jspui/handle/123456789/7652
Resumo: A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species.
id UCB-2_fe0e62fd31fc233144735100e79ee873
oai_identifier_str oai:200.214.135.189:123456789/7652
network_acronym_str UCB-2
network_name_str Repositório Institucional da UCB
spelling Fernandes, Gabriel da RochaBarbosa, Daniela Vale CamposProsdocimi, FranciscoNeshich, Izabella Agostinho PenaSantos, Lucas SantanaCoelho Júnior, Oto SoaresSilva, Adriano BarbosaMelo, Henrique Velloso FerreiraMudado, Maurício de AlvarengaNatale, Darren A.Campos, Alessandra C. FariaCampos, Sérgio Vale AguiarOrtega, José Miguel2016-10-10T03:52:14Z2016-10-10T03:52:14Z2008FERNANDES, Gabriel da Rocha et. al. A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model. Genetics and Molecular Research, v. 7, p. 925-932, 2008.http://twingo.ucb.br:8080/jspui/handle/10869/653https://repositorio.ucb.br:9443/jspui/handle/123456789/7652A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species.Made available in DSpace on 2016-10-10T03:52:14Z (GMT). No. of bitstreams: 5 A procedure to recruit members to enlarge protein family databases_the building of UECOG.pdf: 887570 bytes, checksum: 093ef6ed5d0cf349fd5d795606ed20ca (MD5) license_url: 52 bytes, checksum: 3d480ae6c91e310daba2020f8787d6f9 (MD5) license_text: 23851 bytes, checksum: 294cb7010cc40c47642971e073de3dba (MD5) license_rdf: 23892 bytes, checksum: afd5dad10b1d1e6dc10c8c5d25222c7a (MD5) license.txt: 1887 bytes, checksum: 445d1980f282ec865917de35a4c622f6 (MD5) Previous issue date: 2008PublicadoTextoCOGSecondary databaseUniRefUniProtUECOGA procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a modelinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleGenetics and Molecular Researchinfo:eu-repo/semantics/openAccessengreponame:Repositório Institucional da UCBinstname:Universidade Católica de Brasília (UCB)instacron:UCBORIGINALA procedure to recruit members to enlarge protein family databases_the building of UECOG.pdfapplication/pdf887570https://200.214.135.178:9443/jspui/bitstream/123456789/7652/1/A%20procedure%20to%20recruit%20members%20to%20enlarge%20protein%20family%20databases_the%20building%20of%20UECOG.pdf093ef6ed5d0cf349fd5d795606ed20caMD51CC-LICENSElicense_urlapplication/octet-stream52https://200.214.135.178:9443/jspui/bitstream/123456789/7652/2/license_url3d480ae6c91e310daba2020f8787d6f9MD52license_textapplication/octet-stream23851https://200.214.135.178:9443/jspui/bitstream/123456789/7652/3/license_text294cb7010cc40c47642971e073de3dbaMD53license_rdfapplication/octet-stream23892https://200.214.135.178:9443/jspui/bitstream/123456789/7652/4/license_rdfafd5dad10b1d1e6dc10c8c5d25222c7aMD54LICENSElicense.txttext/plain1887https://200.214.135.178:9443/jspui/bitstream/123456789/7652/5/license.txt445d1980f282ec865917de35a4c622f6MD55TEXTA procedure to recruit members to enlarge protein family databases_the building of UECOG.pdf.txtA procedure to recruit members to enlarge protein family databases_the building of UECOG.pdf.txtExtracted texttext/plain40733https://200.214.135.178:9443/jspui/bitstream/123456789/7652/6/A%20procedure%20to%20recruit%20members%20to%20enlarge%20protein%20family%20databases_the%20building%20of%20UECOG.pdf.txt709875d06fd96b35d5005749bdb24155MD56123456789/76522017-01-17 15:10:05.939TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkFvIGFzc2luYXIgZSBlbnRyZWdhciBlc3RhIGxpY2Vuw6dhLCBvL2EgU3IuL1NyYS4gKGF1dG9yIG91IGRldGVudG9yCmRvcyBkaXJlaXRvcyBkZSBhdXRvcik6CgphKSBDb25jZWRlIGEgVW5pdmVyc2lkYWRlIENhdMOzbGljYSBkZSBCcmFzw61saWEgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvCiBkZSByZXByb2R1emlyLCBjb252ZXJ0ZXIgKGNvbW8gZGVmaW5pZG8gZW0gYmFpeG8pLGNvbXVuaWNhciBlL291CiBkaXN0cmlidWlyIG8gZG9jdW1lbnRvIGVudHJlZ3VlIChpbmNsdWluZG8gbyByZXN1bW8vYWJzdHJhY3QpIGVtCiBmb3JtYXRvIGRpZ2l0YWwgb3UgaW1wcmVzc28gZSBlbSBxdWFscXVlciBtZWlvLiAKCmIpIERlY2xhcmEgcXVlIG8gZG9jdW1lbnRvIGVudHJlZ3VlIMOpIHNldSB0cmFiYWxobyBvcmlnaW5hbCwgZSBxdWUKIGRldMOpbSBvIGRpcmVpdG8gZGUgY29uY2VkZXJvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gCiBEZWNsYXJhIHRhbWLDqW0gcXVlIGEgZW50cmVnYSBkbyBkb2N1bWVudG8gbsOjbyBpbmZyaW5nZSwgdGFudG8gcXVhbnRvCiBsaGUgw6kgcG9zc8OtdmVsIHNhYmVyLCBvcyBkaXJlaXRvcyBkZSBxdWFscXVlciBvdXRyYSBwZXNzb2Egb3UKIGVudGlkYWRlLiAKCmMpIFNlIG8gZG9jdW1lbnRvIGVudHJlZ3VlIGNvbnTDqW0gbWF0ZXJpYWwgZG8gcXVhbCBuw6NvIGRldMOpbSBvcwogZGlyZWl0b3MgZGUgYXV0b3IsIGRlY2xhcmEgcXVlIG9idGV2ZSBhdXRvcml6YcOnw6NvIGRvIGRldGVudG9yIGRvcwogZGlyZWl0b3MgZGUgYXV0b3IgcGFyYSBjb25jZWRlciBhIFVuaXZlcnNpZGFkZSBDYXTDs2xpY2EgZGUgQnJhc8OtbGlhCiBvcyBkaXJlaXRvcyByZXF1ZXJpZG9zIHBvciBlc3RhIGxpY2Vuw6dhLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGN1am9zCiBkaXJlaXRvcyBzw6NvIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIGlkZW50aWZpY2FkbyBlIHJlY29uaGVjaWRvCiBubyB0ZXh0byBvdSBjb250ZcO6ZG8gZG8gZG9jdW1lbnRvIGVudHJlZ3VlLiAKClNlIG8gZG9jdW1lbnRvIGVudHJlZ3VlIMOpIGJhc2VhZG8gZW0gdHJhYmFsaG8gZmluYW5jaWFkbyBvdSBhcG9pYWRvCiBwb3Igb3V0cmEgaW5zdGl0dWnDp8OjbyBxdWUgbsOjbyBhIFVuaXZlcnNpZGFkZSBDYXTDs2xpY2EgZGUgQnJhc8OtbGlhLAogZGVjbGFyYSBxdWUgY3VtcHJpdSBxdWFpc3F1ZXIgb2JyaWdhw6fDtWVzIGV4aWdpZGFzIHBlbG8gcmVzcGVjdGl2bwogY29udHJhdG8gb3UgYWNvcmRvLiAKCkEgVW5pdmVyc2lkYWRlIENhdMOzbGljYSBkZSBCcmFzw61saWEgaWRlbnRpZmljYXLDoSBjbGFyYW1lbnRlIG8ocykgc2V1CiAodm9zc28pIG5vbWUocykgY29tbyBvKHMpIGF1dG9yKGVzKSBvdSBkZXRlbnRvcihlcylkb3MgZGlyZWl0b3MgZG8KIGRvY3VtZW50byBlbnRyZWd1ZSwgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBwYXJhIGFsw6ltIGRhcwogcGVybWl0aWRhcyBwb3IgZXN0YSBsaWNlbsOnYQoKw4kgbmVjZXNzw6FyaW8gcXVlIGNvbmNvcmRlIGNvbSBhIGxpY2Vuw6dhIGRlIGRpc3RyaWJ1acOnw6NvIG7Do28tZXhjbHVzaXZhLAogYW50ZXMgZG8gc2V1IGRvY3VtZW50byBwb2RlciBhcGFyZWNlciBuYSBSZXBvc2l0w7NyaW8gZGEgVW5pdmVyc2lkYWRlCiBDYXTDs2xpY2EgZGUgQnJhc8OtbGlhLiBQb3IgZmF2b3IsIGxlaWEgYSBsaWNlbsOnYSBhdGVudGFtZW50ZS4gQ2FzbwogcHJldGVuZGEgYWxndW0gZXNjbGFyZWNpbWVudG8gZW50cmUgZW0gY29udGF0byBwb3IgY29ycmVpbyBlbGV0csO0bmljbwogLSBjZGlAdWNiLmJyIG91IHRlbGVmb25lIC0gKDB4eDYxKSAzMzU2LTkwMjkKRepositório de Publicaçõeshttps://repositorio.ucb.br:9443/jspui/
dc.title.pt_BR.fl_str_mv A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model
title A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model
spellingShingle A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model
Fernandes, Gabriel da Rocha
COG
Secondary database
UniRef
UniProt
UECOG
title_short A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model
title_full A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model
title_fullStr A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model
title_full_unstemmed A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model
title_sort A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model
author Fernandes, Gabriel da Rocha
author_facet Fernandes, Gabriel da Rocha
Barbosa, Daniela Vale Campos
Prosdocimi, Francisco
Neshich, Izabella Agostinho Pena
Santos, Lucas Santana
Coelho Júnior, Oto Soares
Silva, Adriano Barbosa
Melo, Henrique Velloso Ferreira
Mudado, Maurício de Alvarenga
Natale, Darren A.
Campos, Alessandra C. Faria
Campos, Sérgio Vale Aguiar
Ortega, José Miguel
author_role author
author2 Barbosa, Daniela Vale Campos
Prosdocimi, Francisco
Neshich, Izabella Agostinho Pena
Santos, Lucas Santana
Coelho Júnior, Oto Soares
Silva, Adriano Barbosa
Melo, Henrique Velloso Ferreira
Mudado, Maurício de Alvarenga
Natale, Darren A.
Campos, Alessandra C. Faria
Campos, Sérgio Vale Aguiar
Ortega, José Miguel
author2_role author
author
author
author
author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Fernandes, Gabriel da Rocha
Barbosa, Daniela Vale Campos
Prosdocimi, Francisco
Neshich, Izabella Agostinho Pena
Santos, Lucas Santana
Coelho Júnior, Oto Soares
Silva, Adriano Barbosa
Melo, Henrique Velloso Ferreira
Mudado, Maurício de Alvarenga
Natale, Darren A.
Campos, Alessandra C. Faria
Campos, Sérgio Vale Aguiar
Ortega, José Miguel
dc.subject.por.fl_str_mv COG
Secondary database
UniRef
UniProt
UECOG
topic COG
Secondary database
UniRef
UniProt
UECOG
dc.description.abstract.por.fl_txt_mv A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species.
dc.description.status.pt_BR.fl_txt_mv Publicado
description A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species.
publishDate 2008
dc.date.issued.fl_str_mv 2008
dc.date.accessioned.fl_str_mv 2016-10-10T03:52:14Z
dc.date.available.fl_str_mv 2016-10-10T03:52:14Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
status_str publishedVersion
format article
dc.identifier.citation.fl_str_mv FERNANDES, Gabriel da Rocha et. al. A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model. Genetics and Molecular Research, v. 7, p. 925-932, 2008.
dc.identifier.uri.fl_str_mv http://twingo.ucb.br:8080/jspui/handle/10869/653
https://repositorio.ucb.br:9443/jspui/handle/123456789/7652
identifier_str_mv FERNANDES, Gabriel da Rocha et. al. A procedure to recruit members to enlarge protein family databases - the building of UECOG (UniRef-Enriched COG Database) as a model. Genetics and Molecular Research, v. 7, p. 925-932, 2008.
url http://twingo.ucb.br:8080/jspui/handle/10869/653
https://repositorio.ucb.br:9443/jspui/handle/123456789/7652
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv Texto
dc.source.none.fl_str_mv reponame:Repositório Institucional da UCB
instname:Universidade Católica de Brasília (UCB)
instacron:UCB
instname_str Universidade Católica de Brasília (UCB)
instacron_str UCB
institution UCB
reponame_str Repositório Institucional da UCB
collection Repositório Institucional da UCB
bitstream.url.fl_str_mv https://200.214.135.178:9443/jspui/bitstream/123456789/7652/1/A%20procedure%20to%20recruit%20members%20to%20enlarge%20protein%20family%20databases_the%20building%20of%20UECOG.pdf
https://200.214.135.178:9443/jspui/bitstream/123456789/7652/2/license_url
https://200.214.135.178:9443/jspui/bitstream/123456789/7652/3/license_text
https://200.214.135.178:9443/jspui/bitstream/123456789/7652/4/license_rdf
https://200.214.135.178:9443/jspui/bitstream/123456789/7652/5/license.txt
https://200.214.135.178:9443/jspui/bitstream/123456789/7652/6/A%20procedure%20to%20recruit%20members%20to%20enlarge%20protein%20family%20databases_the%20building%20of%20UECOG.pdf.txt
bitstream.checksum.fl_str_mv 093ef6ed5d0cf349fd5d795606ed20ca
3d480ae6c91e310daba2020f8787d6f9
294cb7010cc40c47642971e073de3dba
afd5dad10b1d1e6dc10c8c5d25222c7a
445d1980f282ec865917de35a4c622f6
709875d06fd96b35d5005749bdb24155
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1724829829979701248