Challenges in integrating Escherichia coli molecular biology data

Detalhes bibliográficos
Autor(a) principal: Lourenço, Anália
Data de Publicação: 2011
Outros Autores: Carneiro, S., Ferreira, Eugénio C., Rocha, I., Rocha, Miguel
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/1822/14330
Resumo: One key challenge in Systems Biology is to provide mechanisms to collect and integrate the necessary data to be able to meet multiple analysis requirements. Typically, biological contents are scattered over multiple data sources and there is no easy way of comparing heterogeneous data contents. This work discusses ongoing standardisation and interoperability efforts and exposes integration challenges for the model organism Escherichia coli K-12. The goal is to analyse the major obstacles faced by integration processes, suggest ways to systematically identify them, and whenever possible, propose solutions or means to assistmanual curation. Integration of gene, protein and compound data was evaluated by performing comparisons over EcoCyc, KEGG, BRENDA, ChEBI, Entrez Gene and UniProt contents. Cross-links, a number of standard nomenclatures and name information supported the comparisons. Except for the gene integration scenario, in no other scenario an element of integration performed well enough to support the process by itself. Indeed, both the integration of enzyme and compound records imply considerable curation. Results evidenced that, even for a well-studied model organism, source contents are still far from being as standardized as it would be desired and metadata varies considerably from source to source. Before designing any data integration pipeline, researchers should decide on the sources that best fit the purpose of analysis and be aware of existing conflicts/inconsistencies to be able to intervene in their resolution. Moreover, they should be aware of the limits of automatic integration such that they can define the extent of necessary manual curation for each application.
id RCAP_c61d34927500147be038dcb832171044
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/14330
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Challenges in integrating Escherichia coli molecular biology dataMolecular biologyData integrationData standardizationData interoperabilitySemantic heterogeneityScience & TechnologyOne key challenge in Systems Biology is to provide mechanisms to collect and integrate the necessary data to be able to meet multiple analysis requirements. Typically, biological contents are scattered over multiple data sources and there is no easy way of comparing heterogeneous data contents. This work discusses ongoing standardisation and interoperability efforts and exposes integration challenges for the model organism Escherichia coli K-12. The goal is to analyse the major obstacles faced by integration processes, suggest ways to systematically identify them, and whenever possible, propose solutions or means to assistmanual curation. Integration of gene, protein and compound data was evaluated by performing comparisons over EcoCyc, KEGG, BRENDA, ChEBI, Entrez Gene and UniProt contents. Cross-links, a number of standard nomenclatures and name information supported the comparisons. Except for the gene integration scenario, in no other scenario an element of integration performed well enough to support the process by itself. Indeed, both the integration of enzyme and compound records imply considerable curation. Results evidenced that, even for a well-studied model organism, source contents are still far from being as standardized as it would be desired and metadata varies considerably from source to source. Before designing any data integration pipeline, researchers should decide on the sources that best fit the purpose of analysis and be aware of existing conflicts/inconsistencies to be able to intervene in their resolution. Moreover, they should be aware of the limits of automatic integration such that they can define the extent of necessary manual curation for each application.SYSINBIO, an European Coordination and Support action (call FP7-KBBE-2007-1) in the field of model driven metabolic engineering; Portuguese FCT (Fundação para a Ciência e Tecnologia) funded MIT-Portugal Program in Bioengineering (MIT-Pt/BS-BB/0082/2008); PhD grant from FCT (ref. SFRH/BD/22863/2005) to S.C.Oxford University PressUniversidade do MinhoLourenço, AnáliaCarneiro, S.Ferreira, Eugénio C.Rocha, I.Rocha, Miguel20112011-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/14330engLourenco, A., Carneiro, S., Rocha, M., Ferreira, E. C., & Rocha, I. (2010, November 7). Challenges in integrating Escherichia coli molecular biology data. Briefings in Bioinformatics. Oxford University Press (OUP). http://doi.org/10.1093/bib/bbq0671467-546310.1093/bib/bbq06721059604http://bib.oxfordjournals.org/content/early/2010/11/04/bib.bbq067.shortinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:47:24Zoai:repositorium.sdum.uminho.pt:1822/14330Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:45:30.591948Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Challenges in integrating Escherichia coli molecular biology data
title Challenges in integrating Escherichia coli molecular biology data
spellingShingle Challenges in integrating Escherichia coli molecular biology data
Lourenço, Anália
Molecular biology
Data integration
Data standardization
Data interoperability
Semantic heterogeneity
Science & Technology
title_short Challenges in integrating Escherichia coli molecular biology data
title_full Challenges in integrating Escherichia coli molecular biology data
title_fullStr Challenges in integrating Escherichia coli molecular biology data
title_full_unstemmed Challenges in integrating Escherichia coli molecular biology data
title_sort Challenges in integrating Escherichia coli molecular biology data
author Lourenço, Anália
author_facet Lourenço, Anália
Carneiro, S.
Ferreira, Eugénio C.
Rocha, I.
Rocha, Miguel
author_role author
author2 Carneiro, S.
Ferreira, Eugénio C.
Rocha, I.
Rocha, Miguel
author2_role author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Lourenço, Anália
Carneiro, S.
Ferreira, Eugénio C.
Rocha, I.
Rocha, Miguel
dc.subject.por.fl_str_mv Molecular biology
Data integration
Data standardization
Data interoperability
Semantic heterogeneity
Science & Technology
topic Molecular biology
Data integration
Data standardization
Data interoperability
Semantic heterogeneity
Science & Technology
description One key challenge in Systems Biology is to provide mechanisms to collect and integrate the necessary data to be able to meet multiple analysis requirements. Typically, biological contents are scattered over multiple data sources and there is no easy way of comparing heterogeneous data contents. This work discusses ongoing standardisation and interoperability efforts and exposes integration challenges for the model organism Escherichia coli K-12. The goal is to analyse the major obstacles faced by integration processes, suggest ways to systematically identify them, and whenever possible, propose solutions or means to assistmanual curation. Integration of gene, protein and compound data was evaluated by performing comparisons over EcoCyc, KEGG, BRENDA, ChEBI, Entrez Gene and UniProt contents. Cross-links, a number of standard nomenclatures and name information supported the comparisons. Except for the gene integration scenario, in no other scenario an element of integration performed well enough to support the process by itself. Indeed, both the integration of enzyme and compound records imply considerable curation. Results evidenced that, even for a well-studied model organism, source contents are still far from being as standardized as it would be desired and metadata varies considerably from source to source. Before designing any data integration pipeline, researchers should decide on the sources that best fit the purpose of analysis and be aware of existing conflicts/inconsistencies to be able to intervene in their resolution. Moreover, they should be aware of the limits of automatic integration such that they can define the extent of necessary manual curation for each application.
publishDate 2011
dc.date.none.fl_str_mv 2011
2011-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/14330
url https://hdl.handle.net/1822/14330
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Lourenco, A., Carneiro, S., Rocha, M., Ferreira, E. C., & Rocha, I. (2010, November 7). Challenges in integrating Escherichia coli molecular biology data. Briefings in Bioinformatics. Oxford University Press (OUP). http://doi.org/10.1093/bib/bbq067
1467-5463
10.1093/bib/bbq067
21059604
http://bib.oxfordjournals.org/content/early/2010/11/04/bib.bbq067.short
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Oxford University Press
publisher.none.fl_str_mv Oxford University Press
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799133019873738752