Challenges in integrating Escherichia coli molecular biology data
Autor(a) principal: | |
---|---|
Data de Publicação: | 2011 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/1822/14330 |
Resumo: | One key challenge in Systems Biology is to provide mechanisms to collect and integrate the necessary data to be able to meet multiple analysis requirements. Typically, biological contents are scattered over multiple data sources and there is no easy way of comparing heterogeneous data contents. This work discusses ongoing standardisation and interoperability efforts and exposes integration challenges for the model organism Escherichia coli K-12. The goal is to analyse the major obstacles faced by integration processes, suggest ways to systematically identify them, and whenever possible, propose solutions or means to assistmanual curation. Integration of gene, protein and compound data was evaluated by performing comparisons over EcoCyc, KEGG, BRENDA, ChEBI, Entrez Gene and UniProt contents. Cross-links, a number of standard nomenclatures and name information supported the comparisons. Except for the gene integration scenario, in no other scenario an element of integration performed well enough to support the process by itself. Indeed, both the integration of enzyme and compound records imply considerable curation. Results evidenced that, even for a well-studied model organism, source contents are still far from being as standardized as it would be desired and metadata varies considerably from source to source. Before designing any data integration pipeline, researchers should decide on the sources that best fit the purpose of analysis and be aware of existing conflicts/inconsistencies to be able to intervene in their resolution. Moreover, they should be aware of the limits of automatic integration such that they can define the extent of necessary manual curation for each application. |
id |
RCAP_c61d34927500147be038dcb832171044 |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/14330 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Challenges in integrating Escherichia coli molecular biology dataMolecular biologyData integrationData standardizationData interoperabilitySemantic heterogeneityScience & TechnologyOne key challenge in Systems Biology is to provide mechanisms to collect and integrate the necessary data to be able to meet multiple analysis requirements. Typically, biological contents are scattered over multiple data sources and there is no easy way of comparing heterogeneous data contents. This work discusses ongoing standardisation and interoperability efforts and exposes integration challenges for the model organism Escherichia coli K-12. The goal is to analyse the major obstacles faced by integration processes, suggest ways to systematically identify them, and whenever possible, propose solutions or means to assistmanual curation. Integration of gene, protein and compound data was evaluated by performing comparisons over EcoCyc, KEGG, BRENDA, ChEBI, Entrez Gene and UniProt contents. Cross-links, a number of standard nomenclatures and name information supported the comparisons. Except for the gene integration scenario, in no other scenario an element of integration performed well enough to support the process by itself. Indeed, both the integration of enzyme and compound records imply considerable curation. Results evidenced that, even for a well-studied model organism, source contents are still far from being as standardized as it would be desired and metadata varies considerably from source to source. Before designing any data integration pipeline, researchers should decide on the sources that best fit the purpose of analysis and be aware of existing conflicts/inconsistencies to be able to intervene in their resolution. Moreover, they should be aware of the limits of automatic integration such that they can define the extent of necessary manual curation for each application.SYSINBIO, an European Coordination and Support action (call FP7-KBBE-2007-1) in the field of model driven metabolic engineering; Portuguese FCT (Fundação para a Ciência e Tecnologia) funded MIT-Portugal Program in Bioengineering (MIT-Pt/BS-BB/0082/2008); PhD grant from FCT (ref. SFRH/BD/22863/2005) to S.C.Oxford University PressUniversidade do MinhoLourenço, AnáliaCarneiro, S.Ferreira, Eugénio C.Rocha, I.Rocha, Miguel20112011-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/14330engLourenco, A., Carneiro, S., Rocha, M., Ferreira, E. C., & Rocha, I. (2010, November 7). Challenges in integrating Escherichia coli molecular biology data. Briefings in Bioinformatics. Oxford University Press (OUP). http://doi.org/10.1093/bib/bbq0671467-546310.1093/bib/bbq06721059604http://bib.oxfordjournals.org/content/early/2010/11/04/bib.bbq067.shortinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:47:24Zoai:repositorium.sdum.uminho.pt:1822/14330Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:45:30.591948Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Challenges in integrating Escherichia coli molecular biology data |
title |
Challenges in integrating Escherichia coli molecular biology data |
spellingShingle |
Challenges in integrating Escherichia coli molecular biology data Lourenço, Anália Molecular biology Data integration Data standardization Data interoperability Semantic heterogeneity Science & Technology |
title_short |
Challenges in integrating Escherichia coli molecular biology data |
title_full |
Challenges in integrating Escherichia coli molecular biology data |
title_fullStr |
Challenges in integrating Escherichia coli molecular biology data |
title_full_unstemmed |
Challenges in integrating Escherichia coli molecular biology data |
title_sort |
Challenges in integrating Escherichia coli molecular biology data |
author |
Lourenço, Anália |
author_facet |
Lourenço, Anália Carneiro, S. Ferreira, Eugénio C. Rocha, I. Rocha, Miguel |
author_role |
author |
author2 |
Carneiro, S. Ferreira, Eugénio C. Rocha, I. Rocha, Miguel |
author2_role |
author author author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Lourenço, Anália Carneiro, S. Ferreira, Eugénio C. Rocha, I. Rocha, Miguel |
dc.subject.por.fl_str_mv |
Molecular biology Data integration Data standardization Data interoperability Semantic heterogeneity Science & Technology |
topic |
Molecular biology Data integration Data standardization Data interoperability Semantic heterogeneity Science & Technology |
description |
One key challenge in Systems Biology is to provide mechanisms to collect and integrate the necessary data to be able to meet multiple analysis requirements. Typically, biological contents are scattered over multiple data sources and there is no easy way of comparing heterogeneous data contents. This work discusses ongoing standardisation and interoperability efforts and exposes integration challenges for the model organism Escherichia coli K-12. The goal is to analyse the major obstacles faced by integration processes, suggest ways to systematically identify them, and whenever possible, propose solutions or means to assistmanual curation. Integration of gene, protein and compound data was evaluated by performing comparisons over EcoCyc, KEGG, BRENDA, ChEBI, Entrez Gene and UniProt contents. Cross-links, a number of standard nomenclatures and name information supported the comparisons. Except for the gene integration scenario, in no other scenario an element of integration performed well enough to support the process by itself. Indeed, both the integration of enzyme and compound records imply considerable curation. Results evidenced that, even for a well-studied model organism, source contents are still far from being as standardized as it would be desired and metadata varies considerably from source to source. Before designing any data integration pipeline, researchers should decide on the sources that best fit the purpose of analysis and be aware of existing conflicts/inconsistencies to be able to intervene in their resolution. Moreover, they should be aware of the limits of automatic integration such that they can define the extent of necessary manual curation for each application. |
publishDate |
2011 |
dc.date.none.fl_str_mv |
2011 2011-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1822/14330 |
url |
https://hdl.handle.net/1822/14330 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Lourenco, A., Carneiro, S., Rocha, M., Ferreira, E. C., & Rocha, I. (2010, November 7). Challenges in integrating Escherichia coli molecular biology data. Briefings in Bioinformatics. Oxford University Press (OUP). http://doi.org/10.1093/bib/bbq067 1467-5463 10.1093/bib/bbq067 21059604 http://bib.oxfordjournals.org/content/early/2010/11/04/bib.bbq067.short |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Oxford University Press |
publisher.none.fl_str_mv |
Oxford University Press |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799133019873738752 |