Classifying and discovering genomic sequences in metagenomic repositories
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/37778 |
Resumo: | The taxonomic and functional composition of microbial communities from environmental, agricultural, and therapeutic settings is increasingly being studied using metagenomic methodologies in large-scale genomic applications. This has led to exponential growth in the field and has impacted on healthcare, pharmacology and biotechnology. However, with the current methodologies, it is sometimes difficult to obtain conclusive identification of an organism. In addition, the growth of the metagenomic field has led to the creation of large amounts of data held by different hosts, which characterize data differently and make analysis difficult. Therefore, correct data aggregation and classification improve and facilitate the discovery of repositories of interest. This paper tackles these issues by proposing a methodology for organism identification, data aggregation and content characterization, visualization and selection. We propose a three-step pipeline for organism identification that uses compression-based metrics, an aggregation mechanism for content characterization, and a web database catalogue for data exposition and visualization. |
id |
RCAP_843daf70fc6127a6ea35865c7dcb3601 |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/37778 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Classifying and discovering genomic sequences in metagenomic repositoriesTaxonomic ClassificationOrganism IdentificationCompressionWeb PortalData AggregationGenomic CatalogueThe taxonomic and functional composition of microbial communities from environmental, agricultural, and therapeutic settings is increasingly being studied using metagenomic methodologies in large-scale genomic applications. This has led to exponential growth in the field and has impacted on healthcare, pharmacology and biotechnology. However, with the current methodologies, it is sometimes difficult to obtain conclusive identification of an organism. In addition, the growth of the metagenomic field has led to the creation of large amounts of data held by different hosts, which characterize data differently and make analysis difficult. Therefore, correct data aggregation and classification improve and facilitate the discovery of repositories of interest. This paper tackles these issues by proposing a methodology for organism identification, data aggregation and content characterization, visualization and selection. We propose a three-step pipeline for organism identification that uses compression-based metrics, an aggregation mechanism for content characterization, and a web database catalogue for data exposition and visualization.Elsevier2023-05-19T09:06:52Z2023-01-01T00:00:00Z2023info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/37778eng1877-050910.1016/j.procs.2023.01.441Silva, Jorge MiguelAlmeida, João RafaelOliveira, José Luísinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:13:50Zoai:ria.ua.pt:10773/37778Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:08:23.148053Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Classifying and discovering genomic sequences in metagenomic repositories |
title |
Classifying and discovering genomic sequences in metagenomic repositories |
spellingShingle |
Classifying and discovering genomic sequences in metagenomic repositories Silva, Jorge Miguel Taxonomic Classification Organism Identification Compression Web Portal Data Aggregation Genomic Catalogue |
title_short |
Classifying and discovering genomic sequences in metagenomic repositories |
title_full |
Classifying and discovering genomic sequences in metagenomic repositories |
title_fullStr |
Classifying and discovering genomic sequences in metagenomic repositories |
title_full_unstemmed |
Classifying and discovering genomic sequences in metagenomic repositories |
title_sort |
Classifying and discovering genomic sequences in metagenomic repositories |
author |
Silva, Jorge Miguel |
author_facet |
Silva, Jorge Miguel Almeida, João Rafael Oliveira, José Luís |
author_role |
author |
author2 |
Almeida, João Rafael Oliveira, José Luís |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Silva, Jorge Miguel Almeida, João Rafael Oliveira, José Luís |
dc.subject.por.fl_str_mv |
Taxonomic Classification Organism Identification Compression Web Portal Data Aggregation Genomic Catalogue |
topic |
Taxonomic Classification Organism Identification Compression Web Portal Data Aggregation Genomic Catalogue |
description |
The taxonomic and functional composition of microbial communities from environmental, agricultural, and therapeutic settings is increasingly being studied using metagenomic methodologies in large-scale genomic applications. This has led to exponential growth in the field and has impacted on healthcare, pharmacology and biotechnology. However, with the current methodologies, it is sometimes difficult to obtain conclusive identification of an organism. In addition, the growth of the metagenomic field has led to the creation of large amounts of data held by different hosts, which characterize data differently and make analysis difficult. Therefore, correct data aggregation and classification improve and facilitate the discovery of repositories of interest. This paper tackles these issues by proposing a methodology for organism identification, data aggregation and content characterization, visualization and selection. We propose a three-step pipeline for organism identification that uses compression-based metrics, an aggregation mechanism for content characterization, and a web database catalogue for data exposition and visualization. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-05-19T09:06:52Z 2023-01-01T00:00:00Z 2023 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/37778 |
url |
http://hdl.handle.net/10773/37778 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
1877-0509 10.1016/j.procs.2023.01.441 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier |
publisher.none.fl_str_mv |
Elsevier |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137736503853056 |