Classifying and discovering genomic sequences in metagenomic repositories

Detalhes bibliográficos
Autor(a) principal: Silva, Jorge Miguel
Data de Publicação: 2023
Outros Autores: Almeida, João Rafael, Oliveira, José Luís
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/37778
Resumo: The taxonomic and functional composition of microbial communities from environmental, agricultural, and therapeutic settings is increasingly being studied using metagenomic methodologies in large-scale genomic applications. This has led to exponential growth in the field and has impacted on healthcare, pharmacology and biotechnology. However, with the current methodologies, it is sometimes difficult to obtain conclusive identification of an organism. In addition, the growth of the metagenomic field has led to the creation of large amounts of data held by different hosts, which characterize data differently and make analysis difficult. Therefore, correct data aggregation and classification improve and facilitate the discovery of repositories of interest. This paper tackles these issues by proposing a methodology for organism identification, data aggregation and content characterization, visualization and selection. We propose a three-step pipeline for organism identification that uses compression-based metrics, an aggregation mechanism for content characterization, and a web database catalogue for data exposition and visualization.
id RCAP_843daf70fc6127a6ea35865c7dcb3601
oai_identifier_str oai:ria.ua.pt:10773/37778
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Classifying and discovering genomic sequences in metagenomic repositoriesTaxonomic ClassificationOrganism IdentificationCompressionWeb PortalData AggregationGenomic CatalogueThe taxonomic and functional composition of microbial communities from environmental, agricultural, and therapeutic settings is increasingly being studied using metagenomic methodologies in large-scale genomic applications. This has led to exponential growth in the field and has impacted on healthcare, pharmacology and biotechnology. However, with the current methodologies, it is sometimes difficult to obtain conclusive identification of an organism. In addition, the growth of the metagenomic field has led to the creation of large amounts of data held by different hosts, which characterize data differently and make analysis difficult. Therefore, correct data aggregation and classification improve and facilitate the discovery of repositories of interest. This paper tackles these issues by proposing a methodology for organism identification, data aggregation and content characterization, visualization and selection. We propose a three-step pipeline for organism identification that uses compression-based metrics, an aggregation mechanism for content characterization, and a web database catalogue for data exposition and visualization.Elsevier2023-05-19T09:06:52Z2023-01-01T00:00:00Z2023info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/37778eng1877-050910.1016/j.procs.2023.01.441Silva, Jorge MiguelAlmeida, João RafaelOliveira, José Luísinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:13:50Zoai:ria.ua.pt:10773/37778Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:08:23.148053Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Classifying and discovering genomic sequences in metagenomic repositories
title Classifying and discovering genomic sequences in metagenomic repositories
spellingShingle Classifying and discovering genomic sequences in metagenomic repositories
Silva, Jorge Miguel
Taxonomic Classification
Organism Identification
Compression
Web Portal
Data Aggregation
Genomic Catalogue
title_short Classifying and discovering genomic sequences in metagenomic repositories
title_full Classifying and discovering genomic sequences in metagenomic repositories
title_fullStr Classifying and discovering genomic sequences in metagenomic repositories
title_full_unstemmed Classifying and discovering genomic sequences in metagenomic repositories
title_sort Classifying and discovering genomic sequences in metagenomic repositories
author Silva, Jorge Miguel
author_facet Silva, Jorge Miguel
Almeida, João Rafael
Oliveira, José Luís
author_role author
author2 Almeida, João Rafael
Oliveira, José Luís
author2_role author
author
dc.contributor.author.fl_str_mv Silva, Jorge Miguel
Almeida, João Rafael
Oliveira, José Luís
dc.subject.por.fl_str_mv Taxonomic Classification
Organism Identification
Compression
Web Portal
Data Aggregation
Genomic Catalogue
topic Taxonomic Classification
Organism Identification
Compression
Web Portal
Data Aggregation
Genomic Catalogue
description The taxonomic and functional composition of microbial communities from environmental, agricultural, and therapeutic settings is increasingly being studied using metagenomic methodologies in large-scale genomic applications. This has led to exponential growth in the field and has impacted on healthcare, pharmacology and biotechnology. However, with the current methodologies, it is sometimes difficult to obtain conclusive identification of an organism. In addition, the growth of the metagenomic field has led to the creation of large amounts of data held by different hosts, which characterize data differently and make analysis difficult. Therefore, correct data aggregation and classification improve and facilitate the discovery of repositories of interest. This paper tackles these issues by proposing a methodology for organism identification, data aggregation and content characterization, visualization and selection. We propose a three-step pipeline for organism identification that uses compression-based metrics, an aggregation mechanism for content characterization, and a web database catalogue for data exposition and visualization.
publishDate 2023
dc.date.none.fl_str_mv 2023-05-19T09:06:52Z
2023-01-01T00:00:00Z
2023
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/37778
url http://hdl.handle.net/10773/37778
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1877-0509
10.1016/j.procs.2023.01.441
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137736503853056