Classifying and discovering genomic sequences in metagenomic repositories

Silva, Jorge Miguel; Almeida, João Rafael; Oliveira, José Luís

Classifying and discovering genomic sequences in metagenomic repositories

Detalhes bibliográficos
Autor(a) principal:	Silva, Jorge Miguel
Data de Publicação:	2023
Outros Autores:	Almeida, João Rafael, Oliveira, José Luís
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10773/37778
Resumo:	The taxonomic and functional composition of microbial communities from environmental, agricultural, and therapeutic settings is increasingly being studied using metagenomic methodologies in large-scale genomic applications. This has led to exponential growth in the field and has impacted on healthcare, pharmacology and biotechnology. However, with the current methodologies, it is sometimes difficult to obtain conclusive identification of an organism. In addition, the growth of the metagenomic field has led to the creation of large amounts of data held by different hosts, which characterize data differently and make analysis difficult. Therefore, correct data aggregation and classification improve and facilitate the discovery of repositories of interest. This paper tackles these issues by proposing a methodology for organism identification, data aggregation and content characterization, visualization and selection. We propose a three-step pipeline for organism identification that uses compression-based metrics, an aggregation mechanism for content characterization, and a web database catalogue for data exposition and visualization.

Metadados do item

id	RCAP_843daf70fc6127a6ea35865c7dcb3601
oai_identifier_str	oai:ria.ua.pt:10773/37778
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Classifying and discovering genomic sequences in metagenomic repositoriesTaxonomic ClassificationOrganism IdentificationCompressionWeb PortalData AggregationGenomic CatalogueThe taxonomic and functional composition of microbial communities from environmental, agricultural, and therapeutic settings is increasingly being studied using metagenomic methodologies in large-scale genomic applications. This has led to exponential growth in the field and has impacted on healthcare, pharmacology and biotechnology. However, with the current methodologies, it is sometimes difficult to obtain conclusive identification of an organism. In addition, the growth of the metagenomic field has led to the creation of large amounts of data held by different hosts, which characterize data differently and make analysis difficult. Therefore, correct data aggregation and classification improve and facilitate the discovery of repositories of interest. This paper tackles these issues by proposing a methodology for organism identification, data aggregation and content characterization, visualization and selection. We propose a three-step pipeline for organism identification that uses compression-based metrics, an aggregation mechanism for content characterization, and a web database catalogue for data exposition and visualization.Elsevier2023-05-19T09:06:52Z2023-01-01T00:00:00Z2023info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/37778eng1877-050910.1016/j.procs.2023.01.441Silva, Jorge MiguelAlmeida, João RafaelOliveira, José Luísinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:13:50Zoai:ria.ua.pt:10773/37778Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:08:23.148053Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Classifying and discovering genomic sequences in metagenomic repositories
title	Classifying and discovering genomic sequences in metagenomic repositories
spellingShingle	Classifying and discovering genomic sequences in metagenomic repositories Silva, Jorge Miguel Taxonomic Classification Organism Identification Compression Web Portal Data Aggregation Genomic Catalogue
title_short	Classifying and discovering genomic sequences in metagenomic repositories
title_full	Classifying and discovering genomic sequences in metagenomic repositories
title_fullStr	Classifying and discovering genomic sequences in metagenomic repositories
title_full_unstemmed	Classifying and discovering genomic sequences in metagenomic repositories
title_sort	Classifying and discovering genomic sequences in metagenomic repositories
author	Silva, Jorge Miguel
author_facet	Silva, Jorge Miguel Almeida, João Rafael Oliveira, José Luís
author_role	author
author2	Almeida, João Rafael Oliveira, José Luís
author2_role	author author
dc.contributor.author.fl_str_mv	Silva, Jorge Miguel Almeida, João Rafael Oliveira, José Luís
dc.subject.por.fl_str_mv	Taxonomic Classification Organism Identification Compression Web Portal Data Aggregation Genomic Catalogue
topic	Taxonomic Classification Organism Identification Compression Web Portal Data Aggregation Genomic Catalogue
description	The taxonomic and functional composition of microbial communities from environmental, agricultural, and therapeutic settings is increasingly being studied using metagenomic methodologies in large-scale genomic applications. This has led to exponential growth in the field and has impacted on healthcare, pharmacology and biotechnology. However, with the current methodologies, it is sometimes difficult to obtain conclusive identification of an organism. In addition, the growth of the metagenomic field has led to the creation of large amounts of data held by different hosts, which characterize data differently and make analysis difficult. Therefore, correct data aggregation and classification improve and facilitate the discovery of repositories of interest. This paper tackles these issues by proposing a methodology for organism identification, data aggregation and content characterization, visualization and selection. We propose a three-step pipeline for organism identification that uses compression-based metrics, an aggregation mechanism for content characterization, and a web database catalogue for data exposition and visualization.
publishDate	2023
dc.date.none.fl_str_mv	2023-05-19T09:06:52Z 2023-01-01T00:00:00Z 2023
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10773/37778
url	http://hdl.handle.net/10773/37778
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	1877-0509 10.1016/j.procs.2023.01.441
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Elsevier
publisher.none.fl_str_mv	Elsevier
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799137736503853056

Classifying and discovering genomic sequences in metagenomic repositories

Registros relacionados